Herman Wong

Lewiston, ID

DevOps Engineer with 7+ years in IT, coming from the Ops side. I focus on eliminating manual work through infrastructure automation - turning multi-week manual efforts into repeatable, version-controlled pipelines you can run with a single click.

Most of my recent work has been Terraform and Ansible on AWS. I've built everything from multi-cloud VPN connectivity (AWS-to-Azure, dual tunnels with BGP) to fully automated Splunk HA cluster deployments that cut provisioning from 153 hours to under 5. I write CI/CD pipelines in GitLab CI and GitHub Actions, build hardened container images, and use Ansible Molecule for testing infrastructure code locally before it touches a cloud account.

I also integrate AI tooling (GitHub Copilot, Claude) into infrastructure workflows - not as a gimmick, but to speed up the write-test-fix loop on Ansible roles and Terraform modules.

Key Areas of Expertise:

  • Cloud Infrastructure: AWS (EC2, VPC, EKS, Transit Gateway, IAM, S3, RDS, KMS), Azure (VPN Gateway, Entra ID), multi-cloud connectivity
  • Infrastructure as Code: Terraform, Terragrunt, Ansible (roles, collections, Molecule testing), Packer AMI/container builds
  • CI/CD & Automation: GitLab CI, GitHub Actions, container image pipelines with Trivy scanning, automated deployment orchestration
  • Monitoring & Observability: Splunk HA clusters, Elasticsearch, Prometheus, Grafana, vRealize Operations
  • Scripting: Python, PowerShell, Bash, Golang (practical modifications to existing tools)
  • AI-Augmented Workflows: Claude Code, GitHub Copilot, MCP integrations (Firecrawl, Chrome DevTools), orchestrator-agent patterns for token-aware automation
Note:

This website is a static site hosted in an AWS S3 bucket behind a CloudFront CDN. Changes are made in VSCode, sync'd to GitHub, and then deployed to AWS via a simple CI/CD pipeline with GitHub Actions.


Visits:


Experience

DevOps Engineer

Coalfire
  • Terraform Drift Detection Tool Enhancement (Golang): Re-architected an open-source drift detection tool (driftctl) to replace 107 service-specific API enumerators with a single AWS Config API call for resource discovery, eliminating API rate limiting issues and reducing maintenance overhead. Integrated the terraform-exec Golang library to run terraform init + terraform plan against all root modules, enabling true configuration drift detection rather than just unmanaged resource identification. Implemented categorization logic to filter false positives by identifying CloudFormation-managed resources, AWS-managed service-linked roles, and resources unsupported by AWS Config or Terraform.
  • Infrastructure Automation & Deployment: Built Terraform configs for cloud resource provisioning (EC2, Kinesis Firehose, ALB, Security Groups, SQS) and Ansible roles for configuration management. Splunk HA Cluster deployments went from 153 hours of manual work to under 5 hours, fully automated through standardized playbooks. SQS queues handle the S3-notification-based log ingestion path for AWS services that cannot write directly to CloudWatch (ALB access logs being a primary example); Kinesis Firehose handles the real-time streaming path.
  • Ansible Molecule Container-Based Testing: Set up Ansible Molecule with Docker containers for local role and collection testing, replacing slow cloud-based validation cycles where you'd wait for EC2 instances just to test a config change. Configured multi-node test scenarios simulating production topologies (Elasticsearch clusters, Kibana, Fleet Server) with systemd support and custom networking. Catches configuration errors locally before anything touches a cloud account.
  • AI-Assisted Development with Continuous Feedback Loop: Integrated GitHub Copilot Agent Mode with Molecule testing so the AI could run tests, read failures, and fix issues on its own in a loop. Created instruction files defining test commands, error patterns, and fix strategies so the AI handles the tedious Run → Analyze → Fix → Repeat cycle, which cut debugging time on complex Ansible roles significantly.
  • Multi-Cloud Architecture: Built Terraform configs for HA Site-to-Site VPN between AWS Transit Gateway and Azure VPN Gateway with dual tunnels and BGP routing. What was a 3-week manual process took 2 days with IaC. Failover works automatically. Diagnosed tunnel establishment failures by enabling VPN logging via Terraform and analyzing IKE negotiation errors - root cause was incompatible default encryption protocols between AWS and Azure, resolved by explicitly defining compatible Diffie-Hellman Group 24 cipher configurations on both platforms.
  • CI/CD Pipeline Development: Built GitLab CI/CD pipelines that orchestrate Terraform + Ansible workflows. Push a button and 45 minutes later you have Active Directory domains, PKI services, Palo Alto NGFW, and clustered applications (Gitlab/Elasticsearch/Splunk) fully deployed.
  • Configuration Management Migration: Inherited an environment where all EC2 provisioning was done through ad-hoc User Data scripts — a fire-and-forget approach with no visibility into failures, no idempotency, and no orchestration between dependent systems. The Splunk cluster setup alone involved 9+ Python scripts and a custom module with hard-coded hostname logic that couldn't scale beyond the original node count. Windows Domain Controllers were configured via PowerShell/DSC scripts downloaded from S3 at runtime (a workaround for User Data size limits), with replica DC setup implemented as a 45-minute sleep followed by a domain join attempt. Ansible Tower itself was bootstrapped through shell scripts wrapping its own setup tooling, stitched together with jq for inventory configuration. Migrated this entire surface area to native Ansible roles: idempotent, version-controlled, and with real-time stdout feedback so failures surface immediately and in context rather than requiring engineers to SSH into individual nodes and dig through cloud-init logs. The improvement was most significant for clustered applications, where the original scripts had no mechanism to coordinate across nodes at all.
  • Cost Optimization: Set up AWS cost controls in Terraform: Cost Allocation Tags with enforced policies, Instance Scheduler to shut down resources off-hours, and cloud-nuke via GitHub Actions OIDC to wipe sandbox environments on schedule. Made it obvious who's spending what.
  • Process Automation: Wrote Selenium (Python) scripts to automate browser-based configurations for systems with no API. When there's no API, you either click through a web UI manually or you script a browser to do it for you.
  • Jira SLA Metrics Reporting Tool (Python, AI-Assisted): Manager needed a way to pull SLA violation and compliance data out of client Jira deployments. Used GitHub Copilot to write a ~1,300-line single-file Python script that queries Jira's REST API for SLA breaches, MTTR/MTTA by priority, ticket activity trends, and closure quality metrics. Kept it as one file deliberately so Tier 1-2 support engineers can copy/paste it into client environments without dealing with pip packages or project structure. Supports AWS Secrets Manager for credentials in production, with env var fallback for dev. Tested locally using Ansible Molecule to spin up a Jira instance in Docker containers rather than developing against production.
  • Packer AMI Templating with Jinja2: Built an Ansible Jinja2 templating tool that generates valid Packer HCL templates for multiple Linux distributions (AL2, AL2023, Ubuntu 20.04/22.04, RHEL8/9) and AMI purposes (base-os, docker, eks). Shared build steps (cloud-init wait, package updates, Python/Ansible install, LVM compliance volumes, STIG hardening) live in one place; distro-specific variations are handled by the template. GitHub Actions CI pipeline runs monthly sandbox builds with automated validation.
  • Cross-Platform Container Image Build Pipeline: Built a container build pipeline using go-task, building hardened images from IronBank ubi9-minimal base with Trivy scanning and OpenSCAP RHEL9 STIG checks. Runs on both GitHub Actions (external, with build validation and badges) and airgapped GitLab with ECR push for internal deployments. Makefile handles local builds for bootstrapping Docker executor images.
  • RegScale GRC Platform Infrastructure Deployment: Built the full cloud infrastructure for RegScale (Governance, Risk, and Compliance) in Terraform: EKS cluster with AWS-managed node groups for multi-environment workloads (dev/prod/default namespaces), IRSA roles for granular service account permissions (VPC-CNI, AWS Load Balancer Controller, External DNS). Backend was multi-AZ RDS SQL Server (Express for dev, Standard Edition for prod) with KMS encryption and automated backups, plus multi-AZ EFS for persistent storage. Networking was VPC with multi-tier subnets (public/private/database/intra), least-privilege security groups, and AWS Load Balancer Controller via Helm for ingress.
May 2022 - Jan 2026

VMware Systems Administrator

General Dynamics Information Technology - SPAWAR (Government Client)
  • VMware Horizon VDI Performance Optimization: Diagnosed VDI recompose operations taking 3-5 hours due to Horizon View Agent defaulting to KMS activation instead of AD-based activation. Implemented registry fix reducing recompose time to 30 minutes per pool (97% reduction), dramatically cutting maintenance windows and increasing VDI availability.
  • VDI Infrastructure Recovery: Resolved critical issue where VDI pools could not be managed or recomposed for 4 months. Diagnosed SQL database deployment group ID mismatch with ADAM database values on View Connection Server and corrected, restoring full pool management capability across production environment.
  • C2PC Legacy Software VDI Integration: Integrated C2PC (military Command and Control software by Northrop Grumman) with Horizon View VDI infrastructure it was never designed for. C2PC's SQL Server instances are hostname-based, causing the application to spam critical alerts and become unresponsive after VDI pool recompose operations changed clone hostnames. Wrote a PowerShell script to connect to SQL Server, drop the stale database, create a new database, and run the vendor's table-recreation script; registered it as a Horizon View post-synchronization script so it ran automatically after every pool publish. Successfully tested in MOC lab and deployed across all 6 production MOCs, resolving recurring C2PC failures after pool recompose.
  • VxRail Upgrade Troubleshooting & PKI Certificate Remediation: When VxRail upgrades stalled for one month across 6 production enclaves despite vendor support, volunteered to investigate. Identified two configuration issues: new SubCA certificate not added to VxRail Manager trust store after CA rotation, and missing /etc/hosts entries for ESXi hosts. Located documentation on vendor portal that team members couldn't access. Successfully upgraded 5 of 6 production enclaves within one week after diagnosis. Also diagnosed a separate incident where serious problems in new systems were caused by incorrectly issued Subordinate CA (PKI/SSL) certificates; worked with government SME to reissue correctly configured certificates and reinstall them across all affected infrastructure components (vCenter, VxRail) to prevent service impacts.
  • Log4J Vulnerability Remediation: Assessed and executed patching procedures for vROPs, Horizon View, and vCenter to address Log4J vulnerability across production environments. Resolved legacy View Security Server blocking issue requiring Flash-based pairing password reset; adapted unsupported PowerCLI script as workaround. Trained team on VMware stack patching procedures.
  • Unified Access Gateway HA Architecture: Researched and proposed 2x VCS (View Connection Server) + 4x UAG architecture replacing the original 4x VCS setup. Configured load-balanced UAG appliances so VDI sessions stay active during Connection Server restarts, so users stop getting kicked during monthly patching.
  • Infrastructure Monitoring Stack: Production environments had no centralized visibility — troubleshooting meant logging into individual systems to check logs. Deployed vRealize Operations Manager with vCenter SSO integration and DISA Compliance Pack for STIG monitoring. Integrated vRealize Log Insight with Content Packs for AD, Exchange, SQL, Windows, Horizon View, and Linux, all funneled through vROPS so you have one place to look.
  • Internal Documentation Platform (DokuWiki): There was no internal documentation across production enclaves. Confluence was available but leadership wanted an air-gap-friendly alternative they controlled. Deployed DokuWiki on a Synology NAS with plugins for backup, syntax highlighting, discussions, and tagging. Set up namespace hierarchy covering infrastructure domains and operational areas. Ended up being the primary author, writing runbooks for vROPs/vLOG SSO, VxRail troubleshooting, and Horizon View administration.
  • Windows Failover Clustering: Monthly patching took file servers offline, disrupting user access across production enclaves. Set up native clustered VMDK on vSAN with Windows Failover Clustering and anti-affinity rules to keep file servers on separate hosts, giving users zero-downtime file access during patching. Wrote and tested a PowerShell script to automate cluster creation and configuration.
  • PowerShell DSC Patch Orchestration: Wrote PowerShell DSC configurations for orchestrated patching across 6 production enclaves (separate Windows domain environments). Primary VMs in each role get patched first, secondaries wait until primary completes. DSC Pull Server with a config data file to toggle orchestration on/off. Enabled fully patching all 6 enclaves in a single day.
  • PowerShell DSC Software Inventory via vRealize Log Insight: Wrote a custom PowerShell DSC class-based resource that reads the uninstall registry hive to determine installed software names and versions, then writes custom Events for vRealize Log Insight to consume. Provided automated inventory of all software and versions across 100+ VMs per enclave without requiring agents or third-party tools.
  • Office 2019 Air-Gapped Distribution: Air-gapped downstream servers had no path to Microsoft's update infrastructure, leaving Office 2019 unpatched across the fleet — a compliance gap surfaced during reaccreditation. Wrote two PowerShell scripts: the first pulled the latest Office 2019 installer files from Microsoft's CDN via the Office Deployment Tool onto an internet-connected WSUS server; the second synced only changed files to air-gapped downstream servers using SHA hash comparison to skip unchanged files. Deployed to four production environments as the ongoing patch delivery mechanism for Office.
  • Adobe Flash Killswitch — Proactive Discovery and Fleet-Wide Remediation: Production Horizon View admin portals were going to become inaccessible on January 12, 2021 due to a time-based killswitch built into Adobe Flash — an obscure end-of-life mechanism most teams weren't aware of. Discovered the issue proactively by manually advancing the system date on a management VM to February 2021 and confirming the Horizon portal became unreachable. Found and documented an mms.cfg allowlist configuration that bypasses the killswitch. Led the contractor team in deploying the fix across all production enclaves before the cutoff date, preventing a maintenance window that would have blindsided operations.
  • Domain Controller Outage Diagnosis and Recovery: Domain controllers in a production enclave dropped to Public network profiles after NLA misclassified their NICs, causing domain authentication failures across VMs and VDI pools. After a reboot to fix the NLA issue, logins stopped working entirely. Event Log analysis showed SYSVOL replication had been silently failing because the C: drive on the replica domain controller was full — no SYSVOL replication meant domain auth was broken. Cleared C:\Windows\Temp by mapping C$ from the healthy primary domain controller, rebooted the replica, and restored full domain authentication.
  • SubCA Certificate / NDES Misconfiguration Root Cause Analysis: NDES web server configuration was failing with generic 500 errors across three new production enclaves despite identical configurations — months of work had stalled with no progress. Volunteered to investigate and traced the 500 errors to DNS resolution failures, which exposed a deeper structural issue: the Subordinate CA certificate had incorrect CRL/AIA distribution point URLs, causing all enclave assets to reference an airgapped legacy enclave domain controller for Certificate Revocation checks — a single point of failure that crossed environment boundaries. Worked with the architect and team lead to reissue the SubCA certificates with correct CRL/AIA locations. This unblocked NDES deployment and eliminated the cross-environment CRL dependency.
  • Certificate Trust List Sync for Air-Gapped Networks: Air-gapped downstream WSUS servers had no mechanism to receive updated Windows Certificate Trust Lists — a gap surfaced as a reaccreditation finding. Wrote two PowerShell scripts: the first used certutil -SyncWithWU on the internet-connected WSUS to pull the latest CTL; the second downloaded it to air-gapped downstream servers. Added a custom GPO to redirect the domain CTL URL to the local WSUS path so endpoints pulled from the internal source. Deployed across all production enclaves and closed the reaccreditation finding.
  • Windows 10 WSUS GPO Regression After 1909 Upgrade: All Windows 10 VMs across all production enclaves stopped receiving WSUS-targeted patches after being upgraded past Windows 10 1909 — they were silently ignoring existing Windows Update GPOs. A secondary issue was that Dual-Scan behavior was causing VMs to attempt outbound connections to Microsoft's public Windows Update servers, generating unnecessary firewall noise in an air-gapped environment. Researched Microsoft's whitepaper on GPO changes introduced in 1909+, identified the new required policy settings, tested in lab, and documented the changes for the team. Also added settings to disable Dual-Scan, eliminating the firewall traffic.
  • vRealize Log Insight Appliance Recovery from Filesystem Corruption: A production vRealize Log Insight appliance became unbootable with no obvious cause. Analyzed journalctl boot logs, identified inode corruption on the core data volume (/dev/mapper/data-core), and ran fsck to repair it. Appliance recovered and booted successfully without data loss or a rebuild.
  • DISM Air-Gapped OS Repair Procedure: DOD air-gapped environments had no documented method to repair Windows component store corruption short of rebuilding the VM — the standard DISM repair source is Windows Update, which isn't reachable. Researched and validated an alternative: using the WinSxS folder from a healthy VM running the same OS version at equal or higher patch level as an offline repair source. Tested the method in lab and documented it as a viable repair option for engineers facing component store corruption in air-gapped environments.
  • PowerShell Pester Infrastructure Health Test Suite: No standardized validation existed for verifying the health of critical environment infrastructure before and after patching cycles. Wrote a PowerShell Pester test suite covering Active Directory, Exchange DAG, File Server Failover Cluster, Horizon View, Admin Server, and C2PCGW. Tests gave engineers an automated health baseline instead of manual spot-checking. Formally handed off the suite during offboarding alongside the DSC orchestration scripts, vROPs/vLOG configurations, and WSUS tooling.
April 2021 - April 2022

Systems Administrator

Zachary Piper Solutions - SPAWAR (Government Client)
  • WSUS Performance Optimization: Applied WSUS metadata cleanup and Microsoft-recommended optimizations across 14 production enclaves, getting 20-30x improvement in patch deployment speed. Fixed chronic delta import failures on air-gapped networks that previously required importing a full OVF of the upstream WSUS server. Monthly patch export/import went from 100+ hours to 8 hours.
  • WSUS Infrastructure & Storage: Diagnosed and corrected VMFS deployment issues where thin-provisioned disks consumed maximum allocated space due to improper datastore initialization. Used vmkfstools to zero datastores and reclaim disk space, recovering 200-300GB. Converted thick-provisioned WSUS servers to thin-provisioned to resolve storage constraints. Reconfigured lab WSUS to pull patches from DISA WSUS servers over SSL, replacing unsecured connections to Microsoft update servers.
  • Exchange DAG Maintenance Automation: Wrote PowerShell scripts to start and stop DAG (Database Availability Group) maintenance mode for Microsoft Exchange mailbox servers, including checks to ensure all relevant services were properly started and stopped prior to major Cumulative Update upgrades. Prevents email loss by gracefully failing over nodes prior to patching.
October 2020 - April 2021

PC Technician

Abbtech - US Army Corps of Engineers IT
  • Enterprise System Health Monitoring: Wrote multi-threaded PowerShell tools using runspaces to scan 600+ computers in parallel, checking SCCM client health (installation, registry, WMI namespace), Bitlocker encryption status with Active Directory recovery key validation, Avamar backup agent, Windows Update compliance via registry UBR version tracking, GPO freshness (stale registry.pol blocks SCCM updates), TPM/Secure Boot, and Bomgar agent components. System health checks went from days to hours. Built during COVID downtime when ticket volume was minimal.
  • WPF Diagnostic Dashboard: Reverse-engineered an existing VBScript diagnostic tool (basically end-of-life language) that was natively single-threaded and painfully slow. Rebuilt the whole thing in PowerShell as a WPF GUI, using runspace pools for parallel CIM queries with DCOM protocol (since WinRM wasn't enabled), making it about 3-5x faster. Shows real-time status of 50+ system properties: SCCM health, Bitlocker status, McAfee HBSS components (Engine version, DAT version/date, EPO server, HIPS, DLP), Windows services, VPN detection, and network config.
  • BITS-Based Software Deployment: Needed to deploy 20-30GB CAD software (AutoCAD, Revit) to remote workers over VPN during COVID. Noticed SCCM uses BITS for fault-tolerant transfers, so I wrote a PowerShell script that does the same thing - asynchronous low-priority background transfers that survive VPN disconnections and resume automatically. Users didn't need to bring laptops on-site; installations completed fully remotely via Bomgar while they worked normally. My peers were doing File Explorer copies that failed after 10+ hours when users disconnected.
  • SCCM Enterprise Incident Investigation: Assisted SCCM Administrators in diagnosing an enterprise-wide issue where SCCM clients were uninstalling themselves across hundreds of machines, causing operational impacts (inability to deliver OS patches and software). Traced Event Logs to identify that a recently enabled SCCM add-on was the root cause, enabling the team to resolve the issue and restore patch delivery capability.
  • Microsoft Office Mass Activation Failure: Diagnosed and resolved a work-stopping incident where hundreds of users had Microsoft Office deactivated and unable to open. Root cause analysis identified a misconfigured GPO (missing DNS suffix SearchList entry) as the culprit. Provided the Tier 3 Active Directory management team with the specific GPO configuration changes needed to resolve the issue enterprise-wide.
  • Remote SCCM Patch Orchestration: Wrote a PowerShell tool that uses WMI remote registry to write SCCM reboot notification registry values and restart ccmexec on target computers, triggering Software Center restart prompts without mandatory enforcement. Users restart at their convenience, no phone calls or office visits. I resolved patching tickets entirely remotely while peers were asking users to drive to the office.
  • Bitlocker Compliance Auditing: Built automated Bitlocker compliance reporting that queries Active Directory computer objects for msFVE-RecoveryInformation to find systems missing AD-stored recovery keys. Domain-wide scanner compares on-device recovery passwords with AD records to catch sync failures that need re-escrow.
  • System Remediation Tooling: Wrote PowerShell tools for common break/fix scenarios: Windows Update Agent reset (stop BITS/WUA services, clear SoftwareDistribution folder, re-register 30+ DLLs per Microsoft docs) and WMI repository repair (adapted Microsoft's script to filter autorecover/deleteinstance/deleteclass MOF files that broke on Windows 10+). Fixed machines remotely instead of reimaging.
April 2019 - September 2020

Helpdesk Technician

Hawaii Tech Support
  • Network Discovery & Asset Documentation (Auvik): Deployed Auvik network monitoring across MSP client environments, discovered undocumented switches, firewalls, and servers still running factory-default credentials. Configured WinRM, SNMP, and SSH integration for device profiling. Documented all network assets, changed default credentials on discovered iDRAC/iLO and SonicWall devices, and built baseline inventories for environments that had none.
  • OS Deployment Automation (MDT/WDS): Technicians were spending hours babysitting individual Windows 10 installs. Set up MDT/WDS imaging server with VLAN isolation for PXE boot automation. New computers go from bare metal to fully imaged with standard software (RMM, Webroot, Office) in 30 minutes instead of 3+ hours of hands-on work.
  • Out-of-Band Server Management: Configured iDRAC/iLO access for client servers to enable remote power operations after power outages. Documented credentials and created procedures for other technicians, eliminating site visits previously required to physically power on servers when UPS-protected network equipment recovered but servers remained offline.
  • Remote Endpoint Remediation: Figured out how to install RMM agents remotely using Webroot shell command execution and domain-joined psexec for computers that Auvik discovered but had no management software. No more client site visits or asking users to bring laptops in just to install an agent.
  • Azure Recovery Services Deployment: Set up Azure Recovery Services for a non-profit client who had Azure credits sitting unused, gave them free direct-to-cloud offsite backups instead of relying solely on on-premises backup infrastructure.
  • Active Directory Lifecycle Automation: Set up PowerShell + Task Scheduler to automatically prune stale Active Directory computer and user accounts. Keeps the directory clean without someone manually auditing it.
  • Tableau Service Startup Automation: Resolved recurring issue where Tableau Server services failed to start automatically after monthly patching-related server restarts. Worked with Tableau vendor support to identify required startup command sequence, then implemented PowerShell + Task Scheduler automation to run the commands before and after patching. Tableau support noted the issue for future software improvement.
  • SQL Server Backup Automation: Configured SQL Server Agent to automate cleanup of SQL Server backups, preventing storage exhaustion from accumulated backup files.
  • Client Communication Automation: Utilized Microsoft Flows (Power Automate) to automate regular email notices to clients, reducing manual follow-up overhead.
  • Email Deliverability Configuration: Client emails kept getting flagged as spam. Set up SPF/DKIM/DMARC records in their DNS to fix email authentication.
February 2018 - February 2019

Projects

Elastic Cloud on Kubernetes on AWS EKS

  • Deployed EKS v1.33 cluster via Terragrunt with SPOT instances (m5.xlarge family) across 2 AZs for 60-80% compute savings. IRSA roles for AWS Load Balancer Controller, External DNS, and EBS CSI Driver so there are zero static AWS credentials on the cluster.
  • ArgoCD manages the entire stack through an App of Apps pattern with 14 child Applications across 7 sync waves. Kustomize base/overlay structure handles differences between local Rancher Desktop K3s and AWS EKS without duplicating manifests. Automated self-heal and prune keep the cluster in sync with Git.
  • Full Elastic stack (Elasticsearch 8.17.4, Kibana, Fleet Server, Elastic Agent) deployed as ECK Operator custom resources. 3-node Elasticsearch cluster on encrypted gp3 EBS volumes. Elastic Agent runs as a DaemonSet collecting system and Kubernetes metrics from every node.
  • Keycloak OIDC SSO across Elasticsearch, Kibana, ArgoCD, and Kiali. Solved an EKS hairpin NAT issue by splitting OIDC front-channel calls (public ALB URL) from back-channel token/JWKS calls (internal K8s DNS). Realm bootstrap and Elasticsearch role mappings handled by ArgoCD PostSync Jobs.
  • Istio service mesh in sidecar mode on the Elastic namespace for L7 traffic observability through Kiali and Prometheus. 15 NetworkPolicies across 3 namespaces implementing default-deny-ingress with selective allow rules.
  • AWS ALB with ACM wildcard TLS termination and CIDR-restricted access. External DNS auto-creates Route 53 records from Ingress annotations. cert-manager with a self-signed CA issues internal TLS certs for all internal-only components.
February 2026

AI Job Search Automation with Orchestrator-Agent Pattern

  • Built an AI-assisted job search system using GitHub Copilot prompt files with Claude models. Uses an orchestrator + agent pattern where orchestrators coordinate isolated subagents so each one gets clean context instead of drowning in accumulated job description tokens.
  • Searches multiple platforms (Greenhouse, Lever, Ashby, Workday, Hiring Cafe) using Firecrawl MCP for scraping and Chrome DevTools MCP for browser automation. Scoring framework ranks positions 0-10 with boosters, penalties, and disqualifiers.
  • Resume tailoring pipeline uses Claude Code slash commands to extract keywords from job postings, match against a full CV, and generate DOCX resumes and cover letters via Python docx library. Only uses skills actually listed in the CV, no fabrication.
  • Config is modular: YAML for inclusions/exclusions, markdown for scoring rules, CSV for company monitoring targets. Easy to swap out for different roles or tech stacks.
February 2026

Multi-Distribution Container Build Pipeline

  • CI/CD pipeline that builds 20 container images across 4 Linux distros (Alpine, UBI9, Ubuntu, Amazon Linux 2023) and 5 DevOps tools (Ansible, Terraform, Packer, Python, Golang) using GitHub Actions matrix strategy for parallel builds.
  • Trivy scanning for CVEs, secrets, and misconfigs on every build. Generates SPDX SBOMs for supply chain visibility.
  • All images run as non-root with distro-specific optimizations. Shared requirements files so you're not duplicating package lists across 20 Dockerfiles.
  • Makefile handles local builds on Apple Silicon (ARM64), so you can build, test, scan, and shell into containers without waiting for CI.
February 2026

Apache Kafka Cluster with AI-Assisted Development Workflow

  • Ansible collection that deploys an 8-node Apache Kafka cluster (3 ZooKeeper, 3 Kafka brokers, Root CA for TLS certs, Kafka UI dashboard). Tested entirely in Ansible Molecule containers, no cloud costs for iteration.
  • Found that GitHub Copilot Agent Mode's autonomous Run → Analyze → Fix loops consume one premium request regardless of iteration count. Complex debugging sessions that would normally burn through quota cost virtually nothing.
January 2026

Container-based Code Testing with Ansible Molecule

  • Set up Ansible Molecule for container-based infrastructure testing, deploys a 2-node monitoring stack (Grafana/Prometheus server + PostgreSQL node) in under 5 minutes locally.
  • Verifies application files, services, and API endpoints are working correctly after deployment. Full test cycle runs faster than waiting for a single EC2 instance to boot.
  • GitHub Actions CI runs the same tests across multiple Linux distros in parallel.
July 2025

Model Context Protocol (MCP) Server for Infrastructure as Code

  • Built a Python MCP server that gives AI tools persistent memory for Terraform and Ansible resources, caches resource state and tracks versions so the AI doesn't lose context between sessions.
  • Includes an XML-based task tracking system so the AI can pick up where it left off across infrastructure deployment iterations instead of starting from scratch.
  • Open-sourced with tests, documentation, and integration examples.
February 2025

Terragrunt and Ansible AWS Cloud Lab

  • Personal lab integrating Terraform and Ansible for AWS infrastructure. Terraform provisions resources, Ansible configures them, across dev/staging/prod environments.
  • Terragrunt wrapper eliminates code duplication between environments. Same module, different tfvars.
  • Reusable modules for common patterns (VPC, EC2, security groups) so spinning up a new environment is just a new Terragrunt config.
July 2022

AWS Cloud Resume Challenge

  • Built a static resume site on AWS: S3 for hosting, CloudFront for HTTPS delivery with a managed cache policy, ACM for TLS, and Route 53 for DNS. Root domain redirects to www via S3 website redirect.
  • Visitor counter built with API Gateway, Lambda (Python), and DynamoDB. Lambda uses an atomic DynamoDB UpdateItem so the count is accurate even with concurrent hits. API Gateway has rate limiting configured to avoid surprise costs.
  • GitHub Actions CI/CD pipeline syncs the repo to S3 and invalidates the CloudFront cache on every push to main. Originally used IAM access keys; later migrated to GitHub OIDC so no long-lived credentials are stored in GitHub secrets.
  • Infrastructure migrated from AWS SAM/CloudFormation to Terraform in 2025. Existing S3, CloudFront, ACM, and Route 53 resources were imported into state. The SAM counter stack was replaced with new Terraform-managed resources and the old CloudFormation stacks deleted. State stored in S3 with native locking.
March 2022

Skills

Operating Systems & Environments

Cloud & DevOps
  • AWS Services: EC2, S3, VPC, DynamoDB, RDS, Route 53, IAM, SSM, EKS, KMS, ASG, ELB, Kinesis Firehose, SQS, Transit Gateway, CloudFormation, Lambda, CloudWatch
  • Azure Services: VPN Gateway, Site-to-Site VPN, Entra ID, Virtual Networks, Resource Manager, Azure Monitor, Azure Policy, Azure Arc
  • Multi-Cloud Integration: Cross-cloud connectivity (AWS-to-Azure VPN), hybrid architecture
  • CI/CD Platforms: GitLab CI, GitHub Actions, Git workflows, ArgoCD for GitOps
  • Container Orchestration: Docker, Kubernetes, AWS EKS, IRSA roles for service accounts, External DNS, AWS Load Balancer Controller, Helm package management
  • EKS Components: VPC-CNI networking, CoreDNS, kube-proxy, OIDC provider integration, multi-namespace isolation, EKS add-ons management
  • Persistent Storage: EFS integration with Kubernetes, multi-AZ mount targets, CSI drivers, StatefulSets
  • IaC Tools: Terraform, Terragrunt, AWS CloudFormation, Packer for image automation
  • Configuration Management: Ansible playbooks, roles, and collections for enterprise deployments
  • Infrastructure Testing: Ansible Molecule for container-based testing, automated validation pipelines
  • Automation Frameworks: PowerShell DSC, Selenium for web automation, Python scripting
  • Monitoring & Observability: Splunk HA Cluster, Elasticsearch, Prometheus, Grafana, CloudWatch, RDS Performance Insights

Infrastructure
  • Windows Core Services: Active Directory, GPO, DNS, PKI, PowerShell, Exchange DAG, WSUS, MDT/WDS, BITS
  • Windows High Availability: Windows Failover Clustering, File Server Clusters
  • VMware Platform: vCenter, vSphere, VxRail, vSAN, Nutanix
  • VMware VDI: Horizon View, Unified Access Gateway (UAG), App Volumes
  • VMware Monitoring: vRealize Operations Manager, vRealize Log Insight
  • VMware Automation: PowerCLI
  • Linux: Bash, Cron, LVM, Systemd
  • Server Management: WinRM, SNMP, iDRAC/iLO
  • Email Security: SPF, DKIM, DMARC

AI-Assisted Development
  • AI Tools & Integration: Claude API, Claude Code CLI, GitHub Copilot, Python Anthropic SDK
  • Model Context Protocol (MCP): Firecrawl MCP for web scraping, Chrome DevTools MCP for browser automation, built custom MCP servers
  • AI Architecture Patterns: Orchestrator-agent pattern for breaking up token-heavy workloads, each subagent gets clean context instead of accumulating noise
  • Practical Usage: AI pair programming for Terraform and Ansible, prompt engineering for structured output, building repeatable AI workflows with the same rigor as infrastructure code (version-controlled configs, modular design, testable components)

Certifications


Education

Kapiolani Community College

Associate of Applied Science
Information Technology
December 2017