Production infrastructure platform for a growing SaaS product — Terraform and Terragrunt across seven AWS accounts applied through Atlantis, Kubernetes workload delivery via ArgoCD with Flagger-based canary analysis, and two custom operators handling namespace provisioning and progressive delivery policy, replacing a manual process that blocked engineering teams for days at a time.
TerraformTerragruntAtlantisArgoCDApplicationSetsFlaggerKubernetesEKSKarpenterHelmGitHub ActionsGocontroller-runtimeKubebuilderOPA GatekeeperConftestExternal Secrets OperatorCert ManagerPrometheus OperatorAWS Load Balancer ControllerExternal DNSFluent BitAmazon ECRAWS Secrets ManagerAWS OrganizationsMSKAurora PostgreSQLElastiCacheOpenSearchTransit GatewayIRSAOIDC
Peak throughput5K req/s
p99 latency28ms (↓ from 140ms)
Annual compute savings~$124K (54% Spot)
AWS accounts7 (single region)
Deploy frequency50+ / day
MTTR8 min (auto-rollback)
Terraform modules18 reusable
Static credentialsEliminated
Open-source Terraform Provider in Go that manages Pingdom monitoring resources — checks, contacts, and teams — as infrastructure code, with a layered expand/flatten/normalize model that enforces strict state consistency and eliminates phantom diffs across plan/apply cycles.
GoTerraform Plugin SDKPingdom APIHCL
Resources managed3 types
Check protocolsHTTP / Ping / TCP
Plan stabilityIdempotent
Import supportAll resources
Streaming analytics pipeline that ingests Twitter data with Kafka, processes it through Spark Structured Streaming, stores semantic vectors in Milvus, and runs on Kubernetes for scale and resilience.
PythonKafkaSpark Structured StreamingKubernetesMilvusDocker
Streaming engineSpark
Message busKafka
Vector storeMilvus
OrchestrationKubernetes