Cloud Migration

Migrating from AWS to GCP: A Zero-Downtime Playbook

November 12, 2025 9 min read

A B2B SaaS company came to us spending $45K/month on AWS with a committed-use discount expiring in 3 months. Their ML team had standardized on Vertex AI and BigQuery for analytics, but their production workloads were still on EKS. They wanted to consolidate everything on GCP — without any customer-visible downtime.

Six weeks later, they were fully running on GKE. Here's how we did it.

Phase 1: Architecture mapping (Week 1)

Before touching anything, we mapped every service, dependency, and data flow:

12 microservices on EKS across 3 node groups
RDS PostgreSQL (primary + read replica) — 800GB
ElastiCache Redis — session store and job queue
S3 — 2TB of user-uploaded assets
CloudFront — CDN for static assets
Route 53 — DNS with health checks
SQS + Lambda — event processing pipeline

Every component got a GCP equivalent assigned: EKS→GKE, RDS→Cloud SQL, ElastiCache→Memorystore, S3→Cloud Storage, CloudFront→Cloud CDN, Route 53→Cloud DNS, SQS+Lambda→Pub/Sub+Cloud Run.

Phase 2: Parallel infrastructure (Week 2)

We stood up the entire GCP infrastructure using Terraform — identical to the AWS setup but with GCP services. Key decisions:

GKE Autopilot for the Kubernetes cluster (less node management overhead)
Cloud SQL with high availability (regional) and automated backups
VPN tunnel between AWS VPC and GCP VPC for the migration period

Phase 3: Data migration (Weeks 3–4)

The hardest part. We ran both systems in parallel:

Database

We used pglogical for continuous logical replication from RDS to Cloud SQL. Initial sync took 6 hours for the 800GB database. After that, changes replicated in near-real-time (sub-second lag). We monitored replication lag continuously and set alerts for anything above 5 seconds.

Object storage

We used rclone with 64 parallel transfers to sync S3 to Cloud Storage. Initial sync: 4 hours. Then a continuous sync job running every 15 minutes to catch new uploads. We configured the application to dual-write to both S3 and GCS during the migration window.

Redis

Redis data is ephemeral (sessions and cache). We didn't migrate it — we let it rebuild naturally after cutover. The application handled cache misses gracefully.

Phase 4: Application deployment (Week 4)

We deployed all 12 services to GKE using the same Helm charts (Kubernetes is Kubernetes). The only changes were:

Database connection strings → Cloud SQL proxy
S3 SDK calls → GCS SDK (we had already abstracted storage behind an interface)
SQS consumers → Pub/Sub consumers
IAM roles → GCP Workload Identity

Phase 5: DNS cutover (Week 5)

The zero-downtime moment. Our strategy:

Lower DNS TTL to 60 seconds (done 48 hours before cutover)
Verify GKE is serving traffic correctly via a staging domain
Verify database replication lag is under 1 second
Switch DNS from AWS ALB to GCP load balancer
Monitor error rates for 30 minutes
If clean → stop replication and decommission AWS read path
If errors → revert DNS (60-second TTL means recovery in ~1 minute)

The cutover was clean. Zero errors, zero customer complaints. We kept the AWS infrastructure running in read-only mode for 1 week as a safety net, then decommissioned it.

Results

Downtime: Zero (DNS-based cutover with 60s TTL)
Migration duration: 5 weeks end-to-end
Cost savings: $45K → $31K/month (31% reduction, mostly from GKE Autopilot and committed-use discounts)
Performance: 15% improvement in API latency (co-located with Vertex AI and BigQuery)

"We were terrified of the migration. DevOps Team made it feel routine. Not a single customer noticed."

Planning a cloud migration?

We'll map your current architecture and give you a realistic timeline and cost estimate — for free.

Book Free Assessment

← Back to all articles