Scaling to 100K Users on a $200/month Budget: Our Kubernetes Playbook

The exact infra stack we used to help a UAE startup handle 100,000 concurrent users without a single second of downtime on launch day.

Last year we helped a UAE-based consumer app launch on national television. We had 48 hours of warning. The client expected 20,000 users. They got 100,000 in the first two hours.

The app didn't go down. The total infrastructure cost for that day was under $40. Here's exactly what we built and why it worked.

The Stack

We deployed on DigitalOcean Kubernetes (DOKS) — not AWS or GCP, despite what every enterprise playbook says. For a startup with tight margins, DOKS is dramatically simpler to operate and the per-node cost is significantly lower. Our base cluster was three nodes at $48/month each. Total: $144/month.

On top of that: a managed PostgreSQL database ($15/month), Redis for caching and session management ($10/month), and Cloudflare's free tier for DDoS protection and CDN. Total baseline: $169/month.

The API Architecture

The backend was a Node.js API, containerised with a minimal Alpine-based Docker image (final image: 87MB). We deployed it as a Kubernetes Deployment with three replicas and configured Horizontal Pod Autoscaling (HPA) targeting 60% CPU utilization. At peak load, it automatically scaled to eleven pods within ninety seconds.

The critical decision was what not to put in the API. Authentication, static assets, and read-heavy data endpoints were all handled before the request even reached our pods — more on that below.

The Caching Strategy

This is where most scaling problems live: the database. We implemented a three-layer caching approach. First, Cloudflare cached all public-facing static content at the edge. Second, a Redis cache sat in front of our database for any data that was the same for all users — product listings, configuration, leaderboards — with a 60-second TTL. Third, we used database connection pooling via PgBouncer to prevent the Postgres connection limit from becoming a bottleneck.

The result: at 100K concurrent users, our database was handling about 800 queries per second. Without the caching layers, it would have been closer to 80,000.

What Helm Charts Got Us

We packaged the entire deployment as a Helm chart. This meant that when we needed to scale the cluster mid-launch — adding two more nodes — it was a single command. No manual configuration, no SSH into servers, no crossed fingers. The new nodes joined the cluster, the HPA scheduled pods onto them, and traffic was balanced automatically.

The One Thing That Almost Broke

WebSocket connections. We had a real-time feature that opened a persistent WebSocket per user. At 100K users, that's 100K open connections — and our original setup tried to route them through the same ingress controller as normal HTTP traffic. Ingress controllers are not designed for this.

We caught it in load testing 24 hours before launch. The fix was a dedicated WebSocket service with a separate LoadBalancer, bypassing the ingress entirely. Forty minutes of work that prevented a catastrophic failure.

The Honest Part

This architecture worked because the application was well-designed. There were no N+1 queries. There were no synchronous operations that should have been async. There was no shared mutable state between pods. Infrastructure can absorb a lot of sins, but it can't fix bad application architecture at scale.

If you're planning a high-traffic launch and want someone to stress-test your architecture before the day arrives, we do launch readiness audits. We've seen exactly what breaks and exactly when.

Scaling to 100K Users on a $200/month Budget: Our Kubernetes Playbook

The Stack

The API Architecture

The Caching Strategy

What Helm Charts Got Us

The One Thing That Almost Broke

The Honest Part

More from the Blog

RAG vs Fine-Tuning: Which AI Approach Is Right for Your Business?

Why Your MVP is Failing: 5 Architecture Mistakes We Fix Every Week

How We Ranked a Client #1 on Google using Next.js

Ready to Build Something Great?