Scenario / Realtime systems
WebSocket Scaling: Designing Realtime Systems
A multiplayer backend goes live. Long-lived sessions, bursts of match events, and chat traffic punish shallow scaling.
Run this scenarioBriefing
Realtime systems & persistent connections
Realtime systems maintain persistent connections and push events continuously instead of only answering short HTTP requests.
Capacity planning changes when users hold open sessions. Connection count, fan-out, and event bursts matter as much as request rate.
Persistent connections scale differently than plain HTTP. Session pressure changes the shape of the problem.
- Provision dedicated WebSocket capacity for long-lived sessions.
- Use streams or queues to absorb bursty event traffic.
- Scale connection handling separately from ordinary application servers.
Contract
99.9%
120ms
$860/mo
Traffic shape
Live event traffic with long-lived sessions and bursts. Baseline 220 users; peak around 3,400 users over 60 hours.
Available components
Server
HTTP request handler Every web app needs at least one server. More servers let you handle more simultaneous requests before latency starts climbing.
Postgres
Primary data store Without a database, your app has no memory. Most dynamic requests eventually depend on it.
LB
Load balancer If you run more than one server, something needs to decide where each request goes. That is the load balancer.
WS
WebSocket server Chat, multiplayer games, and live dashboards need open connections instead of one request at a time.
Stream
Durable event stream Streams are shock absorbers for high-volume ingestion and fan-out systems where one producer feeds many downstream consumers.
Queue
Async job buffer Moving background work out of the request path keeps the app responsive even when extra processing is needed.
Worker
Background job processor Separating background work keeps checkout, page loads, and other user actions from competing with batch processing.
Redis
In-memory cache layer Popular pages, profiles, and product data often get requested again and again. Serving those from memory is much faster and cheaper.
Replica
Read-only DB copy Many applications read far more often than they write. Replicas let you spread those reads across more machines.
Rate limiter
Request throttle During abuse events, legitimate traffic competes with junk traffic for server capacity. Filtering noisy traffic at the edge protects the rest of the stack.
Common mistakes
- Traffic spikes need pre-positioned headroom and fast offload paths.
- Streams and workers must scale together to control backlog.
- Retries need backoff, priority, and enough worker capacity to avoid self-inflicted load.
- Realtime systems need reconnect backoff and connection-aware capacity planning.
Interview adjacency
- Design a chat app
- Design multiplayer presence
- Scale WebSockets