Scenario / Contention and isolation

Checkout Contention: Write-Heavy System Design Practice

A payments startup lands a partner launch. Checkout must stay fast while retries, auth, and DB reads compete for capacity.

Run this scenario

Briefing

Contention & resource competition

Contention happens when unrelated workloads compete for the same limited server, database, or ingress capacity.

A system can fail even when total capacity looks adequate if the wrong workload monopolizes the bottleneck.

Systems fail when unrelated workloads compete for the same bottleneck. Isolation prevents cascading failures.

  • Move slow or retry-heavy work behind queues.
  • Separate auth or background work when it competes with core product traffic.
  • Watch which workload causes the bottleneck, not only which component turns red.

Contract

Uptime

99.85%

P95 latency

160ms

Budget

$720/mo

Traffic shape

Morning surge that tests capacity during a narrow peak. Baseline 150 users; peak around 3,000 users over 54 hours.

Available components

Server

HTTP request handler Every web app needs at least one server. More servers let you handle more simultaneous requests before latency starts climbing.

Postgres

Primary data store Without a database, your app has no memory. Most dynamic requests eventually depend on it.

Redis

In-memory cache layer Popular pages, profiles, and product data often get requested again and again. Serving those from memory is much faster and cheaper.

LB

Load balancer If you run more than one server, something needs to decide where each request goes. That is the load balancer.

Queue

Async job buffer Moving background work out of the request path keeps the app responsive even when extra processing is needed.

Worker

Background job processor Separating background work keeps checkout, page loads, and other user actions from competing with batch processing.

Auth

Session and token gate Authentication can become a heavy repeated workload, especially when every request needs identity checks.

Replica

Read-only DB copy Many applications read far more often than they write. Replicas let you spread those reads across more machines.

Rate limiter

Request throttle During abuse events, legitimate traffic competes with junk traffic for server capacity. Filtering noisy traffic at the edge protects the rest of the stack.

Common mistakes

  • Decoupling and fallback paths keep third-party failures from becoming total outages.
  • Timeouts and fallbacks are reliability controls, not polish. Slow dependencies need hard limits.
  • Authentication pressure should be isolated from core product compute.
  • Connection pools can fail before raw database capacity reaches 100%.

Interview adjacency

  • Design payment checkout
  • Design order processing
  • Handle write contention