Scenario / The budget call at h18

On-Call Simulation: Diagnose an Overprovisioned System

A healthy internal tool is wildly overprovisioned. Finance needs a material cost cut before the next billing cycle without breaching the remaining peak.

Run this scenario

Briefing

Capacity planning in reverse

Rightsizing means matching deployed capacity to observed demand and required headroom.

Overprovisioning hides design mistakes and spends budget that could protect a real bottleneck later.

Removing things is engineering too.

  • Cut duplicate capacity from tiers with low load first.
  • Keep redundancy where failure would immediately breach the SLA.
  • Watch the remaining traffic peak after every cut.

Contract

Uptime

99.4%

P95 latency

300ms

Budget

$300/mo

Traffic shape

Daily traffic curve with a predictable high-traffic window. Baseline 120 users; peak around 900 users over 36 hours.

Available components

Server

HTTP request handler Every web app needs at least one server. More servers let you handle more simultaneous requests before latency starts climbing.

Postgres

Primary data store Without a database, your app has no memory. Most dynamic requests eventually depend on it.

LB

Load balancer If you run more than one server, something needs to decide where each request goes. That is the load balancer.

Redis

In-memory cache layer Popular pages, profiles, and product data often get requested again and again. Serving those from memory is much faster and cheaper.

Replica

Read-only DB copy Many applications read far more often than they write. Replicas let you spread those reads across more machines.

Queue

Async job buffer Moving background work out of the request path keeps the app responsive even when extra processing is needed.

Worker

Background job processor Separating background work keeps checkout, page loads, and other user actions from competing with batch processing.

Rate limiter

Request throttle During abuse events, legitimate traffic competes with junk traffic for server capacity. Filtering noisy traffic at the edge protects the rest of the stack.

Common mistakes

  • Cost optimization should follow measured bottlenecks, not gut feel.
  • Rolling recovery preserves availability better than restarting the whole fleet.
  • Query-heavy paths need indexes, replicas, or search offload before peak traffic.
  • Connection pools can fail before raw database capacity reaches 100%.

Interview adjacency

  • Rightsize infrastructure
  • Capacity planning
  • Handle production handover