System Design Glossary

A

Auth gateway

An auth gateway is a dedicated layer for login, session, and token validation so those checks do not bottleneck the main app servers.

Autoscaling adds or removes capacity based on metrics, but it still has delay and can chase the wrong bottleneck. In Mutexmachine it is the idea behind mid-simulation deploys arriving after traffic has already changed.

B

Budget

Budget is the maximum monthly infrastructure spend allowed by the contract. In the simulator, extra capacity can save uptime but still fail the run if it overspends.

Backpressure

Backpressure happens when producers create work faster than downstream systems can consume it, causing queues or streams to pile up.

Blue-green deploy

A blue-green deploy runs old and new versions side by side, then shifts traffic once the new version is healthy. In Mutexmachine it is the safer pattern behind deploy rollback and DNS cutover incidents.

Bulkhead isolation

Bulkhead isolation separates resource pools so one failing workload cannot consume everything. In Mutexmachine it appears in contention, auth, worker, and graceful degradation scenarios.

Bottleneck

A bottleneck is the tier currently limiting overall system throughput. Once it saturates, the rest of the architecture starts queuing behind it.

C

Cache

A cache stores frequently requested data in fast memory so repeated reads do not hit the database every time.

Cache stampede

A cache stampede happens when cached data expires and many requests rush to the database at the same time to rebuild it.

Connection pool

A connection pool is the limited set of database connections shared by the app. When the pool is exhausted, new queries have to wait or fail outright.

CDN

A CDN caches static content close to users so files can be served from the edge without reaching your origin servers.

Circuit breaker states

Circuit breakers move between closed, open, and half-open states to stop slow dependencies from consuming upstream capacity. In Mutexmachine they explain graceful degradation and dependency latency incidents.

CAP theorem

CAP is a practical reminder that distributed systems must choose behavior during partitions rather than pretending perfect consistency and availability always coexist. In Mutexmachine it appears as tradeoffs around failover, stale reads, and graceful degradation.

Cold start

A cold start is the period when a component exists but has not warmed up enough to be fully effective. In Mutexmachine it appears when caches come online and hit rate climbs gradually.

Connection draining

Connection draining lets existing requests finish before a server is removed from rotation. In Mutexmachine it is the safer operational pattern behind rollbacks, restarts, and deploy incidents.

D

Database

The database is the durable source of truth for application state. It often becomes the hardest component to scale because reads and writes converge there.

DDoS

A distributed denial-of-service attack floods a system with traffic in order to overwhelm capacity and knock the service offline.

Dependency latency

Dependency latency happens when a downstream service slows down and upstream servers wait on it, holding threads and connections open until pressure spreads.

Dead letter queue

A dead letter queue stores messages that failed too many times so healthy work can keep moving. In Mutexmachine it explains poison queue message and retry-limit incidents.

E

Error rate

Error rate is the percentage of requests that fail instead of returning a valid response. Rising error rate usually means capacity or dependency failure.

Event stream

An event stream carries a continuous flow of messages or events between producers and consumers and helps smooth sudden bursts.

Exponential backoff

Exponential backoff spaces retries farther apart after each failure, often with jitter to avoid synchronization. In Mutexmachine it is the mitigation behind retry storms, reconnect storms, and thundering herd failures.

Eventual consistency

Eventual consistency means replicas or async systems may be temporarily stale but converge later. In Mutexmachine it appears in queues, streams, read replicas, and async processing scenarios.

Error budget

An error budget is the amount of failure a service can spend before it violates its reliability target. In Mutexmachine it is the practical meaning of strict uptime contracts.

F

Failover

Failover is the act of shifting traffic or responsibility to a backup component when the primary one is unhealthy.

Fan-out

Fan-out sends one event or update to many downstream consumers or clients. In Mutexmachine it appears in realtime and event-stream scenarios where one burst can multiply work.

G

Graceful degradation

Graceful degradation keeps the core service alive by returning a simpler or less personalized response when a dependency is slow or unavailable.

H

Horizontal scaling

Horizontal scaling means adding more instances of a component, like more servers, instead of making one machine larger.

Hot key

A hot key is one cached object, database row, or partition that receives far more traffic than its neighbors. In Mutexmachine it appears when one expired cache key pushes many reads back to storage.

Head-of-line blocking

Head-of-line blocking happens when one slow item at the front of a queue delays everything behind it. In Mutexmachine it shows up as latency growth when connections, workers, or queues are saturated.

Health check

A health check is a small probe that decides whether a component should receive traffic. In Mutexmachine it is implied by failover, load balancer routing, and rollback incidents.

I

Idempotency

Idempotency means a repeated operation produces the same final result instead of duplicating work. In Mutexmachine it matters when retries, queues, and payment-like writes appear in incidents and post-mortems.

Incident response

Incident response is the practice of stabilizing a live degraded system under time pressure, prioritizing fast risk reduction before deeper root-cause fixes.

L

Latency

Latency is how long a request takes to get a response. High latency means the system is still working, but users feel the slowdown.

Load balancer

A load balancer distributes incoming traffic across multiple servers so one machine does not take all the requests alone.

O

Object storage

Object storage is designed for large files like images, videos, and uploads. It keeps bulk media out of the app server and database path.

Origin shield

An origin shield is a caching layer that protects your origin from many edge cache misses arriving at once. In Mutexmachine it appears in CDN origin miss and media delivery incidents.

P

P95 latency

P95 latency is the response time below which 95% of requests complete. It is a common way to measure whether users are getting consistently fast responses.

P50/P95/P99

P50, P95, and P99 describe how fast typical and tail requests are, not only the average. In Mutexmachine the contract uses P95 latency because tail slowdown is what users feel during pressure.

Q

Queue

A queue buffers work that does not need to happen inside the user request path so that bursts can be absorbed and processed later.

Queue depth

Queue depth is the number of messages waiting to be processed. A growing queue means producers are faster than workers, so the queue is buying time rather than solving the bottleneck.

R

RPS

RPS means requests per second. It is the live traffic rate your backend must absorb, and peak RPS is where hidden bottlenecks usually appear in the simulator.

Read/write split

A read/write split separates requests that only read data from requests that change data. Caches and replicas help reads, while queues and write isolation help writes.

Replica

A replica is a read-only copy of the database that can answer read queries and reduce pressure on the primary database.

Rate limiting

Rate limiting caps how many requests a client can make in a given time window, which helps protect systems during abuse or sudden spikes.

Replication lag

Replication lag is the delay between a primary system accepting a write and replicas or consumers seeing it. In Mutexmachine it appears when read replicas or streams fall behind under write pressure.

Retry storm

A retry storm happens when failing requests are retried so aggressively that the retries become new load. In Mutexmachine it appears in worker and dependency incidents where backoff is missing.

Rate limiting algorithms

Rate limiting algorithms decide how requests are counted and smoothed before excess traffic is rejected. In Mutexmachine they appear when rate limiters protect ingress during abuse or misconfiguration.

S

SLA

A Service Level Agreement is the contract that defines the minimum uptime, latency, and cost targets your architecture must meet.

Search index

A search index is a separate structure optimized for fast search queries, which prevents complex search reads from hammering the primary database.

Sharding

Sharding splits data across multiple partitions so one database node does not own every read and write. In Mutexmachine it is the deeper scaling idea behind write-heavy database pressure.

Service degradation

Service degradation means the system is still running but with slower responses, fewer features, or reduced correctness. In Mutexmachine it appears before full breaches and in graceful degradation incidents.

SLO vs SLA

An SLO is an internal reliability target, while an SLA is an external promise or contract. In Mutexmachine the contract behaves like an SLA, while your design margin is the SLO thinking underneath it.

T

Thundering herd

A thundering herd happens when many clients retry, reconnect, or rebuild cached data at the same time. In Mutexmachine it appears in cache, restart, and database connection incidents.

TTL jitter

TTL jitter adds small random variation to cache expiration times so many keys do not expire together. In Mutexmachine it is the prevention pattern behind cache stampede and hot-key incidents.

Throughput vs latency

Throughput is how much work a system completes per second, while latency is how long one request waits. In Mutexmachine adding capacity can improve throughput while queues and dependencies still hurt latency.

U

Uptime

Uptime is the percentage of time a service stays operational. Higher uptime targets leave less room for outages or degraded performance.

V

Vertical vs horizontal scaling

Vertical scaling makes one machine larger, while horizontal scaling adds more machines. In Mutexmachine most deploy choices teach horizontal scaling because redundancy and load distribution matter under incidents.

W

Worker

Workers are background processes that drain queued or streamed work outside the main request path.

WebSocket

A WebSocket keeps a client connection open so the server can push realtime updates without opening a new request each time.

Write amplification

Write amplification means one user action creates multiple internal writes, such as audit logs, analytics events, search updates, and primary database rows.