Key Question
How do we prevent a single abusive client or a “retry storm” from crashing our entire system?
Deep Dive
A distributed system has finite resources (CPU, Memory, Database connections). If the incoming request rate exceeds the system’s capacity, the result is Congestion Collapse: the system spends more time managing overhead than doing actual work, and latency spikes to infinity.
Rate Limiting is the practice of rejecting requests once a certain threshold is reached to protect the system’s health.
Common Algorithms
- Token Bucket: The system has a “bucket” that refills with tokens at a fixed rate. Each request costs one token. If the bucket is empty, the request is rejected. This allows for small “bursts” of traffic but enforces a long-term average.
- Leaky Bucket: Imagine a bucket with a hole in the bottom. Requests enter the bucket, but they “leak” out (are processed) at a constant rate. This smoothes out traffic perfectly, eliminating bursts.
- Fixed Window: You allow 100 requests per minute. At 12:01:00, the counter resets. This is simple but dangerous: if a client sends 100 requests at 12:00:59 and another 100 at 12:01:01, the system sees a spike of 200 requests in 2 seconds.
Distributed Rate Limiting
In a distributed system, rate limiting is hard because the “counter” must be shared across multiple servers. If you store the counter in a local variable, a client could hit 10 different servers and get 10x the limit.
The solution is usually an external, fast Key-Value store like Redis. Each server checks Redis before processing a request. To minimize the network “tax” of checking Redis for every request, some systems use Local Batching: each server reserves 10 tokens from Redis at once and doles them out locally.
Key Takeaways
- Rate limiting is about Fairness and Self-Preservation.
- Leaky Bucket = Smooth traffic; Token Bucket = Controlled bursts.
- Distributed rate limiting requires a centralized fast counter (like Redis).
Exercises
- Which algorithm is better for an API that expects occasional bursts of traffic?
- A client is rate-limited. What HTTP status code should the server return?
- How does “Local Batching” improve performance in a distributed rate limiter?
👁️ View Solutions
- Token Bucket, as it allows the client to use up accumulated tokens quickly.
429 Too Many Requests.- It reduces the number of network round-trips to the central counter (e.g., Redis), lowering latency and reducing load on the counter itself.