Rate Limiting Strategies
Express·2 min read·Feb 26, 2026
In backend development, rate limiting is a strategy for controlling how many requests clients can make to an API endpoint within a given time window before being rejected.
Rate limiting is essential for:
- Protection from abuse / DoS — A client or attacker might try to flood your API and tie up resources.
- Fairness & QoS — You ensure clients can't hog all your capacity, hijacking performance.
- Cost control — External services might have rate limits or incur cost per request.
- User experience — Better to throttle politely than crash or slow to a crawl.
But rate limiting isn't just about stopping "bad guys" — it's also about making your API predictable and reliable under load.
In this lesson, I'll walk you through 3 classic rate-limiting strategies called: fixed window, sliding window, and token bucket.
Fixed Window Strategy
The fixed window strategy consists in dividing time into fixed slots (e.g., 1 minute) and counting how many requests a client makes within this slot.
Once the client exceeds the defined maximum amount of requests per slot, all further requests are blocked until the slot resets.
For example, a fixed window of 10 requests per minute means that a client can only make 10 requests between 12:00:00–12:00:59, then another 10 between 12:01:00–12:01:59, and so on.
This strategy is often used for internal/admin endpoints.
Pros and cons
- ✅ Very simple to implement.
- ✅ Minimal memory usage.