Rate Limiting Strategies

Express·2 min read·Feb 26, 2026

In backend development, rate limiting is a strategy for controlling how many requests clients can make to an API endpoint within a given time window before being rejected.

Rate limiting is essential for:

  • Protection from abuse / DoS — A client or attacker might try to flood your API and tie up resources.
  • Fairness & QoS — You ensure clients can't hog all your capacity, hijacking performance.
  • Cost control — External services might have rate limits or incur cost per request.
  • User experience — Better to throttle politely than crash or slow to a crawl.

But rate limiting isn't just about stopping "bad guys" — it's also about making your API predictable and reliable under load.

In this lesson, I'll walk you through 3 classic rate-limiting strategies called: fixed window, sliding window, and token bucket.

Fixed Window Strategy

The fixed window strategy consists in dividing time into fixed slots (e.g., 1 minute) and counting how many requests a client makes within this slot.

Once the client exceeds the defined maximum amount of requests per slot, all further requests are blocked until the slot resets.

For example, a fixed window of 10 requests per minute means that a client can only make 10 requests between 12:00:00–12:00:59, then another 10 between 12:01:00–12:01:59, and so on.

This strategy is often used for internal/admin endpoints.

Pros and cons

  • ✅ Very simple to implement.
  • ✅ Minimal memory usage.