API Rate Limiting

Protect your API from abuse while ensuring fair usage for all clients

Why Rate Limiting?

πŸ›‘οΈ

Prevent Abuse

Stop malicious actors from overwhelming your API with excessive requests, scraping data, or attempting brute-force attacks.

βš–οΈ

Fair Usage

Ensure all clients get equitable access to your API resources. Prevent a single client from monopolizing server capacity.

πŸ’°

Cost Control

Manage infrastructure costs by limiting resource consumption. Essential for APIs with metered billing or limited compute resources.

πŸš€

DDoS Protection

First line of defense against distributed denial-of-service attacks. Limit request rates before they reach your application logic.

Rate Limit Headers

Standard HTTP headers communicate rate limit status to clients. Include these in every API response to help clients manage their request patterns.

X-RateLimit-Limit Maximum requests allowed in the time window
X-RateLimit-Remaining Requests remaining in current window
X-RateLimit-Reset Unix timestamp when the limit resets
GET /api/users
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640000000
Content-Type: application/json

{
  "data": [...]
}

πŸ’‘ Draft IETF Standard Headers

Consider using the proposed standard headers from IETF draft-ietf-httpapi-ratelimit-headers:

Standard Names RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset

Rate Limiting Algorithms

Token Bucket

Burst-Friendly

Tokens are added to a bucket at a fixed rate. Each request consumes a token. Allows controlled bursts while maintaining average rate limits.

Bucket capacity: 100 tokens Refill rate: 10 tokens/second Allows bursts up to 100 requests
Sliding Window

Smooth Limiting

Counts requests within a rolling time window. Provides smoother rate limiting without the boundary issues of fixed windows.

Window size: 60 seconds Max requests: 100 per window Window slides with each request
Fixed Window

Simple & Efficient

Counts requests within fixed time periods (e.g., per minute). Simple to implement but can allow burst at window boundaries.

Window: 00:00 - 00:59 Max: 100 requests per minute Counter resets at window start

Algorithm Comparison

Algorithm Memory Burst Handling Accuracy Best For
Token Bucket Low Excellent Good APIs needing burst tolerance
Sliding Window Medium Good Excellent Strict rate enforcement
Fixed Window Very Low Poor Moderate Simple implementations

429 Too Many Requests

When a client exceeds the rate limit, respond with HTTP status 429 Too Many Requests. Include helpful information to guide client behavior. See our error handling guide for response format best practices.

  • Always return a clear error message
  • Include the Retry-After header
  • Provide machine-readable error codes
  • Optionally include rate limit headers
429 Too Many Requests
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000060
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests",
  "retry_after": 60
}

Retry-After Header

⏱️ Seconds Format

Specify the number of seconds to wait before retrying:

Example Retry-After: 60

Simple and easy to parse. Preferred for rate limiting responses.

πŸ“… HTTP Date Format

Specify an exact date/time when requests can resume:

Example Retry-After: Wed, 21 Dec 2024 16:00:00 GMT

Useful for scheduled maintenance or planned downtime.

πŸ”§ Client Handling

Clients should always respect the Retry-After header:

async function fetchWithRetry(url) {
  const response = await fetch(url);
  
  if (response.status === 429) {
    const retryAfter = response.headers
      .get('Retry-After');
    const waitMs = parseInt(retryAfter) * 1000;
    
    await sleep(waitMs);
    return fetchWithRetry(url);
  }
  
  return response;
}

Client Best Practices

πŸ“ˆ

Exponential Backoff

Progressively increase wait time between retries: 1s, 2s, 4s, 8s, etc. Add jitter to prevent thundering herd problems.

delay = min(base * 2^attempt, maxDelay)
delay += random(0, delay * 0.1)
⏰

Respect Retry-After

Always honor the Retry-After header. Don't retry before the suggested timeβ€”it wastes resources and may extend your rate limit penalty.

πŸ’Ύ

Cache Responses

Implement proper HTTP caching using Cache-Control, ETag, and Last-Modified headers. Avoid unnecessary requests for unchanged data.

πŸ“¦

Batch Requests

Combine multiple operations into single requests when the API supports it. Reduces total request count and improves efficiency.

πŸ“Š

Monitor Headers

Track X-RateLimit-Remaining and proactively slow down before hitting limits. Better than waiting for 429 errors.

πŸ”„

Queue Requests

Implement a request queue that respects rate limits. Process requests at a sustainable pace rather than bursting.

Implementation Examples

Redis Sliding Window Counter

Production-ready rate limiter using Redis for distributed systems.

const Redis = require('ioredis');
const redis = new Redis();

async function isRateLimited(clientId, limit = 100, windowSec = 60) {
  const key = `ratelimit:${clientId}`;
  const now = Date.now();
  const windowStart = now - (windowSec * 1000);
  
  // Remove old entries outside the window
  await redis.zremrangebyscore(key, 0, windowStart);
  
  // Count requests in current window
  const count = await redis.zcard(key);
  
  if (count >= limit) {
    const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES');
    const resetTime = Math.ceil((parseInt(oldestEntry[1]) + windowSec * 1000 - now) / 1000);
    return { limited: true, remaining: 0, resetIn: resetTime };
  }
  
  // Add current request
  await redis.zadd(key, now, `${now}-${Math.random()}`);
  await redis.expire(key, windowSec);
  
  return { limited: false, remaining: limit - count - 1, resetIn: windowSec };
}

In-Memory Token Bucket

Simple implementation for single-server deployments.

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens
    this.tokens = capacity;        // Current tokens
    this.refillRate = refillRate;  // Tokens per second
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }

  consume(tokens = 1) {
    this.refill();
    
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return { allowed: true, remaining: Math.floor(this.tokens) };
    }
    
    const waitTime = (tokens - this.tokens) / this.refillRate;
    return { allowed: false, remaining: 0, retryAfter: Math.ceil(waitTime) };
  }
}

// Usage: const limiter = new TokenBucket(100, 10);

Express.js Rate Limit Middleware

Complete middleware with headers and error handling.

const rateLimit = (options = {}) => {
  const { limit = 100, windowMs = 60000 } = options;
  const clients = new Map();

  return (req, res, next) => {
    const clientId = req.ip || req.headers['x-forwarded-for'];
    const now = Date.now();
    
    let client = clients.get(clientId);
    if (!client || now - client.windowStart > windowMs) {
      client = { count: 0, windowStart: now };
      clients.set(clientId, client);
    }
    
    client.count++;
    const remaining = Math.max(0, limit - client.count);
    const resetTime = Math.ceil((client.windowStart + windowMs) / 1000);
    
    res.set({
      'X-RateLimit-Limit': limit,
      'X-RateLimit-Remaining': remaining,
      'X-RateLimit-Reset': resetTime
    });
    
    if (client.count > limit) {
      const retryAfter = Math.ceil((client.windowStart + windowMs - now) / 1000);
      res.set('Retry-After', retryAfter);
      return res.status(429).json({
        error: 'rate_limit_exceeded',
        message: 'Too many requests',
        retry_after: retryAfter
      });
    }
    
    next();
  };
};

// Usage: app.use('/api', rateLimit({ limit: 100, windowMs: 60000 }));