REST API Rate Limiting - Headers, Algorithms & Best Practices

Why Rate Limiting?

🛡️

Prevent Abuse

Stop malicious actors from overwhelming your API with excessive requests, scraping data, or attempting brute-force attacks.

⚖️

Fair Usage

Ensure all clients get equitable access to your API resources. Prevent a single client from monopolizing server capacity.

💰

Cost Control

Manage infrastructure costs by limiting resource consumption. Essential for APIs with metered billing or limited compute resources.

🚀

DDoS Protection

First line of defense against distributed denial-of-service attacks. Limit request rates before they reach your application logic.

Rate Limit Headers

Standard HTTP headers communicate rate limit status to clients. Include these in every API response to help clients manage their request patterns.

X-RateLimit-Limit Maximum requests allowed in the time window

X-RateLimit-Remaining Requests remaining in current window

X-RateLimit-Reset Unix timestamp when the limit resets

GET /api/users

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640000000
Content-Type: application/json

{
  "data": [...]
}

💡 Draft IETF Standard Headers

Consider using the proposed standard headers from IETF draft-ietf-httpapi-ratelimit-headers:

Standard Names RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset

Rate Limiting Algorithms

Token Bucket

Burst-Friendly

Tokens are added to a bucket at a fixed rate. Each request consumes a token. Allows controlled bursts while maintaining average rate limits.

Bucket capacity: 100 tokens Refill rate: 10 tokens/second Allows bursts up to 100 requests

Sliding Window

Smooth Limiting

Counts requests within a rolling time window. Provides smoother rate limiting without the boundary issues of fixed windows.

Window size: 60 seconds Max requests: 100 per window Window slides with each request

Fixed Window

Simple & Efficient

Counts requests within fixed time periods (e.g., per minute). Simple to implement but can allow burst at window boundaries.

Window: 00:00 - 00:59 Max: 100 requests per minute Counter resets at window start

Algorithm Comparison

Algorithm	Memory	Burst Handling	Accuracy	Best For
Token Bucket	Low	Excellent	Good	APIs needing burst tolerance
Sliding Window	Medium	Good	Excellent	Strict rate enforcement
Fixed Window	Very Low	Poor	Moderate	Simple implementations

429 Too Many Requests

When a client exceeds the rate limit, respond with HTTP status 429 Too Many Requests. Include helpful information to guide client behavior. See our error handling guide for response format best practices.

Always return a clear error message
Include the Retry-After header
Provide machine-readable error codes
Optionally include rate limit headers

429 Too Many Requests

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000060
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests",
  "retry_after": 60
}

Retry-After Header

⏱️ Seconds Format

Specify the number of seconds to wait before retrying:

Example Retry-After: 60

Simple and easy to parse. Preferred for rate limiting responses.

📅 HTTP Date Format

Specify an exact date/time when requests can resume:

Example Retry-After: Wed, 21 Dec 2024 16:00:00 GMT

Useful for scheduled maintenance or planned downtime.

🔧 Client Handling

Clients should always respect the Retry-After header:

async function fetchWithRetry(url) {
  const response = await fetch(url);
  
  if (response.status === 429) {
    const retryAfter = response.headers
      .get('Retry-After');
    const waitMs = parseInt(retryAfter) * 1000;
    
    await sleep(waitMs);
    return fetchWithRetry(url);
  }
  
  return response;
}

Client Best Practices

📈

Exponential Backoff

Progressively increase wait time between retries: 1s, 2s, 4s, 8s, etc. Add jitter to prevent thundering herd problems.

delay = min(base * 2^attempt, maxDelay)
delay += random(0, delay * 0.1)

⏰

Respect Retry-After

Always honor the Retry-After header. Don't retry before the suggested time—it wastes resources and may extend your rate limit penalty.

💾

Cache Responses

Implement proper HTTP caching using Cache-Control, ETag, and Last-Modified headers. Avoid unnecessary requests for unchanged data.

📦

Batch Requests

Combine multiple operations into single requests when the API supports it. Reduces total request count and improves efficiency.

📊

Monitor Headers

Track X-RateLimit-Remaining and proactively slow down before hitting limits. Better than waiting for 429 errors.

🔄

Queue Requests

Implement a request queue that respects rate limits. Process requests at a sustainable pace rather than bursting.

Implementation Examples

Redis Sliding Window Counter

Production-ready rate limiter using Redis for distributed systems.

const Redis = require('ioredis');
const redis = new Redis();

async function isRateLimited(clientId, limit = 100, windowSec = 60) {
  const key = `ratelimit:${clientId}`;
  const now = Date.now();
  const windowStart = now - (windowSec * 1000);
  
  // Remove old entries outside the window
  await redis.zremrangebyscore(key, 0, windowStart);
  
  // Count requests in current window
  const count = await redis.zcard(key);
  
  if (count >= limit) {
    const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES');
    const resetTime = Math.ceil((parseInt(oldestEntry[1]) + windowSec * 1000 - now) / 1000);
    return { limited: true, remaining: 0, resetIn: resetTime };
  }
  
  // Add current request
  await redis.zadd(key, now, `${now}-${Math.random()}`);
  await redis.expire(key, windowSec);
  
  return { limited: false, remaining: limit - count - 1, resetIn: windowSec };
}

In-Memory Token Bucket

Simple implementation for single-server deployments.

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens
    this.tokens = capacity;        // Current tokens
    this.refillRate = refillRate;  // Tokens per second
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }

  consume(tokens = 1) {
    this.refill();
    
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return { allowed: true, remaining: Math.floor(this.tokens) };
    }
    
    const waitTime = (tokens - this.tokens) / this.refillRate;
    return { allowed: false, remaining: 0, retryAfter: Math.ceil(waitTime) };
  }
}

// Usage: const limiter = new TokenBucket(100, 10);

Express.js Rate Limit Middleware

Complete middleware with headers and error handling.

const rateLimit = (options = {}) => {
  const { limit = 100, windowMs = 60000 } = options;
  const clients = new Map();

  return (req, res, next) => {
    const clientId = req.ip || req.headers['x-forwarded-for'];
    const now = Date.now();
    
    let client = clients.get(clientId);
    if (!client || now - client.windowStart > windowMs) {
      client = { count: 0, windowStart: now };
      clients.set(clientId, client);
    }
    
    client.count++;
    const remaining = Math.max(0, limit - client.count);
    const resetTime = Math.ceil((client.windowStart + windowMs) / 1000);
    
    res.set({
      'X-RateLimit-Limit': limit,
      'X-RateLimit-Remaining': remaining,
      'X-RateLimit-Reset': resetTime
    });
    
    if (client.count > limit) {
      const retryAfter = Math.ceil((client.windowStart + windowMs - now) / 1000);
      res.set('Retry-After', retryAfter);
      return res.status(429).json({
        error: 'rate_limit_exceeded',
        message: 'Too many requests',
        retry_after: retryAfter
      });
    }
    
    next();
  };
};

// Usage: app.use('/api', rateLimit({ limit: 100, windowMs: 60000 }));

API Rate Limiting

Why Rate Limiting?

Prevent Abuse

Fair Usage

Cost Control

DDoS Protection

Rate Limit Headers

💡 Draft IETF Standard Headers

Rate Limiting Algorithms

Burst-Friendly

Smooth Limiting

Simple & Efficient

Algorithm Comparison

429 Too Many Requests

Retry-After Header

⏱️ Seconds Format

📅 HTTP Date Format

🔧 Client Handling

Client Best Practices

Exponential Backoff

Respect Retry-After

Cache Responses

Batch Requests

Monitor Headers

Queue Requests

Implementation Examples

Redis Sliding Window Counter

In-Memory Token Bucket

Express.js Rate Limit Middleware

Related Topics

📊 Status Codes

🔐 Authentication

⚠️ Error Handling

✨ Best Practices