API Rate Limiting
Protect your API from abuse while ensuring fair usage for all clients
Why Rate Limiting?
Prevent Abuse
Stop malicious actors from overwhelming your API with excessive requests, scraping data, or attempting brute-force attacks.
Fair Usage
Ensure all clients get equitable access to your API resources. Prevent a single client from monopolizing server capacity.
Cost Control
Manage infrastructure costs by limiting resource consumption. Essential for APIs with metered billing or limited compute resources.
DDoS Protection
First line of defense against distributed denial-of-service attacks. Limit request rates before they reach your application logic.
Rate Limit Headers
Standard HTTP headers communicate rate limit status to clients. Include these in every API response to help clients manage their request patterns.
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640000000
Content-Type: application/json
{
"data": [...]
}
π‘ Draft IETF Standard Headers
Consider using the proposed standard headers from IETF draft-ietf-httpapi-ratelimit-headers:
RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset
Rate Limiting Algorithms
Burst-Friendly
Tokens are added to a bucket at a fixed rate. Each request consumes a token. Allows controlled bursts while maintaining average rate limits.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Allows bursts up to 100 requests
Smooth Limiting
Counts requests within a rolling time window. Provides smoother rate limiting without the boundary issues of fixed windows.
Window size: 60 seconds
Max requests: 100 per window
Window slides with each request
Simple & Efficient
Counts requests within fixed time periods (e.g., per minute). Simple to implement but can allow burst at window boundaries.
Window: 00:00 - 00:59
Max: 100 requests per minute
Counter resets at window start
Algorithm Comparison
429 Too Many Requests
When a client exceeds the rate limit, respond with HTTP status 429 Too Many Requests. Include helpful information to guide client behavior. See our error handling guide for response format best practices.
- Always return a clear error message
- Include the
Retry-Afterheader - Provide machine-readable error codes
- Optionally include rate limit headers
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000060
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Too many requests",
"retry_after": 60
}
Retry-After Header
β±οΈ Seconds Format
Specify the number of seconds to wait before retrying:
Retry-After: 60
Simple and easy to parse. Preferred for rate limiting responses.
π HTTP Date Format
Specify an exact date/time when requests can resume:
Retry-After: Wed, 21 Dec 2024 16:00:00 GMT
Useful for scheduled maintenance or planned downtime.
π§ Client Handling
Clients should always respect the Retry-After header:
async function fetchWithRetry(url) {
const response = await fetch(url);
if (response.status === 429) {
const retryAfter = response.headers
.get('Retry-After');
const waitMs = parseInt(retryAfter) * 1000;
await sleep(waitMs);
return fetchWithRetry(url);
}
return response;
}
Client Best Practices
Exponential Backoff
Progressively increase wait time between retries: 1s, 2s, 4s, 8s, etc. Add jitter to prevent thundering herd problems.
delay = min(base * 2^attempt, maxDelay)
delay += random(0, delay * 0.1)
Respect Retry-After
Always honor the Retry-After header. Don't retry before the suggested timeβit wastes resources and may extend your rate limit penalty.
Cache Responses
Implement proper HTTP caching using Cache-Control, ETag, and Last-Modified headers. Avoid unnecessary requests for unchanged data.
Batch Requests
Combine multiple operations into single requests when the API supports it. Reduces total request count and improves efficiency.
Monitor Headers
Track X-RateLimit-Remaining and proactively slow down before hitting limits. Better than waiting for 429 errors.
Queue Requests
Implement a request queue that respects rate limits. Process requests at a sustainable pace rather than bursting.
Implementation Examples
Redis Sliding Window Counter
Production-ready rate limiter using Redis for distributed systems.
const Redis = require('ioredis');
const redis = new Redis();
async function isRateLimited(clientId, limit = 100, windowSec = 60) {
const key = `ratelimit:${clientId}`;
const now = Date.now();
const windowStart = now - (windowSec * 1000);
// Remove old entries outside the window
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
const count = await redis.zcard(key);
if (count >= limit) {
const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES');
const resetTime = Math.ceil((parseInt(oldestEntry[1]) + windowSec * 1000 - now) / 1000);
return { limited: true, remaining: 0, resetIn: resetTime };
}
// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, windowSec);
return { limited: false, remaining: limit - count - 1, resetIn: windowSec };
}
In-Memory Token Bucket
Simple implementation for single-server deployments.
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens
this.tokens = capacity; // Current tokens
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
consume(tokens = 1) {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return { allowed: true, remaining: Math.floor(this.tokens) };
}
const waitTime = (tokens - this.tokens) / this.refillRate;
return { allowed: false, remaining: 0, retryAfter: Math.ceil(waitTime) };
}
}
// Usage: const limiter = new TokenBucket(100, 10);
Express.js Rate Limit Middleware
Complete middleware with headers and error handling.
const rateLimit = (options = {}) => {
const { limit = 100, windowMs = 60000 } = options;
const clients = new Map();
return (req, res, next) => {
const clientId = req.ip || req.headers['x-forwarded-for'];
const now = Date.now();
let client = clients.get(clientId);
if (!client || now - client.windowStart > windowMs) {
client = { count: 0, windowStart: now };
clients.set(clientId, client);
}
client.count++;
const remaining = Math.max(0, limit - client.count);
const resetTime = Math.ceil((client.windowStart + windowMs) / 1000);
res.set({
'X-RateLimit-Limit': limit,
'X-RateLimit-Remaining': remaining,
'X-RateLimit-Reset': resetTime
});
if (client.count > limit) {
const retryAfter = Math.ceil((client.windowStart + windowMs - now) / 1000);
res.set('Retry-After', retryAfter);
return res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Too many requests',
retry_after: retryAfter
});
}
next();
};
};
// Usage: app.use('/api', rateLimit({ limit: 100, windowMs: 60000 }));