Designing REST APIs for AI Agents
Idempotency, rate limiting, structured outputs, async patterns, and tool-calling schema design for LLM consumers
Why AI Agents Are Different API Consumers
Gartner predicts that by 2026, more than 30% of API demand growth will come from AI agents — not humans. Designing for agents requires a different mindset from designing for human-driven browsers or mobile apps.
AI agents are fully automated. There is no human in the loop to resolve ambiguity, retry intelligently, or handle edge cases. They call APIs at machine speed — potentially thousands of requests per minute — and they interpret your error messages literally. Your response shape, your error format, and your retry behaviour all become part of your API's functional contract.
| Property | Human Client | AI Agent Client |
|---|---|---|
| Retry behaviour | Manual, rare | Automatic, aggressive |
| Error interpretation | Reads UI message | Parses JSON error field |
| Request rate | ~1–5 req/s | 100–10,000+ req/s |
| Ambiguity handling | Asks user | Fails or hallucinates |
| Long operations | Waits + spins | Needs 202 + polling |
| Documentation | Reads prose docs | Reads field descriptions in schema |
OpenAI / Anthropic Tool Calling Schema
Tool calling (function calling) is the primary interface between LLMs and your REST API. You define each endpoint as a JSON Schema function definition. The LLM reads the description and parameters to decide when and how to invoke your API.
// OpenAI / Anthropic tool definition for a REST endpoint
const tools = [
{
type: 'function',
function: {
name: 'create_order',
description: 'Create a new customer order. Use this when the user wants to purchase items. ' +
'Returns an order ID and initial status. Always include an idempotency_key ' +
'generated from the conversation context to prevent duplicate orders.',
parameters: {
type: 'object',
required: ['customer_id', 'items'],
additionalProperties: false,
properties: {
customer_id: {
type: 'string',
format: 'uuid',
description: 'The UUID of the customer placing the order'
},
items: {
type: 'array',
minItems: 1,
maxItems: 50,
description: 'List of products and quantities to order',
items: {
type: 'object',
required: ['product_id', 'quantity'],
properties: {
product_id: { type: 'string', format: 'uuid', description: 'Product UUID' },
quantity: { type: 'integer', minimum: 1, maximum: 100, description: 'Units to order' }
}
}
},
idempotency_key: {
type: 'string',
minLength: 8,
maxLength: 64,
description: 'Client-generated unique key to prevent duplicate orders on retry. ' +
'Generate from: sha256(user_id + product_ids + timestamp_minute)'
}
}
}
}
},
{
type: 'function',
function: {
name: 'get_order_status',
description: 'Get the current status of an order by its ID.',
parameters: {
type: 'object',
required: ['order_id'],
properties: {
order_id: { type: 'string', format: 'uuid', description: 'The order UUID to check' }
}
}
}
}
];
// Execute tool call from LLM response
async function executeTool(toolCall) {
const args = JSON.parse(toolCall.function.arguments);
switch (toolCall.function.name) {
case 'create_order':
return await fetch('https://api.example.com/orders', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + process.env.API_TOKEN,
'Idempotency-Key': args.idempotency_key || crypto.randomUUID()
},
body: JSON.stringify(args)
}).then(r => r.json());
case 'get_order_status':
return await fetch('https://api.example.com/orders/' + args.order_id, {
headers: { 'Authorization': 'Bearer ' + process.env.API_TOKEN }
}).then(r => r.json());
}
}
Description is the Most Important Field
The LLM decides whether to call your tool based on the description. Be explicit about:
- When to use this tool (not just what it does)
- What parameters to generate (e.g. how to form an idempotency key)
- What to do with the response
- Edge cases and error conditions
Schema Design for LLM Consumption
LLMs read field names and descriptions to understand your data model. Clear, descriptive naming reduces hallucinated values:
// ❌ Opaque field names — agent will guess
{ "amt": 2999, "cur": "usd", "cid": "c_123" }
// ✅ Descriptive — agent understands context
{ "amount_cents": 2999, "currency": "usd", "customer_id": "cust_abc123" }
// ✅ Use enums wherever possible — constrain the agent's choices
{
"status": {
"type": "string",
"enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
"description": "Current order lifecycle status"
}
}
// ✅ Flat responses over deeply nested (agents struggle with deep paths)
// ❌ Avoid:
{ "data": { "order": { "meta": { "id": "123" } } } }
// ✅ Prefer:
{ "order_id": "123", "status": "pending", "total_cents": 2999 }
Idempotency is Critical
AI agents retry aggressively. A network timeout on a payment endpoint will cause the agent to retry — potentially creating a duplicate charge. Every POST and PATCH endpoint with side effects must support idempotency keys.
See our full Idempotency guide for the Redis-backed implementation. For agent APIs, add guidance in the tool description on how to generate a deterministic key.
// Deterministic idempotency key generation for agents
// Key should be stable for the same "intent" — regenerating on retry must return same key
function generateIdempotencyKey(userId, action, contextHash) {
return crypto
.createHash('sha256')
.update(`${userId}:${action}:${contextHash}`)
.digest('base64url')
.slice(0, 32);
}
// On the API side: idempotency middleware (see /idempotency for full implementation)
app.post('/orders', idempotencyMiddleware, createOrderHandler);
Rate Limiting for Agent Traffic
A single AI agent workflow can generate thousands of API calls per minute. Standard per-user rate limits designed for humans will not protect against runaway agent loops. Implement agent-aware rate limits:
// Per-agent-id rate limiting middleware
const ratelimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
// Identify agent traffic via a header
app.use((req, res, next) => {
req.agentId = req.headers['x-agent-id'] || null;
req.isAgent = !!req.agentId;
next();
});
const agentRateLimit = ratelimit({
windowMs: 60 * 1000, // 1 minute
max: (req) => req.isAgent ? 200 : 60, // agents: 200/min, humans: 60/min
keyGenerator: (req) => req.agentId || req.ip,
store: new RedisStore({ sendCommand: (...args) => redis.sendCommand(args) }),
handler: (req, res) => {
res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Too many requests from this agent. Back off and retry.',
retry_after_seconds: 60,
// Structured so the LLM knows exactly what to do
suggested_action: 'Wait 60 seconds, then retry with exponential backoff'
});
}
});
app.use('/api/', agentRateLimit);
// Burst detection: alert on unusual traffic spikes
app.use('/api/', (req, res, next) => {
if (req.isAgent) {
metrics.increment('agent_requests', { agent_id: req.agentId });
}
next();
});
Async Patterns: 202 Accepted + Polling
Long-running operations (report generation, bulk imports, AI processing) must be async. Agents cannot block — they need to poll or receive a webhook callback.
// 202 Accepted pattern: immediate response, poll for completion
app.post('/reports', auth, async (req, res) => {
const jobId = crypto.randomUUID();
// Start async job (queue, background worker)
await jobQueue.add('generate-report', { ...req.body, jobId });
// Return immediately with job reference
res.status(202).json({
job_id: jobId,
status: 'pending',
status_url: `https://api.example.com/jobs/${jobId}`,
estimated_completion_seconds: 30,
// Tell the agent exactly what to do next
instructions: 'Poll the status_url every 5 seconds until status is "completed" or "failed"'
});
});
// Job status endpoint — agent polls this
app.get('/jobs/:jobId', auth, async (req, res) => {
const job = await getJob(req.params.jobId);
if (!job) return res.status(404).json({ error: 'Job not found' });
if (job.status === 'completed') {
return res.json({
job_id: job.id,
status: 'completed',
result_url: `https://api.example.com/reports/${job.reportId}`,
completed_at: job.completedAt
});
}
res.json({
job_id: job.id,
status: job.status, // "pending" | "processing" | "failed"
progress_percent: job.progress || 0,
retry_after_seconds: 5
});
});
Structured Output Requirements
Agents parse your responses programmatically. Every error, edge case, and exception must return structured JSON — never HTML error pages, never plain text.
// Agent-friendly error format — parseable, actionable
{
"error": "insufficient_inventory", // machine-readable code
"message": "Product prod_xyz is out of stock. Available quantity: 0.",
"details": {
"product_id": "prod_xyz",
"requested_quantity": 10,
"available_quantity": 0
},
"suggested_action": "Remove this product from the order or reduce quantity to 0",
"retry": false // should the agent retry this request?
}
// Always set Content-Type: application/json — even for errors
// Configure your error handler to never send HTML
app.use((err, req, res, next) => {
res.status(err.status || 500).json({
error: err.code || 'internal_error',
message: err.message || 'An unexpected error occurred',
retry: err.status >= 500 // server errors may be retryable
});
});
Agent Authentication
Each AI agent should have its own scoped credentials — not shared credentials with human users. This enables per-agent rate limiting, audit logging, and instant revocation.
// Per-agent API keys with scoped permissions
// Issue via: POST /api-keys { "name": "order-agent-prod", "scopes": ["orders:read", "orders:write"] }
// Agent includes in header: Authorization: Bearer ak_agent_abc123
// OAuth2 Client Credentials for agents (see /oauth2)
// Each agent has its own client_id + client_secret
// Scoped to only the permissions it needs
For full OAuth2 client credentials flow implementation, see our OAuth2 guide. For centralized agent traffic management, use an API gateway with per-agent routing policies.
Observability for Agent Traffic
Agent traffic patterns are fundamentally different from human traffic — same endpoint, machine-speed bursts. Standard dashboards won't surface agent-specific anomalies without segmentation:
// Log agent context on every request
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
logger.info({
method: req.method,
path: req.path,
status: res.statusCode,
duration_ms: Date.now() - start,
agent_id: req.headers['x-agent-id'],
is_agent: !!req.headers['x-agent-id'],
idempotency_key: req.headers['idempotency-key']
});
});
next();
});
Model Context Protocol Bridge
The Model Context Protocol (MCP) standardizes how AI agents discover and call your REST API without custom integration code per LLM provider. An MCP server wraps your REST endpoints as typed "tools" that any MCP-compatible agent (Claude, GPT-4, Gemini) can call through a standard interface.
Think of MCP as the "OpenAPI for AI tool calling" — define your tools once, use them across all LLM providers.
FAQ
Why are AI agents different from human API consumers?
AI agents are fully automated — there is no human in the loop to resolve ambiguity. They call APIs at machine speed, potentially thousands per minute, and interpret error messages literally. This demands clearer contracts, stricter validation, better error messages, and robust idempotency.
How do I make my REST API available as an OpenAI tool?
Define your endpoint as a JSON Schema function definition with name, description, and parameters. Pass it in the tools array of your OpenAI API call. The LLM returns a tool_call object when it decides to invoke your endpoint.
Do I need idempotency keys for AI agent APIs?
Yes — critical. AI agents retry aggressively on network errors. Without idempotency, a network timeout can create duplicate orders or payments. Always support an Idempotency-Key header on any POST or PATCH with side effects.
Related Topics
AI agents retry aggressively — robust idempotency is non-negotiable. Use the API gateway layer for per-agent rate limiting and centralized token validation. The Model Context Protocol provides a standard interface for exposing your REST API to any LLM provider.