Designing REST APIs for AI Agents

Q: Do I need idempotency keys for AI agent APIs?

Yes — idempotency keys are critical. AI agents retry aggressively on network errors or timeouts, and unlike humans, they may not realize they already succeeded. Without idempotency, a network timeout can result in duplicate orders, duplicate payments, or duplicate messages. Always support an Idempotency-Key header on any POST or PATCH endpoint that has side effects.

Idempotency, rate limiting, structured outputs, async patterns, and tool-calling schema design for LLM consumers

Last Updated: May 13, 2026

Tool Calling Schema Async Patterns

Why AI Agents Are Different API Consumers

Gartner predicts that by 2026, more than 30% of API demand growth will come from AI agents — not humans. Designing for agents requires a different mindset from designing for human-driven browsers or mobile apps.

AI agents are fully automated. There is no human in the loop to resolve ambiguity, retry intelligently, or handle edge cases. They call APIs at machine speed — potentially thousands of requests per minute — and they interpret your error messages literally. Your response shape, your error format, and your retry behaviour all become part of your API's functional contract.

Property	Human Client	AI Agent Client
Retry behaviour	Manual, rare	Automatic, aggressive
Error interpretation	Reads UI message	Parses JSON error field
Request rate	~1–5 req/s	100–10,000+ req/s
Ambiguity handling	Asks user	Fails or hallucinates
Long operations	Waits + spins	Needs 202 + polling
Documentation	Reads prose docs	Reads field descriptions in schema

OpenAI / Anthropic Tool Calling Schema

Tool calling (function calling) is the primary interface between LLMs and your REST API. You define each endpoint as a JSON Schema function definition. The LLM reads the description and parameters to decide when and how to invoke your API.

// OpenAI / Anthropic tool definition for a REST endpoint
const tools = [
  {
    type: 'function',
    function: {
      name: 'create_order',
      description: 'Create a new customer order. Use this when the user wants to purchase items. ' +
                   'Returns an order ID and initial status. Always include an idempotency_key ' +
                   'generated from the conversation context to prevent duplicate orders.',
      parameters: {
        type: 'object',
        required: ['customer_id', 'items'],
        additionalProperties: false,
        properties: {
          customer_id: {
            type: 'string',
            format: 'uuid',
            description: 'The UUID of the customer placing the order'
          },
          items: {
            type: 'array',
            minItems: 1,
            maxItems: 50,
            description: 'List of products and quantities to order',
            items: {
              type: 'object',
              required: ['product_id', 'quantity'],
              properties: {
                product_id: { type: 'string', format: 'uuid', description: 'Product UUID' },
                quantity:   { type: 'integer', minimum: 1, maximum: 100, description: 'Units to order' }
              }
            }
          },
          idempotency_key: {
            type: 'string',
            minLength: 8,
            maxLength: 64,
            description: 'Client-generated unique key to prevent duplicate orders on retry. ' +
                         'Generate from: sha256(user_id + product_ids + timestamp_minute)'
          }
        }
      }
    }
  },
  {
    type: 'function',
    function: {
      name: 'get_order_status',
      description: 'Get the current status of an order by its ID.',
      parameters: {
        type: 'object',
        required: ['order_id'],
        properties: {
          order_id: { type: 'string', format: 'uuid', description: 'The order UUID to check' }
        }
      }
    }
  }
];

// Execute tool call from LLM response
async function executeTool(toolCall) {
  const args = JSON.parse(toolCall.function.arguments);

  switch (toolCall.function.name) {
    case 'create_order':
      return await fetch('https://api.example.com/orders', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': 'Bearer ' + process.env.API_TOKEN,
          'Idempotency-Key': args.idempotency_key || crypto.randomUUID()
        },
        body: JSON.stringify(args)
      }).then(r => r.json());

    case 'get_order_status':
      return await fetch('https://api.example.com/orders/' + args.order_id, {
        headers: { 'Authorization': 'Bearer ' + process.env.API_TOKEN }
      }).then(r => r.json());
  }
}

Description is the Most Important Field

The LLM decides whether to call your tool based on the description. Be explicit about:

When to use this tool (not just what it does)
What parameters to generate (e.g. how to form an idempotency key)
What to do with the response
Edge cases and error conditions

Schema Design for LLM Consumption

LLMs read field names and descriptions to understand your data model. Clear, descriptive naming reduces hallucinated values:

// ❌ Opaque field names — agent will guess
{ "amt": 2999, "cur": "usd", "cid": "c_123" }

// ✅ Descriptive — agent understands context
{ "amount_cents": 2999, "currency": "usd", "customer_id": "cust_abc123" }

// ✅ Use enums wherever possible — constrain the agent's choices
{
  "status": {
    "type": "string",
    "enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
    "description": "Current order lifecycle status"
  }
}

// ✅ Flat responses over deeply nested (agents struggle with deep paths)
// ❌ Avoid:
{ "data": { "order": { "meta": { "id": "123" } } } }
// ✅ Prefer:
{ "order_id": "123", "status": "pending", "total_cents": 2999 }

Idempotency is Critical

AI agents retry aggressively. A network timeout on a payment endpoint will cause the agent to retry — potentially creating a duplicate charge. Every POST and PATCH endpoint with side effects must support idempotency keys.

See our full Idempotency guide for the Redis-backed implementation. For agent APIs, add guidance in the tool description on how to generate a deterministic key.

// Deterministic idempotency key generation for agents
// Key should be stable for the same "intent" — regenerating on retry must return same key
function generateIdempotencyKey(userId, action, contextHash) {
  return crypto
    .createHash('sha256')
    .update(`${userId}:${action}:${contextHash}`)
    .digest('base64url')
    .slice(0, 32);
}

// On the API side: idempotency middleware (see /idempotency for full implementation)
app.post('/orders', idempotencyMiddleware, createOrderHandler);

Rate Limiting for Agent Traffic

A single AI agent workflow can generate thousands of API calls per minute. Standard per-user rate limits designed for humans will not protect against runaway agent loops. Implement agent-aware rate limits:

// Per-agent-id rate limiting middleware
const ratelimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');

// Identify agent traffic via a header
app.use((req, res, next) => {
  req.agentId = req.headers['x-agent-id'] || null;
  req.isAgent = !!req.agentId;
  next();
});

const agentRateLimit = ratelimit({
  windowMs: 60 * 1000,  // 1 minute
  max: (req) => req.isAgent ? 200 : 60,  // agents: 200/min, humans: 60/min
  keyGenerator: (req) => req.agentId || req.ip,
  store: new RedisStore({ sendCommand: (...args) => redis.sendCommand(args) }),
  handler: (req, res) => {
    res.status(429).json({
      error: 'rate_limit_exceeded',
      message: 'Too many requests from this agent. Back off and retry.',
      retry_after_seconds: 60,
      // Structured so the LLM knows exactly what to do
      suggested_action: 'Wait 60 seconds, then retry with exponential backoff'
    });
  }
});

app.use('/api/', agentRateLimit);

// Burst detection: alert on unusual traffic spikes
app.use('/api/', (req, res, next) => {
  if (req.isAgent) {
    metrics.increment('agent_requests', { agent_id: req.agentId });
  }
  next();
});

Async Patterns: 202 Accepted + Polling

Long-running operations (report generation, bulk imports, AI processing) must be async. Agents cannot block — they need to poll or receive a webhook callback.

// 202 Accepted pattern: immediate response, poll for completion
app.post('/reports', auth, async (req, res) => {
  const jobId = crypto.randomUUID();

  // Start async job (queue, background worker)
  await jobQueue.add('generate-report', { ...req.body, jobId });

  // Return immediately with job reference
  res.status(202).json({
    job_id: jobId,
    status: 'pending',
    status_url: `https://api.example.com/jobs/${jobId}`,
    estimated_completion_seconds: 30,
    // Tell the agent exactly what to do next
    instructions: 'Poll the status_url every 5 seconds until status is "completed" or "failed"'
  });
});

// Job status endpoint — agent polls this
app.get('/jobs/:jobId', auth, async (req, res) => {
  const job = await getJob(req.params.jobId);
  if (!job) return res.status(404).json({ error: 'Job not found' });

  if (job.status === 'completed') {
    return res.json({
      job_id: job.id,
      status: 'completed',
      result_url: `https://api.example.com/reports/${job.reportId}`,
      completed_at: job.completedAt
    });
  }

  res.json({
    job_id: job.id,
    status: job.status,   // "pending" | "processing" | "failed"
    progress_percent: job.progress || 0,
    retry_after_seconds: 5
  });
});

Structured Output Requirements

Agents parse your responses programmatically. Every error, edge case, and exception must return structured JSON — never HTML error pages, never plain text.

// Agent-friendly error format — parseable, actionable
{
  "error": "insufficient_inventory",      // machine-readable code
  "message": "Product prod_xyz is out of stock. Available quantity: 0.",
  "details": {
    "product_id": "prod_xyz",
    "requested_quantity": 10,
    "available_quantity": 0
  },
  "suggested_action": "Remove this product from the order or reduce quantity to 0",
  "retry": false   // should the agent retry this request?
}

// Always set Content-Type: application/json — even for errors
// Configure your error handler to never send HTML
app.use((err, req, res, next) => {
  res.status(err.status || 500).json({
    error: err.code || 'internal_error',
    message: err.message || 'An unexpected error occurred',
    retry: err.status >= 500  // server errors may be retryable
  });
});

Agent Authentication

Each AI agent should have its own scoped credentials — not shared credentials with human users. This enables per-agent rate limiting, audit logging, and instant revocation.

// Per-agent API keys with scoped permissions
// Issue via: POST /api-keys { "name": "order-agent-prod", "scopes": ["orders:read", "orders:write"] }
// Agent includes in header: Authorization: Bearer ak_agent_abc123

// OAuth2 Client Credentials for agents (see /oauth2)
// Each agent has its own client_id + client_secret
// Scoped to only the permissions it needs

For full OAuth2 client credentials flow implementation, see our OAuth2 guide. For centralized agent traffic management, use an API gateway with per-agent routing policies.

Observability for Agent Traffic

Agent traffic patterns are fundamentally different from human traffic — same endpoint, machine-speed bursts. Standard dashboards won't surface agent-specific anomalies without segmentation:

// Log agent context on every request
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    logger.info({
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration_ms: Date.now() - start,
      agent_id: req.headers['x-agent-id'],
      is_agent: !!req.headers['x-agent-id'],
      idempotency_key: req.headers['idempotency-key']
    });
  });
  next();
});

Model Context Protocol Bridge

The Model Context Protocol (MCP) standardizes how AI agents discover and call your REST API without custom integration code per LLM provider. An MCP server wraps your REST endpoints as typed "tools" that any MCP-compatible agent (Claude, GPT-4, Gemini) can call through a standard interface.

Think of MCP as the "OpenAPI for AI tool calling" — define your tools once, use them across all LLM providers.

FAQ

Why are AI agents different from human API consumers?

AI agents are fully automated — there is no human in the loop to resolve ambiguity. They call APIs at machine speed, potentially thousands per minute, and interpret error messages literally. This demands clearer contracts, stricter validation, better error messages, and robust idempotency.

How do I make my REST API available as an OpenAI tool?

Define your endpoint as a JSON Schema function definition with name, description, and parameters. Pass it in the tools array of your OpenAI API call. The LLM returns a tool_call object when it decides to invoke your endpoint.

Do I need idempotency keys for AI agent APIs?

Yes — critical. AI agents retry aggressively on network errors. Without idempotency, a network timeout can create duplicate orders or payments. Always support an Idempotency-Key header on any POST or PATCH with side effects.