Rate Limiting

The Nexus Platform applies rate limiting to protect service availability and ensure fair usage across all consumers. Rate limits are applied per workspace (or per IP address for unauthenticated requests) and are tracked independently for different categories of API endpoints.

Rate Limit Groups

API endpoints are organized into groups, each with its own independent rate limit counter. This means activity on one group does not consume the budget of another. For example, polling for session status will not reduce the number of queries you can send.

Group	Endpoints	Limit	Window
Authentication	`POST /v2/auth/access-tokens`, `POST /v2/auth/refresh`	20 requests	60 seconds
Query	`POST /v2/query`	Workspace-specific	60 seconds
Sessions	All `/v2/sessions/*` endpoints, `GET /v2/sessions/:id/status`	300 requests	60 seconds
Feedback	`POST /v2/query/feedback`, `POST /v2/analytics/link-clicks`	500 requests	60 seconds
Data Ingestion	All `/v2/data-ingestion/*` endpoints	60 requests	60 seconds
Health	`GET /v2/health`	120 requests	60 seconds

The query endpoint has a workspace-specific rate limit that may differ from the defaults above. Contact your workspace administrator for details on your configured limits.

Endpoints not listed above fall under a general rate limit of 200 requests per 60 seconds.

Response Headers

When rate limiting is active, responses include standard headers to help you track your usage:

Header	Description
`X-RateLimit-Limit`	Maximum number of requests allowed in the current window
`X-RateLimit-Remaining`	Number of requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp (seconds) when the current window resets
`Retry-After`	Seconds until you can retry (only present on `429` responses)

Handling Rate Limits

When you exceed a rate limit, the API responds with HTTP status 429 Too Many Requests:

{
  "statusCode": 429,
  "message": "Rate limit exceeded. Try again in 42 seconds."
}

Recommended Approach

Use the Retry-After header to wait the appropriate amount of time before retrying:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60', 10);
      console.log(`Rate limited. Retrying in ${retryAfter}s...`);
      await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
      continue;
    }
    return response;
  }
  throw new Error('Max retries exceeded');
}

Best Practices

Respect Retry-After — always use the value from the response header rather than a fixed delay
Add jitter — when multiple clients retry simultaneously, add a small random delay to avoid thundering herd effects
Monitor remaining requests — use the X-RateLimit-Remaining header to proactively slow down before hitting the limit
Batch where possible — for data ingestion, use batching to reduce the number of API calls (see Data Ingestion API)
Cache responses — avoid redundant API calls by caching results that don’t change frequently (e.g., session status)

Repeatedly sending requests after receiving a 429 response without waiting for the Retry-After period will not resolve faster and may extend the cooldown.