Rate Limiting
The Nexus Platform applies rate limiting to protect service availability and ensure fair usage across all consumers. Rate limits are applied per workspace (or per IP address for unauthenticated requests) and are tracked independently for different categories of API endpoints.
Rate Limit Groups
API endpoints are organized into groups, each with its own independent rate limit counter. This means activity on one group does not consume the budget of another. For example, polling for session status will not reduce the number of queries you can send.
| Group | Endpoints | Limit | Window |
|---|---|---|---|
| Authentication | POST /v2/auth/access-tokens, POST /v2/auth/refresh | 20 requests | 60 seconds |
| Query | POST /v2/query | Workspace-specific | 60 seconds |
| Sessions | All /v2/sessions/* endpoints, GET /v2/sessions/:id/status | 300 requests | 60 seconds |
| Feedback | POST /v2/query/feedback, POST /v2/analytics/link-clicks | 500 requests | 60 seconds |
| Data Ingestion | All /v2/data-ingestion/* endpoints | 60 requests | 60 seconds |
| Health | GET /v2/health | 120 requests | 60 seconds |
The query endpoint has a workspace-specific rate limit that may differ from the defaults above. Contact your workspace administrator for details on your configured limits.
Endpoints not listed above fall under a general rate limit of 200 requests per 60 seconds.
Response Headers
When rate limiting is active, responses include standard headers to help you track your usage:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum number of requests allowed in the current window |
X-RateLimit-Remaining | Number of requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp (seconds) when the current window resets |
Retry-After | Seconds until you can retry (only present on 429 responses) |
Handling Rate Limits
When you exceed a rate limit, the API responds with HTTP status 429 Too Many Requests:
{
"statusCode": 429,
"message": "Rate limit exceeded. Try again in 42 seconds."
}Recommended Approach
Use the Retry-After header to wait the appropriate amount of time before retrying:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60', 10);
console.log(`Rate limited. Retrying in ${retryAfter}s...`);
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}Best Practices
- Respect
Retry-After— always use the value from the response header rather than a fixed delay - Add jitter — when multiple clients retry simultaneously, add a small random delay to avoid thundering herd effects
- Monitor remaining requests — use the
X-RateLimit-Remainingheader to proactively slow down before hitting the limit - Batch where possible — for data ingestion, use batching to reduce the number of API calls (see Data Ingestion API)
- Cache responses — avoid redundant API calls by caching results that don’t change frequently (e.g., session status)
Repeatedly sending requests after receiving a 429 response without waiting
for the Retry-After period will not resolve faster and may extend the
cooldown.