Data Ingestion API
The Data Ingestion API enables bulk import of content into your knowledge bases. Use this API to programmatically ingest large volumes of data from external systems, migrate content, or integrate with content management platforms.
Overview
The data-ingestion API provides asynchronous job-based processing for importing content at scale. Key features include:
- Bulk data import - Efficiently ingest hundreds or thousands of records in a single request
- Asynchronous processing - Jobs are processed in the background with progress tracking
- Multiple content formats - JSON payloads or multipart file uploads
- Flexible adapters - Generic JSON adapter or specialized formats
- Automatic deduplication - Content-hash based duplicate detection
- Progress tracking - Real-time monitoring of job processing
- Workspace isolation - All jobs scoped to your workspace (and/or knowledge base)
Authentication
All data-ingestion endpoints require bearer token authentication with specific scopes.
For complete authentication details including token refresh and security best practices, see the Authentication Guide.
Quick Start
- Create an access token client in your workspace with the required scopes
- Obtain an access token using the client credentials flow
- Include the token in your requests using the
Authorizationheader
Required Scopes
Create an access token client with these scopes:
data-ingestion:write- Required for creating and cancelling jobsdata-ingestion:read- Required for listing and checking job status
Or use * (full access) for all operations.
Authentication Header
Include your access token in all requests:
{
"Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}Quick Start
1. Submit an Ingestion Job
Submit a bulk ingestion job with the generic JSON adapter:
// Submit a bulk data ingestion job with JSON payload
const response = await fetch("https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer YOUR_ACCESS_TOKEN",
},
body: JSON.stringify({
adapterType: "generic-json",
knowledgeBaseId: "550e8400-e29b-41d4-a716-446655440000",
data: {
records: [
{
id: "doc-123",
title: "Product Documentation",
content: "This is the main content of the document...",
contentType: "markdown",
metadata: {
author: "John Doe",
category: "technical",
version: "1.0",
},
url: "https://example.com/docs/product",
tags: ["documentation", "product", "technical"],
},
{
id: "doc-456",
title: "API Reference",
content: "Complete API reference documentation...",
contentType: "markdown",
metadata: {
author: "Jane Smith",
category: "api",
},
tags: ["api", "reference"],
},
],
},
}),
});
const job = await response.json();
console.log(job);
// {
// jobId: "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a",
// status: "pending"
// }
2. Check Job Status
Monitor the job’s progress:
// Check the status of an ingestion job
const jobId = "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a";
const response = await fetch(
`https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs/${jobId}?includePayload=false`,
{
method: "GET",
headers: {
Authorization: "Bearer YOUR_ACCESS_TOKEN",
},
},
);
const job = await response.json();
console.log(job);
// {
// id: "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a",
// status: "completed",
// data: {
// progress: {
// processed: 50,
// total: 50,
// successful: 45,
// failed: 2,
// skipped: 3
// },
// // ... more details
// }
// }
3. Wait for Completion
Poll the job status until processing completes:
// Poll job status until completion
async function waitForJobCompletion(jobId, accessToken) {
const maxAttempts = 60; // 5 minutes with 5-second intervals
const pollInterval = 5000; // 5 seconds
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const response = await fetch(
`https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs/${jobId}`,
{
headers: { Authorization: `Bearer ${accessToken}` },
},
);
const job = await response.json();
if (job.status === "completed") {
console.log("Job completed successfully!");
console.log(`Created: ${job.result.entriesCreated} entries`);
console.log(`Skipped: ${job.result.entriesSkipped} entries`);
console.log(`Generated: ${job.result.embeddingsGenerated} embeddings`);
return job;
}
if (job.status === "failed") {
console.error("Job failed:", job.result.error);
throw new Error(job.result.error);
}
// Still processing
console.log(
`Progress: ${job.data.progress.processed}/${job.data.progress.total}`,
);
await new Promise((resolve) => setTimeout(resolve, pollInterval));
}
throw new Error("Job polling timeout");
}
// Usage
const jobId = "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a";
const result = await waitForJobCompletion(jobId, "YOUR_ACCESS_TOKEN");
Endpoints
POST /v2/data-ingestion/jobs
Submit a new bulk data ingestion job.
URL: POST /api/v2/data-ingestion/jobs
Authentication: Bearer token with data-ingestion:write scope
Content-Type: application/json or multipart/form-data
Request Format (JSON)
Required Fields:
adapterType (string)
The adapter to use for processing your data. Use the /adapters endpoint to discover available adapters.
{
"adapterType": "generic-json"
}data (object)
Adapter-specific payload containing your records. Format depends on the adapter type.
For the generic-json adapter:
{
"data": {
"records": [
{
"id": "doc-123",
"title": "Example Document",
"content": "Content here...",
"contentType": "markdown",
"metadata": {
"author": "John Doe"
},
"tags": [
"example"
]
}
]
}
}knowledgeBaseId (string, UUID)
UUID of the knowledge base to associate ingested entries with. All successfully created entries will be linked to this knowledge base.
{
"knowledgeBaseId": "550e8400-e29b-41d4-a716-446655440000"
}Request Format (Multipart)
For large payloads exceeding 30MB, use multipart/form-data:
// Submit a large payload using multipart/form-data
const formData = new FormData();
formData.append("adapterType", "generic-json");
formData.append("knowledgeBaseId", "550e8400-e29b-41d4-a716-446655440000");
// Create a JSON file blob for large payloads
const payload = {
records: [
// ... many records
],
};
const jsonBlob = new Blob([JSON.stringify(payload)], {
type: "application/json",
});
formData.append("data", jsonBlob, "data.json");
const response = await fetch("https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_ACCESS_TOKEN",
// Content-Type is set automatically by FormData
},
body: formData,
});
const job = await response.json();
console.log(job.jobId);
Form Fields:
adapterType(text field, required)knowledgeBaseId(text field, required, UUID)data(file or text field, required) - Must be JSON file or JSON string
Response (202 Accepted)
{
"jobId": "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a",
"status": "pending"
}The job is created immediately and processed asynchronously in the background.
GET /v2/data-ingestion/jobs
List ingestion jobs with optional filtering.
URL: GET /api/v2/data-ingestion/jobs
Authentication: Bearer token with data-ingestion:read scope
Query Parameters
status (optional): Filter by job status
- Values:
pending,processing,completed,failed
limit (optional): Results per page
- Range: 1-100
- Default: 20
offset (optional): Records to skip
- Minimum: 0
- Default: 0
includePayload (optional): Include raw input data
- Default:
false - Set to
trueto include the original submitted payload
startDate (optional): Filter jobs created after date (ISO 8601)
endDate (optional): Filter jobs created before date (ISO 8601)
Example
// List ingestion jobs with filtering and pagination
const params = new URLSearchParams({
status: "completed",
limit: "20",
offset: "0",
startDate: "2025-01-01",
endDate: "2025-12-31",
});
const response = await fetch(`https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs?${params}`, {
method: "GET",
headers: {
Authorization: "Bearer YOUR_ACCESS_TOKEN",
},
});
const result = await response.json();
console.log(`Total jobs: ${result.total}`);
console.log(`Retrieved: ${result.jobs.length} jobs`);
result.jobs.forEach((job) => {
console.log(
`${job.id}: ${job.status} (${job.data.progress.processed}/${job.data.progress.total})`,
);
});
Response (200 OK)
Note: The structure of ingestion results payloads is subject to change as we refine job reporting and progress tracking.
{
"total": 150,
"jobs": [
{
"id": "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a",
"status": "completed",
"data": {
"adapterId": "generic-json",
"adapterType": "generic-json",
"recordCount": 50,
"progress": {
"processed": 50,
"total": 50,
"successful": 45,
"failed": 2,
"skipped": 3
},
"results": [
{
"sourceId": "doc-123",
"status": "success",
"entryId": "e8f5a3b1-2c4d-5e6f-7a8b-9c0d1e2f3a4b",
"embeddingCount": 12
}
]
},
"result": {
"sourceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"entriesCreated": 45,
"entriesSkipped": 3,
"entriesDeleted": 0,
"embeddingsGenerated": 540
},
"started_at": "2025-10-30T18:05:00.000Z",
"completed_at": "2025-10-30T18:08:30.000Z"
}
]
}GET /v2/data-ingestion/jobs/:jobId
Get detailed status of a specific job.
URL: GET /api/v2/data-ingestion/jobs/:jobId
Authentication: Bearer token with data-ingestion:read scope
Query Parameters
includePayload (optional): Include raw input data
- Default:
false
Response (200 OK)
Returns the same structure as a single job from the list endpoint.
DELETE /v2/data-ingestion/jobs/:jobId
Cancel a pending or processing job.
URL: DELETE /api/v2/data-ingestion/jobs/:jobId
Authentication: Bearer token with data-ingestion:write scope
Example
// Cancel a pending or processing job
const jobId = "7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a";
const response = await fetch(`https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs/${jobId}`, {
method: "DELETE",
headers: {
Authorization: "Bearer YOUR_ACCESS_TOKEN",
},
});
if (response.status === 204) {
console.log("Job cancelled successfully");
} else {
const error = await response.json();
console.error("Failed to cancel job:", error.message);
}
Response (204 No Content)
Empty response body on successful cancellation.
Note: Only jobs with status pending or processing can be cancelled. Completed or failed jobs cannot be cancelled.
GET /v2/data-ingestion/adapters
List available data ingestion adapters.
URL: GET /api/v2/data-ingestion/adapters
Authentication: Bearer token with data-ingestion:read scope
Example
// Get list of available data ingestion adapters
const response = await fetch("https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/adapters", {
method: "GET",
headers: {
Authorization: "Bearer YOUR_ACCESS_TOKEN",
},
});
const adapters = await response.json();
console.log(adapters);
// [
// { name: "generic-json", format: "json", version: "1.0.0" },
// { name: "content-hub", format: "content-hub-json", version: "1.0.0" }
// ]
Response (200 OK)
[
{
"name": "generic-json",
"format": "json",
"version": "1.0.0"
},
{
"name": "content-hub",
"format": "content-hub-json",
"version": "1.0.0"
}
]Adapters
Note: Ingestion adapters and their payload formats are subject to change as we iterate on integration efforts with various content sources.
Adapters transform your data into a standardized format for ingestion into knowledge bases.
Generic JSON Adapter
Adapter Type: generic-json
Use Case: Simple bulk imports where data is already in a flat, standardized JSON structure.
Input Format:
{
"records": [
{
"id": "string",
"title": "string",
"content": "string",
"contentType": "markdown",
"metadata": {},
"deleted": false,
"url": "string",
"tags": [
"string"
]
}
]
}Field Details:
id(required): Unique identifier for the recordtitle(optional): Document title (auto-generated if omitted)content(required): Main text contentcontentType(optional): Content classification (text,markdown,code, orqa_pair)metadata(optional): Flexible key-value pairs for custom datadeleted(optional): Set totrueto delete existing entryurl(optional): External reference URLtags(optional): Array of tags for categorization
Content Hub Adapter
Adapter Type: content-hub
Use Case: Importing content with HTML-encoded content and inline attributes from content management systems.
Input Format:
[
[
{
"content": {
"documentIdentifier": "Page://content/acme/resource-center/article-123",
"parseImage": false,
"deleted": false,
"inlineContent": {
"textContent": {
"data": "<h1>Title</h1><p>Content...</p>"
},
"type": "TEXT"
}
},
"metadata": {
"inlineAttributes": [
{
"key": "Title",
"value": {
"stringValue": "Article Title",
"type": "STRING"
}
},
{
"key": "keywords",
"value": {
"stringValue": "business,enterprise",
"type": "STRING"
}
}
]
}
}
],
{
"timeTaken": "250ms"
}
]Processing:
- HTML entities are decoded
- HTML converted to Markdown
- Inline attributes converted to metadata
- Title extracted from
Titleattribute or first heading - Tags extracted from
keywordsand custom entity fields - Content type detected based on taxonomy
Job Processing
Lifecycle
Jobs follow this lifecycle:
- Pending - Job created, waiting to start
- Processing - Job is actively processing records
- Completed - All records processed successfully (or with tracked failures)
- Failed - Job failed due to unrecoverable error
Background Processing
When you submit a job, it is processed asynchronously:
- Job Creation - Job is created and queued for processing
- Processing - Each record is validated, deduplicated, and transformed
- Completion - Job status is updated with results summary
You can poll the job status endpoint to track progress and retrieve results when complete.
Automatic Features
Deduplication: Content is automatically checked against existing entries to prevent duplicates.
Embedding Generation: Content is automatically chunked and embedded into vectors for semantic search, enabling AI-powered retrieval.
Error Handling
400 Bad Request
Request validation failed. Common causes:
- Invalid JSON format - Malformed request body
- Missing required fields -
adapterTypeordatanot provided - Unknown adapter - Adapter type not recognized
- Payload validation failed - Data doesn’t match adapter’s schema
- Invalid file type - Multipart data field is not JSON
Example:
{
"statusCode": 400,
"error": "Bad Request",
"message": "Unknown adapter: invalid-adapter",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs"
}401 Unauthorized
Authentication failed. Common causes:
- No access token provided - Missing
Authorizationheader - Invalid token format - Malformed bearer token
- Expired access token - Token has passed expiration time
- Invalid token - Token signature verification failed
Example:
{
"statusCode": 401,
"error": "Unauthorized",
"message": "Access token is required",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs"
}403 Forbidden
Insufficient permissions. Common causes:
- Missing scope - Access token lacks
data-ingestion:writeordata-ingestion:readscope - Workspace access denied - Resource belongs to different workspace
Example:
{
"statusCode": 403,
"error": "Forbidden",
"message": "Insufficient scope: data-ingestion:write required",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs"
}404 Not Found
Resource not found. Common causes:
- Job not found - Job ID doesn’t exist or belongs to different workspace
- Knowledge base not found - Specified
knowledgeBaseIddoesn’t exist
Example:
{
"statusCode": 404,
"error": "Not Found",
"message": "Job 7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a not found",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs/7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a"
}409 Conflict
Request conflicts with current state. Common causes:
- Job already terminal - Attempting to cancel a completed or failed job
Example:
{
"statusCode": 409,
"error": "Conflict",
"message": "Job 7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a is already completed",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs/7c9e8e8a-5b5a-4f5d-8b5e-5f5d5b5a5b5a"
}500 Internal Server Error
Unexpected server error. This indicates a problem with the platform.
Example:
{
"statusCode": 500,
"error": "Internal Server Error",
"message": "An unexpected error occurred",
"timestamp": "2025-10-30T18:10:00.000Z",
"path": "/v2/data-ingestion/jobs"
}Error Handling Best Practices
Implement robust error handling in your integration:
// Robust error handling for data ingestion
async function submitIngestionJob(payload, accessToken) {
try {
const response = await fetch("https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${accessToken}`,
},
body: JSON.stringify(payload),
});
if (!response.ok) {
const error = await response.json();
switch (response.status) {
case 400:
console.error("Validation error:", error.message);
if (error.errors) {
error.errors.forEach((err) => {
console.error(`- ${err.path}: ${err.message}`);
});
}
break;
case 401:
console.error("Authentication failed:", error.message);
console.error(
"Check that your access token is valid and not expired",
);
break;
case 403:
console.error("Permission denied:", error.message);
console.error(
"Ensure your access token has data-ingestion:write scope",
);
break;
case 404:
console.error("Knowledge base not found:", error.message);
break;
default:
console.error("Unexpected error:", error.message);
}
throw new Error(error.message);
}
return await response.json();
} catch (error) {
if (error.name === "TypeError") {
console.error("Network error - check API URL and connectivity");
}
throw error;
}
}
// Usage
try {
const job = await submitIngestionJob(
{
adapterType: "generic-json",
data: {
records: [
/* ... */
],
},
},
"YOUR_ACCESS_TOKEN",
);
console.log("Job submitted:", job.jobId);
} catch (error) {
console.error("Failed to submit job:", error.message);
}
Best Practices
Job Monitoring
Poll with exponential backoff:
Start with short intervals and increase progressively to reduce API calls:
let interval = 1000; // Start with 1 second
const maxInterval = 30000; // Max 30 seconds
while (job.status === "processing") {
await sleep(interval);
job = await checkJobStatus(jobId);
interval = Math.min(interval * 1.5, maxInterval);
}Store job IDs immediately:
Save the returned jobId to your database immediately after job creation to enable recovery from crashes:
const { jobId } = await submitJob(payload);
await db.jobs.create({ id: jobId, status: "pending" });File Size Limits
Default limit: 30MB for multipart uploads
Recommendations:
- For payloads under 1MB: Use JSON format
- For payloads 1MB-30MB: Use multipart format
- For payloads over 30MB: Split into multiple jobs
Large Datasets
Batch your records:
Submit jobs in batches of 100-500 records for optimal processing:
const batchSize = 250;
for (let i = 0; i < allRecords.length; i += batchSize) {
const batch = allRecords.slice(i, i + batchSize);
await submitJob({ adapterType: "generic-json", data: { records: batch } });
}Rate limiting:
Respect platform rate limits by implementing delays between job submissions:
await submitJob(payload);
await sleep(500); // 500ms delay between submissionscURL Examples
Complete Command-Line Reference
# Submit a job
curl -X POST "https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"adapterType": "generic-json",
"knowledgeBaseId": "550e8400-e29b-41d4-a716-446655440000",
"data": {
"records": [
{
"id": "doc-123",
"title": "Example Document",
"content": "Document content here..."
}
]
}
}'
# Check job status
curl -X GET "https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs/JOB_ID" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"
# List jobs
curl -X GET "https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs?status=completed&limit=10" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"
# Cancel a job
curl -X DELETE "https://nexus-api.uat.knowbl.com/api/v2/data-ingestion/jobs/JOB_ID" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"
Next Steps
- Query API - Query your ingested content with AI
- Sessions API - Track multi-turn conversations
- API Reference - Interactive Swagger documentation