Batches
Process bulk workloads asynchronously at 50% cost vs. realtime. Submit a JSONL of requests, poll until done, download results.
Ideal for: corpus embedding, large eval runs, batch summarization. 24-hour completion SLA (most batches complete in minutes).
Endpoints
| Method | Path | What it does |
|---|---|---|
| POST | /v1/batches |
Create a batch from an uploaded input file |
| GET | /v1/batches |
List your batches (newest first; ?limit=N, max 100) |
| GET | /v1/batches/{batch_id} |
Status + counters + output/error file ids |
| POST | /v1/batches/{batch_id}/cancel |
Cancel a batch (only if not yet in terminal state) |
Request body (create)
| Field | Type | Required | Description |
|---|---|---|---|
input_file_id |
string | yes | File id from /v1/files upload with purpose=batch_input |
endpoint |
string | yes | One of /v1/chat/completions, /v1/embeddings, /v1/rerank. All requests in the batch must target this endpoint. |
completion_window |
string | no | "24h" (only supported value; we usually finish in minutes) |
metadata |
object | no | Free-form key/value tags returned on every fetch. Use for your own bookkeeping. |
Supported endpoints inside a batch
| Endpoint | Use for |
|---|---|
/v1/chat/completions |
Bulk chat / summarization / classification |
/v1/embeddings |
Bulk corpus embedding |
/v1/rerank |
Bulk reranking |
Input JSONL format
One request per line; each line is a JSON object with custom_id + method + url + body:
{"custom_id":"doc-001","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 1"]}}
{"custom_id":"doc-002","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 2"]}}
{"custom_id":"doc-003","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 3"]}}
custom_id is your identifier, returned in the output. Use it to match results back to your records.
End-to-end example
# 1) Upload input
input_file = client.files.create(file=open("requests.jsonl","rb"), purpose="batch_input")
# 2) Create batch
batch = client.batches.create(
input_file_id=input_file.id,
endpoint="/v1/embeddings",
completion_window="24h",
)
print(batch.id) # batch-abc123...
# 3) Poll until completed
import time
while batch.status not in ("completed", "failed", "cancelled"):
time.sleep(5)
batch = client.batches.retrieve(batch.id)
# 4) Download output
out = client.files.content(batch.output_file_id)
for line in out.text.splitlines():
result = json.loads(line)
if result["error"] is None:
emb = result["response"]["body"]["data"][0]["embedding"]
# ... store in your DB by result["custom_id"]
Batch object shape
{
"id": "batch-abc123...",
"object": "batch",
"endpoint": "/v1/embeddings",
"input_file_id": "file-input...",
"output_file_id": "file-output...",
"error_file_id": null,
"completion_window": "24h",
"status": "completed",
"request_counts": {"total": 1000, "completed": 998, "failed": 2},
"created_at": 1778765001,
"in_progress_at": 1778765010,
"completed_at": 1778765320,
"expires_at": 1781357001,
"metadata": {"job": "weekly-reembed"},
"errors": null
}
output_file_id is null until status is completed. error_file_id is set only if there were per-line errors.
Output JSONL format
Output file mirrors input order. Each line corresponds to one request:
{"id": "batch_req_abc...", "custom_id": "doc-001",
"response": {"status_code": 200, "body": {"object":"list","data":[...],"usage":{...}}},
"error": null}
{"id": "batch_req_def...", "custom_id": "doc-002",
"response": null, "error": {"message": "...", "type": "invalid_request_error"}}
Batch states
| Status | Meaning |
|---|---|
validating |
Pre-flight checks (file exists, lines parse) |
in_progress |
Worker processing requests |
completed |
All requests done, output_file_id ready |
failed |
Fatal error, error_file_id has per-line errors |
cancelled |
Cancelled by user via POST /v1/batches/{id}/cancel |
expired |
Batch passed its expires_at (default 30 days post-creation) |
Cancellation
batch = client.batches.cancel(batch_id="batch-abc123...")
Cancellation flips status to cancelled immediately at the database level. In-flight requests for that batch may still complete (the worker checks status between requests), but no new requests will be dispatched. Any output produced up to the cancel point is discarded.
Limits
- Max 50,000 requests per batch
- One batch = one endpoint (mixed endpoints not allowed)
- 24-hour completion SLA (most batches finish in minutes)
- Output + error files auto-expire 30 days post-completion
Pricing
50% off list price on all token costs:
- Chat: 0.5x base input/output rates
- Embed: 0.5x input token rate
- Rerank: 0.5x per-document rate
Stacks with prompt cache: cache_read inside a batch = 0.1x * 0.5x = 0.05x of base input. 20x off.
Getting notified on completion
Use webhooks instead of polling. Register a webhook for batch.completed and batch.failed events. We POST your URL when the batch reaches a terminal state.