Batches

Process bulk workloads asynchronously at 50% cost vs. realtime. Submit a JSONL of requests, poll until done, download results.

Ideal for: corpus embedding, large eval runs, batch summarization. 24-hour completion SLA (most batches complete in minutes).

Endpoints

Method Path What it does
POST /v1/batches Create a batch from an uploaded input file
GET /v1/batches List your batches (newest first; ?limit=N, max 100)
GET /v1/batches/{batch_id} Status + counters + output/error file ids
POST /v1/batches/{batch_id}/cancel Cancel a batch (only if not yet in terminal state)

Request body (create)

Field Type Required Description
input_file_id string yes File id from /v1/files upload with purpose=batch_input
endpoint string yes One of /v1/chat/completions, /v1/embeddings, /v1/rerank. All requests in the batch must target this endpoint.
completion_window string no "24h" (only supported value; we usually finish in minutes)
metadata object no Free-form key/value tags returned on every fetch. Use for your own bookkeeping.

Supported endpoints inside a batch

Endpoint Use for
/v1/chat/completions Bulk chat / summarization / classification
/v1/embeddings Bulk corpus embedding
/v1/rerank Bulk reranking

Input JSONL format

One request per line; each line is a JSON object with custom_id + method + url + body:

{"custom_id":"doc-001","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 1"]}}
{"custom_id":"doc-002","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 2"]}}
{"custom_id":"doc-003","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 3"]}}

custom_id is your identifier, returned in the output. Use it to match results back to your records.

End-to-end example

# 1) Upload input
input_file = client.files.create(file=open("requests.jsonl","rb"), purpose="batch_input")

# 2) Create batch
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/embeddings",
    completion_window="24h",
)
print(batch.id)  # batch-abc123...

# 3) Poll until completed
import time
while batch.status not in ("completed", "failed", "cancelled"):
    time.sleep(5)
    batch = client.batches.retrieve(batch.id)

# 4) Download output
out = client.files.content(batch.output_file_id)
for line in out.text.splitlines():
    result = json.loads(line)
    if result["error"] is None:
        emb = result["response"]["body"]["data"][0]["embedding"]
        # ... store in your DB by result["custom_id"]

Batch object shape

{
  "id": "batch-abc123...",
  "object": "batch",
  "endpoint": "/v1/embeddings",
  "input_file_id": "file-input...",
  "output_file_id": "file-output...",
  "error_file_id": null,
  "completion_window": "24h",
  "status": "completed",
  "request_counts": {"total": 1000, "completed": 998, "failed": 2},
  "created_at": 1778765001,
  "in_progress_at": 1778765010,
  "completed_at": 1778765320,
  "expires_at": 1781357001,
  "metadata": {"job": "weekly-reembed"},
  "errors": null
}

output_file_id is null until status is completed. error_file_id is set only if there were per-line errors.

Output JSONL format

Output file mirrors input order. Each line corresponds to one request:

{"id": "batch_req_abc...", "custom_id": "doc-001",
 "response": {"status_code": 200, "body": {"object":"list","data":[...],"usage":{...}}},
 "error": null}
{"id": "batch_req_def...", "custom_id": "doc-002",
 "response": null, "error": {"message": "...", "type": "invalid_request_error"}}

Batch states

Status Meaning
validating Pre-flight checks (file exists, lines parse)
in_progress Worker processing requests
completed All requests done, output_file_id ready
failed Fatal error, error_file_id has per-line errors
cancelled Cancelled by user via POST /v1/batches/{id}/cancel
expired Batch passed its expires_at (default 30 days post-creation)

Cancellation

batch = client.batches.cancel(batch_id="batch-abc123...")

Cancellation flips status to cancelled immediately at the database level. In-flight requests for that batch may still complete (the worker checks status between requests), but no new requests will be dispatched. Any output produced up to the cancel point is discarded.

Limits

Pricing

50% off list price on all token costs:

Stacks with prompt cache: cache_read inside a batch = 0.1x * 0.5x = 0.05x of base input. 20x off.

Getting notified on completion

Use webhooks instead of polling. Register a webhook for batch.completed and batch.failed events. We POST your URL when the batch reaches a terminal state.

See also