Batches

Process bulk workloads asynchronously at 50% cost vs. realtime. Submit a JSONL of requests, poll until done, download results.

Ideal for: corpus embedding, large eval runs, batch summarization. 24-hour completion SLA (most batches complete in minutes).

Endpoints

Method	Path	What it does
POST	`/v1/batches`	Create a batch from an uploaded input file
GET	`/v1/batches`	List your batches (newest first; `?limit=N`, max 100)
GET	`/v1/batches/{batch_id}`	Status + counters + output/error file ids
POST	`/v1/batches/{batch_id}/cancel`	Cancel a batch (only if not yet in terminal state)

Request body (create)

Field	Type	Required	Description
`input_file_id`	string	yes	File id from `/v1/files` upload with `purpose=batch_input`
`endpoint`	string	yes	One of `/v1/chat/completions`, `/v1/embeddings`, `/v1/rerank`. All requests in the batch must target this endpoint.
`completion_window`	string	no	`"24h"` (only supported value; we usually finish in minutes)
`metadata`	object	no	Free-form key/value tags returned on every fetch. Use for your own bookkeeping.

Supported endpoints inside a batch

Endpoint	Use for
`/v1/chat/completions`	Bulk chat / summarization / classification
`/v1/embeddings`	Bulk corpus embedding
`/v1/rerank`	Bulk reranking

Input JSONL format

One request per line; each line is a JSON object with custom_id + method + url + body:

{"custom_id":"doc-001","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 1"]}}
{"custom_id":"doc-002","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 2"]}}
{"custom_id":"doc-003","method":"POST","url":"/v1/embeddings","body":{"model":"epithre-embed","input":["text 3"]}}

custom_id is your identifier, returned in the output. Use it to match results back to your records.

End-to-end example

# 1) Upload input
input_file = client.files.create(file=open("requests.jsonl","rb"), purpose="batch_input")

# 2) Create batch
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/embeddings",
    completion_window="24h",
)
print(batch.id)  # batch-abc123...

# 3) Poll until completed
import time
while batch.status not in ("completed", "failed", "cancelled"):
    time.sleep(5)
    batch = client.batches.retrieve(batch.id)

# 4) Download output
out = client.files.content(batch.output_file_id)
for line in out.text.splitlines():
    result = json.loads(line)
    if result["error"] is None:
        emb = result["response"]["body"]["data"][0]["embedding"]
        # ... store in your DB by result["custom_id"]

Batch object shape

{
  "id": "batch-abc123...",
  "object": "batch",
  "endpoint": "/v1/embeddings",
  "input_file_id": "file-input...",
  "output_file_id": "file-output...",
  "error_file_id": null,
  "completion_window": "24h",
  "status": "completed",
  "request_counts": {"total": 1000, "completed": 998, "failed": 2},
  "created_at": 1778765001,
  "in_progress_at": 1778765010,
  "completed_at": 1778765320,
  "expires_at": 1781357001,
  "metadata": {"job": "weekly-reembed"},
  "errors": null
}

output_file_id is null until status is completed. error_file_id is set only if there were per-line errors.

Output JSONL format

Output file mirrors input order. Each line corresponds to one request:

{"id": "batch_req_abc...", "custom_id": "doc-001",
 "response": {"status_code": 200, "body": {"object":"list","data":[...],"usage":{...}}},
 "error": null}
{"id": "batch_req_def...", "custom_id": "doc-002",
 "response": null, "error": {"message": "...", "type": "invalid_request_error"}}

Batch states

Status	Meaning
`validating`	Pre-flight checks (file exists, lines parse)
`in_progress`	Worker processing requests
`completed`	All requests done, output_file_id ready
`failed`	Fatal error, error_file_id has per-line errors
`cancelled`	Cancelled by user via `POST /v1/batches/{id}/cancel`
`expired`	Batch passed its expires_at (default 30 days post-creation)

Cancellation

batch = client.batches.cancel(batch_id="batch-abc123...")

Cancellation flips status to cancelled immediately at the database level. In-flight requests for that batch may still complete (the worker checks status between requests), but no new requests will be dispatched. Any output produced up to the cancel point is discarded.

Limits

Max 50,000 requests per batch
One batch = one endpoint (mixed endpoints not allowed)
24-hour completion SLA (most batches finish in minutes)
Output + error files auto-expire 30 days post-completion

Pricing

50% off list price on all token costs:

Chat: 0.5x base input/output rates
Embed: 0.5x input token rate
Rerank: 0.5x per-document rate

Stacks with prompt cache: cache_read inside a batch = 0.1x * 0.5x = 0.05x of base input. 20x off.

Getting notified on completion

Use webhooks instead of polling. Register a webhook for batch.completed and batch.failed events. We POST your URL when the batch reaches a terminal state.