Knowledge and retrieval

POST /v1/retrieval

Upload PDF, TXT, or Markdown documents as knowledge files. Epithre auto-extracts text, chunks recursively, embeds with epithre-embed, and indexes in a per-customer vector store. Query via cosine similarity. Turnkey RAG without DIY chunk pipeline.

Upload a knowledge file

r = client.files.create(
    file=open("hotel_handbook.pdf", "rb"),
    purpose="knowledge",
)
print(r.id)        # file-abc...
print(r.status)    # initially "uploaded"

# Background processor: extracts -> chunks -> embeds asynchronously.
# Poll r.status: "uploaded" -> "processed" (typically <30s for typical docs).

Only processed files are queryable.

Retrieval query

import httpx
r = httpx.post(
    "https://api.epithre.com/v1/retrieval",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "file_ids": ["file-abc..."],
        "query": "kebijakan pembatalan reservasi WNA",
        "top_k": 5,
    },
).json()

for hit in r["results"]:
    print(f"[{hit['score']:.3f}] chunk_{hit['chunk_index']} from {hit['file_id']}")
    print(hit["text"])

Request body

Field Type Required Description
query string yes Search query text (embedded server-side).
file_ids array no Restrict search to specific files. Omit = all your processed knowledge files.
top_k int no 1-50 (default 5).
instruction string no Retrieval instruction prefix. Improves relevance for specific task domains.

Response shape

{
  "object": "list",
  "results": [
    {"file_id": "file-abc...", "chunk_index": 12, "text": "...", "score": 0.83},
    {"file_id": "file-abc...", "chunk_index": 5,  "text": "...", "score": 0.71}
  ],
  "query_tokens": 8,
  "billed_tokens": 8
}

score is cosine similarity in [0, 1] range (vectors are L2-normalized).

Supported file types

Format Notes
PDF (.pdf) Text extracted via PyMuPDF. Preserves page breaks as [Page N] markers.
Plain text (.txt) UTF-8 read as-is.
Markdown (.md) UTF-8 read as-is. Markdown structure preserved in chunks.

For Office docs (.docx, .pptx) or HTML: convert to PDF or Markdown first.

Chunking

Recursive char split: target 1000 chars per chunk with 200 char overlap. Paragraph boundaries are preferred breakpoints; falls back to sentences then hard char split for very long contiguous text. Max chunk size capped at 9000 chars (below the embed cap).

Customer-tunable chunk parameters not yet exposed. If you have a use case that needs different chunking (e.g., per-Pasal for Indonesian legal docs), let us know.

Wire pattern with chat

The natural RAG flow: retrieve, then ground a chat completion:

hits = httpx.post(".../v1/retrieval", json={
    "query": user_question,
    "top_k": 5,
}).json()["results"]

context = "\n\n".join(f"[{h['file_id']} chunk {h['chunk_index']}] {h['text']}"
                     for h in hits)

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": [
            {"type": "text",
             "text": f"Answer based on these documents:\n\n{context}",
             "cache_control": {"type": "ephemeral"}}
        ]},
        {"role": "user", "content": user_question},
    ],
)

Stack with cache_control so follow-up questions on the same retrieved set bill at 10% read rate.

Pricing

Limits

See also