Knowledge and retrieval

POST /v1/retrieval

Upload PDF, TXT, or Markdown documents as knowledge files. Epithre auto-extracts text, chunks recursively, embeds with epithre-embed, and indexes in a per-customer vector store. Query via cosine similarity. Turnkey RAG without DIY chunk pipeline.

Upload a knowledge file

r = client.files.create(
    file=open("hotel_handbook.pdf", "rb"),
    purpose="knowledge",
)
print(r.id)        # file-abc...
print(r.status)    # initially "uploaded"

# Background processor: extracts -> chunks -> embeds asynchronously.
# Poll r.status: "uploaded" -> "processed" (typically <30s for typical docs).

Only processed files are queryable.

Retrieval query

import httpx
r = httpx.post(
    "https://api.epithre.com/v1/retrieval",
    headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
    json={
        "file_ids": ["file-abc..."],
        "query": "kebijakan pembatalan reservasi WNA",
        "top_k": 5,
    },
).json()

for hit in r["results"]:
    print(f"[{hit['score']:.3f}] chunk_{hit['chunk_index']} from {hit['file_id']}")
    print(hit["text"])

Request body

Field	Type	Required	Description
`query`	string	yes	Search query text (embedded server-side).
`file_ids`	array	no	Restrict search to specific files. Omit = all your processed knowledge files.
`top_k`	int	no	1-50 (default 5).
`instruction`	string	no	Retrieval instruction prefix. Improves relevance for specific task domains.

Response shape

{
  "object": "list",
  "results": [
    {"file_id": "file-abc...", "chunk_index": 12, "text": "...", "score": 0.83},
    {"file_id": "file-abc...", "chunk_index": 5,  "text": "...", "score": 0.71}
  ],
  "query_tokens": 8,
  "billed_tokens": 8
}

score is cosine similarity in [0, 1] range (vectors are L2-normalized).

Supported file types

Format	Notes
PDF (`.pdf`)	Text extracted via PyMuPDF. Preserves page breaks as `[Page N]` markers.
Plain text (`.txt`)	UTF-8 read as-is.
Markdown (`.md`)	UTF-8 read as-is. Markdown structure preserved in chunks.

For Office docs (.docx, .pptx) or HTML: convert to PDF or Markdown first.

Chunking

Recursive char split: target 1000 chars per chunk with 200 char overlap. Paragraph boundaries are preferred breakpoints; falls back to sentences then hard char split for very long contiguous text. Max chunk size capped at 9000 chars (below the embed cap).

Customer-tunable chunk parameters not yet exposed. If you have a use case that needs different chunking (e.g., per-Pasal for Indonesian legal docs), let us know.

Wire pattern with chat

The natural RAG flow: retrieve, then ground a chat completion:

hits = httpx.post(".../v1/retrieval", json={
    "query": user_question,
    "top_k": 5,
}).json()["results"]

context = "\n\n".join(f"[{h['file_id']} chunk {h['chunk_index']}] {h['text']}"
                     for h in hits)

resp = client.chat.completions.create(
    model="epithre-omni",
    messages=[
        {"role": "system", "content": [
            {"type": "text",
             "text": f"Answer based on these documents:\n\n{context}",
             "cache_control": {"type": "ephemeral"}}
        ]},
        {"role": "user", "content": user_question},
    ],
)

Stack with cache_control so follow-up questions on the same retrieved set bill at 10% read rate.

Pricing

File upload + processing: standard epithre-embed rate (Rp1,500 / 1M tokens on chunk content).
Storage of chunks: free (subject to 1 GB user quota on parent file).
Retrieval query: one embed call at standard rate (typically <50 tokens for a query).

Limits

Max 1 GB total knowledge storage per user.
Each file max 100 MB.
Files auto-expire 30 days after upload (or after delete).