Knowledge and retrieval
POST /v1/retrieval
Upload PDF, TXT, or Markdown documents as knowledge files. Epithre auto-extracts text, chunks recursively, embeds with epithre-embed, and indexes in a per-customer vector store. Query via cosine similarity. Turnkey RAG without DIY chunk pipeline.
Upload a knowledge file
r = client.files.create(
file=open("hotel_handbook.pdf", "rb"),
purpose="knowledge",
)
print(r.id) # file-abc...
print(r.status) # initially "uploaded"
# Background processor: extracts -> chunks -> embeds asynchronously.
# Poll r.status: "uploaded" -> "processed" (typically <30s for typical docs).
Only processed files are queryable.
Retrieval query
import httpx
r = httpx.post(
"https://api.epithre.com/v1/retrieval",
headers={"Authorization": f"Bearer {EPITHRE_KEY}"},
json={
"file_ids": ["file-abc..."],
"query": "kebijakan pembatalan reservasi WNA",
"top_k": 5,
},
).json()
for hit in r["results"]:
print(f"[{hit['score']:.3f}] chunk_{hit['chunk_index']} from {hit['file_id']}")
print(hit["text"])
Request body
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query text (embedded server-side). |
file_ids |
array | no | Restrict search to specific files. Omit = all your processed knowledge files. |
top_k |
int | no | 1-50 (default 5). |
instruction |
string | no | Retrieval instruction prefix. Improves relevance for specific task domains. |
Response shape
{
"object": "list",
"results": [
{"file_id": "file-abc...", "chunk_index": 12, "text": "...", "score": 0.83},
{"file_id": "file-abc...", "chunk_index": 5, "text": "...", "score": 0.71}
],
"query_tokens": 8,
"billed_tokens": 8
}
score is cosine similarity in [0, 1] range (vectors are L2-normalized).
Supported file types
| Format | Notes |
|---|---|
PDF (.pdf) |
Text extracted via PyMuPDF. Preserves page breaks as [Page N] markers. |
Plain text (.txt) |
UTF-8 read as-is. |
Markdown (.md) |
UTF-8 read as-is. Markdown structure preserved in chunks. |
For Office docs (.docx, .pptx) or HTML: convert to PDF or Markdown first.
Chunking
Recursive char split: target 1000 chars per chunk with 200 char overlap. Paragraph boundaries are preferred breakpoints; falls back to sentences then hard char split for very long contiguous text. Max chunk size capped at 9000 chars (below the embed cap).
Customer-tunable chunk parameters not yet exposed. If you have a use case that needs different chunking (e.g., per-Pasal for Indonesian legal docs), let us know.
Wire pattern with chat
The natural RAG flow: retrieve, then ground a chat completion:
hits = httpx.post(".../v1/retrieval", json={
"query": user_question,
"top_k": 5,
}).json()["results"]
context = "\n\n".join(f"[{h['file_id']} chunk {h['chunk_index']}] {h['text']}"
for h in hits)
resp = client.chat.completions.create(
model="epithre-omni",
messages=[
{"role": "system", "content": [
{"type": "text",
"text": f"Answer based on these documents:\n\n{context}",
"cache_control": {"type": "ephemeral"}}
]},
{"role": "user", "content": user_question},
],
)
Stack with cache_control so follow-up questions on the same retrieved set bill at 10% read rate.
Pricing
- File upload + processing: standard
epithre-embedrate (Rp1,500 / 1M tokens on chunk content). - Storage of chunks: free (subject to 1 GB user quota on parent file).
- Retrieval query: one embed call at standard rate (typically <50 tokens for a query).
Limits
- Max 1 GB total knowledge storage per user.
- Each file max 100 MB.
- Files auto-expire 30 days after upload (or after delete).