Image generation app

epithre-iris is our image generation model with a 3-LoRA style registry (none / dark / anime).

Basic generation

import base64
resp = client.images.generate(
    model="epithre-iris",
    prompt="warung kopi pinggir jalan Jakarta malam, cinematic, golden hour, wide shot",
    size="768x768",
    num_steps=20,
)
open("warkop.png", "wb").write(base64.b64decode(resp.data[0].b64_json))

size: max 960x960. Default 768x768. Rounds down to multiple of 16.
num_steps: 4 (fast preview) to 30 (final render). 20 is the typical production setting.
seed: -1 random, integer for reproducibility.

Style LoRAs

# Anime style
resp = client.images.generate(
    model="epithre-iris",
    prompt="a young samurai in a bamboo forest, cherry blossoms falling",
    lora="anime",
    lora_strength=0.8,   # 0-1.5, default 0.6
)

# Dark / cinematic style
resp = client.images.generate(
    model="epithre-iris",
    prompt="abandoned colonial-era building in Bandung, mist, dramatic shadows",
    lora="dark",
    lora_strength=0.7,
)

LoRAs are triggerless: just set lora and the model picks up the style. No keyword-stuffing your prompt.

Multi-reference editing

import base64, httpx, os

refs = [base64.b64encode(open(f"ref_{i}.png", "rb").read()).decode()
        for i in range(3)]

resp = httpx.post(
    "https://api.epithre.com/v1/images/edits",
    headers={"Authorization": f"Bearer {os.environ['EPITHRE_KEY']}"},
    json={
        "model": "epithre-iris",
        "prompt": "the product from image 1, in the studio setting from image 2, "
                  "shot in the style of image 3",
        "images": refs,
        "size": "640x640",
    },
    timeout=60,
).json()

Useful for: product-in-context shots, style transfer, A/B compositions.

Common prompt patterns

Goal	Prompt pattern
Photorealistic	"photorealistic, [subject], [lighting], shot on [camera/lens], [setting]"
Illustration	"illustration, [subject], [art style], [palette]" + `lora="anime"` for anime
Marketing thumbnail	"professional marketing thumbnail for: [topic], vibrant colors, no text"
Product shot	"studio product photo of [product], white background, soft lighting"
Concept art	"concept art, [subject], cinematic, [mood], [color scheme]"

What works well

Indonesian scenes (warung, traditional architecture, batik patterns)
Photorealistic landscapes
Stylized portraits (especially with anime LoRA)
Product shots
Concept art

What works less well

Text in images: don't expect readable text on signs/billboards. The model will produce text-like shapes but rarely legible words.
Exact logos: it'll produce plausible-but-wrong company logos. Not for licensed brand work.
Exact faces of real people: refused for real public figures (IRIS_SAFETY_V2 filter).
Counting: "5 cats" reliably gets 3-7 cats. The model isn't a precise counter.

Latency

4 steps @ 768x768: 12-19s
20 steps @ 768x768: 45-60s
Anime LoRA adds ~5s

For interactive UX: stream-of-previews using 4 steps, render final at 20 steps when user confirms.

Cost

Rp750 per image. A typical app generating 10 previews + 1 final per user-session = Rp8,250 / session.

For nightly poster/thumbnail/asset generation jobs, you can fire many requests in parallel within your concurrency cap. Image generation isn't yet a batch-API target (the input format is different from chat/embed). Use direct realtime calls with a semaphore.

import asyncio
sem = asyncio.Semaphore(3)  # Iris parallel cap

async def generate_one(prompt):
    async with sem:
        return await async_client.images.generate(
            model="epithre-iris", prompt=prompt, size="768x768")

results = await asyncio.gather(*[generate_one(p) for p in prompts])