Sitely — Cost Model

Assumptions

Schools 120

Queries / day · per school 50

Input tokens / query

Output tokens / query

Answer cache hit rate 35%

Repeat questions served from cache cost $0 in LLM calls.

Prompt caching

90% off repeated context on OpenAI and Anthropic. Groq Llama models don't offer it, so the toggle has no effect on them.

Indexing

Indexed tokens / school

~200 pages ≈ 800k tokens. Drives embedding cost.

Embeddings

Rescrape frequency

Fixed monthly

Compute / EC2

Vector DB

Monitoring

Setup labour

Projection · Llama 3.3 70B

Monthly running cost

$0/ mo

— per school

One time setup

initial index + labour

Cost / answer

— answers / mo

Provider comparison

Monthly running cost under the same assumptions. Tap a model to select it.

Assumptions to verify Tokens per query, queries per day, and cache hit rate are estimates, not measured. Infrastructure lines are placeholders and exclude load balancer, connection pooler, data egress, and backups. This models cost, not rate limit capacity during spikes. Embedding price needs confirming against OpenAI's live pricing.

Notes & caveats

Prompt caching by provider. OpenAI and Anthropic discount cached input ~90%; Groq offers no caching on its Llama models, so it gets no discount here. Anthropic also charges ~1.25× on cache writes (only reads are discounted), so its real figure sits a little above this estimate.
Embedding price. Hosted figure uses OpenAI text-embedding-3-small at ~$0.02 / 1M tokens. Verify against the live pricing page before quoting.
Reindex is destructive. The current rebuild embeds all content again each run, so embedding cost scales with frequency × content size, not just changed pages. A change detection pass would cut this materially.
Output is capped. At a 300 token cap, output barely moves the bill; input (retrieved chunks) dominates, which is why cheap input models look strong here.
Provider risk. Groq's catalogue is open source only and its future shifted after the Dec 2025 Nvidia deal. Don't lock into a single vendor; this comparison is also the hedge.
Not included. Bandwidth, support time, and any paid moderation classifier. Add them to the fixed monthly fields if relevant.