Projection · Llama 3.3 70B
Monthly running cost
$0/ mo
— per school
One time setup
$0
initial index + labour
Cost / answer
$0
— answers / mo
Provider comparison
Monthly running cost under the same assumptions. Tap a model to select it.
Assumptions to verify
Tokens per query, queries per day, and cache hit rate are estimates, not measured. Infrastructure lines are placeholders and exclude load balancer, connection pooler, data egress, and backups. This models cost, not rate limit capacity during spikes. Embedding price needs confirming against OpenAI's live pricing.
Notes & caveats
- Prompt caching by provider. OpenAI and Anthropic discount cached input ~90%; Groq offers no caching on its Llama models, so it gets no discount here. Anthropic also charges ~1.25× on cache writes (only reads are discounted), so its real figure sits a little above this estimate.
- Embedding price. Hosted figure uses OpenAI
text-embedding-3-small at ~$0.02 / 1M tokens. Verify against the live pricing page before quoting.
- Reindex is destructive. The current rebuild embeds all content again each run, so embedding cost scales with frequency × content size, not just changed pages. A change detection pass would cut this materially.
- Output is capped. At a 300 token cap, output barely moves the bill; input (retrieved chunks) dominates, which is why cheap input models look strong here.
- Provider risk. Groq's catalogue is open source only and its future shifted after the Dec 2025 Nvidia deal. Don't lock into a single vendor; this comparison is also the hedge.
- Not included. Bandwidth, support time, and any paid moderation classifier. Add them to the fixed monthly fields if relevant.