Chatbot · 10k users · 5/day · 400/600 tokens

OpenAI API

Rate-limit-bound

Rate-limit bound — the provider tier caps you at 100k users users before you need to upgrade or split keys.

Cost / month$10.5kat 10k users

Cost / request$0.00701.5M req/mo

Cost / user$1.05monthly

Cost / year$126.0kconstant traffic

LLM tokens$10.5k100%

p50 end-to-endInfinitys

LLM generationInfinitys · NaN%

Generation scales linearly with output tokens — shrink the response or pick a higher-throughput model to bring this down.

Total response exceeds 8.0s (Infinitys) — past what chatbot users tolerate
→ Drop output tokens, switch to a higher-throughput model (Cerebras ~800 tok/s, Groq ~500 tok/s, Gemini Flash ~120 tok/s), or stream the response so users see progress.
Cost crosses $1,000/mo at 1k users
→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
Cost crosses $5,000/mo at 5k users
→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
Cost crosses $20,000/mo at 25k users
→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
Cost crosses $100,000/mo at 100k users
→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
Peak load exceeds provider rate limit at 100k users (3× peak/avg)
→ Move to a higher provider tier, distribute across multiple API keys, or buffer non-real-time traffic via batch API.

Same workload at 10k users — ranked by monthly cost. Click a row to switch primary LLM.

Model	Cost / mo	Cost / req	p50 latency
DeepInfraCheapest	$570	0.038¢	12.35s
Groq	$1.1k	0.071¢	12.10s
DeepSeek API	$1.2k	0.077¢	6.11s
Fireworks AI	$1.4k	0.090¢	12.20s
Together AI	$1.4k	0.090¢	12.25s
Perplexity API	$1.5k	$0.0010	12.50s
Cerebras	$1.6k	$0.0011	12.05s
Z.AI GLM API	$4.8k	$0.0032	12.60s
Google Gemini API	$5.3k	$0.0035	Infinitys
Mistral API	$6.6k	$0.0044	9.87s
xAI Grok API	$10.2k	$0.0068	7.56s
Cohere API	$10.5k	$0.0070	11.02s
Azure OpenAI	$10.5k	$0.0070	12.60s
OpenAI APICurrent	$10.5k	$0.0070	Infinitys
Amazon Bedrock	$15.3k	$0.010	12.60s
Anthropic API	$15.3k	$0.010	Infinitys

•No eval layer in stack. Add Langfuse, Braintrust, or DeepEval to catch quality regressions before users do.