Chatbot · 10k users · 5/day · 400/600 tokens
OpenAI API
Rate-limit-bound
Rate-limit bound — the provider tier caps you at 100k users users before you need to upgrade or split keys.
Cost / month$10.5kat 10k users
Cost / request$0.00701.5M req/mo
Cost / user$1.05monthly
Cost / year$126.0kconstant traffic
Cost over time
Cost composition at 10k users
LLM tokens$10.5k100%
Latency breakdown
p50 end-to-end3.90s
LLM time-to-first-token431ms · 11%
LLM generation3.47s · 89%
Generation scales linearly with output tokens — shrink the response or pick a higher-throughput model to bring this down.
Breaking points
- Cost crosses $1,000/mo at 1k users→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
- Cost crosses $5,000/mo at 5k users→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
- Cost crosses $20,000/mo at 25k users→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
- Cost crosses $100,000/mo at 100k users→ Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
- Peak load exceeds provider rate limit at 100k users (3× peak/avg)→ Move to a higher provider tier, distribute across multiple API keys, or buffer non-real-time traffic via batch API.
Provider comparison
Same workload at 10k users — ranked by monthly cost. Click a row to switch primary LLM.
| Model | Cost / mo | Cost / req | p50 latency |
|---|---|---|---|
| DeepInfraCheapest | $570 | 0.038¢ | 12.35s |
| Groq | $1.1k | 0.071¢ | 12.10s |
| Fireworks AI | $1.4k | 0.090¢ | 12.20s |
| Together AI | $1.4k | 0.090¢ | 12.25s |
| Perplexity API | $1.5k | $0.0010 | 12.50s |
| Cerebras | $1.6k | $0.0011 | 12.05s |
| Mistral API | $1.6k | $0.0011 | 10.74s |
| DeepSeek API | $1.7k | $0.0012 | Infinitys |
| Google Gemini API | $2.4k | $0.0016 | 3.32s |
| OpenAI APICurrent | $10.5k | $0.0070 | 3.90s |
| Azure OpenAI | $10.5k | $0.0070 | 12.60s |
| Cohere API | $10.5k | $0.0070 | 17.29s |
| Amazon Bedrock | $15.3k | $0.010 | 12.60s |
| Anthropic API | $15.7k | $0.010 | 12.79s |
| xAI Grok API | $20.4k | $0.014 | Infinitys |
Structural risks
Switch away from this stack when
- • Peak load exceeds the LLM provider's standard tier (around 100k).
Structural risks
- •No eval layer in stack. Add Langfuse, Braintrust, or DeepEval to catch quality regressions before users do.