Chatbot · 10k users · 5/day · 400/600 tokens

OpenAI API

Rate-limit-bound
Rate-limit bound — the provider tier caps you at 100k users users before you need to upgrade or split keys.
Cost / month$10.5kat 10k users
Cost / request$0.00701.5M req/mo
Cost / user$1.05monthly
Cost / year$126.0kconstant traffic

Cost over time

$0$288.8k$577.5k$866.3k$1.2M1k5k10k25k50k100k250k500k1M1kMonthly active usersOpenAI API

Cost composition at 10k users

LLM tokens$10.5k100%

Latency breakdown

p50 end-to-end3.90s
LLM time-to-first-token431ms · 11%
LLM generation3.47s · 89%

Generation scales linearly with output tokens — shrink the response or pick a higher-throughput model to bring this down.

Breaking points

  • Cost crosses $1,000/mo at 1k users
    Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
  • Cost crosses $5,000/mo at 5k users
    Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
  • Cost crosses $20,000/mo at 25k users
    Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
  • Cost crosses $100,000/mo at 100k users
    Move to a cheaper model, enable prompt caching, or send non-real-time traffic via the batch API.
  • Peak load exceeds provider rate limit at 100k users (3× peak/avg)
    Move to a higher provider tier, distribute across multiple API keys, or buffer non-real-time traffic via batch API.

Provider comparison

Same workload at 10k users — ranked by monthly cost. Click a row to switch primary LLM.

ModelCost / moCost / reqp50 latency
DeepInfraCheapest$5700.038¢12.35s
Groq$1.1k0.071¢12.10s
Fireworks AI$1.4k0.090¢12.20s
Together AI$1.4k0.090¢12.25s
Perplexity API$1.5k$0.001012.50s
Cerebras$1.6k$0.001112.05s
Mistral API$1.6k$0.001110.74s
DeepSeek API$1.7k$0.0012Infinitys
Google Gemini API$2.4k$0.00163.32s
OpenAI APICurrent$10.5k$0.00703.90s
Azure OpenAI$10.5k$0.007012.60s
Cohere API$10.5k$0.007017.29s
Amazon Bedrock$15.3k$0.01012.60s
Anthropic API$15.7k$0.01012.79s
xAI Grok API$20.4k$0.014Infinitys

Structural risks

Switch away from this stack when

  • Peak load exceeds the LLM provider's standard tier (around 100k).

Structural risks

  • No eval layer in stack. Add Langfuse, Braintrust, or DeepEval to catch quality regressions before users do.