DeepInfra

Serverless GPU inference for open-source LLMs at low cost

App Infrastructure

About

DeepInfra provides serverless inference for hundreds of open-source models including Llama, Mistral, and Falcon, with pay-per-token pricing and an OpenAI-compatible API. No infrastructure management — just call the API and scale automatically.

Choose DeepInfra when…

•running open-source models without managing GPU infrastructure
•need the lowest cost per token for open models
•want OpenAI-compatible API for easy integration

Builder Slot

Where do your models actually run?Required for most stacks

LLM providers and inference servers — where the actual model computation happens

Dev Tools

Not applicable

App Infra

Required

Hybrid

Required

Other tools in this slot:

Ollama vLLM Groq Together AI Fireworks AI llama.cpp Replicate HuggingFace +14 more

Stack Genome Detection

AIchitect's Genome scanner detects DeepInfra in your project via these signals:

env vars

DEEPINFRA_API_KEY

Alternatives to consider (2)

Together AIcompare →Fireworks AIcompare →

Pricing

✦ Free tier available

Free trial$0

Pay-as-you-goPer token

Recent Activity

Pricing updated

3 weeks ago

↗

Pricing updated

5 weeks ago

↗

View all activity for this tool →

Badge

Add to your GitHub README

[![DeepInfra](https://www.aichitect.dev/badge/tool/deepinfra)](https://www.aichitect.dev/tool/deepinfra)

Explore the full AI landscape

See how DeepInfra fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →