Prompt & EvalOpen Source✦ Free Tier

PromptFoo

CLI/library for prompt testing and red-teaming

⭐ 5,000 stars● Health 95/100 — Active· commit recency (40 pts) · star momentum (30 pts) · issue ratio (20 pts) · forks (10 pts)App Infrastructure

Open in Builder →Website ↗GitHub ↗

About

Test and compare prompts across models. Built-in red-teaming, regression testing, and side-by-side model comparison.

Choose PromptFoo when…

•You want CLI-first, config-driven LLM evals
•Running eval suites in CI/CD pipelines is a goal
•You need red-teaming and safety testing built in

Builder Slot

How do you know it's working?Optional for most stacks

Tests, evals, and experiment tracking to measure and improve your AI output quality

Dev Tools

Not applicable

App Infra

Recommended

Hybrid

Optional

Other tools in this slot:

DeepEval RAGAS Vellum PromptLayer Agenta TruLens Humanloop Inspect

Stack Genome Detection

AIchitect's Genome scanner detects PromptFoo in your project via these signals:

npm packages

promptfoo

config files

promptfooconfig.yamlpromptfooconfig.yml

Integrates with (2)

LangfuseObservability

Langfuse production traces can be exported as eval datasets that Promptfoo uses for regression testing in CI.

→ Close the eval loop: real failures captured in Langfuse become the regression test cases Promptfoo runs on every deploy.

Compare →

OpenAI APILLM Infrastructure

Promptfoo calls OpenAI's API directly to run prompts through configured test cases and compare outputs against assertions.

→ Automated prompt regression testing against GPT-4o — catch output quality changes before they reach production.

Compare →

Often paired with (1)

DeepEval

Alternatives to consider (3)

Vellumcompare →Agentacompare →Galileocompare →

Pricing

✦ Free tier available

Recent Activity

Pricing updated

3 weeks ago

↗

Health ↑ 80 → 95

4 weeks ago

↗

Pricing updated

5 weeks ago

↗

View all activity for this tool →

In 4 stacks

Evaluation & Quality Stack LLM Cost Reduction Stack AI Red-Team / Security Stack Research & Synthesis Stack

Ruled out by 1 stack

AI Guardrails Stack

“Pre-launch adversarial testing — use it before deploying. Guardrails are runtime enforcement after deployment. You need both.”

Badge

Add to your GitHub README

[![PromptFoo](https://www.aichitect.dev/badge/tool/promptfoo)](https://www.aichitect.dev/tool/promptfoo)

Explore the full AI landscape

See how PromptFoo fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →