LiteLLM Backend (opt-in)
The upcoming 0.7.3 release ships an opt-in LiteLLMProvider that routes every LLM call through LiteLLM instead of Forge's native per-provider HTTP shapes. Enable it for unified routing, accurate per-call cost attribution, and access to LiteLLM's broader provider catalog.
Where this fits
LiteLLM is a routing-layer alternative to the native provider stack documented at LLM Providers. Both stacks share the same LlmProvider base class — switching between them is a one-env-var flip.
Enabling
pip install 'data-product-forge[litellm]'
export FLUID_LLM_BACKEND=litellm
fluid forge --domain retail
That's it. The dispatcher in cli/forge_copilot_llm_providers.py::get_llm_provider short-circuits to LiteLLMProvider when the env var is set; call_llm and call_llm_streaming route through it transparently.
What changes
| Aspect | Native stack | LiteLLM stack |
|---|---|---|
| Provider list | Anthropic / OpenAI / Gemini / Azure / Ollama (5 hand-rolled) | 100+ providers via LiteLLM's catalog |
| Cost attribution | Heuristic estimator (token counts × price/token table) | LiteLLM's per-call usage.cost field — accurate to the cent |
| Streaming | Native SSE per provider | LiteLLM's unified streaming wrapper |
| Tool use | Native per-provider tool-use shapes | LiteLLM's unified tools parameter |
| Prompt caching | Native (Anthropic only) | Whatever LiteLLM passes through |
| Token-usage capture | SSE wire decoder per provider | LiteLLM's usage callback |
Configuration
| Env var | Purpose |
|---|---|
FLUID_LLM_BACKEND=litellm | Activate the LiteLLM stack |
FLUID_LLM_MODEL | Model name. Use LiteLLM's model-name conventions (e.g. claude-3-5-sonnet-20241022, gpt-4.1-mini, gemini/gemini-2.5-pro). |
FLUID_LITELLM_MODEL_PREFIX | Override the prefix LiteLLM uses for niche providers — e.g. azure/, bedrock/. |
OPENAI_API_KEY / ANTHROPIC_API_KEY / GEMINI_API_KEY | Provider keys, same as the native stack. |
LITELLM_* | Any LiteLLM-specific env var (LiteLLM reads these directly; Forge doesn't filter them). |
Cost attribution
The native stack estimates cost from input/output token counts via an internal price table that gets stale every time providers change pricing. LiteLLM's usage.cost field is the authoritative per-call cost the LLM API itself reports, so:
# Internal — RunCostTracker.record_call signature gained usd_override
tracker.record_call(
provider="anthropic",
model="claude-3-5-sonnet-20241022",
input_tokens=8420,
output_tokens=1800,
usd_override=0.0286, # passed in from LiteLLM's reported cost
)
fluid stats (page) and the per-run cost.json show the LiteLLM-reported figure when FLUID_LLM_BACKEND=litellm. Older runs (or runs on the native stack) keep using the heuristic estimate.
Capability warnings
The capability catalog at fluid_build/copilot/agents/capability_catalog.py covers the native-stack provider/model combinations. When you switch to LiteLLM, the catalog still applies (LiteLLM is just a router) but the warnings reflect the underlying model — litellm + claude-sonnet-4-6 warns identically to anthropic + claude-sonnet-4-6.
If you point LiteLLM at a model not in the catalog, the run-start banner says "model X is not in the capability catalog" and the run continues with conservative defaults. See Capability Warnings.
When to use LiteLLM vs. native
| Use LiteLLM when | Use native when |
|---|---|
| You need a provider not in the native 5 (Bedrock, Vertex direct, Cohere, Mistral cloud, etc.) | You're on Anthropic / OpenAI / Gemini / Ollama and want fewest dependencies |
| You want per-call cost accurate to the cent (LiteLLM passes through provider billing) | You're cost-budgeting at the run level — heuristic is good enough |
| You're already running LiteLLM as a routing proxy in production and want Forge to use the same path | You want zero new runtime deps |
| You want to A/B test models without code changes | — |
The two stacks coexist safely — you can flip FLUID_LLM_BACKEND per run without touching contracts or config.
Caveats
- LiteLLM is an extra at install time (
pip install 'data-product-forge[litellm]'). Without it, settingFLUID_LLM_BACKEND=litellmraisesMissingExtraErrorat run start. - Tool-use behaviour matches LiteLLM's wrapper, not the native shape. If you've been depending on Anthropic's exact tool-use response shape (rare), switching to LiteLLM may surface differences.
- The agent-layer typed errors (
RateLimitError,ContextOverflowError, etc. — see Typed Errors) still fire under LiteLLM; the classifier handles both wire shapes.
See also
- LLM Providers — the native stack
- Capability Warnings — what the capability catalog enforces under both stacks
- Cost Tracking — how cost figures land in
.fluid/agents/<run-id>/cost.json fluid stats— aggregating cost across runs