LLM Providers

Forge data-model runs use one active LLM provider per run. The provider is selected from the CLI flag, environment, or saved AI config:

fluid forge data-model from-intent intent.yaml \
  -o customer_orders.fluid.yaml \
  --llm-provider gemini

FLUID_LLM_PROVIDER=ollama \
FLUID_OLLAMA_MODEL=gemma4:latest \
fluid forge data-model from-intent intent.yaml -o customer_orders.fluid.yaml

Supported providers

Provider	Default / common model	Notes
Anthropic	`claude-sonnet-4-6`	Tool-forced structured output and provider-native prompt caching. Streamed runs report accurate token usage in cost summaries (was previously "missing usage" on every streamed call).
OpenAI	`gpt-4.1-mini`	Strict JSON Schema output where available; seed support. Tiered runs use `gpt-4.1` for deep logical modeling. Set `FLUID_OPENAI_STRICT_SCHEMA=1` to harden the response-format schema for `gpt-4o`/`gpt-4.1`/o-series models that reject permissive nested objects.
Gemini	`gemini-2.5-pro`	Uses Gemini response schema where suitable and validator repair when needed
Ollama	`FLUID_OLLAMA_MODEL` such as `gemma4:latest`	Local-only; JSON mode is model-gated. Capability + token-budget catalogs cover `gemma` 1–4, `qwen3-coder`, `qwen3`, `qwen2.5`, `llama3.1`/`3.2`/`3.3`, `mistral`, `mixtral`, `deepseek`, `phi`. See Capability Warnings for tool-use accuracy notes per family.
Azure OpenAI	`FLUID_AZURE_DEPLOYMENT`	OpenAI-compatible wire shape with deployment names

Inspect the active catalog with:

fluid ai models
fluid ai models --provider gemini --json

Tiered mode

--tiered chooses different models within the same provider, never across providers. A typical layout is:

Tier	Role
deep	hardest reasoning and planning
balanced	main model-building execution
fast	routing, clarification, and light evaluation

If a provider has no distinct tier models configured, the CLI collapses tiered mode to a single-model run and emits a one-line warning. Ollama commonly runs this way unless the local model catalog is configured with separate fast, balanced, and deep models.

The deterministic stages stay deterministic even in tiered mode:

Stage	Model use
Interview	Fast routing model
Logical modeler	Deep model
Contract forge	No model, deterministic
Transformation	No model, deterministic from `.model.json`
Validator	No model, deterministic
Self-evaluation	Fast routing model

Strict provider testing

For normal user experience, forge can fall back to deterministic heuristics if an LLM call fails. For provider certification and E2E testing, use:

fluid forge data-model from-intent intent.yaml \
  -o customer_orders.fluid.yaml \
  --llm-provider anthropic \
  --require-llm

--require-llm fails loudly if the provider cannot run. This prevents a green-looking smoke test that actually used heuristics.

Deterministic runs

fluid forge data-model from-intent intent.yaml \
  -o customer_orders.fluid.yaml \
  --deterministic

--deterministic disables cache and tiering for replayable output. Providers pin temperature=0; OpenAI, Ollama, and Azure OpenAI also pin seed where supported.

Environment variables

Provider + credentials

Env var	Purpose
`FLUID_LLM_PROVIDER`	Active provider for the run
`FLUID_LLM_MODEL`	Specific model override
`FLUID_LLM_TIMEOUT_SECONDS`	Provider HTTP timeout
`OPENAI_API_KEY`	OpenAI key
`ANTHROPIC_API_KEY`	Anthropic key
`GOOGLE_API_KEY` or `GEMINI_API_KEY`	Gemini key
`OLLAMA_HOST`	Ollama endpoint; local addresses only
`FLUID_OLLAMA_MODEL`	Ollama model name

Agent-loop tuning

Env var	Purpose
`FLUID_AGENT_COMPACT_AFTER`	Iteration count after which the multi-turn agent loop compacts older tool results to stay under the model's context window. Default `6`. Set to a higher number for long-context Anthropic / Gemini runs; lower for tight-context Ollama models.
`FLUID_COMPACTION_STRATEGY`	`truncate` (default — char/token-aware truncation), `summarize` (LLM-backed; calls your provider's fast tier once per compaction trigger), or `hybrid` (truncate first, then summarize the rest if still over budget). See Agentic primitives → Token-budget pre-flight & compaction.
`FLUID_TOKEN_COUNTER`	Internal — selects the token-counting backend. Default is the pure-Python char-based heuristic; the CLI does not require an external tokenizer.
`FLUID_OPENAI_STRICT_SCHEMA`	`1` to enable the recursive strict-schema walker for OpenAI's `response_format = json_schema` mode. Closes the "Invalid schema for response_format 'ForgeContract'" 400 some `gpt-4o`/`gpt-4.1`/o-series deployments return when nested objects are free-form. Free-form fields are rewritten to JSON-encoded strings under strict mode.
`FLUID_QUIET` / `FLUID_NONINTERACTIVE`	`1` to silence the v2-preview banner and capability-degradation warnings. The warnings are still recorded to telemetry.

Use fluid ai setup for interactive setup and key storage. Provider and model choices are saved in ~/.fluid/ai_config.json; API keys go to the OS keyring by default. Plaintext API-key persistence requires explicit opt-in with FLUID_ALLOW_PLAINTEXT_AI_SECRETS=1.

Run-start capability warnings

When you pick a provider/model whose declared capabilities don't satisfy what the run needs (e.g. gpt-3.5 in agent-loop mode, an Ollama model with no tool-use support, or a model not yet in the capability catalog), the CLI prints a one-paragraph warning at the start of fluid forge data-model from-intent and continues with degraded behaviour. See Capability Warnings for the full matrix and the exact banner shape.

Operator-facing errors

When a provider call fails, the CLI raises a typed exception that distinguishes rate limits, context-overflow, auth failures, transient server errors, and schema-validation failures so retries honor Retry-After and the agent loop can route corrective feedback to the LLM. See Typed Errors for the full reference.

For complete command journeys, see AI Forge And Data-Model Journeys.