Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

Cost Tracking

Every fluid forge data-model invocation prints a per-run cost summary. CLI-only — no UI, no dashboard, just a one-block panel in the terminal:

Cost summary
─────────────────────────────────────────────────────────────────
  anthropic / claude-sonnet-4-6     12,453 in   3,827 out  $0.0247
  anthropic / claude-haiku-4-5         876 in     412 out  $0.0006
─────────────────────────────────────────────────────────────────
  total                            13,329 in   4,239 out  $0.0253

This page documents the price table, the per-org override path, the missing-usage warning footer, and the variant-lint surfacing — all V2 polish items shipped with V1.5.

Embedded price table

Prices live in fluid_build/copilot/cost.py::MODEL_PRICES_USD — USD per 1M tokens, (input_price, output_price) tuples. Source: each provider's public pricing page. Snapshot date is in the module docstring.

The table is a frozen Python dict, not a pulled-at-runtime catalog. Stale entries fail loud-but-safe — unknown models surface with $? instead of a misleading $0.00.

MODEL_PRICES_USD: Dict[str, Tuple[float, float]] = {
    "claude-sonnet-4-6": (3.00, 15.00),
    "claude-haiku-4-5": (1.00, 5.00),
    "claude-opus-4-7": (15.00, 75.00),
    "gpt-4.1": (2.50, 10.00),
    "gpt-4.1-mini": (0.15, 0.60),
    "gemini-2.5-pro": (1.25, 5.00),
    "gemini-2.5-flash": (0.075, 0.30),
    # Ollama is local — provider-name match wins, returns $0 for any model.
    "*ollama*": (0.0, 0.0),
}

When forge-cli sees a model not in the table:

Cost summary
─────────────────────────────────────────────────────────────────
  openai / future-gpt-9000   1,000 in   500 out   $?
─────────────────────────────────────────────────────────────────
  total                      1,000 in   500 out   $?

  Note: no price table entry for 'future-gpt-9000'.
  Update fluid_build/copilot/cost.py:MODEL_PRICES_USD.

Total is $? whenever any row is unknown — defends against partial sums that look authoritative.

Per-org price override

Enterprise customers negotiate rates that don't match the embedded list price. The override file at ~/.fluid/prices.json patches in your negotiated rates without forking forge-cli:

{
  "schema_version": 1,
  "prices": {
    "claude-sonnet-4-6": [2.40, 12.00],
    "gpt-4.1": [2.00, 8.00]
  }
}

The flat layout {"model": [in, out]} is also accepted — operators scribbling overrides don't have to look up the wrapped schema.

Path resolution order

  1. $FLUID_PRICES_JSON — explicit override (used by tests).
  2. $FLUID_HOME/prices.json if $FLUID_HOME is set.
  3. ~/.fluid/prices.json (default).

Failure modes (always silent fallback)

  • Override file missing → embedded table wins. No warning.
  • Override JSON malformed → embedded table wins. Logged at DEBUG.
  • Negative price in override → that entry skipped (rest applied).
  • Wrong-shape entry (e.g. [0.10] instead of [0.10, 0.40]) → that entry skipped, rest applied.

The override file is operator-edited, so syntax errors are real possibilities. We never let a malformed override break a forge run.

Missing-usage warning footer

Some providers ship empty usage blocks under load (or on streaming-cancellation paths, or on certain Azure deployments). Without a counter, the user would see a misleading "$0.0042" total with no hint that the figure is under-reported.

V1.5+V2 polish wires a missing-usage counter:

Cost summary
─────────────────────────────────────────────────────────────────
  openai / gpt-4.1-mini     12,453 in   3,827 out  $0.0042
─────────────────────────────────────────────────────────────────
  total                     12,453 in   3,827 out  $0.0042

  Note: 2 calls had no usage data; cost may be under-reported.

The counter increments on two paths:

  1. extract_usage exception — provider's usage extractor blew up. The call is recorded as missing without per-row token data.
  2. 0/0 token counts on a non-Ollama provider — the LLM responded but the provider ate the usage block.

Ollama is special-cased: its (0, 0) baseline is legitimate (local compute, no token counts) so 0/0 calls there don't flag.

Streaming runs now report accurate usage

Pre-fix, every SSE-streamed call landed on path #2 above because the iterator discarded the terminal usage event. The footer was the default state for any user with FLUID_LLM_STREAMING=1.

The provider classes now extract token usage from the SSE wire on all four supported providers (OpenAI's terminal usage chunk, Anthropic's message_start + message_delta accumulation, Gemini's usageMetadata, Ollama's OpenAI-compatible final chunk on Ollama 0.3.x+) and stash it in a thread-local that BaseStageAgent._call_once reads after the streaming context exits. Cost summaries on streamed runs now match the blocking-path numbers.

The counter resets per run. fluid forge data-model calls reset_run_tracker() at start so the summary reflects only the current invocation.

Variant-lint warning footer

When the dimensional variant validator runs (per-Kimball-flavor lint), warnings flow into the validation report. V1.5+V2 polish also surfaces them in the cost summary footer so operators piping stdout to a log see the lint score next to the cost:

Cost summary
─────────────────────────────────────────────────────────────────
  anthropic / claude-sonnet-4-6   12,453 in   3,827 out  $0.0247
─────────────────────────────────────────────────────────────────
  total                           12,453 in   3,827 out  $0.0247

  Note: 2 variant-lint warnings on variant='snowflake'.
  See validation report for details.

The footer:

  • Shows one line per variant with non-zero warnings (sorted alphabetically).
  • Pluralises correctly ("1 warning" vs "2 warnings").
  • Replaces (not accumulates) on repair-loop reruns — the count reflects the FINAL pass, not all retries summed.
  • Is silent when every variant lint passes — no false alarms.

What gets tracked

Every staged LLM call goes through BaseStageAgent._call_once, which after parsing the response calls get_run_tracker().record_call():

get_run_tracker().record_call(
    provider=provider.name,
    model=config.model,
    input_tokens=int(usage.get("input_tokens", 0) or 0),
    output_tokens=int(usage.get("output_tokens", 0) or 0),
)

Anthropic prompt-cache tokens are visible in the breakdown

On Anthropic, the usage block also carries cache_read_input_tokens and cache_creation_input_tokens (and Gemini emits cachedContentTokenCount for its context-cache feature). When you run a multi-stage pipeline (fluid forge data-model from-intent) the system prompt is identical across stages, so the cache hit rate tends to be 80–90% on calls 2..N. Concretely: a 9-stage Anthropic run that would have charged for 36K input tokens at full rate often comes in around 7K equivalent input-token cost — the discount shows up in the per-call cost figures because the price table maps cache-read tokens to the discounted rate.

The tracker is a module-level singleton because it has to be written from threads (parallel-physical fan-out runs three agents concurrently) without threading a context object through the entire pipeline. The lock is per-instance.

Hermetic tests

The price table itself is regression-pinned:

def test_price_table_entries_well_formed():
    """Every entry is a (in, out) tuple with non-negative numeric prices."""
    for model, prices in MODEL_PRICES_USD.items():
        assert isinstance(prices, tuple)
        assert len(prices) == 2
        for p in prices:
            assert isinstance(p, (int, float))
            assert p >= 0

Override semantics, missing-usage flags, and variant-lint surfacing are all covered by tests/copilot/test_cost_tracking.py (39 tests).

See also

  • Cost summary in fluid forge data-model
  • V1.5 architecture
Edit this page on GitHub
Last Updated: 4/30/26, 5:21 PM
Contributors: fas89, Claude Opus 4.7
Prev
Credential Resolver — Security Model
Next
Agentic Primitives