Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

V1.5 + V2 Hardening — Release Notes

This page is the changelog for the V1.5 catalog integration and V2 polish releases of forge-cli.

Audience: anyone planning to upgrade, anyone auditing what's new.

Top-line summary

ThemeStatusTest count
V1.5 — Catalog integration (7 catalogs)shipped116 catalog tests
V1.5 — Three-stage catalog metadata flowshippedcovered by Logical / Builder / Validator suites
V1.5 — Industry auto-detection from catalog tagsshipped39 tests
V1.5 — fluid ai setup --source wizardshippedcovered by interview e2e tests
V1.5 — MCP forge_from_source + 5 read-only catalog toolsshipped23 catalog-MCP tests
V1.5 — MCP tool inputSchema for typed autocompleteshipped23 catalog-MCP tests (inc. 3 new pins)
V1.5 — Per-adapter unit tests (5 SDKs stubbed)shipped61 across BigQuery / Dataplex / Glue / DataHub / DMM
V1.5 — CatalogAdapter ABC pinned in public APIshipped119 public API tests (33 new)
V1.5 — Catalog docs (forge_docs)shipped7 catalog pages + index + walkthrough
V1.5 — CONTRIBUTING walkthroughshippedn/a (docs)
V1.5 — Issue template catalog contextshippedn/a (template)
V2 — Cost-tracking missing-usage surfacingshipped8 tests
V2 — Per-org price override (~/.fluid/prices.json)shipped7 tests
V2 — capability_matrix in cache keyshipped5 tests
V2 — Anthropic cache_control regression pinshipped5 tests
V2 — Variant-lint warning count in cost footershipped9 tests

Full sweep as of this release: 9104 passed, 194 skipped, 0 failed in ~105s.

V1.5 — Catalog integration

Seven adapters

Source-side catalog adapters reading metadata FROM existing catalogs (the complement to publish-target providers):

  • Snowflake Horizon
  • Databricks Unity
  • BigQuery
  • Dataplex
  • AWS Glue
  • DataHub
  • Data Mesh Manager

Each adapter follows the nine reusable patterns documented in _patterns.py. Adding a new adapter is a weekend project — see the contributor walkthrough.

Two surfaces, one pipeline

Both surfaces dispatch to the same staged Logical pipeline:

# CLI
fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB --schema SEEDED \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml
// MCP — Claude Code, Cursor, any MCP client
{
  "tool": "forge_from_source",
  "arguments": {
    "source": "snowflake",
    "credentials": { "credential_id": "snowflake-prod" },
    "scope": { "database": "BIZ_LAB", "schema": "SEEDED" },
    "technique": "data_vault_2",
    "output_path": "biz_lab.fluid.yaml"
  }
}

Three-stage catalog metadata flow

Catalog signal shapes every stage of the pipeline, not just the Logical input. See the full mapping table in V1.5 architecture.

Highlights:

  • Logical: catalog descriptions / FK constraints / glossary terms feed OSIDataset / OSIRelationship / OSI.ai_context.
  • Builder: catalog owner / sensitivity / lineage land in Fluid contract metadata verbatim — modeler does not re-invent.
  • Transformation: partition keys / clustering keys / quality rules / freshness SLAs become dbt configs.

System roles never become owners

Snowflake ACCOUNTADMIN / SYSADMIN / SECURITYADMIN / USERADMIN / ORGADMIN / PUBLIC are NOT promoted to metadata.owner.team. They land in labels.catalogCreatingRoles (audit only) so the contract reflects the business team, not the DDL-running role.

Industry auto-detection

Catalog domain tags are matched against INDUSTRY_DOMAIN_HINTS:

TagIndustry pack
telco, cdr, networktelecommunications
healthcare, phi, clinicalhealthcare
finance, pci, transactionfinance
retail, commerce, posretail

Most-common-hit-per-scope wins. Operator can override with --industry.

fluid ai setup --source wizard

Per-source interactive setup with auth-method recommendations (★ marks the recommended path) and field-validation. Saves non-sensitive fields to ~/.fluid/sources.yaml; secrets go to the OS keyring.

fluid ai setup --source snowflake --name snowflake-prod
fluid ai status                              # lists configured sources

fluid ai status extended

Existing command now lists configured catalogs alongside the LLM provider config. One-stop summary of "what's wired up on this machine."

MCP inputSchema populated

Every MCP tool now advertises a JSON Schema for its arguments at tools/list. Claude Code / Cursor / VS Code MCP clients can drive typed autocomplete on:

  • source enum (snowflake | unity | bigquery | dataplex | glue | datahub | datamesh_manager)
  • technique enum (data_vault_2 | dimensional)
  • credentials.credential_id (required)
  • scope.database, scope.schema, scope.tables, scope.catalog
  • output_path, logical_path

Closed enums + additionalProperties: false defends against hallucinated free-form values.

Public API stability test

tests/test_public_api_stability.py adds 33 new entries pinning the V1.5 catalog surface:

  • CatalogAdapter, CatalogTable, CatalogColumn, CatalogForeignKey, CatalogLineage, CatalogScope, GlossaryTerm, LineageRef, SensitivityTag
  • CatalogConfigError, CatalogConnectionError, CatalogPermissionError, CredentialNotFoundError, CredentialResolver
  • Seven typed *Credentials classes
  • Seven concrete *CatalogAdapter classes
  • INDUSTRY_DOMAIN_HINTS, match_industry_from_domain, match_industry_from_catalog_tags, detect_industry_from_catalog_tables
  • run_from_catalog, CatalogPipelineResult, run_from_source_command

A future refactor that drops or renames any of these fails the test loudly. Removal requires a deprecation cycle or a v2 bump.

V2 polish items

Cost-tracking missing-usage surfacing

Some providers ship empty usage blocks under load. Without a counter, users would see misleading "$0.0042" totals with no hint that the figure is under-reported.

Cost summary
─────────────────────────────────────────────────────────────────
  ...
─────────────────────────────────────────────────────────────────
  total                  12,453 in   3,827 out   $0.0042

  Note: 2 calls had no usage data; cost may be under-reported.

The counter increments on extract_usage exceptions AND on 0/0 token responses from non-Ollama providers. Ollama's (0, 0) baseline is legitimate (local compute), so it's exempt.

Per-org price override

~/.fluid/prices.json lets enterprise customers patch in their negotiated rates without forking forge-cli:

{
  "schema_version": 1,
  "prices": {
    "claude-sonnet-4-6": [2.40, 12.00]
  }
}

Both wrapped ({"prices": {...}}) and flat ({...}) layouts are accepted. Negative / wrong-shape entries are silently rejected per-entry.

See cost tracking for full details.

capability_matrix in cache key

generate_cache_key adds a fourth hash segment for the capability matrix:

sha256(model || sha256(prompt) || sha256(canonical(params)) || sha256(canonical(capability_matrix)))

Flipping a capability flag (extended-thinking budget, prompt-cache mode, structured-output strictness) now invalidates the cache cleanly. Two runs with identical model/prompt/params but different capability matrices hash distinct.

Anthropic cache_control regression pin

The Anthropic provider's system field is an array of content blocks with cache_control: {type: "ephemeral"} — required for prompt caching to engage server-side. A new test class TestAnthropicCacheControl pins:

  • system is array, not string
  • Ephemeral marker present on system block
  • Long (≥1024-token) system prompts retain marker
  • Short prompts still advertise marker (server no-ops below threshold)
  • Tool-request path also honors cache_control

Without these pins, a future refactor could "simplify" the system field back to a plain string, the cache_control would silently disappear, and the warm-cache regression test would flap intermittently.

Variant-lint warning footer

When the dimensional variant validator runs (per-Kimball-flavor lint), warning counts surface in the cost summary footer:

  Note: 2 variant-lint warnings on variant='snowflake'.
  See validation report for details.

The footer:

  • Replaces (not accumulates) on repair-loop reruns.
  • Pluralises correctly.
  • Sorted alphabetically when multiple variants have findings.
  • Silent on clean pass.

Test sweep

tests/copilot/catalog/                  116 tests
  test_catalog_adapter_base.py            20
  test_catalog_adapter_bigquery.py        13
  test_catalog_adapter_datahub.py         11
  test_catalog_adapter_dataplex.py        10
  test_catalog_adapter_dmm.py              7
  test_catalog_adapter_glue.py            13
  test_catalog_mcp_tools.py               23
  test_credentials_resolver.py            21
  test_industry_autodetect_*               (within wider industry suite)

tests/copilot/test_cost_tracking.py      39 tests
tests/copilot/test_store_keys.py          9 tests
tests/test_provider_determinism_payloads.py  27 tests
tests/test_public_api_stability.py      119 tests

Full sweep:                            9104 passed, 194 skipped, 0 failed

Files added in this release

New module tree:

fluid_build/copilot/catalog/
├── __init__.py
├── _patterns.py                    # 9 reusable patterns
├── base.py                         # CatalogAdapter ABC + typed errors
├── bigquery.py
├── credentials.py                  # CredentialResolver + 7 *Credentials
├── datahub.py
├── datamesh_manager.py
├── dataplex.py
├── glue.py
├── models.py                       # CatalogTable, CatalogColumn, ...
├── snowflake.py
└── unity.py

fluid_build/forge_datamodel/from_catalog/
├── __init__.py
└── pipeline.py                     # run_from_catalog + CatalogPipelineResult

fluid_build/cli/
├── ai_source_setup.py              # `fluid ai setup --source` wizard

fluid_build/copilot/industry/
└── compiler.py                     # INDUSTRY_DOMAIN_HINTS + matchers

Key edits:

fluid_build/cli/forge_data_model.py  # from-source subcommand
fluid_build/cli/forge_copilot_interview.py  # AI-mode catalog branch
fluid_build/cli/mcp.py               # 6 source-catalog tools + inputSchema
fluid_build/cli/ai_setup.py          # --source / --name args + ai status
fluid_build/copilot/agents/coordinator.py    # from_catalog method
fluid_build/copilot/agents/logical_agent.py  # from_catalog + summary
fluid_build/forge_datamodel/emit/fluid_contract.py  # promote catalog signal
fluid_build/forge_datamodel/emit/validator.py        # variant-lint hook
fluid_build/copilot/cost.py          # missing-usage + override + variant-lint
fluid_build/copilot/store/keys.py    # capability_matrix segment
fluid_build/copilot/agents/base.py   # split try/except for usage extraction

Tests:

tests/copilot/catalog/test_catalog_adapter_bigquery.py     # NEW (13)
tests/copilot/catalog/test_catalog_adapter_dataplex.py     # NEW (10)
tests/copilot/catalog/test_catalog_adapter_glue.py         # NEW (13)
tests/copilot/catalog/test_catalog_adapter_datahub.py      # NEW (11)
tests/copilot/catalog/test_catalog_adapter_dmm.py          # NEW (7)
tests/copilot/test_industry_autodetect_from_catalog.py     # NEW (39)
tests/copilot/test_cost_tracking.py                        # +21 tests
tests/copilot/test_store_keys.py                           # +5 tests
tests/test_provider_determinism_payloads.py                # +5 tests
tests/test_public_api_stability.py                         # +33 entries
tests/copilot/catalog/test_catalog_mcp_tools.py            # +3 tests

Migration notes

Nothing breaks. Every catalog adapter ships behind an optional extra. The default pip install data-product-forge is unchanged.

Existing fluid forge data-model from-intent / from-ddl flows continue to work exactly as before — V1.5 adds a catalog entry point, never replaces an existing one.

The cost summary format adds new footer lines (missing-usage, variant-lint) only when the relevant counter is non-zero. Clean runs see the same summary as before.

The cache-key shape gained a fourth segment for capability_matrix, which means all entries cached before this release are invalidated on first run. This is by design: the new keying is correct, the old keying could collide on capability mismatches.

See also

  • Catalogs index
  • V1.5 architecture deep-dive
  • Credential resolver
  • Cost tracking
  • End-to-end walkthrough
Edit this page on GitHub
Last Updated: 5/17/26, 6:10 PM
Contributors: fas89, Claude Opus 4.7, Claude Opus 4.7 (1M context)
Prev
V1.5 Catalog Integration — Architecture Deep-Dive