V1.5 + V2 Hardening — Release Notes

This page is the changelog for the V1.5 catalog integration and V2 polish releases of forge-cli.

Audience: anyone planning to upgrade, anyone auditing what's new.

Top-line summary

Theme	Status	Test count
V1.5 — Catalog integration (7 catalogs)	shipped	116 catalog tests
V1.5 — Three-stage catalog metadata flow	shipped	covered by Logical / Builder / Validator suites
V1.5 — Industry auto-detection from catalog tags	shipped	39 tests
V1.5 — `fluid ai setup --source` wizard	shipped	covered by interview e2e tests
V1.5 — MCP `forge_from_source` + 5 read-only catalog tools	shipped	23 catalog-MCP tests
V1.5 — MCP tool `inputSchema` for typed autocomplete	shipped	23 catalog-MCP tests (inc. 3 new pins)
V1.5 — Per-adapter unit tests (5 SDKs stubbed)	shipped	61 across BigQuery / Dataplex / Glue / DataHub / DMM
V1.5 — `CatalogAdapter` ABC pinned in public API	shipped	119 public API tests (33 new)
V1.5 — Catalog docs (forge_docs)	shipped	7 catalog pages + index + walkthrough
V1.5 — CONTRIBUTING walkthrough	shipped	n/a (docs)
V1.5 — Issue template catalog context	shipped	n/a (template)
V2 — Cost-tracking missing-usage surfacing	shipped	8 tests
V2 — Per-org price override (`~/.fluid/prices.json`)	shipped	7 tests
V2 — `capability_matrix` in cache key	shipped	5 tests
V2 — Anthropic `cache_control` regression pin	shipped	5 tests
V2 — Variant-lint warning count in cost footer	shipped	9 tests

Full sweep as of this release: 9104 passed, 194 skipped, 0 failed in ~105s.

V1.5 — Catalog integration

Seven adapters

Source-side catalog adapters reading metadata FROM existing catalogs (the complement to publish-target providers):

Each adapter follows the nine reusable patterns documented in _patterns.py. Adding a new adapter is a weekend project — see the contributor walkthrough.

Two surfaces, one pipeline

Both surfaces dispatch to the same staged Logical pipeline:

# CLI
fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB --schema SEEDED \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml

// MCP — Claude Code, Cursor, any MCP client
{
  "tool": "forge_from_source",
  "arguments": {
    "source": "snowflake",
    "credentials": { "credential_id": "snowflake-prod" },
    "scope": { "database": "BIZ_LAB", "schema": "SEEDED" },
    "technique": "data_vault_2",
    "output_path": "biz_lab.fluid.yaml"
  }
}

Three-stage catalog metadata flow

Catalog signal shapes every stage of the pipeline, not just the Logical input. See the full mapping table in V1.5 architecture.

Highlights:

Logical: catalog descriptions / FK constraints / glossary terms feed OSIDataset / OSIRelationship / OSI.ai_context.
Builder: catalog owner / sensitivity / lineage land in Fluid contract metadata verbatim — modeler does not re-invent.
Transformation: partition keys / clustering keys / quality rules / freshness SLAs become dbt configs.

System roles never become owners

Snowflake ACCOUNTADMIN / SYSADMIN / SECURITYADMIN / USERADMIN / ORGADMIN / PUBLIC are NOT promoted to metadata.owner.team. They land in labels.catalogCreatingRoles (audit only) so the contract reflects the business team, not the DDL-running role.

Industry auto-detection

Catalog domain tags are matched against INDUSTRY_DOMAIN_HINTS:

Tag	Industry pack
`telco`, `cdr`, `network`	telecommunications
`healthcare`, `phi`, `clinical`	healthcare
`finance`, `pci`, `transaction`	finance
`retail`, `commerce`, `pos`	retail

Most-common-hit-per-scope wins. Operator can override with --industry.

`fluid ai setup --source` wizard

Per-source interactive setup with auth-method recommendations (★ marks the recommended path) and field-validation. Saves non-sensitive fields to ~/.fluid/sources.yaml; secrets go to the OS keyring.

fluid ai setup --source snowflake --name snowflake-prod
fluid ai status                              # lists configured sources

`fluid ai status` extended

Existing command now lists configured catalogs alongside the LLM provider config. One-stop summary of "what's wired up on this machine."

MCP `inputSchema` populated

Every MCP tool now advertises a JSON Schema for its arguments at tools/list. Claude Code / Cursor / VS Code MCP clients can drive typed autocomplete on:

source enum (snowflake | unity | bigquery | dataplex | glue | datahub | datamesh_manager)
technique enum (data_vault_2 | dimensional)
credentials.credential_id (required)
scope.database, scope.schema, scope.tables, scope.catalog
output_path, logical_path

Closed enums + additionalProperties: false defends against hallucinated free-form values.

Public API stability test

tests/test_public_api_stability.py adds 33 new entries pinning the V1.5 catalog surface:

CatalogAdapter, CatalogTable, CatalogColumn, CatalogForeignKey, CatalogLineage, CatalogScope, GlossaryTerm, LineageRef, SensitivityTag
CatalogConfigError, CatalogConnectionError, CatalogPermissionError, CredentialNotFoundError, CredentialResolver
Seven typed *Credentials classes
Seven concrete *CatalogAdapter classes
INDUSTRY_DOMAIN_HINTS, match_industry_from_domain, match_industry_from_catalog_tags, detect_industry_from_catalog_tables
run_from_catalog, CatalogPipelineResult, run_from_source_command

A future refactor that drops or renames any of these fails the test loudly. Removal requires a deprecation cycle or a v2 bump.

V2 polish items

Cost-tracking missing-usage surfacing

Some providers ship empty usage blocks under load. Without a counter, users would see misleading "$0.0042" totals with no hint that the figure is under-reported.

Cost summary
─────────────────────────────────────────────────────────────────
  ...
─────────────────────────────────────────────────────────────────
  total                  12,453 in   3,827 out   $0.0042

  Note: 2 calls had no usage data; cost may be under-reported.

The counter increments on extract_usage exceptions AND on 0/0 token responses from non-Ollama providers. Ollama's (0, 0) baseline is legitimate (local compute), so it's exempt.

Per-org price override

~/.fluid/prices.json lets enterprise customers patch in their negotiated rates without forking forge-cli:

{
  "schema_version": 1,
  "prices": {
    "claude-sonnet-4-6": [2.40, 12.00]
  }
}

Both wrapped ({"prices": {...}}) and flat ({...}) layouts are accepted. Negative / wrong-shape entries are silently rejected per-entry.

See cost tracking for full details.

`capability_matrix` in cache key

generate_cache_key adds a fourth hash segment for the capability matrix:

sha256(model || sha256(prompt) || sha256(canonical(params)) || sha256(canonical(capability_matrix)))

Flipping a capability flag (extended-thinking budget, prompt-cache mode, structured-output strictness) now invalidates the cache cleanly. Two runs with identical model/prompt/params but different capability matrices hash distinct.

Anthropic `cache_control` regression pin

The Anthropic provider's system field is an array of content blocks with cache_control: {type: "ephemeral"} — required for prompt caching to engage server-side. A new test class TestAnthropicCacheControl pins:

system is array, not string
Ephemeral marker present on system block
Long (≥1024-token) system prompts retain marker
Short prompts still advertise marker (server no-ops below threshold)
Tool-request path also honors cache_control

Without these pins, a future refactor could "simplify" the system field back to a plain string, the cache_control would silently disappear, and the warm-cache regression test would flap intermittently.

When the dimensional variant validator runs (per-Kimball-flavor lint), warning counts surface in the cost summary footer:

  Note: 2 variant-lint warnings on variant='snowflake'.
  See validation report for details.

The footer:

Replaces (not accumulates) on repair-loop reruns.
Pluralises correctly.
Sorted alphabetically when multiple variants have findings.
Silent on clean pass.

Test sweep

tests/copilot/catalog/                  116 tests
  test_catalog_adapter_base.py            20
  test_catalog_adapter_bigquery.py        13
  test_catalog_adapter_datahub.py         11
  test_catalog_adapter_dataplex.py        10
  test_catalog_adapter_dmm.py              7
  test_catalog_adapter_glue.py            13
  test_catalog_mcp_tools.py               23
  test_credentials_resolver.py            21
  test_industry_autodetect_*               (within wider industry suite)

tests/copilot/test_cost_tracking.py      39 tests
tests/copilot/test_store_keys.py          9 tests
tests/test_provider_determinism_payloads.py  27 tests
tests/test_public_api_stability.py      119 tests

Full sweep:                            9104 passed, 194 skipped, 0 failed

Files added in this release

New module tree:

fluid_build/copilot/catalog/
├── __init__.py
├── _patterns.py                    # 9 reusable patterns
├── base.py                         # CatalogAdapter ABC + typed errors
├── bigquery.py
├── credentials.py                  # CredentialResolver + 7 *Credentials
├── datahub.py
├── datamesh_manager.py
├── dataplex.py
├── glue.py
├── models.py                       # CatalogTable, CatalogColumn, ...
├── snowflake.py
└── unity.py

fluid_build/forge_datamodel/from_catalog/
├── __init__.py
└── pipeline.py                     # run_from_catalog + CatalogPipelineResult

fluid_build/cli/
├── ai_source_setup.py              # `fluid ai setup --source` wizard

fluid_build/copilot/industry/
└── compiler.py                     # INDUSTRY_DOMAIN_HINTS + matchers

Key edits:

fluid_build/cli/forge_data_model.py  # from-source subcommand
fluid_build/cli/forge_copilot_interview.py  # AI-mode catalog branch
fluid_build/cli/mcp.py               # 6 source-catalog tools + inputSchema
fluid_build/cli/ai_setup.py          # --source / --name args + ai status
fluid_build/copilot/agents/coordinator.py    # from_catalog method
fluid_build/copilot/agents/logical_agent.py  # from_catalog + summary
fluid_build/forge_datamodel/emit/fluid_contract.py  # promote catalog signal
fluid_build/forge_datamodel/emit/validator.py        # variant-lint hook
fluid_build/copilot/cost.py          # missing-usage + override + variant-lint
fluid_build/copilot/store/keys.py    # capability_matrix segment
fluid_build/copilot/agents/base.py   # split try/except for usage extraction

Tests:

tests/copilot/catalog/test_catalog_adapter_bigquery.py     # NEW (13)
tests/copilot/catalog/test_catalog_adapter_dataplex.py     # NEW (10)
tests/copilot/catalog/test_catalog_adapter_glue.py         # NEW (13)
tests/copilot/catalog/test_catalog_adapter_datahub.py      # NEW (11)
tests/copilot/catalog/test_catalog_adapter_dmm.py          # NEW (7)
tests/copilot/test_industry_autodetect_from_catalog.py     # NEW (39)
tests/copilot/test_cost_tracking.py                        # +21 tests
tests/copilot/test_store_keys.py                           # +5 tests
tests/test_provider_determinism_payloads.py                # +5 tests
tests/test_public_api_stability.py                         # +33 entries
tests/copilot/catalog/test_catalog_mcp_tools.py            # +3 tests

Migration notes

Nothing breaks. Every catalog adapter ships behind an optional extra. The default pip install data-product-forge is unchanged.

Existing fluid forge data-model from-intent / from-ddl flows continue to work exactly as before — V1.5 adds a catalog entry point, never replaces an existing one.

The cost summary format adds new footer lines (missing-usage, variant-lint) only when the relevant counter is non-zero. Clean runs see the same summary as before.

The cache-key shape gained a fourth segment for capability_matrix, which means all entries cached before this release are invalidated on first run. This is by design: the new keying is correct, the old keying could collide on capability mismatches.