V1.5 + V2 Hardening — Release Notes
This page is the changelog for the V1.5 catalog integration and V2 polish releases of forge-cli.
Audience: anyone planning to upgrade, anyone auditing what's new.
Top-line summary
| Theme | Status | Test count |
|---|---|---|
| V1.5 — Catalog integration (7 catalogs) | shipped | 116 catalog tests |
| V1.5 — Three-stage catalog metadata flow | shipped | covered by Logical / Builder / Validator suites |
| V1.5 — Industry auto-detection from catalog tags | shipped | 39 tests |
V1.5 — fluid ai setup --source wizard | shipped | covered by interview e2e tests |
V1.5 — MCP forge_from_source + 5 read-only catalog tools | shipped | 23 catalog-MCP tests |
V1.5 — MCP tool inputSchema for typed autocomplete | shipped | 23 catalog-MCP tests (inc. 3 new pins) |
| V1.5 — Per-adapter unit tests (5 SDKs stubbed) | shipped | 61 across BigQuery / Dataplex / Glue / DataHub / DMM |
V1.5 — CatalogAdapter ABC pinned in public API | shipped | 119 public API tests (33 new) |
| V1.5 — Catalog docs (forge_docs) | shipped | 7 catalog pages + index + walkthrough |
| V1.5 — CONTRIBUTING walkthrough | shipped | n/a (docs) |
| V1.5 — Issue template catalog context | shipped | n/a (template) |
| V2 — Cost-tracking missing-usage surfacing | shipped | 8 tests |
V2 — Per-org price override (~/.fluid/prices.json) | shipped | 7 tests |
V2 — capability_matrix in cache key | shipped | 5 tests |
V2 — Anthropic cache_control regression pin | shipped | 5 tests |
| V2 — Variant-lint warning count in cost footer | shipped | 9 tests |
Full sweep as of this release: 9104 passed, 194 skipped, 0 failed in ~105s.
V1.5 — Catalog integration
Seven adapters
Source-side catalog adapters reading metadata FROM existing catalogs (the complement to publish-target providers):
Each adapter follows the nine reusable patterns documented in _patterns.py. Adding a new adapter is a weekend project — see the contributor walkthrough.
Two surfaces, one pipeline
Both surfaces dispatch to the same staged Logical pipeline:
# CLI
fluid forge data-model from-source \
--source snowflake \
--credential-id snowflake-prod \
--database BIZ_LAB --schema SEEDED \
--technique data-vault-2 \
-o biz_lab.fluid.yaml
// MCP — Claude Code, Cursor, any MCP client
{
"tool": "forge_from_source",
"arguments": {
"source": "snowflake",
"credentials": { "credential_id": "snowflake-prod" },
"scope": { "database": "BIZ_LAB", "schema": "SEEDED" },
"technique": "data_vault_2",
"output_path": "biz_lab.fluid.yaml"
}
}
Three-stage catalog metadata flow
Catalog signal shapes every stage of the pipeline, not just the Logical input. See the full mapping table in V1.5 architecture.
Highlights:
- Logical: catalog descriptions / FK constraints / glossary terms feed
OSIDataset/OSIRelationship/OSI.ai_context. - Builder: catalog owner / sensitivity / lineage land in Fluid contract metadata verbatim — modeler does not re-invent.
- Transformation: partition keys / clustering keys / quality rules / freshness SLAs become dbt configs.
System roles never become owners
Snowflake ACCOUNTADMIN / SYSADMIN / SECURITYADMIN / USERADMIN / ORGADMIN / PUBLIC are NOT promoted to metadata.owner.team. They land in labels.catalogCreatingRoles (audit only) so the contract reflects the business team, not the DDL-running role.
Industry auto-detection
Catalog domain tags are matched against INDUSTRY_DOMAIN_HINTS:
| Tag | Industry pack |
|---|---|
telco, cdr, network | telecommunications |
healthcare, phi, clinical | healthcare |
finance, pci, transaction | finance |
retail, commerce, pos | retail |
Most-common-hit-per-scope wins. Operator can override with --industry.
fluid ai setup --source wizard
Per-source interactive setup with auth-method recommendations (★ marks the recommended path) and field-validation. Saves non-sensitive fields to ~/.fluid/sources.yaml; secrets go to the OS keyring.
fluid ai setup --source snowflake --name snowflake-prod
fluid ai status # lists configured sources
fluid ai status extended
Existing command now lists configured catalogs alongside the LLM provider config. One-stop summary of "what's wired up on this machine."
MCP inputSchema populated
Every MCP tool now advertises a JSON Schema for its arguments at tools/list. Claude Code / Cursor / VS Code MCP clients can drive typed autocomplete on:
sourceenum (snowflake | unity | bigquery | dataplex | glue | datahub | datamesh_manager)techniqueenum (data_vault_2 | dimensional)credentials.credential_id(required)scope.database,scope.schema,scope.tables,scope.catalogoutput_path,logical_path
Closed enums + additionalProperties: false defends against hallucinated free-form values.
Public API stability test
tests/test_public_api_stability.py adds 33 new entries pinning the V1.5 catalog surface:
CatalogAdapter,CatalogTable,CatalogColumn,CatalogForeignKey,CatalogLineage,CatalogScope,GlossaryTerm,LineageRef,SensitivityTagCatalogConfigError,CatalogConnectionError,CatalogPermissionError,CredentialNotFoundError,CredentialResolver- Seven typed
*Credentialsclasses - Seven concrete
*CatalogAdapterclasses INDUSTRY_DOMAIN_HINTS,match_industry_from_domain,match_industry_from_catalog_tags,detect_industry_from_catalog_tablesrun_from_catalog,CatalogPipelineResult,run_from_source_command
A future refactor that drops or renames any of these fails the test loudly. Removal requires a deprecation cycle or a v2 bump.
V2 polish items
Cost-tracking missing-usage surfacing
Some providers ship empty usage blocks under load. Without a counter, users would see misleading "$0.0042" totals with no hint that the figure is under-reported.
Cost summary
─────────────────────────────────────────────────────────────────
...
─────────────────────────────────────────────────────────────────
total 12,453 in 3,827 out $0.0042
Note: 2 calls had no usage data; cost may be under-reported.
The counter increments on extract_usage exceptions AND on 0/0 token responses from non-Ollama providers. Ollama's (0, 0) baseline is legitimate (local compute), so it's exempt.
Per-org price override
~/.fluid/prices.json lets enterprise customers patch in their negotiated rates without forking forge-cli:
{
"schema_version": 1,
"prices": {
"claude-sonnet-4-6": [2.40, 12.00]
}
}
Both wrapped ({"prices": {...}}) and flat ({...}) layouts are accepted. Negative / wrong-shape entries are silently rejected per-entry.
See cost tracking for full details.
capability_matrix in cache key
generate_cache_key adds a fourth hash segment for the capability matrix:
sha256(model || sha256(prompt) || sha256(canonical(params)) || sha256(canonical(capability_matrix)))
Flipping a capability flag (extended-thinking budget, prompt-cache mode, structured-output strictness) now invalidates the cache cleanly. Two runs with identical model/prompt/params but different capability matrices hash distinct.
Anthropic cache_control regression pin
The Anthropic provider's system field is an array of content blocks with cache_control: {type: "ephemeral"} — required for prompt caching to engage server-side. A new test class TestAnthropicCacheControl pins:
systemis array, not string- Ephemeral marker present on system block
- Long (≥1024-token) system prompts retain marker
- Short prompts still advertise marker (server no-ops below threshold)
- Tool-request path also honors
cache_control
Without these pins, a future refactor could "simplify" the system field back to a plain string, the cache_control would silently disappear, and the warm-cache regression test would flap intermittently.
Variant-lint warning footer
When the dimensional variant validator runs (per-Kimball-flavor lint), warning counts surface in the cost summary footer:
Note: 2 variant-lint warnings on variant='snowflake'.
See validation report for details.
The footer:
- Replaces (not accumulates) on repair-loop reruns.
- Pluralises correctly.
- Sorted alphabetically when multiple variants have findings.
- Silent on clean pass.
Test sweep
tests/copilot/catalog/ 116 tests
test_catalog_adapter_base.py 20
test_catalog_adapter_bigquery.py 13
test_catalog_adapter_datahub.py 11
test_catalog_adapter_dataplex.py 10
test_catalog_adapter_dmm.py 7
test_catalog_adapter_glue.py 13
test_catalog_mcp_tools.py 23
test_credentials_resolver.py 21
test_industry_autodetect_* (within wider industry suite)
tests/copilot/test_cost_tracking.py 39 tests
tests/copilot/test_store_keys.py 9 tests
tests/test_provider_determinism_payloads.py 27 tests
tests/test_public_api_stability.py 119 tests
Full sweep: 9104 passed, 194 skipped, 0 failed
Files added in this release
New module tree:
fluid_build/copilot/catalog/
├── __init__.py
├── _patterns.py # 9 reusable patterns
├── base.py # CatalogAdapter ABC + typed errors
├── bigquery.py
├── credentials.py # CredentialResolver + 7 *Credentials
├── datahub.py
├── datamesh_manager.py
├── dataplex.py
├── glue.py
├── models.py # CatalogTable, CatalogColumn, ...
├── snowflake.py
└── unity.py
fluid_build/forge_datamodel/from_catalog/
├── __init__.py
└── pipeline.py # run_from_catalog + CatalogPipelineResult
fluid_build/cli/
├── ai_source_setup.py # `fluid ai setup --source` wizard
fluid_build/copilot/industry/
└── compiler.py # INDUSTRY_DOMAIN_HINTS + matchers
Key edits:
fluid_build/cli/forge_data_model.py # from-source subcommand
fluid_build/cli/forge_copilot_interview.py # AI-mode catalog branch
fluid_build/cli/mcp.py # 6 source-catalog tools + inputSchema
fluid_build/cli/ai_setup.py # --source / --name args + ai status
fluid_build/copilot/agents/coordinator.py # from_catalog method
fluid_build/copilot/agents/logical_agent.py # from_catalog + summary
fluid_build/forge_datamodel/emit/fluid_contract.py # promote catalog signal
fluid_build/forge_datamodel/emit/validator.py # variant-lint hook
fluid_build/copilot/cost.py # missing-usage + override + variant-lint
fluid_build/copilot/store/keys.py # capability_matrix segment
fluid_build/copilot/agents/base.py # split try/except for usage extraction
Tests:
tests/copilot/catalog/test_catalog_adapter_bigquery.py # NEW (13)
tests/copilot/catalog/test_catalog_adapter_dataplex.py # NEW (10)
tests/copilot/catalog/test_catalog_adapter_glue.py # NEW (13)
tests/copilot/catalog/test_catalog_adapter_datahub.py # NEW (11)
tests/copilot/catalog/test_catalog_adapter_dmm.py # NEW (7)
tests/copilot/test_industry_autodetect_from_catalog.py # NEW (39)
tests/copilot/test_cost_tracking.py # +21 tests
tests/copilot/test_store_keys.py # +5 tests
tests/test_provider_determinism_payloads.py # +5 tests
tests/test_public_api_stability.py # +33 entries
tests/copilot/catalog/test_catalog_mcp_tools.py # +3 tests
Migration notes
Nothing breaks. Every catalog adapter ships behind an optional extra. The default pip install data-product-forge is unchanged.
Existing fluid forge data-model from-intent / from-ddl flows continue to work exactly as before — V1.5 adds a catalog entry point, never replaces an existing one.
The cost summary format adds new footer lines (missing-usage, variant-lint) only when the relevant counter is non-zero. Clean runs see the same summary as before.
The cache-key shape gained a fourth segment for capability_matrix, which means all entries cached before this release are invalidated on first run. This is by design: the new keying is correct, the old keying could collide on capability mismatches.