Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

End-to-End Walkthrough: Catalog → Contract → Transformation

Five-minute walkthrough taking you from "I have a Snowflake schema" to "I have a working dbt project + Fluid contract" — using nothing but fluid commands.

The same flow works for every supported catalog. Pick your catalog from the catalogs index for the catalog-specific privilege grant, env-var setup, and auth options. The walkthrough below uses Snowflake.

Prerequisites

  • Python 3.10+
  • A Snowflake account with INFORMATION_SCHEMA read access on at least one schema
  • An LLM API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, or a local ollama runtime)

Install

pip install "data-product-forge[snowflake]"
# Or if you want every catalog at once:
# pip install "data-product-forge[catalogs]"

Step 1 — Configure your LLM provider (one-time)

fluid ai setup
# ? Which LLM provider? anthropic
# ? API key? <paste key into hidden prompt, never into docs>
# ? Default model? claude-sonnet-4-6
# ✓ Saved to ~/.fluid/ai_config.json

You only do this once per machine. The config is reused by every fluid forge and fluid generate invocation.

Step 2 — Configure your source catalog (one-time per source)

fluid ai setup --source snowflake --name snowflake-prod
# ? Catalog: snowflake
# ? Account: myorg-abc12345
# ? Auth method:
#   ★ key_pair (recommended)
#     password
#     externalbrowser
#     oauth
# ? Private key path: /Users/me/.ssh/snowflake_key.p8
# ? Default warehouse: COMPUTE_WH
# ? Default role: ANALYST
# ✓ Saved to ~/.fluid/sources.yaml (secrets in OS keyring)

fluid ai status confirms what's configured:

$ fluid ai status

LLM provider: anthropic / claude-sonnet-4-6
  - API key:           ✓ in OS keyring

Configured sources:
  - snowflake-prod     (snowflake / key_pair, ANALYST role)

Step 3 — Forge the data model

fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB --schema SEEDED \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml

What happens:

[CatalogConnect]   snowflake-prod (key_pair / ANALYST role)        0.4s
[CatalogList]      24 tables in BIZ_LAB.SEEDED                     2.1s
[CatalogInspect]   24 tables × (columns + tags + lineage)         11.7s
[IndustryDetect]   matched 'telco' from domain tags (18/24 tables)
[Logical]          24 tables → 23 hubs / 33 links / 23 satellites 19.4s
[Builder]          → biz_lab.fluid.yaml                            8.2s
[Transformation]   23 dbt models                                  18.5s
[Validator]        ✓ schema clean, no warnings                     0.6s

Cost summary
─────────────────────────────────────────────────────────────────
  anthropic / claude-sonnet-4-6   45,221 in  12,489 out  $0.3230
  anthropic / claude-haiku-4-5     2,891 in     876 out  $0.0073
─────────────────────────────────────────────────────────────────
  total                           48,112 in  13,365 out  $0.3303

✓ Forged to:
  biz_lab.fluid.yaml             — Fluid 0.7.3 contract
  biz_lab.fluid.yaml.model.json  — Logical IR sidecar (DV2)
  biz_lab.semantics.osi.yaml     — OSI v0.1.1 standalone

Step 4 — Inspect the forged contract

# Pretty-print the contract.
cat biz_lab.fluid.yaml | head -50

# Or open the canvas — visual entity-relationship view.
fluid viz-graph biz_lab.fluid.yaml

The contract carries every signal the catalog provided:

# biz_lab.fluid.yaml (excerpt)
metadata:
  name: biz_lab
  domain: telco                       # auto-detected from catalog tags
  owner:
    team: data-eng                    # from OBJECT_TAGS.owner_team
                                      # (system roles like ACCOUNTADMIN excluded)
  steward: alice@example.com
  certification: certified            # from Snowflake Horizon
  lineage:
    upstream:
      - urn: snowflake://BIZ_LAB.RAW.ORDERS
        relationship: imports

agentPolicy:
  sensitiveData:
    - column: customer_email
      classification: PII             # from SYSTEM$CLASSIFY
    - column: payment_card_number
      classification: PCI

exposes:
  - exposeId: orders
    kind: table
    qos:
      freshnessSLO: PT24H             # from Dataplex/Unity if present
    contract:
      schema:
        - name: order_date
          type: DATE
          required: true
      dq:
        rules:
          - id: order_date_completeness
            type: completeness
            selector: order_date
            severity: error
          - id: order_date_uniqueness
            type: uniqueness
            selector: order_date
            severity: error

Step 5 — Generate dbt transformations

fluid generate transformation biz_lab.fluid.yaml -o ./dbt_biz_lab --dbt-validate

You now have a working dbt project that:

  • Builds 23 hub/link/satellite models in topological order.
  • Honors partition keys from the catalog (dbt partition_by).
  • Wraps every PII column in dbt meta: tags.
  • Has dbt tests: for not_null / unique / accepted_values derived from catalog quality rules.
  • Uses ref() / source() references that match the lineage chain from the catalog.

Run the dbt project:

cd dbt_biz_lab
dbt run
dbt test

Step 6 — Iterate

If the modeled output isn't quite right, you have three options:

Option 1: Re-forge with different scope or technique

# Try Dimensional instead of Data Vault 2.0:
fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB --schema SEEDED \
  --technique dimensional \
  -o biz_lab.fluid.yaml

Option 2: Review the Logical sidecar, regenerate downstream artifacts

# Edit biz_lab.fluid.yaml.model.json directly.
$EDITOR biz_lab.fluid.yaml.model.json

# Regenerate the dbt project from the contract's labels.modelSidecar.
fluid generate transformation biz_lab.fluid.yaml -o ./dbt_biz_lab --overwrite --dbt-validate

Option 3: Drive interactive refinements via Claude Code / Cursor

Configure your IDE's MCP client (see MCP walkthrough), then:

Read biz_lab.fluid.yaml.model.json and add a Satellite to hub_customer for loyalty_status (SCD2). Regenerate the contract.

The agent calls read_logical_model → update_entity → regenerate_physical over the wire, then writes back to disk.

What's deterministic

  • Same catalog state + same intent/model input + same model + same capability_matrix → byte-identical contract on re-run (cache hit).
  • --deterministic flag forces temp=0, seed=42 (where supported), cache off — emits a proof-of-determinism receipt.

What's audited

Every catalog read writes an audit event under ~/.fluid/store/audit/. Query with:

fluid memory show audit --filter catalog
fluid memory show audit --window 24h --filter forge_from_source

Credentials are NEVER in the audit. Only metadata about the call (catalog name, scope, table count, duration).

What's the cost ceiling

Set a per-run cost ceiling in ~/.fluid/ai_config.json:

{
  "cost_limit_usd_per_run": 5.00
}

The forge stops mid-pipeline if the running total exceeds the ceiling, prints the partial cost summary, and exits non-zero.

Errors you might see (and what to do)

ErrorWhat to do
CatalogConfigError: snowflake-connector-python missingpip install "data-product-forge[snowflake]"
CatalogPermissionError: USAGE on schema BIZ_LAB.SEEDED requiredRun the GRANT command in the suggestion list
CatalogConnectionError: Failed to connectVerify SNOWFLAKE_ACCOUNT matches the URL in Snowsight
CredentialNotFoundError: No credentials configuredfluid ai setup --source snowflake --name snowflake-prod
1 call had no usage data; cost may be under-reportedProvider ate the usage block; cost figure is partial

Each catalog page documents the catalog-specific errors. See the catalogs index.

Next steps

  • Add your own catalog adapter
  • V1.5 architecture deep-dive
  • MCP server walkthrough
  • Cost tracking details
Edit this page on GitHub
Last Updated: 5/17/26, 6:10 PM
Contributors: fas89, Claude Opus 4.7, Claude Opus 4.7 (1M context)
Prev
The 11-Stage Pipeline