Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

AI Forge And Data-Model Journeys

This walkthrough shows the main fluid forge and fluid forge data-model paths a new user can take. AI helps with discovery, interview, semantic modeling, and review. Contract writing, validation, and dbt SQL generation stay deterministic from the forged logical model.

Credential Safety First

Never put an API key in an intent file, contract, docs page, shell history snippet, or Git commit. Use environment variables or fluid ai setup.

# Pick one hosted provider.
export GOOGLE_API_KEY="<your-gemini-key>"
# or
export OPENAI_API_KEY="<your-openai-key>"
# or
export ANTHROPIC_API_KEY="<your-anthropic-key>"

For local-only testing, use Ollama instead:

export OLLAMA_HOST=http://localhost:11434
export FLUID_OLLAMA_MODEL=gemma4:latest

fluid ai setup stores provider/model preferences under ~/.fluid/. API keys go to the OS keyring when available. Plaintext key persistence requires explicit opt-in with FLUID_ALLOW_PLAINTEXT_AI_SECRETS=1.

Choose The Right Entry Point

GoalCommandAI needed
Create a blank contract scaffoldfluid forge --blankNo
Let the CLI interview you and scaffold a projectfluid forgeOptional but recommended
Forge a model from YAML/JSON business intentfluid forge data-model from-intentOptional
Reverse-engineer existing SQL DDLfluid forge data-model from-ddlOptional
Forge directly from a metadata catalogfluid forge data-model from-sourceOptional for modeling, catalog credentials required
Validate a forged artifactfluid forge data-model validateNo
Compare two sidecarsfluid forge data-model diffNo
Teach memory from operator editsfluid forge data-model learnNo
Generate dbt SQLfluid generate transformationNo

Provider Setup And Model Plan

Inspect the configured provider defaults and tier routing:

fluid ai models
fluid ai models --provider gemini
fluid ai models --provider openai --json

The current tier plan is provider-local:

StageTypical mode
Interview / clarificationFast routing model
Logical modelerDeep model
Contract forgeDeterministic
Transformation/dbtDeterministic from .model.json
ValidatorDeterministic
Self-evaluationFast routing model

For a hosted provider run, either complete interactive setup:

fluid ai setup
fluid ai status

or use environment variables for a single shell session:

export GOOGLE_API_KEY="<your-gemini-key>"
fluid forge data-model from-intent intent.yaml \
  -o customer_orders.fluid.yaml \
  --llm-provider gemini \
  --tiered \
  --require-llm

Use --require-llm when you are validating provider setup. Without it, normal UX may fall back to deterministic heuristics if the hosted provider is unavailable.

Flow 1: Blank Scaffold, No AI

Use this when you want a contract skeleton and prefer to fill it by hand.

fluid forge --blank --target-dir ./customer-orders --non-interactive
cd customer-orders
fluid validate contract.fluid.yaml
fluid plan contract.fluid.yaml --out runtime/plan.json

This path writes a contract scaffold only. It does not create a logical model sidecar or dbt project.

Flow 2: Interactive AI Scaffold

Use this when you want the CLI to discover local context, ask a short interview, and scaffold the first product draft.

fluid forge

Useful variants:

fluid forge --domain retail
fluid forge --provider gcp --domain finance
fluid forge --discovery-path ./warehouse-ddl --discovery-path ./sample-data
fluid forge --no-discover
fluid forge --no-memory
fluid forge --save-memory
fluid forge --llm-provider openai --llm-model gpt-4.1-mini
fluid forge --llm-provider gemini --tiered --require-llm

The interview keeps three concepts separate:

Interview conceptMeaning
Data modelDimensional or Data Vault 2.0
Transformationdbt, SQL, Spark, Python, or custom code generation
SchedulerNone, Airflow, Dagster, or Prefect

Choosing dbt as the transformation engine does not imply scheduling. Add a scheduler only when you want generated orchestration artifacts.

Flow 3: Intent To Model Doc To dbt

An intent file is the business request in YAML or JSON. For example:

data_product:
  name: customer_orders
  domain: retail
  description: Customer order analytics for revenue, basket, and store performance.

business_context: >
  The team needs a trusted customer order model that can support sales reporting,
  product performance analysis, and store operations.

grain:
  entity: order_line
  time_dimension: order_date

dimensions:
  entities:
    - customer
    - product
    - store
    - promotion

metrics:
  - name: total_revenue
    description: Sum of order line revenue after discounts.
  - name: order_count
    description: Count of unique customer orders.
  - name: average_order_value
    description: Total revenue divided by order count.

data_sources:
  - name: raw_orders
    system: snowflake
    table: RAW_RETAIL.ORDERS
  - name: raw_order_lines
    system: snowflake
    table: RAW_RETAIL.ORDER_LINES

business_rules:
  - Exclude cancelled orders from revenue metrics.
  - Treat returned items as negative revenue.

modeling:
  technique: dimensional

Or ask the CLI for a parseable example and schema:

fluid forge data-model from-intent --example retail > retail.intent.yaml
fluid forge data-model from-intent --schema > business-intent.schema.json
fluid forge data-model from-intent --validate retail.intent.yaml

Forge the contract and model artifacts:

fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --technique dimensional \
  --emit-osi-sidecar

For a deterministic local smoke test, add:

fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --technique dimensional \
  --deterministic \
  --emit-osi-sidecar

Expected artifacts:

customer_orders.fluid.yaml
customer_orders.fluid.yaml.model.json
customer_orders.fluid.yaml.model.md
customer_orders.fluid.yaml.semantics.osi.yaml

The .model.md file is the human review layer. It contains a Mermaid diagram, facts/dimensions or hubs/links/satellites, grain, metrics, source hints, and assumptions. The .model.json file is the machine source of truth for generation.

Generate dbt from the forged sidecar:

fluid generate transformation customer_orders.fluid.yaml \
  -o ./dbt_customer_orders \
  --dbt-validate \
  --overwrite

For dbt output, zero generated models/**/*.sql files is a hard failure. A normal output directory includes dbt_project.yml, profiles.yml, models/sources.yml when source hints exist, and non-empty SQL model files.

Flow 4: Strict Hosted Provider Smoke

Use this when you want to prove the run really used a provider and did not fall back.

Gemini:

export GOOGLE_API_KEY="<your-gemini-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.gemini.fluid.yaml \
  --llm-provider gemini \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

OpenAI:

export OPENAI_API_KEY="<your-openai-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.openai.fluid.yaml \
  --llm-provider openai \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

Anthropic:

export ANTHROPIC_API_KEY="<your-anthropic-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.anthropic.fluid.yaml \
  --llm-provider anthropic \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

Ollama:

export OLLAMA_HOST=http://localhost:11434
export FLUID_OLLAMA_MODEL=gemma4:latest
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.ollama.fluid.yaml \
  --llm-provider ollama \
  --llm-model gemma4:latest \
  --require-llm \
  --emit-osi-sidecar

After any provider run, validate and generate:

fluid forge data-model validate customer_orders.gemini.fluid.yaml
fluid generate transformation customer_orders.gemini.fluid.yaml \
  -o ./dbt_customer_orders_gemini \
  --dbt-validate \
  --overwrite

Flow 5: DDL To Model

Use DDL when you already have warehouse table definitions.

fluid forge data-model from-ddl \
  --ddl warehouse/orders.sql warehouse/customers.sql \
  --source-type snowflake \
  --technique dimensional \
  -o customer_orders_ddl.fluid.yaml \
  --emit-osi-sidecar

For live Snowflake schemas, dump first, then forge:

fluid forge data-model dump-ddl \
  --database BIZ_LAB \
  --schema SEEDED \
  -o biz_lab.sql

fluid forge data-model from-ddl \
  --ddl biz_lab.sql \
  --source-type snowflake \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml

DDL is excellent for table and column evidence. If the generated dbt project needs exact physical source mappings, prefer from-source or enrich the intent with data_sources so models/sources.yml can be generated correctly.

Flow 6: Metadata Catalog To Model

Use from-source when metadata already lives in Snowflake, Unity Catalog, BigQuery, Dataplex, Glue, DataHub, or Data Mesh Manager.

One-time source credential setup:

fluid ai setup --source snowflake --name snowflake-prod

Forge from the configured source:

fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB \
  --schema SEEDED \
  --tables CUSTOMER ORDER_LINE PRODUCT \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml \
  --emit-osi-sidecar

If running on cloud infrastructure with workload identity, opt in explicitly:

fluid forge data-model from-source \
  --source bigquery \
  --database analytics-prod \
  --schema sales_mart \
  --allow-metadata-service \
  -o sales_mart.fluid.yaml

Catalog credentials are separate from LLM provider credentials. --credential-id refers to the source credential created by fluid ai setup --source ....

Flow 7: Review, Diff, Learn

Review the logical sidecar before finalizing:

EDITOR=vim fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --review

Compare two forged sidecars:

fluid forge data-model diff old.model.json new.model.json

Teach memory from a human-edited version:

fluid forge data-model learn \
  --before customer_orders.fluid.yaml.model.json \
  --after customer_orders.reviewed.fluid.yaml.model.json

Use memory intentionally:

fluid forge --save-memory
fluid forge --no-memory
FLUID_COPILOT_SEMANTIC_MEMORY=1 \
  fluid forge data-model from-intent retail.intent.yaml -o customer_orders.fluid.yaml

Memory should store preferences and summaries, not raw data or credentials.

Flow 8: Add Scheduling Only When Needed

dbt generation and scheduling are separate:

fluid generate transformation customer_orders.fluid.yaml \
  -o ./dbt_customer_orders \
  --dbt-validate

If the contract includes or you choose a scheduler, generate it explicitly:

fluid generate schedule customer_orders.fluid.yaml \
  --scheduler airflow \
  -o ./dags \
  --overwrite

Use none during interviews when the team already has its own scheduler or only wants model/dbt artifacts.

What To Commit

Usually commit:

  • The intent file when it is part of the design record.
  • *.fluid.yaml.
  • *.model.json.
  • *.model.md.
  • *.semantics.osi.yaml when semantic sidecars are part of your review flow.
  • Generated dbt SQL when this repo owns transformation code.

Do not commit:

  • API keys, tokens, passwords, or private keys.
  • ~/.fluid/ contents.
  • .fluid/store/ memory/cache data unless your team has explicitly decided to version a sanitized team memory file.
  • dbt logs, target/, or local profile secrets.

Troubleshooting

SymptomWhat to do
--require-llm failsCheck provider env var, fluid ai status, model name, network, and quota.
Prompt asks about scheduling after you said noTreat it as a UX bug and report the exact transcript; transformation and scheduler are separate decisions.
No .model.md writtenEnsure --no-emit-model-doc was not passed. The default is to emit it.
dbt output is emptyThis should fail. Re-run with --dbt-validate and inspect the model sidecar.
dbt source not foundAdd source hints in intent or forge from a catalog source so models/sources.yml can be generated correctly.
You want CI without AIUse deterministic checked-in artifacts: validate, generate, plan, apply. Do not require live LLM calls in production CI.
Edit this page on GitHub
Last Updated: 4/26/26, 10:42 PM
Contributors: fas89, Claude Opus 4.7
Prev
Source-Aligned: Postgres → DuckDB → Parquet
Next
Walkthrough: Deploy to Google Cloud Platform