AI Forge And Data-Model Journeys

This walkthrough shows the main fluid forge and fluid forge data-model paths a new user can take. AI helps with discovery, interview, semantic modeling, and review. Contract writing, validation, and dbt SQL generation stay deterministic from the forged logical model.

Credential Safety First

Never put an API key in an intent file, contract, docs page, shell history snippet, or Git commit. Use environment variables or fluid ai setup.

# Pick one hosted provider.
export GOOGLE_API_KEY="<your-gemini-key>"
# or
export OPENAI_API_KEY="<your-openai-key>"
# or
export ANTHROPIC_API_KEY="<your-anthropic-key>"

For local-only testing, use Ollama instead:

export OLLAMA_HOST=http://localhost:11434
export FLUID_OLLAMA_MODEL=gemma4:latest

fluid ai setup stores provider/model preferences under ~/.fluid/. API keys go to the OS keyring when available. Plaintext key persistence requires explicit opt-in with FLUID_ALLOW_PLAINTEXT_AI_SECRETS=1.

Choose The Right Entry Point

Goal	Command	AI needed
Create a blank contract scaffold	`fluid forge --blank`	No
Let the CLI interview you and scaffold a project	`fluid forge`	Optional but recommended
Forge a model from YAML/JSON business intent	`fluid forge data-model from-intent`	Optional
Reverse-engineer existing SQL DDL	`fluid forge data-model from-ddl`	Optional
Forge directly from a metadata catalog	`fluid forge data-model from-source`	Optional for modeling, catalog credentials required
Validate a forged artifact	`fluid forge data-model validate`	No
Compare two sidecars	`fluid forge data-model diff`	No
Teach memory from operator edits	`fluid forge data-model learn`	No
Generate dbt SQL	`fluid generate transformation`	No

Provider Setup And Model Plan

Inspect the configured provider defaults and tier routing:

fluid ai models
fluid ai models --provider gemini
fluid ai models --provider openai --json

The current tier plan is provider-local:

Stage	Typical mode
Interview / clarification	Fast routing model
Logical modeler	Deep model
Contract forge	Deterministic
Transformation/dbt	Deterministic from `.model.json`
Validator	Deterministic
Self-evaluation	Fast routing model

For a hosted provider run, either complete interactive setup:

fluid ai setup
fluid ai status

or use environment variables for a single shell session:

export GOOGLE_API_KEY="<your-gemini-key>"
fluid forge data-model from-intent intent.yaml \
  -o customer_orders.fluid.yaml \
  --llm-provider gemini \
  --tiered \
  --require-llm

Use --require-llm when you are validating provider setup. Without it, normal UX may fall back to deterministic heuristics if the hosted provider is unavailable.

Flow 1: Blank Scaffold, No AI

Use this when you want a contract skeleton and prefer to fill it by hand.

fluid forge --blank --target-dir ./customer-orders --non-interactive
cd customer-orders
fluid validate contract.fluid.yaml
fluid plan contract.fluid.yaml --out runtime/plan.json

This path writes a contract scaffold only. It does not create a logical model sidecar or dbt project.

Flow 2: Interactive AI Scaffold

Use this when you want the CLI to discover local context, ask a short interview, and scaffold the first product draft.

fluid forge

Useful variants:

fluid forge --domain retail
fluid forge --provider gcp --domain finance
fluid forge --discovery-path ./warehouse-ddl --discovery-path ./sample-data
fluid forge --no-discover
fluid forge --no-memory
fluid forge --save-memory
fluid forge --llm-provider openai --llm-model gpt-4.1-mini
fluid forge --llm-provider gemini --tiered --require-llm

The interview keeps three concepts separate:

Interview concept	Meaning
Data model	Dimensional or Data Vault 2.0
Transformation	dbt, SQL, Spark, Python, or custom code generation
Scheduler	None, Airflow, Dagster, or Prefect

Choosing dbt as the transformation engine does not imply scheduling. Add a scheduler only when you want generated orchestration artifacts.

Flow 3: Intent To Model Doc To dbt

An intent file is the business request in YAML or JSON. For example:

data_product:
  name: customer_orders
  domain: retail
  description: Customer order analytics for revenue, basket, and store performance.

business_context: >
  The team needs a trusted customer order model that can support sales reporting,
  product performance analysis, and store operations.

grain:
  entity: order_line
  time_dimension: order_date

dimensions:
  entities:
    - customer
    - product
    - store
    - promotion

metrics:
  - name: total_revenue
    description: Sum of order line revenue after discounts.
  - name: order_count
    description: Count of unique customer orders.
  - name: average_order_value
    description: Total revenue divided by order count.

data_sources:
  - name: raw_orders
    system: snowflake
    table: RAW_RETAIL.ORDERS
  - name: raw_order_lines
    system: snowflake
    table: RAW_RETAIL.ORDER_LINES

business_rules:
  - Exclude cancelled orders from revenue metrics.
  - Treat returned items as negative revenue.

modeling:
  technique: dimensional

Or ask the CLI for a parseable example and schema:

fluid forge data-model from-intent --example retail > retail.intent.yaml
fluid forge data-model from-intent --schema > business-intent.schema.json
fluid forge data-model from-intent --validate retail.intent.yaml

Forge the contract and model artifacts:

fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --technique dimensional \
  --emit-osi-sidecar

For a deterministic local smoke test, add:

fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --technique dimensional \
  --deterministic \
  --emit-osi-sidecar

Expected artifacts:

customer_orders.fluid.yaml
customer_orders.fluid.yaml.model.json
customer_orders.fluid.yaml.model.md
customer_orders.fluid.yaml.semantics.osi.yaml

The .model.md file is the human review layer. It contains a Mermaid diagram, facts/dimensions or hubs/links/satellites, grain, metrics, source hints, and assumptions. The .model.json file is the machine source of truth for generation.

Generate dbt from the forged sidecar:

fluid generate transformation customer_orders.fluid.yaml \
  -o ./dbt_customer_orders \
  --dbt-validate \
  --overwrite

For dbt output, zero generated models/**/*.sql files is a hard failure. A normal output directory includes dbt_project.yml, profiles.yml, models/sources.yml when source hints exist, and non-empty SQL model files.

Flow 4: Strict Hosted Provider Smoke

Use this when you want to prove the run really used a provider and did not fall back.

Gemini:

export GOOGLE_API_KEY="<your-gemini-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.gemini.fluid.yaml \
  --llm-provider gemini \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

OpenAI:

export OPENAI_API_KEY="<your-openai-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.openai.fluid.yaml \
  --llm-provider openai \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

Anthropic:

export ANTHROPIC_API_KEY="<your-anthropic-key>"
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.anthropic.fluid.yaml \
  --llm-provider anthropic \
  --tiered \
  --require-llm \
  --emit-osi-sidecar

Ollama:

export OLLAMA_HOST=http://localhost:11434
export FLUID_OLLAMA_MODEL=gemma4:latest
fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.ollama.fluid.yaml \
  --llm-provider ollama \
  --llm-model gemma4:latest \
  --require-llm \
  --emit-osi-sidecar

After any provider run, validate and generate:

fluid forge data-model validate customer_orders.gemini.fluid.yaml
fluid generate transformation customer_orders.gemini.fluid.yaml \
  -o ./dbt_customer_orders_gemini \
  --dbt-validate \
  --overwrite

Flow 5: DDL To Model

Use DDL when you already have warehouse table definitions.

fluid forge data-model from-ddl \
  --ddl warehouse/orders.sql warehouse/customers.sql \
  --source-type snowflake \
  --technique dimensional \
  -o customer_orders_ddl.fluid.yaml \
  --emit-osi-sidecar

For live Snowflake schemas, dump first, then forge:

fluid forge data-model dump-ddl \
  --database BIZ_LAB \
  --schema SEEDED \
  -o biz_lab.sql

fluid forge data-model from-ddl \
  --ddl biz_lab.sql \
  --source-type snowflake \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml

DDL is excellent for table and column evidence. If the generated dbt project needs exact physical source mappings, prefer from-source or enrich the intent with data_sources so models/sources.yml can be generated correctly.

Flow 6: Metadata Catalog To Model

Use from-source when metadata already lives in Snowflake, Unity Catalog, BigQuery, Dataplex, Glue, DataHub, or Data Mesh Manager.

One-time source credential setup:

fluid ai setup --source snowflake --name snowflake-prod

Forge from the configured source:

fluid forge data-model from-source \
  --source snowflake \
  --credential-id snowflake-prod \
  --database BIZ_LAB \
  --schema SEEDED \
  --tables CUSTOMER ORDER_LINE PRODUCT \
  --technique data-vault-2 \
  -o biz_lab.fluid.yaml \
  --emit-osi-sidecar

If running on cloud infrastructure with workload identity, opt in explicitly:

fluid forge data-model from-source \
  --source bigquery \
  --database analytics-prod \
  --schema sales_mart \
  --allow-metadata-service \
  -o sales_mart.fluid.yaml

Catalog credentials are separate from LLM provider credentials. --credential-id refers to the source credential created by fluid ai setup --source ....

Flow 7: Review, Diff, Learn

Review the logical sidecar before finalizing:

EDITOR=vim fluid forge data-model from-intent retail.intent.yaml \
  -o customer_orders.fluid.yaml \
  --review

Compare two forged sidecars:

fluid forge data-model diff old.model.json new.model.json

Teach memory from a human-edited version:

fluid forge data-model learn \
  --before customer_orders.fluid.yaml.model.json \
  --after customer_orders.reviewed.fluid.yaml.model.json

Use memory intentionally:

fluid forge --save-memory
fluid forge --no-memory
FLUID_COPILOT_SEMANTIC_MEMORY=1 \
  fluid forge data-model from-intent retail.intent.yaml -o customer_orders.fluid.yaml

Memory should store preferences and summaries, not raw data or credentials.

Flow 8: Add Scheduling Only When Needed

dbt generation and scheduling are separate:

fluid generate transformation customer_orders.fluid.yaml \
  -o ./dbt_customer_orders \
  --dbt-validate

If the contract includes or you choose a scheduler, generate it explicitly:

fluid generate schedule customer_orders.fluid.yaml \
  --scheduler airflow \
  -o ./dags \
  --overwrite

Use none during interviews when the team already has its own scheduler or only wants model/dbt artifacts.

What To Commit

Usually commit:

The intent file when it is part of the design record.
*.fluid.yaml.
*.model.json.
*.model.md.
*.semantics.osi.yaml when semantic sidecars are part of your review flow.
Generated dbt SQL when this repo owns transformation code.

Do not commit:

API keys, tokens, passwords, or private keys.
~/.fluid/ contents.
.fluid/store/ memory/cache data unless your team has explicitly decided to version a sanitized team memory file.
dbt logs, target/, or local profile secrets.

Troubleshooting

Symptom	What to do
`--require-llm` fails	Check provider env var, `fluid ai status`, model name, network, and quota.
Prompt asks about scheduling after you said no	Treat it as a UX bug and report the exact transcript; transformation and scheduler are separate decisions.
No `.model.md` written	Ensure `--no-emit-model-doc` was not passed. The default is to emit it.
dbt output is empty	This should fail. Re-run with `--dbt-validate` and inspect the model sidecar.
dbt source not found	Add source hints in intent or forge from a catalog source so `models/sources.yml` can be generated correctly.
You want CI without AI	Use deterministic checked-in artifacts: `validate`, `generate`, `plan`, `apply`. Do not require live LLM calls in production CI.