Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

Task: Add quality rules to your data product

Forge's dq.rules block declares what correct means for your data product. Rules are evaluated at three points: at validate (schema-level), at test (pre-deploy quality gate), and at verify (post-deploy drift detection). Severity decides whether a violation blocks the deploy or just warns.

Time: ~10 minutes for the basic shape, longer if you're fitting rules to existing production data.

Where rules live

Rules live at exposes[].contract.dq.rules:

exposes:
  - exposeId: bitcoin_prices
    contract:
      schema:
        - name: price_usd
          type: NUMERIC
          required: true
      dq:
        rules:
          - id: price_not_null
            type: completeness
            selector: price_usd
            threshold: 1.0
            operator: ">="
            severity: error

Each rule has id (unique, used in error messages), type (one of 8 allowed types), selector (which column/table), threshold + operator (the gate), and severity (info / warn / error / critical).

Step 1 — pick a rule type

The 8 supported types in v0.7.2:

TypeWhat it checksTypical use
completenessNon-null ratio of a columnRequired IDs, mandatory metrics
uniquenessNo duplicates within a column or column-setPrimary keys, business keys
freshnessTime since last successful updateSLA-bound products
valid_valuesAll values in column appear in an allowed setISO codes, status enums
accuracyColumn compared against a referenceDaily totals matching upstream system
schemaNo silent schema changes (added/removed/retyped columns)Stability gate
anomaly_detectionStatistical outliers in a columnRevenue spikes, click anomalies
drift_detectionDistribution shift vs a baseline windowModel input drift, customer behaviour

Most production contracts use 3-5 rules: usually schema + completeness on key fields + freshness on the SLA window as the minimum.

Step 2 — add a completeness rule

The simplest rule. "This column must not be null."

dq:
  rules:
    - id: customer_id_required
      type: completeness
      selector: customer_id
      threshold: 1.0                # 100% of rows
      operator: ">="
      severity: error               # blocks deploy if violated

For columns that are required for mature rows but optional for young ones (e.g., 30-day rolling metrics), the schema doesn't carry a where: clause on the rule itself — handle the lifecycle in the SQL build, then check completeness on the populated column. See Concepts → Quality, SLAs & Lineage → Common rule patterns for the full pattern, including the production code that fixed the 3am incident in the day2-ops demo.

The shorter version: emit NULL from the SQL when the row isn't ready, set the rule's threshold below 1.0, and the rule passes for partial-window data without a fake where: field.

Step 3 — add a freshness rule

dq:
  rules:
    - id: hourly_freshness
      type: freshness
      window: PT1H                  # ISO-8601 duration: max 1h stale
      severity: warn

Freshness is evaluated against the deployed table's last write timestamp. The schema doesn't carry a grace: field — for a two-tier severity (warn at 1h, critical at 1h15m), declare two rules:

dq:
  rules:
    - id: freshness_warn_1h
      type: freshness
      window: PT1H
      severity: warn

    - id: freshness_critical_75min
      type: freshness
      window: PT75M
      severity: critical

Wire scheduled fluid verify runs (every 15 minutes via your CI / orchestrator) so both rules evaluate against the actual deployed-table state.

Step 4 — add a schema-stability rule

dq:
  rules:
    - id: schema_stability
      type: schema
      severity: critical

This rule fails the deploy if a column was added, removed, or retyped without an explicit exposes[].version bump.

Step 5 — add valid_values for enums

dq:
  rules:
    - id: country_valid_iso
      type: valid_values
      selector: country
      threshold: 1.0
      operator: ">="
      severity: error
      description: "country must be in ISO 3166 alpha-2 (US, CA, GB, ...)"

For richer enum enforcement, gate it in the SQL build's WHERE clause (rejecting non-conforming rows to a quarantine table). The contract's dq.rule then verifies that valid_values holds against the cleaned product.

Step 6 — validate that the rules are well-formed

fluid validate contract.fluid.yaml --strict
# ✓ Schema 0.7.2 — passed
# ✓ dq.rules — 4 rules, all reference real schema fields
# ✓ Severity enum values valid
# ✓ Contract validation passed (strict)

validate --strict catches malformed rules (typos in selector, unsupported operator, conflicting thresholds) before they reach a real deploy.

Step 7 — test against actual data

fluid test runs the rules against the current state of the deployed product (or a sample if you pass --sample):

fluid test contract.fluid.yaml --sample
# ⏳ Loading 10,000-row sample from runtime/out/bitcoin_prices.parquet...
# ✓ price_not_null: 10,000 / 10,000 (100.0%) — pass
# ✓ schema_stability: no changes detected — pass
# ⚠ hourly_freshness: 1h 4m since last update — warn
# ✓ country_valid_iso: 9,847 / 9,847 (100.0%) — pass (153 rows null)

test is the pre-deploy gate. Severity controls behaviour: error/critical exit non-zero (blocks CI); warn/info exit zero (logged but doesn't block).

Step 8 — wire verify for runtime drift detection

fluid verify contract.fluid.yaml --strict

verify runs against the deployed state (not a sample). It's the post-deploy gate: confirm that the live table actually has the schema, freshness, and quality the contract promised.

For continuous monitoring, declare your SLA targets on the expose's qos block:

exposes:
  - exposeId: customer_360_table
    qos:
      availability: 99.5
      freshnessSLO: PT1H              # ISO 8601 duration
      completenessTarget: 0.99
      latencyP95: PT500MS
      errorBudget: 0.01

Then schedule fluid verify via your CI / orchestrator (the contract declares the target; scheduling lives in the runtime layer):

# .github/workflows/verify-fast.yml
on:
  schedule:
    - cron: "*/15 * * * *"            # every 15 min
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - run: fluid verify contract.fluid.yaml --strict --env prod

Wire alerting to whatever your CI / orchestrator emits on a non-zero exit (PagerDuty webhook, Slack notification, etc.) — fluid verify exits non-zero on breach.

Severity → CI behaviour

Severityvalidatetestverify --strict
inforecordedexit 0exit 0
warnrecordedexit 0exit 0 (warning only)
errorexit 0 (it's about runtime)exit non-zero (blocks CI)exit non-zero
criticalexit 0exit non-zero + emit incidentexit non-zero + emit incident

What you DIDN'T have to do

  • Hand-roll dbt tests (assertions: not_null) for each column — dq.rules is per-product, not per-warehouse-syntax
  • Wire a separate Great Expectations / Soda Core layer
  • Maintain a separate "data quality monitoring" repo
  • Translate rules between cloud-specific systems (BigQuery's column-level constraints, Snowflake's quality rules) — Forge translates them for you at policy-apply

See also

  • Quality, SLAs & Lineage — full conceptual treatment
  • Recipe: Add a quality rule — the 1-page copy-paste version
  • fluid test — the pre-deploy gate command
  • fluid verify — runtime drift detection
Edit this page on GitHub
Last Updated: 5/17/26, 6:51 PM
Contributors: fas89, Claude Opus 4.7 (1M context)
Prev
CLI by task
Next
Add agent governance