Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

Task: Add AI / agent access governance to a data product

Your data product is being read by AI agents — for analysis, for summarization, sometimes for training that you didn't authorize. agentPolicy makes the access boundaries declarative, validated at deploy, and enforced at read-time.

Time: ~10 minutes for the basic shape, longer if you're integrating with an existing MCP server or side-car interceptor.

What you're going to add

A top-level agentPolicy block to your contract:

agentPolicy:
  allowedModels: ["gpt-4", "claude-3-opus", "gemini-2.5-flash"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  canStore: false
  auditRequired: true

What this declaration does:

  • Allow reads from gpt-4, claude-3-opus, or gemini-2.5-flash for analysis, summarization, or qa
  • Deny any read tagged as training / fine_tuning — even from an allowed model
  • Cap tokens per request at 4,000 (prevents excessive data exfiltration in one call)
  • Forbid storage / caching (canStore: false = ephemeral reads only)
  • Log every read (auditRequired: true)

Step 1 — add the block

Open contract.fluid.yaml. Add agentPolicy at the top level (sibling to accessPolicy, not nested):

fluidVersion: "0.7.2"
kind: DataProduct
id: gold.finance.customer_360_v1
# ...
metadata:
  # ...
exposes:
  # ...

accessPolicy:                          # human/service grants
  grants:
    - principal: "group:analysts@company.com"
      permissions: ["read"]

agentPolicy:                           # AI/LLM grants — separate
  allowedModels: ["gpt-4", "claude-3-opus", "gemini-2.5-flash"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  canStore: false
  auditRequired: true
  purposeLimitation: "Customer-support analytics only. No marketing use."

Step 2 — validate the policy shape

fluid validate contract.fluid.yaml --strict
# ✓ Schema 0.7.2 — passed
# ✓ agentPolicy.allowedModels — 3 enum values recognized
# ✓ agentPolicy.deniedUseCases — 2 values, no contradictions
# ✓ agentPolicy.maxTokensPerRequest — within int range
# ✓ Contract validation passed (strict)

validate --strict catches contradictions (e.g., a model in both allowedModels and deniedModels), unknown enum values, and missing auditRequired on regulated products.

Step 3 — preview enforcement

fluid policy-check contract.fluid.yaml --category sensitivity

This runs the schema-driven policy engine. The enforcement summary shows who/what is allowed, what's denied, what's audited:

🛡  agentPolicy enforcement summary
─────────────────────────────────────────────────────
Models     3 allowed, all others denied
Use cases  3 allowed, 2 explicitly denied
Storage    no caching — every read is fresh
Audit      every read logged (auditRequired=true)
Limits     maxTokensPerRequest=4000
─────────────────────────────────────────────────────
✓ All 11 schema fields covered by agentPolicy gates
✓ PII-tagged columns (email, phone, ssn) auto-masked at read
✓ agentPolicy ready to enforce

Run this in CI on every contract change. It's the equivalent of fluid validate for the AI-access surface specifically.

Step 4 — compile, then apply the policy

policy-apply does not read the contract directly — it deploys a compiled bindings file. Compile first, then apply:

# Compile the contract (with the prod overlay) into provider-specific bindings
fluid policy compile contract.fluid.yaml --env prod --out runtime/policy/bindings.json

# Apply the compiled bindings — --mode enforce actually deploys the IAM changes
fluid policy apply runtime/policy/bindings.json --mode enforce

policy compile is a pure function (contract in, JSON out — no cloud calls). policy apply defaults to --mode check (dry-run); pass --mode enforce to deploy.

This emits the cloud-specific enforcement primitives and applies them. What gets emitted depends on the platform:

PlatformWhat policy-apply emits
GCP / BigQueryRow-level security policy on the dataset, keyed on agent_id and model_id extracted from the agent's JWT
SnowflakeMasking policy on the table that calls a Snowflake function checking agent_id against the contract's allowedModels
AWS / AthenaLake Formation cell-level filters keyed on the same identity claims
Local (DuckDB)No-op (single-user, no IAM model) — but policy-check still validates the rules for correctness

The enforcement is at the platform layer. Even if your application bypasses the MCP server, the platform's row-level filter still applies.

Step 5 — pick an enforcement mode

You have three options for how agents actually hit the gate. Pick one:

Option A — Forge MCP server (recommended for new agents)

fluid mcp serve

Exposes the data product as an MCP resource. Every MCP read passes through the agentPolicy gate. Audit records ship to the platform's native audit log automatically.

This is the cleanest mode. Use it whenever your agent infrastructure can speak MCP.

Option B — Side-car interceptor (for existing agents)

If your agents read directly via SQL/HTTP (not MCP), the side-car pattern intercepts at the platform layer. The bindings compiled by policy compile and deployed by policy apply (the BigQuery RLS rule, the Snowflake masking policy, etc.) already enforce the policy. No further setup needed beyond passing the agent identity in the connection string.

Example (BigQuery):

-- The agent's connection identifies as: user@analytics-svc.iam (a service account)
-- with custom JWT claims: agent_id="bi-dashboard", model="gpt-4", use_case="analysis"
SELECT * FROM gold.finance.customer_360_v1
WHERE event_date >= '2026-01-01';
-- → BigQuery checks: agent in allowedModels? ✓
-- →                  use_case in allowedUseCases? ✓
-- →                  rows returned with audit log entry written

Option C — Application-level (last resort)

For agents that read directly via SQL/HTTP and can't migrate to MCP or use platform-level enforcement, the application owns the gate. Load the contract via the FLUID Python SDK and inspect contract.agentPolicy in your own code path:

from fluid_build.contract import load_contract

contract = load_contract("contract.fluid.yaml")
policy = contract.agentPolicy

if "training" in policy.deniedUseCases and use_case == "training":
    raise PermissionError("agentPolicy.deniedUseCases includes 'training'")

if model not in policy.allowedModels:
    raise PermissionError(f"model {model!r} not in agentPolicy.allowedModels")

# ... proceed with the read

The application is the trust boundary in this mode (the weakest gate). Use it only when neither MCP nor platform-level enforcement is feasible.

Step 6 — replay agent reads from audit log

Once auditRequired: true is in effect, every read produces a record in the platform's native audit channel:

PlatformWhere audit records land
GCP / BigQueryBigQuery audit log (cloudaudit.googleapis.com/data_access) — query via Cloud Logging or export to a BigQuery sink
SnowflakeSNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view — query directly with SQL
AWS / AthenaCloudTrail data event records — query via CloudTrail Lake or Athena over the trail S3 export

Example query against Snowflake's ACCESS_HISTORY to find all agent reads of this product in the last 24h:

SELECT
  query_start_time,
  user_name,
  query_text,
  base_objects_accessed
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE query_start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
  AND ARRAY_CONTAINS(
    'PROD.GOLD.CUSTOMER_360_V1'::variant,
    ARRAY_AGG(base_objects_accessed:objectName::string)
  )
ORDER BY query_start_time DESC;

The MCP server (Option A) tags each read with the agent identity, model, and use-case in the query_text so you can filter further. The platform's native audit format is the authoritative record — Forge does not duplicate it.

Common patterns

"No training, ever" (most regulated data)

agentPolicy:
  deniedUseCases: ["training", "fine_tuning", "embedding"]
  canStore: false
  auditRequired: true
  purposeLimitation: "Read-only inference for analysis. Data may not leave the runtime context."

"Internal vetted models only" (default for production)

agentPolicy:
  allowedModels: ["gpt-4", "claude-3-opus"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  maxTokensPerDay: 1000000
  canStore: false
  auditRequired: true

"Open to any agent for QA" (low-sensitivity)

agentPolicy:
  allowedUseCases: ["qa"]            # any model, but only QA
  deniedUseCases: ["training"]
  maxTokensPerDay: 100000
  canStore: false
  auditRequired: false               # public-grade data; no audit overhead

What you DIDN'T have to do

  • Build a custom proxy / gateway between your agents and your data
  • Maintain a separate "AI access list" repo
  • Translate the policy across cloud-specific RLS/masking systems (Forge does this)
  • Wire audit logging into a separate observability platform

See also

  • Agent Policy concept — full conceptual treatment + audit event schema
  • Agent policy demo — frame-perfect cast of validate → policy-check → audit replay
  • fluid mcp serve — the MCP server
  • fluid policy-apply — emit + apply the side-car interceptors
  • Governance & Policy — accessPolicy for human/service principals (the complementary gate)
Edit this page on GitHub
Last Updated: 5/17/26, 6:51 PM
Contributors: fas89, Claude Opus 4.7 (1M context)
Prev
Add quality rules
Next
Debug a failed pipeline run