Task: Add AI / agent access governance to a data product

Your data product is being read by AI agents — for analysis, for summarization, sometimes for training that you didn't authorize. agentPolicy makes the access boundaries declarative, validated at deploy, and enforced at read-time.

Time: ~10 minutes for the basic shape, longer if you're integrating with an existing MCP server or side-car interceptor.

What you're going to add

A top-level agentPolicy block to your contract:

agentPolicy:
  allowedModels: ["gpt-4", "claude-3-opus", "gemini-2.5-flash"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  canStore: false
  auditRequired: true

What this declaration does:

Allow reads from gpt-4, claude-3-opus, or gemini-2.5-flash for analysis, summarization, or qa
Deny any read tagged as training / fine_tuning — even from an allowed model
Cap tokens per request at 4,000 (prevents excessive data exfiltration in one call)
Forbid storage / caching (canStore: false = ephemeral reads only)
Log every read (auditRequired: true)

Step 1 — add the block

Open contract.fluid.yaml. Add agentPolicy at the top level (sibling to accessPolicy, not nested):

fluidVersion: "0.7.2"
kind: DataProduct
id: gold.finance.customer_360_v1
# ...
metadata:
  # ...
exposes:
  # ...

accessPolicy:                          # human/service grants
  grants:
    - principal: "group:analysts@company.com"
      permissions: ["read"]

agentPolicy:                           # AI/LLM grants — separate
  allowedModels: ["gpt-4", "claude-3-opus", "gemini-2.5-flash"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  canStore: false
  auditRequired: true
  purposeLimitation: "Customer-support analytics only. No marketing use."

Step 2 — validate the policy shape

fluid validate contract.fluid.yaml --strict
# ✓ Schema 0.7.2 — passed
# ✓ agentPolicy.allowedModels — 3 enum values recognized
# ✓ agentPolicy.deniedUseCases — 2 values, no contradictions
# ✓ agentPolicy.maxTokensPerRequest — within int range
# ✓ Contract validation passed (strict)

validate --strict catches contradictions (e.g., a model in both allowedModels and deniedModels), unknown enum values, and missing auditRequired on regulated products.

Step 3 — preview enforcement

fluid policy-check contract.fluid.yaml --category sensitivity

This runs the schema-driven policy engine. The enforcement summary shows who/what is allowed, what's denied, what's audited:

🛡  agentPolicy enforcement summary
─────────────────────────────────────────────────────
Models     3 allowed, all others denied
Use cases  3 allowed, 2 explicitly denied
Storage    no caching — every read is fresh
Audit      every read logged (auditRequired=true)
Limits     maxTokensPerRequest=4000
─────────────────────────────────────────────────────
✓ All 11 schema fields covered by agentPolicy gates
✓ PII-tagged columns (email, phone, ssn) auto-masked at read
✓ agentPolicy ready to enforce

Run this in CI on every contract change. It's the equivalent of fluid validate for the AI-access surface specifically.

Step 4 — compile, then apply the policy

policy-apply does not read the contract directly — it deploys a compiled bindings file. Compile first, then apply:

# Compile the contract (with the prod overlay) into provider-specific bindings
fluid policy compile contract.fluid.yaml --env prod --out runtime/policy/bindings.json

# Apply the compiled bindings — --mode enforce actually deploys the IAM changes
fluid policy apply runtime/policy/bindings.json --mode enforce

policy compile is a pure function (contract in, JSON out — no cloud calls). policy apply defaults to --mode check (dry-run); pass --mode enforce to deploy.

This emits the cloud-specific enforcement primitives and applies them. What gets emitted depends on the platform:

Platform	What `policy-apply` emits
GCP / BigQuery	Row-level security policy on the dataset, keyed on `agent_id` and `model_id` extracted from the agent's JWT
Snowflake	Masking policy on the table that calls a Snowflake function checking `agent_id` against the contract's `allowedModels`
AWS / Athena	Lake Formation cell-level filters keyed on the same identity claims
Local (DuckDB)	No-op (single-user, no IAM model) — but `policy-check` still validates the rules for correctness

The enforcement is at the platform layer. Even if your application bypasses the MCP server, the platform's row-level filter still applies.

Step 5 — pick an enforcement mode

You have three options for how agents actually hit the gate. Pick one:

Option A — Forge MCP server (recommended for new agents)

fluid mcp serve

Exposes the data product as an MCP resource. Every MCP read passes through the agentPolicy gate. Audit records ship to the platform's native audit log automatically.

This is the cleanest mode. Use it whenever your agent infrastructure can speak MCP.

Option B — Side-car interceptor (for existing agents)

If your agents read directly via SQL/HTTP (not MCP), the side-car pattern intercepts at the platform layer. The bindings compiled by policy compile and deployed by policy apply (the BigQuery RLS rule, the Snowflake masking policy, etc.) already enforce the policy. No further setup needed beyond passing the agent identity in the connection string.

Example (BigQuery):

-- The agent's connection identifies as: user@analytics-svc.iam (a service account)
-- with custom JWT claims: agent_id="bi-dashboard", model="gpt-4", use_case="analysis"
SELECT * FROM gold.finance.customer_360_v1
WHERE event_date >= '2026-01-01';
-- → BigQuery checks: agent in allowedModels? ✓
-- →                  use_case in allowedUseCases? ✓
-- →                  rows returned with audit log entry written

Option C — Application-level (last resort)

For agents that read directly via SQL/HTTP and can't migrate to MCP or use platform-level enforcement, the application owns the gate. Load the contract via the FLUID Python SDK and inspect contract.agentPolicy in your own code path:

from fluid_build.contract import load_contract

contract = load_contract("contract.fluid.yaml")
policy = contract.agentPolicy

if "training" in policy.deniedUseCases and use_case == "training":
    raise PermissionError("agentPolicy.deniedUseCases includes 'training'")

if model not in policy.allowedModels:
    raise PermissionError(f"model {model!r} not in agentPolicy.allowedModels")

# ... proceed with the read

The application is the trust boundary in this mode (the weakest gate). Use it only when neither MCP nor platform-level enforcement is feasible.

Step 6 — replay agent reads from audit log

Once auditRequired: true is in effect, every read produces a record in the platform's native audit channel:

Platform	Where audit records land
GCP / BigQuery	BigQuery audit log (`cloudaudit.googleapis.com/data_access`) — query via Cloud Logging or export to a BigQuery sink
Snowflake	`SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY` view — query directly with SQL
AWS / Athena	CloudTrail data event records — query via CloudTrail Lake or Athena over the trail S3 export

Example query against Snowflake's ACCESS_HISTORY to find all agent reads of this product in the last 24h:

SELECT
  query_start_time,
  user_name,
  query_text,
  base_objects_accessed
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE query_start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
  AND ARRAY_CONTAINS(
    'PROD.GOLD.CUSTOMER_360_V1'::variant,
    ARRAY_AGG(base_objects_accessed:objectName::string)
  )
ORDER BY query_start_time DESC;

The MCP server (Option A) tags each read with the agent identity, model, and use-case in the query_text so you can filter further. The platform's native audit format is the authoritative record — Forge does not duplicate it.

Common patterns

"No training, ever" (most regulated data)

agentPolicy:
  deniedUseCases: ["training", "fine_tuning", "embedding"]
  canStore: false
  auditRequired: true
  purposeLimitation: "Read-only inference for analysis. Data may not leave the runtime context."

"Internal vetted models only" (default for production)

agentPolicy:
  allowedModels: ["gpt-4", "claude-3-opus"]
  allowedUseCases: ["analysis", "summarization", "qa"]
  deniedUseCases: ["training", "fine_tuning"]
  maxTokensPerRequest: 4000
  maxTokensPerDay: 1000000
  canStore: false
  auditRequired: true

"Open to any agent for QA" (low-sensitivity)

agentPolicy:
  allowedUseCases: ["qa"]            # any model, but only QA
  deniedUseCases: ["training"]
  maxTokensPerDay: 100000
  canStore: false
  auditRequired: false               # public-grade data; no audit overhead

What you DIDN'T have to do

Build a custom proxy / gateway between your agents and your data
Maintain a separate "AI access list" repo
Translate the policy across cloud-specific RLS/masking systems (Forge does this)
Wire audit logging into a separate observability platform