Builds, Exposes, Bindings
Every contract maps three questions onto three YAML blocks:
| Question | Block | Example |
|---|---|---|
| How is the data produced? | builds[] | Embedded SQL, a dbt project, a Python script, a Spark job. |
| What does the product expose to consumers? | exposes[] | A table, a view, a file, a Kafka topic. |
| Where does it physically land? | binding (inside each expose) | gcp/bigquery_table, aws/s3_file, local/parquet, etc. |
You can have many of each. Every expose must declare exactly one binding.
builds[] — production logic
builds:
- id: bitcoin_price_ingestion
pattern: embedded-logic # or hybrid-reference (dbt/python repo)
engine: sql # or python, dbt, spark
properties:
sql: |
SELECT CURRENT_TIMESTAMP AS price_timestamp,
price AS price_usd
FROM raw_btc_feed
Patterns supported in v0.7.3 (verified against fluid-schema-0.7.3.json):
embedded-logic— SQL/code inline in the contract.languageenum:sql,flink_sql,pyspark,scala,python,r.hybrid-reference— dbt-style: point at an external repo with amodel:field and optionalvars:.multi-stage— orchestration pattern with astages[]array of named build steps. Schema description: "Multi-stage orchestration pattern" (introduced in v0.5.5).acquisition— source-aligned ingestion pattern (added in v0.7.3) for landing raw external data.
exposes[] — the consumer-facing API
exposes:
- exposeId: bitcoin_prices
title: Bitcoin Hourly Prices
kind: table # see expose.kind enum below
binding:
platform: local
format: parquet
location:
path: ./runtime/out/bitcoin_prices.parquet
contract:
schema:
- name: price_timestamp
type: TIMESTAMP
required: true
- name: price_usd
type: NUMERIC
required: true
The schema lives at exposes[].contract.schema. Quality rules live one level deeper at exposes[].contract.dq.rules (see Quality, SLAs & Lineage).
expose.kind enum (verified against fluid-schema-0.7.3.json): table · view · api · file · stream · topic · feature_store · model · vector · graph · time_series · other
binding — the physical landing target
binding.platform enum (v0.7.3): gcp · aws · azure · snowflake · databricks · kafka · local · kubernetes · other
binding.format enum (v0.7.3): bigquery_table · snowflake_table · gcs_file · s3_file · http_api · grpc_api · pubsub_topic · kafka_topic · delta_table · iceberg · parquet · csv · json · other
binding.location shape varies per format:
| Format | Required location keys |
|---|---|
bigquery_table | project, dataset, table (region optional) |
snowflake_table | database, schema, table |
s3_file | bucket, prefix (region optional) |
parquet / csv | path (relative or absolute) |
The "swap one line" trick
The whole point of bindings is that platform: local → platform: gcp is the only change you need to redeploy the same product to BigQuery. The format and location keys change to match the new platform's vocabulary, but everything else (schema, DQ rules, governance) stays identical.
Multi-expose products: one product, many surfaces
Most data products produce one output. Some produce several: a Gold table for analysts, a feature_store view for the ML team, a Kafka topic for downstream consumers. Add multiple exposes[] entries:
exposes:
- exposeId: customer_360_table # for analysts
kind: table
binding:
platform: gcp
format: bigquery_table
location: { project: prod, dataset: analytics, table: customer_360 }
policy:
authz:
readers: [group:analysts@company.com]
- exposeId: customer_360_features # for ML
kind: feature_store
binding:
platform: gcp
format: bigquery_table
location: { project: prod, dataset: features, table: customer_360_v1 }
policy:
authz:
readers: [group:ml-team@company.com, serviceAccount:training@…]
- exposeId: customer_changes # for downstream
kind: stream
binding:
platform: gcp
format: pubsub_topic
location: { project: prod, topic: customer-changes }
The builds[] are shared. The compute happens once, the surfaces are independent. Each surface gets its own audience via policy.authz.
consumes[] — declaring dependencies
When your product depends on another product (Silver consuming Bronze, Gold consuming Silver), declare it in consumes[]:
consumes:
- consumeId: bronze_orders
productId: bronze.retail.orders_v1
contract: { exposeId: orders_table }
consumes[] references compile to:
- Lineage edges in
fluid generate artifacts(OPDS / ODCS / DataMesh Manager output) - Read grants in
policy-apply(the consumer's service principal gets read on the producer's expose) - Build-time validation —
fluid validateconfirms the upstream product exists and the citedexposeIdmatches
You don't usually wire this by hand. fluid forge infers it from your SQL/dbt refs. Override only when crossing system boundaries.
Build execution: where SQL/Python actually runs
builds[].execution controls the runtime environment:
builds:
- id: bitcoin_price_ingestion
pattern: embedded-logic
engine: python
properties:
script: ./ingest.py
execution:
runtime:
image: python:3.11-slim
environment: # map of NAME: value, injected into the container
AWS_REGION: us-east-1
S3_BUCKET: my-ingest-bucket
retries: # → retryPolicy schema
maxAttempts: 3
backoffStrategy: exponential # fixed | exponential | linear
trigger:
type: scheduled
cron: "0 * * * *" # hourly
For SQL builds (engine: sql), the runtime is the warehouse itself (BigQuery, Snowflake, DuckDB) — execution.runtime is unused. For Python and Spark, the runtime is a container image; the chosen orchestrator (Airflow, Dagster, etc.) provisions it.
Where to look next
- Providers vs platforms — how
binding.platformresolves to actual cloud SDKs - Quality, SLAs & Lineage — the
dq.rules,qos, andlineageblocks - Governance & Policy — the
accessPolicyandagentPolicyblocks fluid planwalkthrough — what the planner emits per binding