Providers vs Platforms
Two related but distinct ideas:
- Platform — a value in your contract (
binding.platform: gcp) describing where the data lands. - Provider — a Python plugin that knows how to make it land there. Each provider implements two required methods,
plan()andapply(), against a specific cloud.
fluid providers lists everything installed in your environment.
Cloud providers shipping in data-product-forge 0.8.0
These are the cloud-platform providers that implement plan/apply against a target cloud:
| Provider | Status | Install extra |
|---|---|---|
local | ✅ Production (DuckDB, runs anywhere) | pip install "data-product-forge[local]" |
gcp | ✅ Production (BigQuery + GCS + IAM) | pip install "data-product-forge[gcp]" |
aws | ✅ Production (S3 + Glue + Athena) | pip install "data-product-forge[aws]" |
snowflake | ✅ Production (Snowflake + Snowpark) | pip install "data-product-forge[snowflake]" |
azure | 🔜 Roadmap (Synapse + ADLS) | — |
databricks | 🔜 Roadmap (Unity Catalog) | — |
Other valid binding.platform values
The schema enum also includes engines and runtime targets that aren't cloud providers in the same sense — they describe how the data lands, not which cloud it lands on:
| Value | Kind | Notes |
|---|---|---|
kafka | Streaming engine | Use with format: kafka_topic. Topic creation handled by your existing Kafka cluster, not by Fluid Forge — it just emits the binding contract. |
kubernetes | Runtime target | For long-running services / consumers, not for table-backed products. |
other | Escape hatch | Lets you bind a contract to a custom provider you've registered via the Provider SDK. |
The provider plugin contract
Building a custom provider for an unsupported platform is supported — see Custom Providers. BaseProvider declares exactly two abstract methods the plugin must implement:
class MyProvider(BaseProvider):
name = "my-cloud"
def plan(self, contract): ... # required (@abstractmethod)
def apply(self, actions): ... # required (@abstractmethod)
Two more methods are optional — BaseProvider ships working defaults you only override when you need them:
def capabilities(self): ... # optional — defaults to ProviderCapabilities()
def render(self, src, *, out=None, fmt=None): ... # optional — default raises ProviderError
capabilities() advertises which features the provider supports (planning, apply, render, graph, auth); render() exports a contract to an external format and is unsupported unless overridden.
Register via Python entry points in your pyproject.toml:
[project.entry-points."fluid_build.providers"]
my-cloud = "my_provider:MyProvider"
After pip install my-fluid-provider, fluid providers will list it automatically and contracts can use platform: my-cloud.
The provider lifecycle
The two required methods are each called at a specific point in the canonical 11-stage pipeline:
| Method | Called by | Pipeline stage | What it must do |
|---|---|---|---|
plan(contract) | fluid plan | Stage 6 — Plan | Return a list of Action objects describing what would change. Must be deterministic — the same contract + same deployed state always emit the same actions. The CLI's plan binding (stage 6 ↔ stage 7) refuses to apply if the plan was tampered with. |
apply(actions) | fluid apply | Stage 7 — Apply | Execute the actions against the target cloud. Idempotent. Returns success/failure per action. |
plan() makes no network calls and has no side effects — it's pure contract-in, action-list-out. That's how the canonical pipeline runs pre-flight checks without touching production.
Verification and policy compilation are engine-level pipeline stages, not provider abstract methods — the CLI drives them around the provider's plan/apply rather than calling extra methods on BaseProvider.
Action semantics
plan() returns Action objects in three categories:
| Category | Examples | Apply behaviour |
|---|---|---|
| Create | + create table foo, + create dataset bar, + grant role/dataViewer to group:x | Idempotent — re-applying a create that already happened is a no-op |
| Modify | ~ alter table foo add column bar, ~ update grants for table foo | Best-effort idempotent — providers may need to detect drift and reconcile |
| Destructive | - drop table foo, - revoke grant from group:x | Gated by --allow-destroy. The plan emits these but apply refuses unless the operator opts in explicitly. |
The destructive gate is the single most important safety property of the planner. Schema migrations that would drop a column require the operator to acknowledge the loss.
Error translation
Every provider translates cloud-specific errors into typed CLI errors so the operator gets a useful message rather than a stack trace. Examples from the GCP provider:
| Cloud error | Translated to | Exit code |
|---|---|---|
403 Forbidden: bigquery.datasets.create | FluidIAMError: "Service principal lacks BigQuery Data Editor role on project prod. Grant via …" | 64 (configuration) |
409 Conflict: dataset already exists | (translated to a no-op create — no error) | 0 |
400 Bad Request: invalid schema | FluidSchemaError: "Field customer.id declared as STRING in contract but BigQuery has it as INT64. Migration needed via …" | 65 (data) |
Quota exceeded: query bytes | FluidQuotaError: "Project prod exceeded daily query bytes quota. See GCP custom cost controls." | 66 (resource) |
See Typed CLI Errors for the full taxonomy.
Version compatibility
Each provider declares the contract schema versions it can handle:
class MyProvider(BaseProvider):
name = "my-cloud"
supported_schemas = ["0.7.1", "0.7.2", "0.7.3"]
fluid validate cross-checks the contract's fluidVersion against every installed provider's supported_schemas. Mismatch is a hard failure at validate time — the CLI refuses to load a contract that no installed provider can plan.
Where to look next
- Custom Providers walkthrough — full step-by-step for shipping your own provider
- Provider Architecture — interface details, action types, error categories
- Universal pipeline — where each provider method lands in the 11-stage flow
- Builds, Exposes, Bindings — the contract surface providers consume