Fluid Forge
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
Get Started
See it run
  • Local (DuckDB)
  • Source-Aligned (Postgres → DuckDB)
  • AI Forge + Data Models
  • GCP (BigQuery)
  • Snowflake Team Collaboration
  • Declarative Airflow
  • Orchestration Export
  • Jenkins CI/CD
  • Universal Pipeline
  • 11-Stage Production Pipeline
  • Catalog Forge End-to-End
CLI Reference
  • Overview
  • Quickstart
  • Examples
  • Your own CI
  • Your own scaffolding
  • Custom validator
  • Apply hook
  • Reference
Demos
  • Overview
  • Architecture
  • GCP (BigQuery)
  • AWS (S3 + Athena)
  • Snowflake
  • Local (DuckDB)
  • Custom Providers
  • Roadmap
GitHub
GitHub
  • Introduction

    • Home
    • Getting Started
    • Snowflake Quickstart
    • See it run
    • Forge Data Model
    • Vision & Roadmap
    • Playground
    • FAQ
  • Concepts

    • Concepts
    • Builds, Exposes, Bindings
    • What is a contract?
    • Quality, SLAs & Lineage
    • Governance & Policy
    • Agent Policy (LLM/AI governance)
    • Providers vs Platforms
    • Fluid Forge vs alternatives
  • Data Products

    • Product Types — SDP, ADP, CDP
  • Walkthroughs

    • Walkthrough: Local Development
    • Source-Aligned: Postgres → DuckDB → Parquet
    • AI Forge And Data-Model Journeys
    • Walkthrough: Deploy to Google Cloud Platform
    • Walkthrough: Snowflake Team Collaboration
    • Declarative Airflow DAG Generation - The FLUID Way
    • Generating Orchestration Code from Contracts
    • Jenkins CI/CD for FLUID Data Products
    • Universal Pipeline
    • The 11-Stage Pipeline
    • End-to-End Walkthrough: Catalog → Contract → Transformation
  • CLI Reference

    • CLI Reference
    • fluid init
    • fluid demo
    • fluid forge
    • fluid skills
    • fluid status
    • fluid validate
    • fluid plan
    • fluid apply
    • fluid generate
    • fluid generate artifacts
    • fluid validate-artifacts
    • fluid verify-signature
    • fluid generate-airflow
    • fluid generate-pipeline
    • fluid viz-graph
    • fluid odps
    • fluid odps-bitol
    • fluid odcs
    • fluid export
    • fluid export-opds
    • fluid publish
    • fluid datamesh-manager
    • fluid market
    • fluid import
    • fluid policy
    • fluid policy check
    • fluid policy compile
    • fluid policy apply
    • fluid contract-tests
    • fluid contract-validation
    • fluid diff
    • fluid test
    • fluid verify
    • fluid product-new
    • fluid product-add
    • fluid workspace
    • fluid ide
    • fluid ai
    • fluid memory
    • fluid mcp
    • fluid scaffold-ci
    • fluid scaffold-composer
    • fluid scaffold-ide
    • fluid docs
    • fluid config
    • fluid split
    • fluid bundle
    • fluid auth
    • fluid doctor
    • fluid providers
    • fluid provider-init
    • fluid roadmap
    • fluid version
    • fluid runs
    • fluid retention
    • fluid secrets
    • fluid stats
    • fluid contract
    • fluid ship
    • fluid rollback
    • fluid schedule-sync
    • Catalog adapters

      • Source Catalog Integration (V1.5)
      • BigQuery Catalog
      • Snowflake Horizon Catalog
      • Databricks Unity Catalog
      • Google Dataplex Catalog
      • AWS Glue Data Catalog
      • DataHub Catalog
      • Data Mesh Manager Catalog
    • CLI by task

      • CLI by task
      • Add quality rules
      • Add agent governance
      • Debug a failed pipeline run
      • Switch clouds with one line
  • Recipes

    • Recipes
    • Recipe — add a quality rule
    • Recipe — switch clouds with one line
    • Recipe — tag PII in your schema
  • SDK & Plugins

    • SDK & Plugins
    • Quickstart — your first plugin
    • Examples

      • Runnable examples
      • Example: hello-scaffold — the minimal viable plugin
      • Example: gitlab-ci-scaffold — generate a complete CI project
      • Example: steward-validator — a custom governance rule
      • Example: prod-key-guard — apply-time invariant check
    • Journeys

      • Journeys
      • Your own CI/CD

        • You have your own CI/CD setup, no problem
        • GitLab CI — the bundle template
        • GitHub Actions — the bundle template
        • Jenkins — the bundle template
        • CircleCI — the bundle template
      • You have a strict project layout, no problem
      • You have governance rules, no problem
      • You want a check at apply time, no problem
    • Reference

      • Reference
      • Roles reference
      • Entry points reference
      • Trust model
      • Packaging
      • Companion packages
  • Providers

    • Providers
    • Provider Architecture
    • GCP Provider
    • AWS Provider
    • Snowflake Provider
    • Local Provider
    • Creating Custom Providers
    • Provider Roadmap
  • Advanced

    • Blueprints
    • Governance & Compliance
    • Airflow Integration
    • Built-in And Custom Forge Guidance
    • FLUID Forge Contract GPT Packet
    • Forge Discovery Guide
    • Forge Memory Guide
    • LLM Providers
    • Capability Warnings
    • LiteLLM Backend (opt-in)
    • MCP Server
    • Credential Resolver — Security Model
    • Cost Tracking
    • Agentic Primitives
    • Typed Errors
    • Typed CLI Errors
    • Authoring Forge Tools
    • Source-Aligned Acquisition
    • API Stability — fluid_build.api
    • Guided fluid forge UX
    • V1.5 Catalog Integration — Architecture Deep-Dive
    • V1.5 + V2 Hardening — Release Notes
  • Project

    • Contributing to Fluid Forge
    • Fluid Forge Docs Baseline: CLI 0.8.3
    • Fluid Forge Docs Baseline: CLI 0.8.0
    • Fluid Forge Docs Baseline: CLI 0.7.11
    • Fluid Forge Docs Baseline: CLI 0.7.9
    • Fluid Forge v0.7.1 - Multi-Provider Export Release

DataHub Catalog

Source-side catalog adapter for DataHub (Acryl Data / open-source DataHub). Reads datasets, schemas, lineage, business glossary, ownership, tags, domains, and business attributes via the DataHubGraph client.

Recommended for: open-source-first teams running their own DataHub instance, or Acryl Cloud customers. DataHub is the most portable governance layer (works across Snowflake, Databricks, BigQuery, Redshift, Postgres, Kafka, dbt) — and forge-cli reads all of it through one adapter.

Install

pip install "data-product-forge[datahub]"

Adds acryl-datahub. Default install ships without it.

Privileges to grant

The adapter is read-only on metadata. DataHub's permission model is policy-based:

  1. Open the DataHub UI as an admin → Permissions → Policies.
  2. Create or assign a policy that grants the user/group:
    • View Entity Page on every dataset/glossary you want forge-cli to see.
    • View Dataset Profile (optional — needed if you want statistical metadata, not yet consumed by V1.5).
    • View Lineage (recommended — without it, lineage reads return empty and DV2 link inference falls back to FK only).

The pre-built Reader role policy is the simplest fit — assign to the forge-cli user/group.

Authentication methods

MethodWhen to useSetup
pat ★Default for production / CIPersonal Access Token from the DataHub UI (Profile → Generate token).
noneSelf-hosted dev DataHubNo auth — for sandbox instances only. The adapter logs a warning at construction time so production users don't accidentally pick this.

★ pat is the recommended path. The wizard pre-fills it.

Setup

fluid ai setup --source datahub --name datahub-corp
# ? Catalog: datahub
# ? Server URL: https://datahub.corp.example.com
# ? Auth method:
#   ★ pat (recommended)
#     none (sandbox only)
# ? Token: ******                    (stored in OS keyring)
# ✓ Saved to ~/.fluid/sources.yaml

Or env vars:

export DATAHUB_SERVER=https://datahub.corp.example.com
export DATAHUB_TOKEN=eyJhbGc...     # PAT from the DataHub UI

End-to-end demo

fluid ai setup --source datahub --name datahub-corp

# Forge from a DataHub container scope (database.schema syntax).
fluid forge data-model from-source \
  --source datahub \
  --credential-id datahub-corp \
  --database snowflake_db \
  --schema  analytics \
  --technique data-vault-2 \
  -o analytics.fluid.yaml

# Or pass DataHub URNs directly:
fluid forge data-model from-source \
  --source datahub \
  --credential-id datahub-corp \
  --tables 'urn:li:dataset:(urn:li:dataPlatform:snowflake,db.schema.orders,PROD)' \
           'urn:li:dataset:(urn:li:dataPlatform:snowflake,db.schema.customers,PROD)' \
  -o orders.fluid.yaml

URN normalisation: type the short form

Operators don't have to type DataHub's verbose URNs. The adapter accepts three forms and normalises:

You typeAdapter expands to
urn:li:dataset:(urn:li:dataPlatform:snowflake,db.schema.orders,PROD)unchanged (full URN)
snowflake.db.ordersurn:li:dataset:(urn:li:dataPlatform:snowflake,db.orders,PROD)
db.schema.orders (no platform prefix)rejected — needs platform; use --platform snowflake to default

The normalisation is a pure function: see DataHubCatalogAdapter._normalise_urn for the exact mapping.

What lands where

DataHub sourceForge output
Dataset descriptionOSIDataset.fields[].expression.description
Schema column descriptionsOSIDataset.fields[].expression.description
Primary key constraintOSIDataset.primary_key[]
Upstream / downstream lineagemetadata.lineage.upstream[] + DV2 link inference
Business glossary termsOSI.ai_context.synonyms + examples
Ownership (technical / business)metadata.owner.team (technical) + metadata.steward (business)
Tagsmetadata.labels.tags[]
Domainsmetadata.domain + industry hint
Business attributesOSIDataset.fields[].expression.description (appended)

Common errors

CatalogConfigError: acryl-datahub missing

Run pip install "data-product-forge[datahub]".

CatalogPermissionError: 401 Unauthorized: token invalid

Suggestion list:

  • Generate a new PAT from the DataHub UI (Profile → Generate token).
  • Verify the policy assigned to your user includes View Entity Page for the datasets you want to forge.

CatalogConnectionError: 404 Not Found

Verify the DataHub server URL is reachable AND the path you pass (database / schema / URN) actually exists in DataHub. The adapter distinguishes 401 (permission) from 404 (not found) so you don't go hunting for IAM grants when the issue is a typo'd URN.

none auth warning at startup

You picked the none auth method. The adapter logs a warning so production users don't ship to prod with no auth. Switch to pat for any non-sandbox deployment.

Lineage tab empty in the forged contract

Likely missing the View Lineage policy. DV2 link inference falls back to FK constraints only — forge still works.

See also

  • Catalog index
  • DataHub upstream docs — for installing / configuring DataHub itself.
Edit this page on GitHub
Last Updated: 4/26/26, 10:42 PM
Contributors: fas89, Claude Opus 4.7
Prev
AWS Glue Data Catalog
Next
Data Mesh Manager Catalog