Jenkins CI/CD for FLUID Data Products
This walkthrough demonstrates how to implement end-to-end CI/CD for declarative data products using Jenkins and the FLUID framework.
Compatibility note
The contracts and pipeline output shown on this page reference FLUID schema 0.7.1. The CLI validates each contract against its own declared version, so these examples remain valid. For new contracts, prefer the current 0.7.2 shape β run fluid init my-project --quickstart to scaffold one, and use fluid generate ci --system jenkins to regenerate a matching Jenkinsfile.
π Overview
The Jenkins pipeline automates the complete lifecycle of a data product:
- Contract Validation - Validate FLUID contract schema
- Static Analysis - Check governance policies and best practices
- Deployment Planning - Generate execution plan without applying changes
- Testing - Run dbt tests and data quality checks
- Deployment - Apply contract and create/update resources
- Verification - Validate deployment and data quality
π― Benefits of Declarative CI/CD
Traditional Approach (Imperative)
stage('Deploy') {
sh 'bq mk dataset'
sh 'bq load ...'
sh 'dbt run'
sh 'python update_metadata.py'
}
Problems:
- β Manual resource creation
- β No validation before deployment
- β Hard to rollback
- β No infrastructure drift detection
FLUID Approach (Declarative)
stage('Deploy') {
sh 'fluid validate contract.fluid.yaml'
sh 'fluid plan contract.fluid.yaml'
sh 'fluid apply contract.fluid.yaml'
sh 'fluid verify contract.fluid.yaml'
}
Benefits:
- β Single source of truth (contract)
- β Automatic validation
- β Plan before apply (like Terraform)
- β Drift detection via verify
ποΈ Pipeline Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Source Control (Git) β
β contract.fluid.yaml + dbt models β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Jenkins Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Setup β Install dependencies, verify env β
β 2. Validate β fluid validate contract.fluid.yaml β
β 3. Static Analysisβ Check policies, governance metadata β
β 4. Plan β fluid plan (preview changes) β
β 5. Test β dbt test + data quality checks β
β 6. Deploy β dbt run + bq updates (labels) β
β 7. Verify β fluid verify (check compliance) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Cloud Platform (BigQuery) β
β β’ Tables: bitcoin_prices β
β β’ Views: daily_price_summary, price_trends β
β β’ Labels: cost-center, environment, sla-tier β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Project Structure
bitcoin-tracker/
βββ Jenkinsfile # Pipeline definition
βββ contract.fluid.yaml # Data product contract
βββ load_bitcoin_price_batch.py # Data ingestion script
βββ dbt/
β βββ dbt_project.yml
β βββ profiles.yml
β βββ models/
β βββ daily_price_summary.sql
β βββ price_trends.sql
β βββ schema.yml
βββ runtime/ # Generated artifacts
βββ plan.json
βββ validation-report.json
βββ test-results.json
π Getting Started
1. Jenkins Setup
Prerequisites:
- Jenkins 2.x or later
- Plugins:
- Pipeline
- Git
- Google Cloud SDK
- Credentials Binding
Configure Jenkins:
// In Jenkins System Configuration
// 1. Add GCP credentials
Credentials β Add Credentials
Kind: Google Service Account from private key
Project Name: dust-labs-485011
ID: gcp-data-product-deployer
// 2. Configure tools
Global Tool Configuration
Python: Python 3.9+
Git: Latest
2. Create Jenkins Pipeline
Option A: Pipeline from SCM
// In Jenkins Job Configuration
Pipeline β Definition β Pipeline script from SCM
SCM: Git
Repository URL: https://github.com/your-org/fluid-mono.git
Script Path: forge_docs/examples/bitcoin-tracker/Jenkinsfile
Option B: Inline Pipeline
// Copy Jenkinsfile content directly into Jenkins job
3. Configure GCP Authentication
Service Account Method (Recommended for Production):
# 1. Create service account
gcloud iam service-accounts create fluid-cicd \
--display-name="FLUID CI/CD Service Account" \
--project=dust-labs-485011
# 2. Grant permissions
gcloud projects add-iam-policy-binding dust-labs-485011 \
--member="serviceAccount:fluid-cicd@dust-labs-485011.iam.gserviceaccount.com" \
--role="roles/bigquery.admin"
# 3. Create key
gcloud iam service-accounts keys create ~/fluid-cicd-key.json \
--iam-account=fluid-cicd@dust-labs-485011.iam.gserviceaccount.com
# 4. Add to Jenkins credentials
# Upload fluid-cicd-key.json as "Secret file" credential
OAuth Method (Development):
# In Jenkins agent, authenticate with:
gcloud auth application-default login
4. Set Environment Variables
In Jenkins job configuration:
environment {
GCP_PROJECT_ID = 'dust-labs-485011'
GOOGLE_APPLICATION_CREDENTIALS = credentials('gcp-data-product-deployer')
}
π Pipeline Stages Explained
Stage 1: Setup Environment
Purpose: Install dependencies and verify prerequisites
stage('Setup Environment') {
steps {
sh '''
# Install Python packages
pip install dbt-core dbt-bigquery google-cloud-bigquery
# Verify contract file exists
test -f contract.fluid.yaml
# Test GCP access
bq ls --project_id=${GCP_PROJECT_ID}
'''
}
}
Key Checks:
- β Python dependencies installed
- β Contract file exists
- β GCP credentials valid
- β BigQuery access confirmed
Stage 2: Validate Contract
Purpose: Ensure contract is syntactically correct and schema-compliant
stage('Validate Contract') {
steps {
sh '''
python3 -m fluid_build.cli validate contract.fluid.yaml
'''
}
}
What's Validated:
- YAML syntax correctness
- FLUID 0.7.1 schema compliance
- Required fields present
- Data types valid
- References consistent
Sample Output:
Starting validate_contract
Metric: validation_duration=0.042seconds
Metric: validation_errors=0count
Metric: validation_warnings=0count
β
Valid FLUID contract (schema v0.7.1)
Stage 3: Static Analysis
Purpose: Check best practices and governance policies
stage('Static Analysis') {
steps {
sh '''
# Check governance metadata
python3 check_governance.py
# Verify dbt models exist
test -f dbt/models/daily_price_summary.sql
'''
}
}
Checks Performed:
Governance Metadata:
- Product-level labels present
- Owner information defined
- Cost allocation tags set
Data Policies:
- Classification defined (Public/Internal/Confidential)
- Access control policies (readers/writers)
- Sensitivity markers for PII fields
Dependencies:
- dbt models exist for declared builds
- Python scripts exist for ingestion
- Source tables referenced exist
Example Policy Check:
# Contract validation rules
required_labels = ['cost-center', 'data-classification', 'billing-tag']
required_policies = ['classification', 'authz']
for expose in contract['exposes']:
if not all(label in expose['labels'] for label in required_labels):
raise ValidationError(f"Missing required labels in {expose['exposeId']}")
Stage 4: Plan Deployment
Purpose: Preview changes before applying (like terraform plan)
stage('Plan Deployment') {
steps {
sh '''
python3 -m fluid_build.cli plan contract.fluid.yaml
'''
}
}
Output:
============================================================
FLUID Execution Plan
============================================================
Total Actions: 6
1. provision_bitcoin_prices_table (provisionDataset)
2. schedule_build_1 (scheduleTask)
3. provision_daily_price_summary (provisionDataset)
...
Benefits:
- See what will change before deployment
- Catch configuration errors early
- Review resource costs
- Validate access permissions
Stage 5: Run Tests
Purpose: Validate data quality and transformations
stage('Run Tests') {
when {
expression { !params.SKIP_TESTS }
}
steps {
sh '''
# Run dbt tests
cd dbt && dbt test
# Custom data quality checks
python3 data_quality_checks.py
'''
}
}
dbt Tests:
# dbt/models/schema.yml
models:
- name: daily_price_summary
columns:
- name: date
tests:
- not_null
- unique
- name: open_price_usd
tests:
- not_null
- positive_value
Custom Quality Checks:
# data_quality_checks.py
checks = [
{
'name': 'No NULL prices',
'query': 'SELECT COUNT(*) FROM bitcoin_prices WHERE price_usd IS NULL',
'expected': 0
},
{
'name': 'Data freshness',
'query': '''
SELECT TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), MAX(price_timestamp), HOUR)
FROM bitcoin_prices
''',
'threshold': 24 # Data should be < 24 hours old
}
]
Stage 6: Deploy
Purpose: Apply contract and create/update resources
stage('Deploy') {
when {
expression { !params.DRY_RUN }
}
steps {
// Production requires manual approval
input message: 'Deploy to Production?'
sh '''
# Run dbt models
cd dbt && dbt run
# Apply labels
bq update --set_label environment:${ENVIRONMENT} \\
--set_label cost-center:engineering \\
${GCP_PROJECT_ID}:crypto_data.bitcoin_prices
# Load fresh data
python3 load_bitcoin_price_batch.py
'''
}
}
Key Actions:
- Manual Approval (for production)
- dbt Execution - Create/update views
- Label Application - FinOps tracking
- Data Loading - Ingest fresh data
Deployment Safety:
// Require approval for production
if (params.ENVIRONMENT == 'production') {
input message: 'Deploy to Production?',
ok: 'Deploy',
submitter: 'admin,release-manager'
}
Stage 7: Verify Deployment
Purpose: Validate resources match contract specification
stage('Verify Deployment') {
steps {
sh '''
# Run FLUID verify
python3 -m fluid_build.cli verify contract.fluid.yaml
# Check labels applied
bq show --format=json bitcoin_prices | jq '.labels'
# Query sample data
bq query "SELECT * FROM daily_price_summary LIMIT 1"
'''
}
}
Verification Steps:
- Schema Compliance - Fields match contract
- Data Types - Correct types (FLOAT64, TIMESTAMP, etc.)
- Labels - Cost tracking labels applied
- Data Quality - Sample queries return valid data
- Freshness - Latest data within SLA
Sample Verification Output:
================================================================================
π FLUID Verify - Multi-Dimensional Contract Validation
================================================================================
π Verifying: bitcoin_prices_table
π Dimension 1: Schema Structure
β
PASS - All 8 column names match specification
π Dimension 2: Data Types
β
PASS - All types match
π Dimension 3: Constraints
β
PASS - All field constraints match
π Dimension 4: Location
β
PASS - Region: us-central1
π Dimension 5: Labels
β
PASS - All required labels present
ποΈ Pipeline Parameters
Configure deployment behavior with parameters:
parameters {
choice(
name: 'ENVIRONMENT',
choices: ['staging', 'production'],
description: 'Deployment environment'
)
booleanParam(
name: 'DRY_RUN',
defaultValue: false,
description: 'Plan only, do not apply'
)
booleanParam(
name: 'SKIP_TESTS',
defaultValue: false,
description: 'Skip test execution'
)
}
Usage:
- Development:
ENVIRONMENT=staging,DRY_RUN=false,SKIP_TESTS=false - Production:
ENVIRONMENT=production,DRY_RUN=false,SKIP_TESTS=false - Preview: Any environment,
DRY_RUN=true
π Monitoring & Logging
Console Output
The pipeline provides detailed, color-coded output:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 2: FLUID Contract Validation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Validating FLUID contract...
β
Contract validation PASSED
Contract is compliant with FLUID 0.7.1 schema
π Contract Metadata:
ID: crypto.bitcoin_prices_gcp
Version: 0.7.1
Exposes: 3 datasets
Builds: 3 transformations
Artifacts
The pipeline archives important artifacts:
validation-report.json- Contract validation resultsplan.json- Deployment plandbt-test-output.log- Test resultsverify-output.log- Post-deployment verification
Access artifacts:
Jenkins Job β Build #123 β Artifacts β runtime/plan.json
Metrics
Key metrics tracked:
- Build Duration - Total pipeline execution time
- Validation Duration - Contract validation time
- Test Pass Rate - Percentage of tests passing
- Deployment Success Rate - % of successful deployments
π Security Best Practices
1. Credentials Management
Don't:
// β Never hardcode credentials
environment {
GCP_KEY = 'AKIA...' // BAD!
}
Do:
// β
Use Jenkins credentials
environment {
GOOGLE_APPLICATION_CREDENTIALS = credentials('gcp-service-account')
}
2. Least Privilege
# Grant only required permissions
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/bigquery.dataEditor" # Not bigquery.admin
3. Approval Gates
// Require approval for sensitive operations
if (params.ENVIRONMENT == 'production') {
input message: 'Proceed with production deployment?',
submitter: 'release-managers'
}
4. Audit Logging
post {
always {
// Log all deployments
sh '''
echo "$(date): Deployment to ${ENVIRONMENT} by ${BUILD_USER}" \
>> /var/log/fluid-deployments.log
'''
}
}
π Troubleshooting
Common Issues
1. Contract Validation Fails
Error:
β Contract validation FAILED
Error: Missing required field 'fluidVersion'
Solution:
# Add at top of contract.fluid.yaml
fluidVersion: "0.7.2"
kind: DataProduct
2. GCP Authentication Fails
Error:
ERROR: (gcloud.auth.application-default.login)
Unable to find Application Default Credentials
Solution:
# In Jenkins, set credential binding
withCredentials([file(credentialsId: 'gcp-key', variable: 'GOOGLE_APPLICATION_CREDENTIALS')]) {
sh 'gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}'
}
3. dbt Tests Fail
Error:
Failure in test not_null_bitcoin_prices_price_usd
Got 5 results, expected 0
Solution:
-- Fix NULL values in source data
DELETE FROM bitcoin_prices WHERE price_usd IS NULL;
4. Labels Not Applied
Error:
β οΈ Labels: null
Solution:
# FLUID apply currently has issues, use manual bq update:
bq update --set_label cost-center:engineering table_name
π Advanced Patterns
Multi-Environment Deployment
pipeline {
stages {
stage('Deploy to Staging') {
environment {
GCP_PROJECT_ID = 'project-staging'
}
steps {
sh 'fluid apply contract.fluid.yaml'
}
}
stage('Smoke Test Staging') {
steps {
sh 'python3 smoke_tests.py --env=staging'
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
environment {
GCP_PROJECT_ID = 'project-production'
}
steps {
input 'Deploy to production?'
sh 'fluid apply contract.fluid.yaml'
}
}
}
}
Parallel Testing
stage('Tests') {
parallel {
stage('Unit Tests') {
steps {
sh 'pytest tests/unit'
}
}
stage('dbt Tests') {
steps {
sh 'dbt test'
}
}
stage('Data Quality') {
steps {
sh 'python3 dq_checks.py'
}
}
}
}
Rollback Support
stage('Deploy') {
steps {
script {
try {
sh 'fluid apply contract.fluid.yaml'
} catch (Exception e) {
echo "Deployment failed, rolling back..."
sh 'git checkout HEAD~1 contract.fluid.yaml'
sh 'fluid apply contract.fluid.yaml'
throw e
}
}
}
}
π Related Documentation
π― Next Steps
- Set Up Jenkins Job - Create pipeline using provided Jenkinsfile
- Configure GCP Access - Set up service account credentials
- Test in Staging - Run pipeline with
ENVIRONMENT=staging - Monitor First Deploy - Review logs and artifacts
- Automate Triggers - Set up Git webhook for automatic builds
- Add Notifications - Configure Slack/email alerts for failures
π‘ Best Practices Summary
β DO:
- Use declarative FLUID contracts as single source of truth
- Validate before deploying (
fluid validateβfluid planβfluid apply) - Run tests in CI/CD pipeline
- Require manual approval for production
- Archive deployment artifacts
- Monitor data quality post-deployment
β DON'T:
- Hardcode credentials in pipeline
- Skip validation or tests for "quick fixes"
- Deploy directly to production without staging
- Ignore failed tests in production pipeline
- Deploy without reviewing the plan
Questions or issues? Open an issue on GitHub.
