Deploy Phase#

Overview#

The Deploy phase implements automated deployment pipelines using the framework to ensure reliable, compliant, and traceable deployments to production environments. This phase establishes CI/CD workflows that automatically promote code through multiple environments with appropriate testing and approval gates.

What You Will Learn#

After completing this chapter, you will understand how to:

Configure and customize the dc-release framework for deploying your data products
Set up automated CI/CD pipelines using GitHub Actions workflows
Configure deployment secrets and federated authentication
Monitor deployments and troubleshoot common issues
Follow GxP-compliant release processes with proper documentation
Set up automated testing and quality gates at each deployment stage

Key Personas & Stakeholders - RACI Matrix#

Activity	Data Product Owner	Data Engineer	IT PM/SM/IM	Solution Architect	Release Manager
CI/CD Configuration	A	R	C	C	I
Environment Setup	C	R	C	C	I
Deployment Validation	A	C	C	C	R
Release Approval	A	C	R	C	R
Production Deployment	C	R	A	C	R
Documentation Review	R	C	A	C	R

R = Responsible, A = Accountable, C = Consulted, I = Informed

Prerequisites#

Before proceeding with deployment setup, ensure you have the following:

Technical Prerequisites#

Repositories: You need to have at least 3 repositories created
- your code repository, setup using Templates as described here.
- -requirements repository needed for dc-release framework to store your business solution requirements and
- -release-log repository to store your solution's release notes as needed by dc-release framework
Completed Build Phase: Data product successfully built and verified as described here
Azure Resources: Databricks workspace and Unity catalog provisioned for all environments (dev, tst, val, prd)

Access and Permissions#

GitHub Organization: Admin access to configure secrets and variables in your repositories
Azure Subscriptions: Contributor access to target environments
Service Principals: Federated credentials configured for GitHub Actions

Documentation Requirements#

Functional Specification: Complete with intended use and public API definition
Risk Assessment: Solution-specific risk analysis completed
Test Strategy: Comprehensive testing approach documented
Operations Manual: Deployment and maintenance procedures defined
Recovery Procedure: Rollback and recovery strategies defined

Understanding the dc-release Framework#

The dc-release framework provides a standardized approach for deploying GxP solutions with:

Automated Environment Promotion: Code flows through DEV → TST → VAL → PRD
Quality Gates: Automated testing and manual approvals at each stage
Documentation Generation: Automatic release notes and compliance documentation
Audit Trail: Complete deployment history with approvals and test results
Rollback Capability: Safe rollback procedures for production issues

dc-release Process Flow#

The following diagram illustrates the complete dc-release process from development to production deployment:

Hold "Alt" / "Option" to enable Pan & Zoom

Process Breakdown#

Phase	Environment	Key Activities
DEV	Development	Code development, Unit tests
TST	Test	Unit tests, Document tests, Installation verification (IV), Integration tests (OV)
VAL	Validation	Same as TST + Acceptance verification, QC review, SME approval
PRD	Production	Installation verification, Integration tests (pOV), Final approvals

Key Approval Gates#

SME (Subject Matter Expert): Reviews and approves at multiple stages
QC (Quality Control): Reviews and approves before and after production deployment
Acceptance Verification: Manual validation in VAL environment
Change Records: Draft and final change documentation published at each stage
Release with Tag: To trigger prod deployment

Step-by-Step Deployment Process#

Step 1: Configure GitHub Apps, Secrets and Variables#

Once you have completed the prerequisites and set up your code repository, you need to setup the required authentication and configuration for your deployment pipeline.

We recommend using GitOps (Configuration as Code) to manage secrets and variables, ensuring that changes are peer-reviewed and version-controlled.

Refer to Step 6 in Set Up Repository Using Templates page for detailed instructions on configuring the GitHub Apps, Secrets and Variables needed for deploying and running your pipelines.

Step 2: Configure Azure Federated Authentication#

Enable GitHub Actions to deploy to Azure without storing long-lived credentials.

Set Up Service Principal Access#

Navigate to Azure Portal: - Go to Azure Portal → App Registrations - Select your Databricks service principal
Add Federated Credentials: - Navigate to Certificates & secrets → Federated Credentials - Click Add Credential
Configure Credential:

Federated credential scenario: Other issuer
Issuer: https://token.actions.githubusercontent.com
Subject identifier: repo:YourOrg/YourRepo:ref:refs/heads/*
Name: github-actions-main
Audience: api://AzureADTokenExchange

Add Additional Credentials for Tags:

Subject identifier: repo:YourOrg/YourRepo:ref:refs/tags/*
Name: github-actions-tags

Multiple Environment Credentials

Set up separate federated credentials for each environment if using different service principals for dev, tst, val, and prd.

Expected Outcome: GitHub Actions can authenticate to Azure for all target environments.

Step 3: Customize Deployment Workflows#

The dc-template-core provides placeholder GitHub Actions workflows. Customize them for your specific deployment needs.

Understanding the Main Workflow#

The main workflow (/.github/workflows/main.yaml) orchestrates the entire deployment process:

main.yaml

name: Main
on:
  push:
    branches: [main]
    tags: ["*"]
  pull_request:
    branches: ["**"]

jobs:
  code_quality:
    name: TEST-your-project-1
    uses: ./.github/workflows/check_code_quality.yaml

  dev:
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: dev
    secrets: inherit

  test_docs:
    uses: NovoNordisk-DataCore/dc-release/.github/workflows/publish.yaml@main
    with:
      environment: dev
    secrets: inherit

  tst:
    needs: [code_quality, dev, test_docs]
    if: ${{ github.ref == 'refs/heads/main' || startswith(github.ref, 'refs/tags/') }}
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: tst
    secrets: inherit

  pre_val:
    needs: [tst]
    if: ${{ startswith(github.ref, 'refs/tags/') }}
    uses: NovoNordisk-DataCore/dc-release/.github/workflows/publish.yaml@main
    with:
      environment: tst
    secrets: inherit

  val:
    needs: [pre_val]
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: val
    secrets: inherit

  pre_prd:
    needs: [val]
    uses: NovoNordisk-DataCore/dc-release/.github/workflows/approve_to_prd.yaml@main
    secrets: inherit

  prd:
    needs: [pre_prd]
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: prd
    secrets: inherit

  for_use:
    needs: [prd]
    uses: NovoNordisk-DataCore/dc-release/.github/workflows/approve_for_use.yaml@main
    secrets: inherit

[Optional] Customize Main workflow for non-GxP needs#

The documentation, val and pre-val checks can be removed for a non-GxP solution. Below is an example:

Example non-GxP Main Workflow

name: Main
on:
  push:
    branches:
      - main
    tags:
      - "*"

  pull_request:
    branches:
      - "**"

jobs:
  code_quality:
    name: TEST-${{ github.event.repository.name }}-1
    uses: ./.github/workflows/check_code_quality.yaml

  dev:
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: dev
    secrets: inherit

  tst:
    needs: [code_quality, dev] #removed test_docs
    if: ${{ github.ref == 'refs/heads/main' || startswith(github.ref, 'refs/tags/') }}
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: tst
    secrets: inherit

  pre_prd:
    needs: [tst]
    uses: NovoNordisk-DataCore/dc-release/.github/workflows/approve_to_prd.yaml@main
    secrets: inherit

  prd:
    needs: [pre_prd]
    uses: ./.github/workflows/deploy_and_test.yaml
    with:
      environment: prd
    secrets: inherit

Customize deploy_and_test.yaml#

Modify /.github/workflows/deploy_and_test.yaml to implement your specific deployment logic:

deploy_and_test.yaml

on:
  workflow_call:
    inputs:
      environment:
        description: Target environment
        required: true
        type: string

jobs:
  deploy:
    name: Deploy to ${{ inputs.environment }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install databricks-cli
          pip install poetry

      - name: Build Python wheel
        run: |
          poetry install
          poetry build

      - name: Deploy to Databricks
        env:
          DATABRICKS_HOST: ${{ vars.DATABRICKS_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
        run: |
          databricks bundle validate --target ${{ inputs.environment }}
          databricks bundle deploy --target ${{ inputs.environment }}

      - name: Run integration tests
        if: inputs.environment != 'prd'
        run: |
          # Run your integration tests here
          python -m pytest tests/integration/ --env=${{ inputs.environment }}

      - name: Run verification tests
        if: inputs.environment == 'prd'
        run: |
          # Run only non-destructive verification tests in production
          python -m pytest tests/verification/ --env=prd

Expected Outcome: Customized deployment workflows that handle your specific pipeline requirements.

Step 4: Set Up Environment-Specific Configuration#

Configure Databricks Asset Bundle for different environments using the bundle configuration generated by the templates.

Databricks Bundle Configuration#

The dc-template-core generates a databricks.yml file. Customize it for your environments:

databricks.yml

bundle:
  name: corp-procurement

# The following variables are passed to the job during deployment
variables:
  environment:
    description: the environment to use
  run_as:
    description: user to run as
  webhook_on_failure:
    description: The webhook used to notify on failure
  manage_group_name:
    description: Name of the group that can manage the job

artifacts:
  default:
    type: whl
    build: hatch run databricks:build "${workspace.file_path}"
    path: .

sync:
  include:
    - dist/*.whl
    - requirements-databricks-wheels.txt

resources:
  jobs:
    main:
      # This ensures other developers can see
      # manage and debug the job in the UI
      permissions:
        - group_name: ${var.manage_group_name}
          level: CAN_MANAGE
      name: ${bundle.name}/${var.environment}/main

      # schedule the job
      schedule:
       quartz_cron_expression: "0 0 0/3 * * ?"
       timezone_id: Europe/Amsterdam

      # set tags as you see fit
      tags:
        specification: FS-${bundle.name}-1
        git-origin: ${bundle.git.origin_url}
        git-commit: ${bundle.git.commit}
      environments:
        - environment_key: default
          spec:
            client: "2"
            dependencies:
              - --requirement ${workspace.file_path}/locks/databricks/requirements.txt
              - --requirement ${workspace.file_path}/requirements-databricks-wheels.txt

      # Define the tasks that are part of this job
      # e.g. the two tasks - source_to_landing
      # and landing_to_bronze
      tasks:
        - task_key: source_to_landing
          python_wheel_task:
            entry_point: source_to_landing
            package_name: etl
            parameters:
              - ${var.environment}
          environment_key: default
          # configure webhook notifications
          webhook_notifications:
            on_failure:
              - id: ${var.webhook_on_failure}

        - task_key: landing_to_bronze
          run_if: ALL_DONE
          # This task depends on the previous task
          # running succesfully
          depends_on:
            - task_key: source_to_landing
          python_wheel_task:
            entry_point: landing_to_bronze
            package_name: etl
            parameters:
              - ${var.environment}
          environment_key: default
          webhook_notifications:
           on_failure:
             - id: ${var.webhook_on_failure}

workspace:
  # This is the rootpath for both the development and production target
  root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}

targets:
  # This target is used for development
  development:
    mode: development

  # This target is used when deploying to production
  # and ensures that its run as the service principal
  main:
    default: true
    mode: production
    run_as:
      service_principal_name: ${var.run_as}

Expected Outcome: Databricks Asset Bundles that can deploy jobs or any other resources to Databricks configured and ready for deployment.

Step 5: Implement Release Process#

Follow the dc-release framework process for creating production releases.

Create Release Notes#

Before tagging a release, create release notes in documentation/releases/X.Y.Z.md. Read the documentation/releases/README.md and use documentation/releases/TEMPLATE_RELEASE_NOTE.md.Below is an example :

# Release note

This release contains the following changes:

* [#123 - Add customer lifecycle segmentation](https://github.com/org/repo/pull/123)
* [#124 - Improve data quality validation](https://github.com/org/repo/pull/124)
* [#125 - Update documentation for new API endpoints](https://github.com/org/repo/pull/125)

## Impact assessment

This is a minor release adding new segmentation capabilities. No breaking changes to existing APIs. New endpoints are additive and backward compatible.

## Risk assessment

- **Low Risk**: New functionality is isolated and well-tested
- **Data Quality**: Enhanced validation reduces risk of data issues
- **Performance**: No impact on existing pipeline performance
- **Dependencies**: No changes to external dependencies

Tag and Deploy Release#

Create and merge release notes PR:

git checkout -b release/1.2.0
# Create release notes file
git add documentation/releases/1.2.0.md
git commit -m "Add release notes for v1.2.0"
git push origin release/1.2.0
# Create PR and get it approved/merged

Create release tag:

git checkout main
git pull origin main
git tag v1.2.0
git push origin v1.2.0

Monitor deployment pipeline:

Watch GitHub Actions for automated progression through environments
Verify deployments succeed in each environment
Review test results and quality gates

Success

Release automatically progresses through environments with proper approval gates and release notes will get deployed with a folder name equalling the version number deployed and same will reflect in the chosen *-release-log repo created as pre-requisites.

Success Metrics & Checkpoints#

Common Challenges & Solutions#

Challenge: GitHub Actions Authentication Failures#

Symptoms:

Error: Error: Could not authenticate to Azure
Failed federated credential validation

Solution:

Verify federated credential configuration in Azure Portal
Check subject identifier format: repo:org/repo:ref:refs/heads/*
Ensure service principal has correct permissions
Validate GitHub secrets are correctly set

Prevention: Test authentication in a simple workflow before full deployment

Challenge: Databricks Bundle Validation Errors#

Symptoms:

databricks bundle validate fails
Resource configuration errors

Solution:

Check databricks.yml syntax and indentation
Verify workspace URLs and service principal names
Test bundle validation locally: databricks bundle validate --target dev
Review Databricks CLI authentication

Prevention: Validate bundle configuration in development environment first

Challenge: Environment-Specific Configuration Issues#

Symptoms:

Wrong catalog or workspace used in deployment
Jobs deployed with incorrect configuration

Solution:

Review target-specific variables in databricks.yml
Verify environment variables passed to workflows
Check bundle target selection in deployment scripts
Test deployment with verbose logging

Prevention: Use clear naming conventions for environment-specific resources

Next Steps#

After successful deployment setup:

Operations and Maintenance - Learn about operations and maintenance procedures
Monitor Production Jobs - Set up comprehensive monitoring and alerting
Establish Support Procedures - Create runbooks and escalation procedures

Deploy Phase#

Overview#

What You Will Learn#

Key Personas & Stakeholders - RACI Matrix#

Prerequisites#

Technical Prerequisites#

Access and Permissions#

Documentation Requirements#

Understanding the dc-release Framework#

dc-release Process Flow#

Process Breakdown#

Key Approval Gates#

Step-by-Step Deployment Process#

Step 1: Configure GitHub Apps, Secrets and Variables#

Step 2: Configure Azure Federated Authentication#

Set Up Service Principal Access#

Step 3: Customize Deployment Workflows#

Understanding the Main Workflow#

[Optional] Customize Main workflow for non-GxP needs#

Customize deploy_and_test.yaml#

Step 4: Set Up Environment-Specific Configuration#

Databricks Bundle Configuration#

Step 5: Implement Release Process#

Create Release Notes#

Tag and Deploy Release#

Success Metrics & Checkpoints#

Common Challenges & Solutions#

Challenge: GitHub Actions Authentication Failures#

Challenge: Databricks Bundle Validation Errors#

Challenge: Environment-Specific Configuration Issues#

Next Steps#

Additional Resources#

Framework Documentation#

Template Resources#

Azure and Databricks#