Skip to content

Data Pipeline Templates#

Overview#

The Data Pipeline Templates provide a standardized starting point for building data pipelines and data products. They use the copier Python library to generate a project structure with all the necessary configuration files, CI/CD pipelines, and documentation and testing frameworks already in place.

These templates accelerate your data product development by providing all the necessary boilerplate setup, eliminating repetitive setup work and ensuring consistency across projects by encoding organizational best practices for development, deployment, and compliance requirements.

What you will learn#

  • What data pipeline templates are and how they work
  • The different types of templates available in the ecosystem
  • What each template provides and when to use them
  • How to choose the right combination of templates for your project
  • How to get started with using templates

Key personas and stakeholders - RACI matrix#

Activity Data Engineer Data Architect Product Owner IT QA/IT QC
Template selection and generation R C I I
Template customization R C I I
CI/CD configuration R C I C
Documentation setup R I A C

R = Responsible, A = Accountable, C = Consulted, I = Informed

Template ecosystem#

The templates ecosystem consists of -

  • Core Template
  • Python Project Template
  • Data Product Template

Hold "Alt" / "Option" to enable Pan & Zoom
Template Ecosystem Diagram

Core Template#

Repository: dc-template-core

The Core Template serves as the foundation for all the projects by providing essential project scaffolding.

What it provides:

  • Project scaffolding - Standard directory structure and configuration files
  • CI/CD pipeline - GitHub Actions workflows for automated deployment and testing
  • Databricks Asset Bundle - Multi-environment deployment configuration
  • Documentation - dc-release framework based documentation templates

When to use:

  • Starting any new data engineering project
  • Required as the foundation for all projects
  • Sufficient by itself for simple data pipelines and notebook-based workflows

Python Project Template#

Repository: dc-template-python-project

The Python Project Template adds Python development tooling as well as code quality tooling to the Core Template.

What it provides:

  • Project structure - Organized Python project layout
  • Python project management - hatch for build and dependency management
  • Testing framework - pytest setup with test structure
  • Code quality tools - Linting, formatting and type checking with ruff and mypy.

When to use:

  • Projects requiring complex Python transformations
  • Building reusable Python libraries
  • Teams prioritizing code quality and testing
  • Production-grade data pipelines

Data Product Template#

Repository: dc-template-data-product

The Data Product Template adds capabilities for publishing data products to the NN Data Marketplace (NNDM) and includes data quality validation features.

What it provides:

  • NNDM integration - Configuration for marketplace publishing
  • Sample data contract - A Sample data contract specification file
  • Sample product config - A Sample data product configuration file
  • Data Quality Framework - Soda library for generating and executing data quality rules
  • Data Contract Based Checks - Automated validation based on data contract specifications

When to use:

  • Creating data products for cross-team sharing
  • Publishing curated datasets to the organization
  • Projects requiring formal data contracts
  • Projects requiring automated data quality checks
  • High-assurance data products with quality requirements

Quick Recap

The Templates are designed to work together with each other like LEGO blocks. You start with the Core Template as the base, and add additional templates on top as needed.

  1. Core Template - provides base infrastructure
  2. Python Project Template - adds Python development capabilities
  3. Data Product Template - adds marketplace publishing and data quality capabilities

Choosing the right templates#

Select templates based on your project requirements:

Your project needs Templates to use
End-to-end data engineering project Core + Python Project + Data Product
Standard Python pipelines, POC projects Core + Python Project
Highly complex or customised solutions with need for flexibility Core Template only

Get started#

For detailed setup instructions, see Set up repo using Templates.

For working with templates locally and customization, see Working with templates locally.

Additional resources#