Data Pipeline Templates#
Overview#
The Data Pipeline Templates provide a standardized starting point for building data pipelines and data products. They use the copier Python library to generate a project structure with all the necessary configuration files, CI/CD pipelines, and documentation and testing frameworks already in place.
These templates accelerate your data product development by providing all the necessary boilerplate setup, eliminating repetitive setup work and ensuring consistency across projects by encoding organizational best practices for development, deployment, and compliance requirements.
What you will learn#
- What data pipeline templates are and how they work
- The different types of templates available in the ecosystem
- What each template provides and when to use them
- How to choose the right combination of templates for your project
- How to get started with using templates
Key personas and stakeholders - RACI matrix#
| Activity | Data Engineer | Data Architect | Product Owner | IT QA/IT QC |
|---|---|---|---|---|
| Template selection and generation | R | C | I | I |
| Template customization | R | C | I | I |
| CI/CD configuration | R | C | I | C |
| Documentation setup | R | I | A | C |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Template ecosystem#
The templates ecosystem consists of -
- Core Template
- Python Project Template
- Data Product Template
Core Template#
Repository: dc-template-core
The Core Template serves as the foundation for all the projects by providing essential project scaffolding.
What it provides:
- Project scaffolding - Standard directory structure and configuration files
- CI/CD pipeline - GitHub Actions workflows for automated deployment and testing
- Databricks Asset Bundle - Multi-environment deployment configuration
- Documentation - dc-release framework based documentation templates
When to use:
- Starting any new data engineering project
- Required as the foundation for all projects
- Sufficient by itself for simple data pipelines and notebook-based workflows
Python Project Template#
Repository: dc-template-python-project
The Python Project Template adds Python development tooling as well as code quality tooling to the Core Template.
What it provides:
- Project structure - Organized Python project layout
- Python project management -
hatchfor build and dependency management - Testing framework -
pytestsetup with test structure - Code quality tools - Linting, formatting and type checking with
ruffandmypy.
When to use:
- Projects requiring complex Python transformations
- Building reusable Python libraries
- Teams prioritizing code quality and testing
- Production-grade data pipelines
Data Product Template#
Repository: dc-template-data-product
The Data Product Template adds capabilities for publishing data products to the NN Data Marketplace (NNDM) and includes data quality validation features.
What it provides:
- NNDM integration - Configuration for marketplace publishing
- Sample data contract - A Sample data contract specification file
- Sample product config - A Sample data product configuration file
- Data Quality Framework - Soda library for generating and executing data quality rules
- Data Contract Based Checks - Automated validation based on data contract specifications
When to use:
- Creating data products for cross-team sharing
- Publishing curated datasets to the organization
- Projects requiring formal data contracts
- Projects requiring automated data quality checks
- High-assurance data products with quality requirements
Quick Recap
The Templates are designed to work together with each other like LEGO blocks. You start with the Core Template as the base, and add additional templates on top as needed.
- Core Template - provides base infrastructure
- Python Project Template - adds Python development capabilities
- Data Product Template - adds marketplace publishing and data quality capabilities
Choosing the right templates#
Select templates based on your project requirements:
| Your project needs | Templates to use |
|---|---|
| End-to-end data engineering project | Core + Python Project + Data Product |
| Standard Python pipelines, POC projects | Core + Python Project |
| Highly complex or customised solutions with need for flexibility | Core Template only |
Get started#
For detailed setup instructions, see Set up repo using Templates.
For working with templates locally and customization, see Working with templates locally.