Architecture
- Pipeline Architecture
- Core Components
- Detailed Flow Architecture
- Modularization and Error handling
- Implemented Data Flows (POC project)
Pipeline Architecture#
The logical architecture presents a high-level overview of the Delta Live Tables pipeline, focusing on the core data flow and main processing stages. This simplified representation highlights the key transformation phases and their relationships, making it easier to understand the overall data processing workflow. It also emphasizes the continuous monitoring and quality control aspects of the pipeline.
Delta Live Tables Pipeline
+-----------------------------------------------------------------------------------------+
| |
| +-------------+ +--------------+ +----------------+ +-------------+ |
| | Source | | Bronze | | Validated | | Silver | |
| | Data | >> | Layer | >> | Bronze | >> | Layer | |
| +-------------+ +--------------+ +----------------+ +-------------+ |
| ^ ^ ^ ^ |
| | | | | |
| JSON Files Raw Storage Data Quality Business Logic |
| & Metadata & Reference & Historical |
| Data Tracking |
| |
+-----------------------------------------------------------------------------------------+
|
Quality Metrics
Error Handling
Monitoring
Core Components#
-
Landing Zone - Initial entry point for all raw JSON data files - Implements schema validation on ingestion - Captures file metadata (name, timestamp, source) - Handles bad records with dedicated error paths
-
Bronze Layer - Preserves raw data in Delta format - Enriches with system metadata (sys_surrogate_pk, sys_job_id) - Implements streaming reads for real-time processing - Maintains data lineage from source
-
Validated Bronze Layer - Applies data quality rules and constraints - Harmonizes attributes using reference data - Standardizes field names and formats - Handles invalid data through explicit rules
-
Silver Layer - Implements slowly changing dimensions (Types 1 & 2) - Ensures data consistency and deduplication - Optimizes for analytical queries - Maintains complete historical tracking
Detailed Flow Architecture#
The detailed flow architecture provides a comprehensive view of the data pipeline's physical implementation. It illustrates the specific components at each layer, from raw JSON file ingestion through to analytics-ready tables. This diagram emphasizes the transformation steps, quality controls, and business logic implementation at each stage, showing how data evolves from raw format to business-ready state.
+------------------+ +----------------------+ +-------------------------+ +----------------------+
| Landing Zone | | Bronze Layer | | Validated Bronze | | Silver Layer |
| | | | | | | |
| +-----------+ | | +---------------+ | | +------------------+ | | +---------------+ |
| | JSON Files | | | | Raw Tables | | | | Quality-Checked | | | | SCD Tables | |
| | - Profiles |--+---->| | - Metadata |--+---->| | - Validated |--+---->| | - Type 1/2 | |
| | - Insulin | | | | - Lineage | | | | - Harmonized | | | | - Analytics | |
| | - Consent | | | | - Streaming | | | | - Transformed | | | | - History | |
| +-----------+ | | +---------------+ | | +------------------+ | | +---------------+ |
| | | | | | | |
+------------------+ +----------------------+ +-------------------------+ +----------------------+
| | | |
| | | |
File Ingestion Data Preservation Quality Control Business Logic
- Schema Check - System Metadata - Data Validation - Historization
- Batch/Streaming - Original Values - Harmonization - Deduplication
- Error Handling - Lineage Tracking - Business Rules - Analytics Ready
Modularization and Error handling#
One of the challenging aspect of DLT pipelines is how we can modularize our code and where we can store the error records which does not pass the DLT expectations check. Also it was important to configure the code in such a way that can be reused for many table pipelines.
Hence the error handling and code modularity and reusability article explains how we overcame the above challenges.
Implemented Data Flows (POC project)#
-
User Profile Pipeline - Manages patient demographic and health information - Tracks changes in diabetes type and app usage - Implements SCD Type 2 for full history preservation - Includes app version and OS tracking - Code Repository
-
Insulin Data Pipeline - Records insulin administration events - Tracks dosage and timing information - Uses SCD Type 1 for latest state only - Maintains device association data - Code Repository
-
Additional SCD Type 1 Implementations: - Smart Pen Pipeline
- Tracks smart pen device data
- Maintains device status and configurations
- Implements SCD Type 1 for current state
-
- Records meal information
- Tracks nutritional data
- Uses SCD Type 1 for latest state
-
- Manages medication records
- Tracks medication details
- Implements SCD Type 1 for current state
All SCD Type 1 implementations follow similar patterns for maintaining current state while ensuring data quality and compliance requirements. Complete implementation details can be found in the respective code repositories.