Source Aligned Data Product - Example Implementation#

Disclaimer

This page is being revamped with live code and steps will be modified in near future.

Overview#

This implementation reads from two kafka topics and writes into respective delta tables. It utilizes the template scaffolding generated using steps mentioned in set-up-repo-using-copier.

For details about the code, please refer the project repo here

Modifying the templates - Step by Step Process#

Once the project scaffolding was setup in the repo, below steps were followed:

Modified the src directory for actual code
- The src folder above contains the actual code within the mdm_streaming_refdata folder.
- Under mdm_streaming_refdata folder, all the required libraries, configuration and main code blocks were added
Hold "Alt" / "Option" to enable Pan & Zoom

Note

Multiple directories can be created under the src directory, the templates don't restrict this
Added relevant contracts and config for NNDM in data_products folder

For this project one config file and 3 data contracts have been added in the contracts sub-folder.

Hold "Alt" / "Option" to enable Pan & Zoom
Added unit and acceptance tests under the tests folder

Multiple units test were added under unit sub-folder and acceptance feature was added in the integration/features subfolder

Hold "Alt" / "Option" to enable Pan & Zoom
Changed relevant documentation

Upcoming as project is still under development
Changed the databricks.yml as per the code
- Added relevant whl file
- Added relevant libraries
- Added relevant job cluster
- Added task keys
Hold "Alt" / "Option" to enable Pan & Zoom
Changed the pyproject.toml
- Added the dependent libraries
- Added the project entry points
- Added the data product folder contents
- Added the behave configuration for acceptance tests
Hold "Alt" / "Option" to enable Pan & Zoom