Setup your project and code repository#
This section offers guidance on choosing and setting up the ADO or Github environments for repository and project management.
What you will Learn#
- Setting up Project Management tools: Github, Azure DevOps (ADO)
- Setting up your code repository: Github Repo, ADO Repo
- Establishing your compute environment on Databricks
- Setting up DORA Metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Recovery.
Key Personas & Stakeholders - RACI Matrix#
| Activity | Product Owner | Business Analyst | Data Engineer | Solution Architect | Scrum Master/Project Manager |
|---|---|---|---|---|---|
| ADO/Github Kanban Board Creation | A | I | I | I | R |
| Code Repository Creation | C | I | R | A | I |
| Choice Of Tools | C | I | R | R | I |
Prerequisites#
- Completed feasibility assessment
- Approved business case
- Platform selected
- Budget approved
- Team assembled
Step by Step Process#
Below steps can happen in parallel:
-
The choice of project management and code management tool is up to each team, between Github and Azure DevOps.
AI Foundation Recommendation
We recommend Github for both project and repository management due to superior collaboration and coding features and widespread adoption.
To set up Github and your developer environment you can follow this guide for Datacore developers. For a quick guide to get started with Git, Github and VS Code see our guide here
-
The storage and compute platform for your data product is typically determined during Feasibility assessment. In AI Foundation we recommend using Databricks (Datacore), Snowflake, or AWS Datahub as your platform for storage and compute. To learn more about the different platforms visit our page.
To request an environment on Datacore, follow the Datacore environment guide here.
-
Set up your code repository for building your data product. Data Engineering Enablement uses data pipeline templates with Copier. Click here to learn how to set up your repository with Copier.. TODO: add link to pipeline template.
-
Decide on CI/CD framework. For data pipelines, we recommended using dc-release framework (click here) which is the framework used for our Data Engineering pipeline templates, but you can also consider the QMS pipeline framework for your use case.
-
DORA Metrics are valuable to measure how quickly you deploy your solution to production, and how often you encounter issues. Our pipeline template helps you establish DORA metrics. TODO: add guide for establishing pipeline using template and set up DORA metrics.
-
Your pipeline will require certain documentation, especially if your are building a GxP data product. Our pipeline template contains the documentation examples. TODO: add guide for documentation for data product creation.
-
You will also need the first draft of a Data Contract and metadata for your Data Product. If you utilize our Template you will find these artifacts ready to be edited. TODO: explain how to utilize these.
-
You then register a business application for your data product in Service-Now, if you do not have one already. This is typically done by the System Manager in your area. The form requires details of IT Solution Manager/System Manager and IT Solution Owner/IT System Owner (usually from LoB).
-
If you are building a Silver or Gold data product, you typically will have to do some Data Modelling. In AI Foundation we recommend Erwin as a tool for Data Modelling. See Enterprise Data Model Hub for instructions to install and access Erwin. Find here more on data modelling best practices.
After you complete these steps you are ready to proceed with the Requirements Specification process, where you define the technical and functional requirements for your data product.