Engineering Gaps - Orchestrate#
Version Control#
| Version | Date | Owner | Change Description |
|---|---|---|---|
| 0.1 | 18 March 2025 | Gareth Stretch | Initial Framework created |
| 0.2 | 23 March 2025 | Gareth Stretch | Moved Gaps to its own page |
Orchestration Gaps#
The diagram and section below is a high-level overview of the azure databricks data platform as highlighted in the to-be deliverable with the respective area being highlighted for the gaps and recommendations.
Hold "Alt" / "Option" to enable Pan & Zoom
Table : Orchestration Gaps#
| Area | Gap | Recommendation |
|---|---|---|
| Cross stack orchestration | Orchestration is limited to workflows: this prevents end to end orchestration of pipelines which span multiple technologies. eg databricks and snowflake | Use orchestration tools like Databricks Jobs, Apache Airflow, Dagster and Databricks Workflows to manage and automate pipeline tasks |
| Automation | Make use of databricks API to improve automation and orchestration. | - The Databricks API allows you to automate complex workflows, reducing the need for manual intervention. This can include tasks such as starting and stopping clusters, running jobs, and managing libraries. - The API enables seamless integration with other tools and services in your data ecosystem. This includes CI/CD pipelines, monitoring tools, and data ingestion frameworks, allowing for a more cohesive and streamlined data processing environment - Real-time Monitoring: The API provides endpoints for monitoring the status of clusters, jobs, and other resources in real-time. This allows for proactive management and quick response to any issues that arise |
| Automation | Automatic Publish of Data Marketplace |