Try the DE Templates ( Your quickstart guide )#
This guide provides a quick start for using the Data Pipeline Templates. This is intended for sample/training purposes only, not for production use. For production deployments, please refer to the complete setup guide.
Prerequisites#
Apply for below access on NovoAccess:
- "AD Group : NN-DATABRICKS-TOYHUB"
- "Github Access: NN Github Users"
Windows
Hatch:
pipx install "hatch<1.16.0"
pipx install hatch-pip-compile
Azure CLI:
pip install azure-cli
Databricks CLI:
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
Copier:
pipx install copier=="9.*"
pipx inject copier copier-templates-extensions
Mac
Hatch:
pipx install "hatch<1.16.0"
pipx install hatch-pip-compile
Azure CLI:
brew install azure-cli
Databricks CLI:
brew tap databricks/tap
brew install databricks
Copier:
pipx install copier=="9.*"
pipx inject copier copier-templates-extensions
Setup project with Copier#
Use copier for scaffolding.
mkdir sample_de_template && cd sample_de_template
copier copy --trust git@github.com:innersource-nn/dc-template-core.git .
You can use the HTTPS URL if you don't have SSH keys set up,
copier copy --trust https://github.com/innersource-nn/dc-template-core .
When prompted, use the following values:
Example Prompts
🎤 Enter the project and git repository name.
my-first-databricks-deployment
🎤 Enter the databricks host url used for development.
https://adb-2611332145766578.18.azuredatabricks.net
🎤 Does the solution require classic Databricks compute?
No
🎤 Enter the deployment workspace from DATABRICKS_DATACORE variable.
main
🎤 Enter the service principal from DATABRICKS_DATACORE variable.
ingest
🎤 Select the Databricks runtime.
Serverless environment version 4
🎤 Azure AD group name that should have CAN_MANAGE permissions for development bundles.
NN-DATABRICKS-TOYHUB
🎤 Would you like to receive Databricks job status notifications on Microsoft Teams?
No
🎤 Would you like to apply additional templates?
Yes
🎤 Select the sub-templates to apply (use <space> to select).
No
Important
The latest version of the dc-template-python-project requires an Azure PAT for accessing Azure package registry.
For the purpose of quickstart and training, you can select N when prompted with Select the sub-templates to apply to prevent applying the latest version of the python template automatically.
You can manually apply the previous version by running the following command:
copier copy --trust -r 1.4.0 https://github.com/innersource-nn/dc-template-python-project .
Common issues
ADO Authentication Error
error: Failed to fetch: https://pkgs.dev.azure.com/novonordiskit/_packaging/datacraft/pypi/simple/databricks-sdk/
Caused by: Missing credentials for ...
Solution
-
Create a Personal Access Token (PAT) with the Packaging: Read scope (see instructions).
-
Then export your token:
bash export AZURE_DEVOPS_EXT_PAT="your Token"
Sometimes additional step
> datacraft:
password_env_var: AZURE_DEVOPS_EXT_PAT
url: https://pkgs.dev.azure.com/novonordiskit/_packaging/datacraft/pypi/simple/
username_for_keyring: VssSessionToken
Solution
Press Esc and then Enter.
Deploy sample jobs#
-
Authenticate to Azure via the Azure CLI using SSO.
az login -
Run an entry point and deploy bundle via hatch.
hatch run databricks:deploy -
Visit my databricks jobs directly, or:
Navigate manually
- Visit Development datacore
- Choose "Jobs & Pipelines" from the left menu
- Choose "Owned by me" from the top filter
-
You will find the jobs with name:
my-first-databricks-deployment/dev/exampleandmy-first-databricks-deployment/dev/main. - Click on any of them, followed by clicking
Run now.