Skip to content

Try the DE Templates ( Your quickstart guide )#

This guide provides a quick start for using the Data Pipeline Templates. This is intended for sample/training purposes only, not for production use. For production deployments, please refer to the complete setup guide.

Prerequisites#

Apply for below access on NovoAccess:

  • "AD Group : NN-DATABRICKS-TOYHUB"
  • "Github Access: NN Github Users"
Windows

Hatch:

pipx install "hatch<1.16.0"
pipx install hatch-pip-compile

Azure CLI:

pip install azure-cli

Databricks CLI:

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

Copier:

pipx install copier=="9.*"
pipx inject copier copier-templates-extensions
Mac

Hatch:

pipx install "hatch<1.16.0"
pipx install hatch-pip-compile

Azure CLI:

brew install azure-cli

Databricks CLI:

brew tap databricks/tap
brew install databricks

Copier:

pipx install copier=="9.*"
pipx inject copier copier-templates-extensions

Setup project with Copier#

Use copier for scaffolding.

mkdir sample_de_template && cd sample_de_template
copier copy --trust git@github.com:innersource-nn/dc-template-core.git .

You can use the HTTPS URL if you don't have SSH keys set up,

copier copy --trust https://github.com/innersource-nn/dc-template-core .

When prompted, use the following values:

Example Prompts

🎤 Enter the project and git repository name.
my-first-databricks-deployment
🎤 Enter the databricks host url used for development.
https://adb-2611332145766578.18.azuredatabricks.net
🎤 Does the solution require classic Databricks compute?
No
🎤 Enter the deployment workspace from DATABRICKS_DATACORE variable.
main
🎤 Enter the service principal from DATABRICKS_DATACORE variable.
ingest
🎤 Select the Databricks runtime.
Serverless environment version 4
🎤 Azure AD group name that should have CAN_MANAGE permissions for development bundles.
NN-DATABRICKS-TOYHUB
🎤 Would you like to receive Databricks job status notifications on Microsoft Teams?
No
🎤 Would you like to apply additional templates?
Yes
🎤 Select the sub-templates to apply (use <space> to select).
No

Important

The latest version of the dc-template-python-project requires an Azure PAT for accessing Azure package registry.

For the purpose of quickstart and training, you can select N when prompted with Select the sub-templates to apply to prevent applying the latest version of the python template automatically.

You can manually apply the previous version by running the following command:

copier copy --trust -r 1.4.0 https://github.com/innersource-nn/dc-template-python-project .
Common issues

ADO Authentication Error

error: Failed to fetch: https://pkgs.dev.azure.com/novonordiskit/_packaging/datacraft/pypi/simple/databricks-sdk/
Caused by: Missing credentials for ...

Solution
  1. Create a Personal Access Token (PAT) with the Packaging: Read scope (see instructions).

  2. Then export your token:

    bash export AZURE_DEVOPS_EXT_PAT="your Token"

Sometimes additional step

> datacraft:
    password_env_var: AZURE_DEVOPS_EXT_PAT
    url: https://pkgs.dev.azure.com/novonordiskit/_packaging/datacraft/pypi/simple/  
    username_for_keyring: VssSessionToken
Solution

Press Esc and then Enter.

Deploy sample jobs#

  1. Authenticate to Azure via the Azure CLI using SSO.

    az login
    
  2. Run an entry point and deploy bundle via hatch.

    hatch run databricks:deploy
    
  3. Visit my databricks jobs directly, or:

    Navigate manually
    1. Visit Development datacore
    2. Choose "Jobs & Pipelines" from the left menu
    3. Choose "Owned by me" from the top filter
  4. You will find the jobs with name: my-first-databricks-deployment/dev/example and my-first-databricks-deployment/dev/main.

  5. Click on any of them, followed by clicking Run now.
Now you can explore the output of jobs. You should also explore the repository, make changes in jobs and deploy.