Skip to content

Finalize data product registration#

Overview#

Data contracts serve as formal agreements that define the structure, semantics, and quality expectations for data shared between systems and teams. This chapter covers the creation, review, and approval process for data contracts within the data product lifecycle.

What you will learn#

After completing this chapter, you will understand how to:

  • Create comprehensive data contract specifications using YAML format
  • Configure data product settings through config.toml files
  • Navigate the review and approval process with stakeholders
  • Implement validation checks to ensure contract integrity
  • Link provider and consumer data products effectively

Key Personas & Stakeholders - RACI Matrix#

Activity Product Owner Data Engineer Solution Architect Platform Architect Data Owner Business Owner
Data Contract Creation A R C C C I
Technical Review C R A C I I
Business Validation A C I I R R
Final Approval A I I I R C

R = Responsible, A = Accountable, C = Consulted, I = Informed

Prerequisites#

Tip

Use dc-template-data-product which sets up a pre-configured data contract templates, folder structures, and NNDM publishing workflows

Step-by-Step Process#

Step 1: Create Data Contract YAML Specification#

Define your data product's schema, data quality expectations and usage terms through a standardized YAML specification.

How to Create Your Data Contract:

  1. Set up the folder structure: Data Contract Specifications should be anchored in the data_products/contracts/ folder as outlined here
  2. Define your contract specifications as per the fields and attributes described below. You can do this in two ways.
    1. Manual creation: Use the template structure below and customize for your data product
    2. Template-based creation: You can use the dc-template-data-product template that quickly scaffolds the folder structure and an example data contract YAML file that you can customize according to your data product.

Data Contract YAML - Key Fields and Attributes#

Required Fields#

Field Description Example
ID Organization-wide unique technical identifier my-data-contract
Version Version of the data contract document 0.0.1
Description Description of the Data Contract "A description of the data contract"
Owner Team responsible for managing the data contract urn:team:0x44bb06dc(domain:commercial)
Status Contract status: proposed, in-development, active, deprecated, retired active

Contact Information#

Field Description Format
Contact Name Name of contact person/organization String
Contact URL URL to contact information Valid URL format
Contact Email Email address of contact Valid email format

Server Configuration#

Field Description Options
Server Type Type of data server S3, Snowflake, etc.
Environment Deployment environment prod, dev, test
Location Physical location/path to data S3 bucket path, database connection

Terms and Conditions#

Field Description Purpose
Usage Usage terms for the data contract Define allowed use cases
Limitations Usage limitations Specify restrictions
Billing Cost information Financial implications
Notice Period Consumer notice period ISO 8601 format (e.g., P3M)

Service Level Agreements#

Tip

Configure appropriate SLAs based on your data product's criticality and consumer expectations.

Availability#

  • Description: Uptime guarantee description
  • Percentage: Guaranteed uptime (e.g., 99.9%)

Retention#

  • Description: Data retention policy description
  • Period: How long data is available (e.g., P1Y for 1 year)
  • Unlimited: Boolean for permanent retention

Latency#

  • Description: Maximum processing time description
  • Threshold: Maximum duration from source to destination
  • Source/Processed Timestamp Fields: Reference fields for timing

Freshness#

  • Description: Data freshness requirements
  • Threshold: Maximum age of youngest entry
  • Timestamp Field: Reference field for freshness calculation

Frequency#

  • Description: Update frequency description
  • Type: Processing type (batch, streaming, manual)
  • Interval/Cron: Schedule information

Example

Create your data contract using the Data Contract Specifications standard. Below is the template structure:

Sample Data Contract YAML Template

dataContractSpecification: 0.9.3 #CI tool supported till 1.2.0 version
id: my-data-contract #ForNNEDHitshouldbe"DatasetURI_ContractName" else use the data contract name in lower case separated by '-'
info:
title: MyDataContract
version: 0.0.1
owner: urn:team:0x44bb06dc(domain:commercial)
description: Adescriptionofthedatacontract.
status: active
contact:
  name: AnuragDaipuriya
  email: GDIY@novonordisk.com #MandatoryfieldforNNEDLforDatastewardemailid.
terms:
usage: Datacanbeusedforanalyticalpurposes.
limitations: Notsuitableforreal-timeusecases.
models:
  my_table:
  type: table
  description: description of table.
  fields:
      my_column_1:
        description: ThetechnicalID
        type: string
        format: uuid
        primaryKey: true
        examples:
          -123e4567-e89b-12d3-a456-426614174000
      my_column_2:
        description: Thebusinesstimestampofthetransaction
        type: timestamp
        examples:
          -2021-01-01T00:00:00Z
      my_column_3:
        description: Theamountofthetransaction
        type: long
        examples:
          -123.45
servers:
  production:
  type: s3
  environment: prod
  location: s3://dhqcglimsrawzonekeoryseucentral1/LABVANTAGE/S_SAMPLE/_symlink_format_manifest
  format: parquet
  delimiter: new_line
  database: nnedl
  dataset: "dhlprdglobal,glookodh" #MandatoryforNNEDLSourcesystemContract
  description: S3serverdetails.
definitions:
  order_id:
    title: Order ID
    type: text
    format: uuid
    description: An internal ID that identifies an order in the online shop.
    examples:
      - 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
    pii: true
    classification: restricted
    tags:
      - orders
#Customfieldsforextrainformation,MandatoryforNNEDLSourceSytem
ADGroupNames: "NNEDL:GLOOKODH_Developer,NNEDL:GLOOKODH_Reader"
DatasetNames: "dhlprdglobal,glookodh"
servicelevels:
  retention:
    description: Dataisretainedforoneyear
    period: P1Y
    unlimited: false
  frequency:
    description: DataisdeliveredonceadayatmidnightUTC
    type: batch
    cron: 00***

Step 2: Configure Data Product Settings#

Configure your data product metadata and NNDM integration for publishing.

Create a config.toml file with your data product configuration:

Configuration Options:

  • Manual creation: Create the file using the template structure below, customizing each field for your product
  • Template-based creation: You can use dc-template-data-product which includes a working config.toml example with NNDM integration and workflows to publishing the data product.

Example

Sample Config.toml Template

# ------MANDATORY  
# sourceID is the servicenow ID for the source system associated with the products defined. This is an integer. The source system should already be onboarded by NNDM team.  
# The source sytem should be one where data is located eg: NNEDL, Datahub (NNEDH).  
sourceId=13849  
# ------MANDATORY  
# teamID is the internal ID provided by NNDM team for this product. This is a literal  
teamId='urn:team:platform:NNDM(domain:test)'  
[[product]]  
# ------MANDATORY  
# productId can be a literal. This should be the internal ID that is assigned for the product  
# when sourceId is 11593 (NNEDH), URI id of dataset from datahub must use in this attribute else use the product name in lower case separated by '-'.  
productId='NewCITEST'  
# ------MANDATORY  
# productName can be a literal. This should be the name of the data product that is being created/updated in CMDB.  
productName='newci-test-1'  
# ------MANDATORY  
# productDescription can be a multi line string. This should be the internal description of the data product. HTML tags are supported.  
productDescription="""This is a test data product.  
Can be multiline"""  
# ------MANDATORY  
# productStatus is the status for the data product. Should be one of : proposed,in-development,active or retired.  
productStatus="retired"  
# ------MANDATORY  
# productArchetype is the Archetype for the data product. Should be one of : consumer-aligned, aggregate, source-aligned  
productArchetype="aggregate"  
# ------NON MANDATORY  
# productMaturity is the maturity for the data product. raw, defined or managed  
productMaturity="raw"  
# ------MANDATORY  
# rootContract is a literal that contains the relative path to the all the contract YAML or JSON files associated with this product.  
rootContract='./p1contract'  

# ------NON MANDATORY  
# To add the links in the data product.  
productLinks=[ "link1: https://test1.org", "link2: https://test2.com", "link3: https://test3.in" ]  

# ------NON MANDATORY  
# To add the managed tags in the data product.  
# Managed Tags are ones which are approved and present in the Tag list of NNDM UI.  
productTags= ["tag1","tag2"]  

# -------CONDITIONAL MANDATORY  
#Mandatory Custom Fields for Data Product:  
#For NNEDL source system(10689):  
#"AD Group Names" (can be left blank)  
#"Dataset Names" (must have a value)  
#"Dataset Steward Email" (must have a value)  
#For NNEDH source system (11593):  
#"Dataset URI" (must have a value)  

customFields= [ "AD Group Names : NNEDL : GLOOKODH_Developer", "Dataset Names : ABC ", "Dataset Steward Email : ABC@novonordisk.com" ]  

# ---------CONDITIONAL MANDATORY  
# Below attributes are mandatory if need to link provider data products with consumer data product.  

accessIds=["link-testprod1-testprod123", "link-dataproduct-test123"] # Request Access Ids created by user any name follow naming convention  
providerTeamIds=["test_domain_team", "test_domain_team"] # NNDM Team Ids of Source(provider) Data products  
providerDataProductIds=["urn:data:source:0014395(product:0x241f0512)","369a6dd1-e217-460c-8987-1ba5f8155b11"] # Source Data Products' IDs  
providerOutputPortIds=["urn:data:source:0014395(product:0x241f0512.ops:0x94530872)","my-output-port"] # Output port id of Source data products  
consumerTeamId="test_domain_team" # TeamId of consumer data product  
consumerDataProductId="urn:data:source:0014395(product:0x2cf705a2)" #Data Product of consumer data product

You can choose to replace the default.yaml contract generated by the dc-template-data-product template with your own data-contract file, just make sure that the same detail is reflected in data_products/config.toml file in rootContract='./data_products/contracts' parameter, for other configuration they will be setup with the values provided by you while initialising the template for your project.

Data Product Config File - Key Attributes#

Mandatory Fields#

Field Description Format Example
sourceId ServiceNow ID for source system Integer (up to 7 digits) 13849
teamId NN Data Marketplace Team ID String literal 'urn:team:platform:NNDM(domain:test)'
productId Data Product ID String literal 'NewCITEST'
productName Data Product Name String literal 'newci-test-1'
productDescription Product description (HTML supported) Multi-line string See template above
productStatus Product lifecycle status proposed, in-development, active, retired active
productArchetype Product type classification source-aligned, aggregate, consumer-aligned aggregate
rootContract Path to contract files Relative path './p1contract'

Product Archetypes - Choosing the Right Archetype

Source-aligned - Minimal transformation from operational systems. Ideal for initial data product creation.
Aggregate - Combined data from multiple sources. Perfect for corporate-level KPIs.
Consumer-aligned - Transformed for specific use cases. Optimized for BI and analytics.

Optional Fields#

Field Description Options Example
productMaturity Data product maturity level raw, defined, managed defined
productTags Managed tags from NNDM UI Array of strings ["tag1","tag2"]
productLinks Additional reference links Array of name:URL pairs ["link1: https://test1.org"]

Linking Data Products#

Prerequisites for Linking

  • Provider and consumer data products must already exist
  • Provider data products must have output ports
  • All team IDs and product IDs must be valid

Linking Configuration#

Field Description Example
accessIds Request Access IDs ["link-testprod1-testprod123"]
providerTeamIds NNDM Team IDs of provider products ["test_domain_team"]
providerDataProductIds Source Data Product IDs ["urn:data:source:0014395(product:0x241f0512)"]
providerOutputPortIds Output port IDs ["urn:data:source:0014395(product:0x241f0512.ops:0x94530872)"]
consumerTeamId Consumer team ID "test_domain_team"
consumerDataProductId Consumer data product ID "urn:data:source:0014395(product:0x2cf705a2)"

Validation Checks#

Duplicate Prevention#

Validation Rules

The system enforces several validation checks to maintain data integrity.

Contract Files#

  • Unique Model Requirement: Two YAML contract files within the same contract folder cannot have the same model
  • Each contract file must define a unique model to ensure distinct identification

Data Products and Contracts#

  1. Data Product: No two products in config.toml can have the same productId or productName
  2. Data Contract: No two contracts in a repository can have identical model details

Source-Specific Requirements#

NNEDL Source System (10689)#

Custom Fields for Data Product:

  • AD Group Names (can be left blank)
  • Dataset Names (must have a value)
  • Dataset Steward Email (must have a value)

Custom Fields for Data Contract:

  • AD Group Names (can be left blank)
  • Dataset Names (must have a value)
  • Dataset Steward Email (must have a value)
  • Contact Email (must contain the Data Steward's email address)

NNEDH Source System (11593)#

Custom Fields:

  • Dataset URI (must have a value)

Step 3: Review with Product Owner and Data Owner#

Schedule review sessions with key stakeholders to validate:

  • Technical Accuracy: Ensure all field definitions, data types, and constraints are correct
  • Business Alignment: Confirm the contract meets business requirements and use cases
  • Compliance Requirements: Verify adherence to data governance and regulatory standards

Step 4: Obtain Consumer Approval#

Present the finalized data contract to data consumers for approval:

  • Usage Terms: Clearly communicate access limitations and usage guidelines
  • Service Level Agreements: Confirm availability, latency, and freshness requirements
  • Support Procedures: Establish contact points and escalation processes

Step 5: Publish to NNDM (later in the Deploy Phase)#

Make your data contract available through the NNDM platform for discovery and consumption.

Publishing Process:

  1. Validate Configuration: Ensure your config.toml and YAML files pass all validation checks
  2. Set up CI/CD Pipeline: Use the NNDM GitHub Actions for automated NNDM publishing
  3. Verify Registration: Confirm your data product appears in NNDM catalog with correct metadata

Template-Accelerated Publishing

The dc-template-data-product provides a pre-configured CI/CD workflows for NNDM integration as well as example YAML & config files required by the NNDM platform.

Success Metrics & Checkpoints#

  • Contract Creation: Data contract YAML file created with all required fields
  • Configuration Setup: Config.toml file properly configured with mandatory fields
  • Technical Review: Data Engineer and Solution Architect have validated technical specifications
  • Business Validation: Product Owner and Data Owner have approved business requirements
  • Consumer Approval: Data consumers have formally approved the contract terms
  • Validation Checks: All duplicate checks and source-specific requirements pass
  • Documentation: Contract specifications anchored in correct folder structure

Common Challenges & Solutions#

  • Challenge: Schema drift concerns affecting downstream consumers
  • Solution: Set flagSchemaDrift to true for critical data products
  • Prevention: Implement robust testing and communication processes for schema changes

  • Challenge: Confusion about product archetype selection

  • Solution: Use the archetype decision tree: raw operational data → source-aligned, multiple source aggregation → aggregate, business-specific transformation → consumer-aligned
  • Prevention: Document archetype rationale in product description

  • Challenge: Missing mandatory custom fields for specific source systems

  • Solution: Reference the source-specific requirements section and ensure all fields are populated
  • Prevention: Create validation checklists for each supported source system

Next Steps#

After completing data contract creation and approval:

  1. Proceed to implementation phase using approved contracts
  2. Set up monitoring and alerting for contract compliance
  3. Establish regular review cycles for contract updates
  4. Begin consumer onboarding and access provisioning

Additional Resources#