Finalize data product registration#
Overview#
Data contracts serve as formal agreements that define the structure, semantics, and quality expectations for data shared between systems and teams. This chapter covers the creation, review, and approval process for data contracts within the data product lifecycle.
What you will learn#
After completing this chapter, you will understand how to:
- Create comprehensive data contract specifications using YAML format
- Configure data product settings through config.toml files
- Navigate the review and approval process with stakeholders
- Implement validation checks to ensure contract integrity
- Link provider and consumer data products effectively
Key Personas & Stakeholders - RACI Matrix#
| Activity | Product Owner | Data Engineer | Solution Architect | Platform Architect | Data Owner | Business Owner |
|---|---|---|---|---|---|---|
| Data Contract Creation | A | R | C | C | C | I |
| Technical Review | C | R | A | C | I | I |
| Business Validation | A | C | I | I | R | R |
| Final Approval | A | I | I | I | R | C |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Prerequisites#
- Completed Requirements Specification
- Finalized Risk Assessment draft in IRM
- Gained access to NNDM platform with necessary permissions
- Knowledge of YAML syntax and Data Modeling concepts
- Repository set up with proper folder structure for data contracts
Tip
Use dc-template-data-product which sets up a pre-configured data contract templates, folder structures, and NNDM publishing workflows
Step-by-Step Process#
Step 1: Create Data Contract YAML Specification#
Define your data product's schema, data quality expectations and usage terms through a standardized YAML specification.
How to Create Your Data Contract:
- Set up the folder structure: Data Contract Specifications should be anchored in the
data_products/contracts/folder as outlined here - Define your contract specifications as per the fields and attributes described below. You can do this in two ways.
- Manual creation: Use the template structure below and customize for your data product
- Template-based creation: You can use the dc-template-data-product template that quickly scaffolds the folder structure and an example data contract YAML file that you can customize according to your data product.
Data Contract YAML - Key Fields and Attributes#
Required Fields#
| Field | Description | Example |
|---|---|---|
| ID | Organization-wide unique technical identifier | my-data-contract |
| Version | Version of the data contract document | 0.0.1 |
| Description | Description of the Data Contract | "A description of the data contract" |
| Owner | Team responsible for managing the data contract | urn:team:0x44bb06dc(domain:commercial) |
| Status | Contract status: proposed, in-development, active, deprecated, retired | active |
Contact Information#
| Field | Description | Format |
|---|---|---|
| Contact Name | Name of contact person/organization | String |
| Contact URL | URL to contact information | Valid URL format |
| Contact Email | Email address of contact | Valid email format |
Server Configuration#
| Field | Description | Options |
|---|---|---|
| Server Type | Type of data server | S3, Snowflake, etc. |
| Environment | Deployment environment | prod, dev, test |
| Location | Physical location/path to data | S3 bucket path, database connection |
Terms and Conditions#
| Field | Description | Purpose |
|---|---|---|
| Usage | Usage terms for the data contract | Define allowed use cases |
| Limitations | Usage limitations | Specify restrictions |
| Billing | Cost information | Financial implications |
| Notice Period | Consumer notice period | ISO 8601 format (e.g., P3M) |
Service Level Agreements#
Tip
Configure appropriate SLAs based on your data product's criticality and consumer expectations.
Availability#
- Description: Uptime guarantee description
- Percentage: Guaranteed uptime (e.g., 99.9%)
Retention#
- Description: Data retention policy description
- Period: How long data is available (e.g., P1Y for 1 year)
- Unlimited: Boolean for permanent retention
Latency#
- Description: Maximum processing time description
- Threshold: Maximum duration from source to destination
- Source/Processed Timestamp Fields: Reference fields for timing
Freshness#
- Description: Data freshness requirements
- Threshold: Maximum age of youngest entry
- Timestamp Field: Reference field for freshness calculation
Frequency#
- Description: Update frequency description
- Type: Processing type (batch, streaming, manual)
- Interval/Cron: Schedule information
Example
Create your data contract using the Data Contract Specifications standard. Below is the template structure:
Sample Data Contract YAML Template
dataContractSpecification: 0.9.3 #CI tool supported till 1.2.0 version
id: my-data-contract #ForNNEDHitshouldbe"DatasetURI_ContractName" else use the data contract name in lower case separated by '-'
info:
title: MyDataContract
version: 0.0.1
owner: urn:team:0x44bb06dc(domain:commercial)
description: Adescriptionofthedatacontract.
status: active
contact:
name: AnuragDaipuriya
email: GDIY@novonordisk.com #MandatoryfieldforNNEDLforDatastewardemailid.
terms:
usage: Datacanbeusedforanalyticalpurposes.
limitations: Notsuitableforreal-timeusecases.
models:
my_table:
type: table
description: description of table.
fields:
my_column_1:
description: ThetechnicalID
type: string
format: uuid
primaryKey: true
examples:
-123e4567-e89b-12d3-a456-426614174000
my_column_2:
description: Thebusinesstimestampofthetransaction
type: timestamp
examples:
-2021-01-01T00:00:00Z
my_column_3:
description: Theamountofthetransaction
type: long
examples:
-123.45
servers:
production:
type: s3
environment: prod
location: s3://dhqcglimsrawzonekeoryseucentral1/LABVANTAGE/S_SAMPLE/_symlink_format_manifest
format: parquet
delimiter: new_line
database: nnedl
dataset: "dhlprdglobal,glookodh" #MandatoryforNNEDLSourcesystemContract
description: S3serverdetails.
definitions:
order_id:
title: Order ID
type: text
format: uuid
description: An internal ID that identifies an order in the online shop.
examples:
- 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
pii: true
classification: restricted
tags:
- orders
#Customfieldsforextrainformation,MandatoryforNNEDLSourceSytem
ADGroupNames: "NNEDL:GLOOKODH_Developer,NNEDL:GLOOKODH_Reader"
DatasetNames: "dhlprdglobal,glookodh"
servicelevels:
retention:
description: Dataisretainedforoneyear
period: P1Y
unlimited: false
frequency:
description: DataisdeliveredonceadayatmidnightUTC
type: batch
cron: 00***
Step 2: Configure Data Product Settings#
Configure your data product metadata and NNDM integration for publishing.
Create a config.toml file with your data product configuration:
Configuration Options:
- Manual creation: Create the file using the template structure below, customizing each field for your product
- Template-based creation: You can use dc-template-data-product which includes a working
config.tomlexample with NNDM integration and workflows to publishing the data product.
Example
Sample Config.toml Template
# ------MANDATORY
# sourceID is the servicenow ID for the source system associated with the products defined. This is an integer. The source system should already be onboarded by NNDM team.
# The source sytem should be one where data is located eg: NNEDL, Datahub (NNEDH).
sourceId=13849
# ------MANDATORY
# teamID is the internal ID provided by NNDM team for this product. This is a literal
teamId='urn:team:platform:NNDM(domain:test)'
[[product]]
# ------MANDATORY
# productId can be a literal. This should be the internal ID that is assigned for the product
# when sourceId is 11593 (NNEDH), URI id of dataset from datahub must use in this attribute else use the product name in lower case separated by '-'.
productId='NewCITEST'
# ------MANDATORY
# productName can be a literal. This should be the name of the data product that is being created/updated in CMDB.
productName='newci-test-1'
# ------MANDATORY
# productDescription can be a multi line string. This should be the internal description of the data product. HTML tags are supported.
productDescription="""This is a test data product.
Can be multiline"""
# ------MANDATORY
# productStatus is the status for the data product. Should be one of : proposed,in-development,active or retired.
productStatus="retired"
# ------MANDATORY
# productArchetype is the Archetype for the data product. Should be one of : consumer-aligned, aggregate, source-aligned
productArchetype="aggregate"
# ------NON MANDATORY
# productMaturity is the maturity for the data product. raw, defined or managed
productMaturity="raw"
# ------MANDATORY
# rootContract is a literal that contains the relative path to the all the contract YAML or JSON files associated with this product.
rootContract='./p1contract'
# ------NON MANDATORY
# To add the links in the data product.
productLinks=[ "link1: https://test1.org", "link2: https://test2.com", "link3: https://test3.in" ]
# ------NON MANDATORY
# To add the managed tags in the data product.
# Managed Tags are ones which are approved and present in the Tag list of NNDM UI.
productTags= ["tag1","tag2"]
# -------CONDITIONAL MANDATORY
#Mandatory Custom Fields for Data Product:
#For NNEDL source system(10689):
#"AD Group Names" (can be left blank)
#"Dataset Names" (must have a value)
#"Dataset Steward Email" (must have a value)
#For NNEDH source system (11593):
#"Dataset URI" (must have a value)
customFields= [ "AD Group Names : NNEDL : GLOOKODH_Developer", "Dataset Names : ABC ", "Dataset Steward Email : ABC@novonordisk.com" ]
# ---------CONDITIONAL MANDATORY
# Below attributes are mandatory if need to link provider data products with consumer data product.
accessIds=["link-testprod1-testprod123", "link-dataproduct-test123"] # Request Access Ids created by user any name follow naming convention
providerTeamIds=["test_domain_team", "test_domain_team"] # NNDM Team Ids of Source(provider) Data products
providerDataProductIds=["urn:data:source:0014395(product:0x241f0512)","369a6dd1-e217-460c-8987-1ba5f8155b11"] # Source Data Products' IDs
providerOutputPortIds=["urn:data:source:0014395(product:0x241f0512.ops:0x94530872)","my-output-port"] # Output port id of Source data products
consumerTeamId="test_domain_team" # TeamId of consumer data product
consumerDataProductId="urn:data:source:0014395(product:0x2cf705a2)" #Data Product of consumer data product
You can choose to replace the default.yaml contract generated by the dc-template-data-product template with your own data-contract file, just make sure that the same detail is reflected in data_products/config.toml file in rootContract='./data_products/contracts' parameter, for other configuration they will be setup with the values provided by you while initialising the template for your project.
Data Product Config File - Key Attributes#
Mandatory Fields#
| Field | Description | Format | Example |
|---|---|---|---|
| sourceId | ServiceNow ID for source system | Integer (up to 7 digits) | 13849 |
| teamId | NN Data Marketplace Team ID | String literal | 'urn:team:platform:NNDM(domain:test)' |
| productId | Data Product ID | String literal | 'NewCITEST' |
| productName | Data Product Name | String literal | 'newci-test-1' |
| productDescription | Product description (HTML supported) | Multi-line string | See template above |
| productStatus | Product lifecycle status | proposed, in-development, active, retired |
active |
| productArchetype | Product type classification | source-aligned, aggregate, consumer-aligned |
aggregate |
| rootContract | Path to contract files | Relative path | './p1contract' |
Product Archetypes - Choosing the Right Archetype
Source-aligned - Minimal transformation from operational systems. Ideal for initial data product creation.
Aggregate - Combined data from multiple sources. Perfect for corporate-level KPIs.
Consumer-aligned - Transformed for specific use cases. Optimized for BI and analytics.
Optional Fields#
| Field | Description | Options | Example |
|---|---|---|---|
| productMaturity | Data product maturity level | raw, defined, managed |
defined |
| productTags | Managed tags from NNDM UI | Array of strings | ["tag1","tag2"] |
| productLinks | Additional reference links | Array of name:URL pairs | ["link1: https://test1.org"] |
Linking Data Products#
Prerequisites for Linking
- Provider and consumer data products must already exist
- Provider data products must have output ports
- All team IDs and product IDs must be valid
Linking Configuration#
| Field | Description | Example |
|---|---|---|
| accessIds | Request Access IDs | ["link-testprod1-testprod123"] |
| providerTeamIds | NNDM Team IDs of provider products | ["test_domain_team"] |
| providerDataProductIds | Source Data Product IDs | ["urn:data:source:0014395(product:0x241f0512)"] |
| providerOutputPortIds | Output port IDs | ["urn:data:source:0014395(product:0x241f0512.ops:0x94530872)"] |
| consumerTeamId | Consumer team ID | "test_domain_team" |
| consumerDataProductId | Consumer data product ID | "urn:data:source:0014395(product:0x2cf705a2)" |
Validation Checks#
Duplicate Prevention#
Validation Rules
The system enforces several validation checks to maintain data integrity.
Contract Files#
- Unique Model Requirement: Two YAML contract files within the same contract folder cannot have the same model
- Each contract file must define a unique model to ensure distinct identification
Data Products and Contracts#
- Data Product: No two products in config.toml can have the same
productIdorproductName - Data Contract: No two contracts in a repository can have identical model details
Source-Specific Requirements#
NNEDL Source System (10689)#
Custom Fields for Data Product:
AD Group Names(can be left blank)Dataset Names(must have a value)Dataset Steward Email(must have a value)
Custom Fields for Data Contract:
AD Group Names(can be left blank)Dataset Names(must have a value)Dataset Steward Email(must have a value)Contact Email(must contain the Data Steward's email address)
NNEDH Source System (11593)#
Custom Fields:
Dataset URI(must have a value)
Step 3: Review with Product Owner and Data Owner#
Schedule review sessions with key stakeholders to validate:
- Technical Accuracy: Ensure all field definitions, data types, and constraints are correct
- Business Alignment: Confirm the contract meets business requirements and use cases
- Compliance Requirements: Verify adherence to data governance and regulatory standards
Step 4: Obtain Consumer Approval#
Present the finalized data contract to data consumers for approval:
- Usage Terms: Clearly communicate access limitations and usage guidelines
- Service Level Agreements: Confirm availability, latency, and freshness requirements
- Support Procedures: Establish contact points and escalation processes
Step 5: Publish to NNDM (later in the Deploy Phase)#
Make your data contract available through the NNDM platform for discovery and consumption.
Publishing Process:
- Validate Configuration: Ensure your
config.tomland YAML files pass all validation checks - Set up CI/CD Pipeline: Use the NNDM GitHub Actions for automated NNDM publishing
- Verify Registration: Confirm your data product appears in NNDM catalog with correct metadata
Template-Accelerated Publishing
The dc-template-data-product provides a pre-configured CI/CD workflows for NNDM integration as well as example YAML & config files required by the NNDM platform.
Success Metrics & Checkpoints#
- Contract Creation: Data contract YAML file created with all required fields
- Configuration Setup: Config.toml file properly configured with mandatory fields
- Technical Review: Data Engineer and Solution Architect have validated technical specifications
- Business Validation: Product Owner and Data Owner have approved business requirements
- Consumer Approval: Data consumers have formally approved the contract terms
- Validation Checks: All duplicate checks and source-specific requirements pass
- Documentation: Contract specifications anchored in correct folder structure
Common Challenges & Solutions#
- Challenge: Schema drift concerns affecting downstream consumers
- Solution: Set
flagSchemaDriftto true for critical data products -
Prevention: Implement robust testing and communication processes for schema changes
-
Challenge: Confusion about product archetype selection
- Solution: Use the archetype decision tree: raw operational data → source-aligned, multiple source aggregation → aggregate, business-specific transformation → consumer-aligned
-
Prevention: Document archetype rationale in product description
-
Challenge: Missing mandatory custom fields for specific source systems
- Solution: Reference the source-specific requirements section and ensure all fields are populated
- Prevention: Create validation checklists for each supported source system
Next Steps#
After completing data contract creation and approval:
- Proceed to implementation phase using approved contracts
- Set up monitoring and alerting for contract compliance
- Establish regular review cycles for contract updates
- Begin consumer onboarding and access provisioning
Additional Resources#
- Data Contract Specifications Official Documentation
- Project Structure Reference
- NNDM Platform User Guide
- Data Governance Policy Documentation
- Repository Setup Guide on how to initialize projects with templates