Data Product config file
Data Product Config file fields and attributes
Sample template of config.toml:
# ------MANDATORY
# sourceID is the servicenow ID for the source system associated with the products defined. This is an integer. The source system should already be onboarded by NNDM team.
# The source sytem should be one where data is located eg: NNEDL, Datahub (NNEDH).
sourceId=13849
# ------MANDATORY
# teamID is the internal ID provided by NNDM team for this product. This is a literal
teamId='urn:team:platform:NNDM(domain:test)'
[[product]]
# ------MANDATORY
# productId can be a literal. This should be the internal ID that is assigned for the product
# when sourceId is 11593 (NNEDH), URI id of dataset from datahub must use in this attribute else use the product name in lower case separated by '-'.
productId='NewCITEST'
# ------MANDATORY
# productName can be a literal. This should be the name of the data product that is being created/updated in CMDB.
productName='newci-test-1'
# ------MANDATORY
# productDescription can be a multi line string. This should be the internal description of the data product. HTML tags are supported.
productDescription="""This is a test data product.
Can be multiline"""
# ------MANDATORY
# productStatus is the status for the data product. Should be one of : proposed,in-development,active or retired.
# #### proposed : Represents a data product in planning state
# #### in-development : Represents a data product actively being developed
# #### active : Represents a data product which is active and available
# #### retired : Represents a data product which was active but currently has been retired and is no longer available
productStatus="retired"
# ------MANDATORY
# productArchetype is the Archetype for the data product. Should be one of : consumer-aligned, aggregate, source-aligned
# #### source-aligned : This represents the data as it is in the operational system with minimal transformation. I am seeing organizations use these as a first step to creating more valuable data products.
# #### aggregate: This represents the data that has been aggregated from multiple sources. These are often used to create more complex data products by combining data from different sources. TL;DR definition of aggregate data products is that they’re built at a corporate level to drive global KPIs.
# #### consumer-aligned: This represents the data that has been transformed into a format that is useful for consumers of the data. These are often used in BI and analytics solutions. When ‘data products’ are referred to generically — these are the data products that people think about and discuss most.
productArchetype="aggregate"
# ------NON MANDATORY
# productMaturity is the maturity for the data product. raw, defined or managed
# #### raw: This represents a data product that is under data exploring maturity and is not capitalizing on the data product's full potential.
# #### defined: This represents a data product that has been refined, cleaned, and validated to be ready for consumption by consumers of the data.
# #### managed: This represents a data product that is managed and is a data-transformed product.
productMaturity="raw"
# ------MANDATORY
# rootContract is a literal that contains the relative path to the all the contract YAML or JSON files associated with this product.
rootContract='./p1contract'
# ------NON MANDATORY
# To add the links in the data product.
productLinks=[ "link1: https://test1.org", "link2: https://test2.com", "link3: https://test3.in" ]
# ------NON MANDATORY
# To add the managed tags in the data product.
# Managed Tags are ones which are approved and present in the Tag list of NNDM UI.
productTags= ["tag1","tag2"]
# -------CONDITIONAL MANDATORY
#Mandatory Custom Fields for Data Product:
#For NNEDL source system(10689):
#"AD Group Names" (can be left blank)
#"Dataset Names" (must have a value)
#"Dataset Steward Email" (must have a value)
#For NNEDH source system (11593):
#"Dataset URI" (must have a value)
customFields= [ "AD Group Names : NNEDL : GLOOKODH_Developer", "Dataset Names : ABC ", "Dataset Steward Email : ABC@novonordisk.com" ]
# ---------CONDITIONAL MANDATORY
# Below attributes are mandatory if need to link provider data products with consumer data product.
accessIds=["link-testprod1-testprod123", "link-dataproduct-test123"] # Request Access Ids created by user any name follow naming convention
providerTeamIds=["test_domain_team", "test_domain_team"] # NNDM Team Ids of Source(provider) Data products
providerDataProductIds=["urn:data:source:0014395(product:0x241f0512)","369a6dd1-e217-460c-8987-1ba5f8155b11"] # Source Data Products' IDs
providerOutputPortIds=["urn:data:source:0014395(product:0x241f0512.ops:0x94530872)","my-output-port"] # Output port id of Source data products
consumerTeamId="test_domain_team" # TeamId of consumer data product
consumerDataProductId="urn:data:source:0014395(product:0x2cf705a2)" #Data Product of consumer data product
-
sourceID (required) - ServiceNow ID for the source system associated with the Data Product(s). This is an integer of up to 7 digits. Ex. 'sourceId=123456'. The source system should already be onboarded by NNDM Team.
The source system should be one where data is located eg: NNEDL, Datahub (NNEDH). -
teamID (required) - NN Data Marketplace Team ID for the Data Product Team associated with the Data Product(s). This is a literal. Ex. 'urn:team:platform:NNDM(domain:test)'
-
productId (required) - Data Product ID that is created on the NN Data Marketplace when using the CI Tool. This should be the internal ID that is assigned for the product. Ex. 'urn:data:product:001234'
-
productName (required) - Data Product Name. Can be a literal. Ex. "OpenPKai"
-
productDescription (required) - Data Product Description. HTML tags are supported. Can be a multi-line string. Ex. """The Data Product for PK data aims to facilitate:
Better understanding of features important for PK for faster decision making in research projects
Reduced number of compound design cycles based on animal studies (#500 days aspiration)
Enable machine learning in research projects
Reduced #animal studies
Faster extraction of PK information & deliverables of PK analysis results to research projects.""" -
productStatus (required) - Data Product Status. Should be one of proposed, in-development, active, or retired.
- proposed: Represents a data product in planning state
- in-development: Represents a data product actively being developed
- active: Represents a data product which is active and available
- retired: Represents a data product which has been retired and is no longer available.
- productArchetype (required) - Data Product Archetype. Should be one of consumer-aligned, aggregate, or source-aligned.
- source-aligned: Represents the data as it is in the operational system with minimal transformation. I am seeing organizations use these as a first step to creating more valuable data products.
- aggregate: Represents the data that has been aggregated from multiple sources. These are often used to create more complex data products by combining data from different sources. TL;DR definition of aggregate data products is that they’re built at a corporate level to drive global KPIs.
-
consumer-aligned: Represents the data that has been transformed into a format that is useful for consumers of the data. These are often used in BI and analytics solutions. When ‘data products’ are referred to generically — these are the data products that people think about and discuss most.
-
productMaturity (optional) - Data Product Maturity. Should be one of raw, defined, or managed.
- raw: This represents a data product that is under data exploring maturity and is not capitalizing on the data product's full potential.
- defined: This represents a data product that has been refined, cleaned, and validated to be ready for consumption by consumers of the data.
-
managed: This represents a data product that is managed and is a data-transformed product.
-
rootContract (required) - Relative path to the Folder, which contains all Data Contracts (YAML or JSON) associated with this Data Product. Is a literal. Ex. './p1contracts'
-
contractRequiredFields - Additional required fields in addition to ODCSv3.0.0 that should be enforced for this Data Product. This is an array. Ex. ['name1', 'name2']
-
flagSchemaDrift - Boolean that indicates if schema drift should be monitored or not for this data product. This is typically set to false unless there are specific concerns about schema changes affecting consumers of the data. Ex. false
- If set to false, this will not throw a warning on the pipeline run if schema changes occur to any associated contract within the data product.
-
If set to true, this will throw a warning on the pipeline run if schema changes occur to any associated contract within the data product (Not yet implemented in CI tool).
-
rootInfra - Literal that contains the relative path to the infrastructure configuration file that is used by the product ETL. Ex. './infra' (Not yet implemented in CI tool).
-
rootAgreement - Absolute or relative path to the folder containing all usage agreements (IA) related to the data contracts in this data product. Ex. './agreements' (Not yet implemented in CI tool).
-
productTags - Data Product Tags. Array of strings. Tags entered should be the managed tags i.e. it should be approved and present in the tag list of NNDM UI. User can check the tags present in the NNDM UI and copy it from there and use it. Currently this is not connected to tags-as-a-service. Ex. ['tag1','tag2'] (Not yet implemented in CI tool).
-
productLinks - User can add the additional links like Erwin Data model link and other references related to Data product.
e.g.productLinks=['link1: https://test1.org', 'link2: https://test2.com', 'link3: https://test3.in']
Linking of Data Products
Note: If a user adds or updates the Data products and Data contracts in their repository, the changes should be approved by the Data owner or Data Steward.
In the CI tool approval workflow is not implemented.
Provider Data Products and Consumer Data product can be linked using the CI tool, by providing the attributes and their values in the config.toml file.
Pre-requisite: 1. Provider Data Products and Consumer Data Product must exist already. 2. Provider Data products must have an output port which will be linked to the input port of the Consumer Data Product.
Example:
Your Team’s (consumerTeamId) Product (consumerDataProductId) is consuming from 3 Data Contract’s (providerOutputPortIds), meaning you are creating three links (accessIDs). For example’s sake, the 3 contracts belong to 2 data products (providerDataProductIDs), and one team (providerTeamIds).
accessIds=["accessID1", "accessID2", "accessID3"] providerTeamIds=["TeamID", "TeamID", "TeamID"] providerDataProductIds=["DataProductID1", "DataProductID1", "DataProductID2"] providerOutputPortIds=["OutputPortID1", "OutputPortID2", "OutputPortID3"] consumerTeamId="ConsumerTeamID" consumerDataProductId="consumerDataProductID"