Data Contract YAML
Data Contract YAML - Fields and Attributes
sample template of contract yaml:
dataContractSpecification: 0.9.3 #CI tool supported till 1.2.0 version
id: my-data-contract #ForNNEDHitshouldbe"DatasetURI_ContractName" else use the data contract name in lower case separated by '-'
info:
title: MyDataContract
version: 0.0.1
owner: urn:team:0x44bb06dc(domain:commercial)
description: Adescriptionofthedatacontract.
status: active
contact:
name: AnuragDaipuriya
email: GDIY@novonordisk.com #MandatoryfieldforNNEDLforDatastewardemailid.
terms:
usage: Datacanbeusedforanalyticalpurposes.
limitations: Notsuitableforreal-timeusecases.
models:
my_table:
type: table
description: description of table.
fields:
my_column_1:
description: ThetechnicalID
type: string
format: uuid
primaryKey: true
examples:
-123e4567-e89b-12d3-a456-426614174000
my_column_2:
description: Thebusinesstimestampofthetransaction
type: timestamp
examples:
-2021-01-01T00:00:00Z
my_column_3:
description: Theamountofthetransaction
type: long
examples:
-123.45
servers:
production:
type: s3
environment: prod
location: s3://dhqcglimsrawzonekeoryseucentral1/LABVANTAGE/S_SAMPLE/_symlink_format_manifest
format: parquet
delimiter: new_line
database: nnedl
dataset: "dhlprdglobal,glookodh" #MandatoryforNNEDLSourcesystemContract
description: S3serverdetails.
definitions:
order_id:
title: Order ID
type: text
format: uuid
description: An internal ID that identifies an order in the online shop.
examples:
- 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
pii: true
classification: restricted
tags:
- orders
#Customfieldsforextrainformation,MandatoryforNNEDLSourceSytem
ADGroupNames: "NNEDL:GLOOKODH_Developer,NNEDL:GLOOKODH_Reader"
DatasetNames: "dhlprdglobal,glookodh"
servicelevels:
retention:
description: Dataisretainedforoneyear
period: P1Y
unlimited: false
frequency:
description: DataisdeliveredonceadayatmidnightUTC
type: batch
cron: 00***
- ID - An organization-wide unique technical identifier for this data contract, such as a UUID, URN, slug, or number. (Required Field)
- Version - The version of the data contract document (which is distinct from the Data Contract Specification version). (Required Field)
- Status β Proposed, In development, Active, Depreciated and Retired
- Description β Description of the Data Contract (Required Field)
- Owner - The team responsible for managing the data contract. Only your teams are displayed. (Required Field)
- Contact Name - The identifying name of the contact person/organization.
- Contact URL - The URL pointing to the contact information. This MUST be in the form of a URL.
-
Contact Email - The email address of the contact person/organization. This MUST be in the form of an email address.
-
Server: Information about the servers.
- Server β Server name /url
-
Server Type β Type of server like S3 or Snowflake
-
Terms: Terms and conditions for access
- Usage - The usage terms for this data contract.
- Limitations - Limitations of the usage of this data contract.
- Billing - Costs associated with the usage data contract.
-
Notice Period - The notice period for consumers in ISO 8601 period format, e.g., P3M for 3 months.
-
Model: The logical data model.
- Model - The name of the model, e.g., the table name
- Title - The title, e.g., the business name
- Type - Table, Message, Object and Graph
- Description - Description of the model
- Fields - Field Name
-
Type - Data Type like string, int, varchar ,text
-
Semantic Description - Semantic Description
- Example: Example data like Type, model, description and data
-
Service Levels: A service level is an agreed measurable level of performance for provided the data.
-
Availability - The promise or guarantee by the service provider about the uptime of the system that provides the data.
- Description - An optional string describing the availability service level.
-
Percentage - An optional string describing the guaranteed uptime in percent (e.g., 99.9%)
-
Retention - The period how long data will be available.
- Description - An optional string describing the retention service level.
- Period - An optional period of time, how long data is available. Supported formats: Simple duration (e.g., 1 year, 30d) and ISO 8601 duration (e.g., P1Y).
- Unlimited - An optional indicator that data is kept forever.
-
Timestamp Field - An optional reference to the field that contains the timestamp that the period refers to.
-
Latency - The maximum amount of time from the source to its destination.
- Description - An optional string describing the latency service level.
- Threshold - An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).
- Source Timestamp Field - An optional reference to the field that contains the timestamp when the data was provided at the source.
-
Processed Timestamp Field - An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.
-
Freshness - The maximum age of the youngest entry.
- Description - An optional string describing the freshness service level.
- Threshold - An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).
-
Timestamp Field - An optional reference to the field that contains the timestamp that the threshold refers to.
-
Frequency - How often data is updated.
- Description - An optional string describing the frequency service level.
- Type - An optional type of data processing. Typical values are batch, micro-batching, streaming, manual.
- Interval - Optional. Only for batch: How often the pipeline is triggered, e.g., daily.
- Cron - Optional. Only for batch: A cron expression when the pipeline is triggered. E.g.,
0 0 * * *. - Support - The times when support is provided.
- Description - An optional string describing the support service level.
- Time - An optional string describing the times when support will be available for contact such as 24/7 or business hours only.
-
Response Time - An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.
-
Backup-Β Details about data backup procedures.
- Description β An optional string describing the backup service level.
- Interval - An optional interval that defines how often data will be backed up, e.g., daily.
- Cron β An optional cron expression when data will be backed up, e.g., 0 0 * * *.
- Recovery Time β An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).
- Recovery Point - An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).
- Custom Fields: Additional custom fields. Key value pair β Name and Value