Data Contract YAML

Data Contract YAML - Fields and Attributes

sample template of contract yaml:

dataContractSpecification: 0.9.3 #CI tool supported till 1.2.0 version

id: my-data-contract #ForNNEDHitshouldbe"DatasetURI_ContractName" else use the data contract name in lower case separated by '-'

info:

 title: MyDataContract
 version: 0.0.1
 owner: urn:team:0x44bb06dc(domain:commercial)
 description: Adescriptionofthedatacontract.
 status: active
 contact:
  name: AnuragDaipuriya
  email: GDIY@novonordisk.com #MandatoryfieldforNNEDLforDatastewardemailid.

terms:

 usage: Datacanbeusedforanalyticalpurposes.
 limitations: Notsuitableforreal-timeusecases.

models:

  my_table:
   type: table
   description: description of table.

  fields:

      my_column_1:
        description: ThetechnicalID
        type: string
        format: uuid
        primaryKey: true
        examples:
          -123e4567-e89b-12d3-a456-426614174000

      my_column_2:
        description: Thebusinesstimestampofthetransaction
        type: timestamp
        examples:
          -2021-01-01T00:00:00Z

      my_column_3:
        description: Theamountofthetransaction
        type: long
        examples:
          -123.45

servers:
  production:
   type: s3
   environment: prod
   location: s3://dhqcglimsrawzonekeoryseucentral1/LABVANTAGE/S_SAMPLE/_symlink_format_manifest
   format: parquet
   delimiter: new_line
   database: nnedl
   dataset: "dhlprdglobal,glookodh" #MandatoryforNNEDLSourcesystemContract
   description: S3serverdetails.

definitions:
  order_id:
    title: Order ID
    type: text
    format: uuid
    description: An internal ID that identifies an order in the online shop.
    examples:
      - 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
    pii: true
    classification: restricted
    tags:
      - orders

#Customfieldsforextrainformation,MandatoryforNNEDLSourceSytem

ADGroupNames: "NNEDL:GLOOKODH_Developer,NNEDL:GLOOKODH_Reader"
DatasetNames: "dhlprdglobal,glookodh"

servicelevels:
  retention:
    description: Dataisretainedforoneyear
    period: P1Y
    unlimited: false

  frequency:
    description: DataisdeliveredonceadayatmidnightUTC
    type: batch
    cron: 00***
  • ID - An organization-wide unique technical identifier for this data contract, such as a UUID, URN, slug, or number. (Required Field)
  • Version - The version of the data contract document (which is distinct from the Data Contract Specification version). (Required Field)
  • Status – Proposed, In development, Active, Depreciated and Retired
  • Description – Description of the Data Contract (Required Field)
  • Owner - The team responsible for managing the data contract. Only your teams are displayed. (Required Field)
  • Contact Name - The identifying name of the contact person/organization.
  • Contact URL - The URL pointing to the contact information. This MUST be in the form of a URL.
  • Contact Email - The email address of the contact person/organization. This MUST be in the form of an email address.

  • Server: Information about the servers.

  • Server – Server name /url
  • Server Type – Type of server like S3 or Snowflake

  • Terms: Terms and conditions for access

  • Usage - The usage terms for this data contract.
  • Limitations - Limitations of the usage of this data contract.
  • Billing - Costs associated with the usage data contract.
  • Notice Period - The notice period for consumers in ISO 8601 period format, e.g., P3M for 3 months.

  • Model: The logical data model.

  • Model - The name of the model, e.g., the table name
  • Title - The title, e.g., the business name
  • Type - Table, Message, Object and Graph
  • Description - Description of the model
  • Fields - Field Name
  • Type - Data Type like string, int, varchar ,text

  • Semantic Description - Semantic Description

  • Example: Example data like Type, model, description and data
  • Service Levels: A service level is an agreed measurable level of performance for provided the data.

  • Availability - The promise or guarantee by the service provider about the uptime of the system that provides the data.

  • Description - An optional string describing the availability service level.
  • Percentage - An optional string describing the guaranteed uptime in percent (e.g., 99.9%)

  • Retention - The period how long data will be available.

  • Description - An optional string describing the retention service level.
  • Period - An optional period of time, how long data is available. Supported formats: Simple duration (e.g., 1 year, 30d) and ISO 8601 duration (e.g., P1Y).
  • Unlimited - An optional indicator that data is kept forever.
  • Timestamp Field - An optional reference to the field that contains the timestamp that the period refers to.

  • Latency - The maximum amount of time from the source to its destination.

  • Description - An optional string describing the latency service level.
  • Threshold - An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).
  • Source Timestamp Field - An optional reference to the field that contains the timestamp when the data was provided at the source.
  • Processed Timestamp Field - An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.

  • Freshness - The maximum age of the youngest entry.

  • Description - An optional string describing the freshness service level.
  • Threshold - An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).
  • Timestamp Field - An optional reference to the field that contains the timestamp that the threshold refers to.

  • Frequency - How often data is updated.

  • Description - An optional string describing the frequency service level.
  • Type - An optional type of data processing. Typical values are batch, micro-batching, streaming, manual.
  • Interval - Optional. Only for batch: How often the pipeline is triggered, e.g., daily.
  • Cron - Optional. Only for batch: A cron expression when the pipeline is triggered. E.g., 0 0 * * *.
  • Support - The times when support is provided.
  • Description - An optional string describing the support service level.
  • Time - An optional string describing the times when support will be available for contact such as 24/7 or business hours only.
  • Response Time - An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.

  • Backup-Β Details about data backup procedures.

  • Description – An optional string describing the backup service level.
  • Interval - An optional interval that defines how often data will be backed up, e.g., daily.
  • Cron – An optional cron expression when data will be backed up, e.g., 0 0 * * *.
  • Recovery Time – An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).
  • Recovery Point - An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).
  • Custom Fields: Additional custom fields. Key value pair – Name and Value