Data publish from mdm

Go to Playbook Main Page
Go back to Design

Data Publish from MDM#

Hold "Alt" / "Option" to enable Pan & Zoom

Key Ideas

Always publish golden or mastered data — not raw or intermediate — to avoid confusion and inconsistencies downstream.
The canonical view should reflect the current best representation of the entity with resolved duplicates, cleaned attributes, and enriched information.
Push messages to Kafka or an API when changes or new records become available.
Adopt Schema-First Approach-Define your data structure in a schema and validate messages against this schema to avoid schema drift and
maintain compatibility across consumers.
Design your messages to be backward compatible by-Adding fields with defaults,not removing or renaming existing fields immediately and providing fallback strategies for older consumers.
Provide a unique identifier or a composite key in messages.
Consumers should be able to safely apply messages multiple times without side effects (important for retry scenarios).
Include metadata fields in messages - Timestamp of change,Source of change,Version number,Trace or message IDs for diagnostics.
Ideally, messages should reflect a transactionally consistent view of the record — all related fields should reflect the state at a single point in time.
Push messages at the level of an entity or master record, not an undifferentiated “bulk” event.
Clearly reflect CRUD operations in messages (such as CREATE, UPDATE, DELETE) if applicable.
Filter and mask sensitive fields (like PII) if not required by all consumers.
Handle authorization and authentication at the delivery layer (API, Kafka).

Why Publish?#

Integrate to other systems to provide master data.
Notify source systems about any improvements or corrections made to their submitted data during MDM processing.

When to Publish?#

New record is created in MDM
An existing record in MDM is updated
When a record is merged,unmerged
When a record is soft deleted

Publish Patterns#

Data from MDM is generally converted to a defined publish format as defined in Data Contracts/ Avro Schemas/ API Specification etc and made available to consumers in multiple modes- batch and real-time .The integration methods depend on the kind of requirements of the consuming system. but while developing a product is it best to design both batch and real time interfaces and allow the consuming systems to choose the approach as per their requirements. It is then left to the consumers to choose the integration based on the use cases.

Batch Mode Batch integration is suitable when - Real-time data updates are not critical and some latency/delayed synchronization is acceptable between updates in the MDM system and the consuming applications.
- Volume is high, MDM Supports both full extract** and incremental load strategies, depending on system capabilities and integration frequency.

Real time /Near real time Mode For use cases that require - 'Zero' latency - Trigger/Event Based - Queuing or Orchestration

Publication Method	Best Use Case	Data Delivery	Advantages	Challenges	How is it achieved
Via REST API	- Real-time or on-demand data retrieval. - Interactive system integrations (e.g., CRM,)	On-demand via Mulesoft API	- Real-time access to master data. - Supports REST APIs for flexible integrations. - Fine-grained control over data consumption.	- Requires API design and management. - Higher initial setup complexity. - Dendency on API consumers for performance.	- REST API of MDM are onboarded to Muelsoft MAP to allow consumers to read data in real time. - CAI workflows can be used to transform the MDM data into the target api specification format.
Via Events	- Real-time, event-driven data delivery. - Distributed systems or microservices requiring synchronized updates. - For integration with Analytics Platform	Real-time messaging	- High throughput with near zero latency. - Decouples producer (MDM) and consumers. - Supports scalable, event-driven architectures. - Suitable for handling high-frequency updates.	- Requires Kafka infrastructure setup and maintenance. - Message schema requires careful definition. - Event management can become complex in distributed systems.	- MDM Business events are configured to publish the create/updates to the MDM data to Kafka Topics. - CAI workflows can be used to transform the MDM data into the target publish schema format.
Via ODS	- Batch-oriented data integration. - Suitable for ETL, data warehousing, or reporting workflows.	Scheduled batch	- Simple, widely used approach. - Compatible with ETL workflows and BI tools. - Easy integration with legacy systems. - Data can be directly consumed from tables.	- No real-time capability. - Requires monitoring for large dataset loads. - Table schema changes can impact downstream consumers.	- Data from MDM base object collection is exported via an eggress job to Postgres tables . Materialized views are created on the tables which can be consumed by the consumers wither via Mviews or API s - CDI jobs can be used to transform the MDM data into the target publish schema format.
Via Files	- Batch-oriented data integration. - Suitable for external systems .	Scheduled batch	- Simple, widely used approach. - Compatible with ETL workflows and BI tools. - Easy integration with legacy systems. - Data can be directly consumed from files.	- No real-time capability. - Requires monitoring for large dataset loads. -Files need to be maintained /archived regularly.	- Data from MDM base object collection is exported via an eggress job to files in the publish format to an S3 bucket or ADLS - CDI jobs can be used to transform the MDM data into the target publish schema format.

Data Contract#

A Data Contract is a critical component that needs to be defined to enable seamless integration between MDM and the downstream systems while supporting complaince ,reducing errors and enabling effective change management.

A data contract is a formal, reusable definition that specifies:

The schema,
The constraints,
The rules,
The types of messages that will be exchanged.

It guarantees:

All producers and consumers conform to this format.
There’s a clear, documented "agreement" between the MDM team (publisher) and the consuming services (subscribers or API clients).

As a technical document, the Data Contract plays a key role in sustaining data quality. Adhering to the recommended specification contributes significantly to achieving FAIR principles.
The specified format, currently adopted for Data Marketplace purposes, was developed by the Data Mesh Architecture organization (datacontract.com).

The Data Contract Specification enables extensive documentation of various aspects of a Data Product, including:

Definitions
Background and Context
Ownership
Terms of Use
Servers
Data Models, including:
Attribute definitions
Data quality requirements
Lineage
Sensitivity
Examples
And more
SLAs
And more

* Data Contract Example
* Data Contract Specification

Example of Data Contract for NNDM (country)#

dataContractSpecification: 0.9.3
id: urn:data:source:0012293(product:0x1e5e04b1.contract:0x3e7f1374)
info:
  title: API - Country List
  version: 0.0.1
  status: active
  owner: urn:team:source:0012293(domain:root)
  contact:
    name: Søren Skibelund
    email: OSKI@novonordisk.com
Data Quality Description: "In country data source, we prioritize quality by verifying country records while inserting them into the Informatica Reference 360. Only verified details will be approved and others will be rejected before inserting into Reference 360 by the designated approver. This helps to have the consistent and valid country data as per Novo standards.  Manual Quality Assurance: Designated approver will check and approve the data before inserting or updating in Reference 360 based on the set workflow"
terms:
  usage: |
    Unlimited number of queries per day
  limitations: |
    Not suitable for real-time use cases.
models:
  countries:
    description: One record per country. ISO 3166-1 alpha-2.
    type: table
    fields:
      Code:
        type: string
        format: ISO 3166-1 alpha-2
        required: true
        unique: true
        primaryKey: true
      fields:
        type: object
        fields:
          Code:
            type: string
            required: true
          status:
            type: string
            required: true
          Name:
            type: string
            required: true
          Description:
            type: string
          CodeTertiary1:
            type: string
          CodeSecondary1:
            type: string
servicelevels:
  availability:
    description: Mulesoft availability
    percentage: 99.9%
  frequency:
    description: "Real-time Updates: Inclusion of countries and update on an existing country fields (Description, Name, Secondary Code, Tertiary Code etc.,) will be updated in Reference 360 based on the set workflow."

Avro Schema#

When MDM publishes a data set (say, substance data to Kafka), it typically emits messages serialized in Avro format.

Avro is a lightweight, schema-centric format designed for fast, compact messages.
The Avro Schema explicitly specifies:
The type of record (like "Substance"),
The fields it contains (such as substanceId, stage, origin, nnSubstanceName),
The type of each field (string, int, array, enum, etc.).

This lets Kafka consumers know exactly how to parse the messages — and makes sure the messages stay forward- and backward-compatible.

Example (substance)#

Substance 


{ 
  "fields": [ 
    { 
      "name": "substance", 
      "type": { 
        "fields": [ 
          { 
            "name": "substanceGlobalID", 
            "type": "string" 
          }, 
          { 
            "name": "stage", 
            "type": "string" 
          }, 
          { 
            "name": "origin", 
            "type": "string" 
          },
          { 
            "name": "inNNCD", 
            "type": "string" 
          }, 
          { 
            "name": "nnSubstanceID",
            "type": "string" 
          }, 
          { 
            "name": "nnSubstanceName",
            "type": "string"
          }, 
          { 
            "name": "inn",
            "type": [ 
              "null", 
              "string"
            ] 
          }, 
          { 
            "name": "analyteNumber", 
            "type": [ 
              "null", 
              "string" 
            ] 
          }, 
          { 
            "name": "developmentSubstanceEVCode", 
            "type": [ 
              "null", 
              "string" 
            ]
          }, 
          {
            "name": "innSubstanceEVCode",
            "type": [ 
              "null",
              "string" 
            ] 
          }, 
          { 
            "logicalType": "timestamp-millis", 
            "name": "createDateTime", 
            "type": "long" 
          }, 
          { 
            "logicalType": "timestamp-millis", 
            "name": "latestUpdateDateTime", 
            "type": "long" 
          }, 
          { 
            "default": null, 
            "doc": "substanceAlternateNames Details",
            "name": "substanceAlternateNames", 
            "type": [
              "null", 
              { 
                "fields": [ 
                  { 
                    "default": null, 
                    "name": "substanceNameType", 
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  }, 
                  { 
                    "default": null,
                    "name": "substanceName",
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  }, 
                  { 
                    "default": null,
                    "logicalType": "timestamp-millis", 
                    "name": "effectiveFromDateTime",
                    "type": [
                      "null", 
                      "long" 
                    ]
                  } 
                ], 
                "name": "substanceAlternateNames", 
                "type": "record" 
              } 
            ] 
          }, 
          { 
            "doc": "referenceSubstance Details", 
            "name": "referenceSubstance", 
            "type": [ 
              "null", 
              {
                "fields": [ 
                  { 
                    "name": "substanceGlobalID",
                    "type": "string" 
                  }, 
                  { 
                    "name": "stage", 
                    "type": "string" 
                  }, 
                  { 
                    "name": "origin",
                    "type": "string" 
                  }, 
                  { 
                    "name": "inNNCD", 
                    "type": "string" 
                  }, 
                  { 
                    "name": "nnSubstanceID", 
                    "type": "string" 
                  }, 
                  { 
                    "name": "nnSubstanceName", 
                    "type": "string" 
                  }, 
                  { 
                    "name": "inn",
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  }, 
                  { 
                    "name": "analyteNumber", 
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  }, 
                  { 
                    "name": "developmentSubstanceEVCode", 
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  }, 
                  { 
                    "name": "innSubstanceEVCode", 
                    "type": [ 
                      "null", 
                      "string" 
                    ] 
                  } 
                ], 
                "name": "referenceSubstance", 
                "type": "record" 
              } 
            ] 
          } 
        ], 
        "name": "substance", 
        "type": "record" 
      } 
    } 
  ], 
  "name": "substanceMessage", 
  "namespace": "com.novonordisk.prodex.operations.substance", 
  "type": "record" 
}

API Specification#

Your API specification (typically defined in OpenAPI/Swagger) describes how clients can retrieve or manipulate this master data.

This covers:

The API endpoint URL, e.g.:

GET /v1/substance/{id}

* The path parameters, like id. * The query parameters, if applicable. * The API response format, which often uses JSON Schema to describe the payload — analogous to the Avro schema — but tailored for a REST API context.

Example (OpenAPI)#

openapi: 3.0.0
info:
  title: Substance API
  version: "1.0"

paths:
  /v1/substance/{id}:
    get:
      summary: Retrieve a Substance by its ID
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
      responses:
        "200":
          description: Successful response
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Substance"

components:
  schemas:
    Substance:
      type: object
      properties:
        Substance Global-Id:
          type: string
        NN Substance ID:
          type: string
        Origin:
          type: string
        In NNC:
          type: string

Sample Response

Header

client_id
client_secret

Body 

[ 
    { 
        "substanceGlobalID": "SUB-0000687", 
        "NNSubstanceId": "NNC0098-0000-1179", 
        "origin": "External", 
        "inNNCD": "Yes", 
        "stage": "Released", 
        "nnSubstanceName": "NNC0098-1179", 
        "inn": "IN1179", 
        "analyteNumber": "AN000079", 
        "developmentSubstanceEVCode": "SUB001179", 
        "innSubstanceEVCode": "SUB210224", 
        "createDateTime": "2023-12-11T08:45:38.608Z", 
        "latestUpdateDateTime": "2024-02-21T10:07:27.488Z", 
        "referenceSubstance": [ 
            { 
                "substanceGlobalID": "SUB-0000684", 
                "nnSubstanceID": "NNC0098-0000-1174", 
                "origin": "Internal",
                "inNNCD": "Yes", 
                "stage": "Released", 
                "nnSubstanceName": "NNC0098-1174", 
                "inn": null, 
                "analyteNumber": "AN000074", 
                "developmentSubstanceEVCode": "SUB000174", 
                "innSubstanceEVCode": "SUB000174" 
            } 
        ] 
    }
]

Sample Response - 400

{
    "success": false,
    "apiName": "mdm-prodex-substance-prc",
    "version": "v1",
    "correlationId": "961101b4-d8c6-453e-81aa-0c0b47e7011e",
    "timestamp": "2025-05-08T09:25:31.686339702Z",
    "errorDetails": "The searched field has no data in MDM"
}

Sample Response -401

{
  "success": false,
  "apiName": null,
  "version": null,
  "correlationId": "8aa2cb70-d7d6-11eb-b4d1-02018dd03ba2",
  "timestamp": "2021-06-28T02:09:22.436-04:00",
  "errorDetails": [
    {
      "code": 401,
      "reason": "HTTP:UNAUTHORIZED",
      "message": "HTTP POST on resource 'https://..........' failed: unauthorized (401)"
    }
  ]
}

Sample Response -422

{
    "success": false,
    "apiName": "mdm-prodex-substance-prc",
    "version": "v1",
    "correlationId": "ee006b70-c6b6-4981-b0a5-3ef3166490aa",
    "timestamp": "2025-05-08T09:26:20.83062079Z",
    "errorDetails": "The Search Criteria is incorrect"
}

Go to Playbook Main Page
Go back to Design