Go back to Requirement Specification
Key Ideas
- Donโt make people think โ The model should be intuitive and easy to understand from a business perspective
- Presentation Matters โ Ensure the diagram is well-organized, visually clear, and reasonably sized. Use concise and clear names for objects and attributes
- Once defined, stick to it โ Use consistent naming conventions, leveraging the Novo Nordisk business glossary where appropriate
- Seek feedback โ Actively gather input to improve the model iteratively
- Collaborate early โ Work with both business and technical teams from the start to achieve better results
Data Modelling#
Data modelling provides a visual representation of a business by analyzing, clarifying, and documenting data requirements and how they support key business processes. It helps uncover relationships between data elements, offering insights into how data flows and where improvements may be possible from a data design perspective.
Having a clear data model significantly simplifies the development of the MDM physical data model and supports the accurate design of data publishing layers.
Initial data models may be created manually, for example, using Miro boards or Excel sheets, as part of a Solution Architecture document. However, at Novo Nordisk, the standard tool for documenting the Data Model is Erwin.
Roles Involved#
Accountable: Product Owner
Responsible: Data Architect for Conceptual and Logical Models Solution Architect for Physical Data Model
Consulted: Business Analyst
Process Flow#
The data modeling process follows a straightforward, structured approach. It begins by analyzing the requirements and then building the Data Model. It is crucial to adhere to the best practices for data modeling that have been developed and adopted within Novo Nordisk (the basic principles are summarized in the key ideas):
Creating a data model involves several sequential steps: starting with the conceptual data model, then developing the logical data model, proceeding to the physical data model, and finally implementing the actual MDM schema.
Info
While the example was created manually, Novo Nordisk has a standard tool for this purpose โ Erwin.
Start with Conceptual Design:#
This model defines what the system contains. It is a high-level view of the data domain, capturing entities and their relationships without details of attributes or subtypes. This model is typically created by business stakeholders and data architects. Its purpose is to organize, scope, and define business concepts and rules. Aligning entity definitions with industry standards like IDMP is recommended where possible.
Sample party conceptual model for representation
Create a Logical Data Model:#
This model describes how the system will be implemented. Starting from the conceptual data model, attributes for each entity are listed. If multiple values are possible for an attribute, a separate child entity should be created. Attributes and potential subtypes are continuously examined and added until the model satisfies Third Normal Form principles. The logical data model defines entities, attributes, primary keys, foreign keys, relationships, cardinalities, and fields marked as reference data or controlled attributes.
Plan the Physical Data Model:#
This is the actual model implemented in the MDM system. Depending on the specific Informatica MDM product used (HMDM or MDM SaaS), the approach to physical data modeling differs slightly.
- The physical data model is what is actually entered into the MDM .
Based on of the product of Informatica MDM use (HMDM and MDM SaaS), there are slight variations in the key ideas behind creating the physical data model - MDM SaaS products take a slightly different approach to data modelling . There is no explicit relationship data model to build as it is a document -style data model that is implemented on MongoDB.
- The process typically starts with defining the business entities, followed by the definition of attributes or fields. Repeating attributes or composite fields can also be added. Once entities and their fields are defined, relationships between entities are established.
- In this model, all data for a specific entity is stored in a single JSON-style document. These documents can be deeply nested, allowing for child entities, grandchildren, and even further levels of hierarchy as needed.
Key Principles for Document-Style Data Modeling#
Some important guidelines for designing an effective document data model:
-
Design based on your applicationโs query patterns Model your documents around how the data will be accessed and updated most frequently. Prioritize real-world access patterns rather than following traditional schema-first or normalized approaches used in relational databases.
-
Think queries-first, not schema-first. Focus on optimizing for the most common queries to improve performance and maintainability.
Example:
If your application frequently retrieves a user's profile and their activity together, it's more efficient to store both in the same document.
- Embed vs Reference
Deciding between embedding and referencing depends on data structure, access patterns, and performance needs:
Embed documents when:
-
The related data is tightly coupled and frequently accessed together.
-
The related data is relatively small and unlikely to grow indefinitely.
Example: Store a user profile and their contact information in the same document.
Reference documents when:
-
The related data is large, loosely coupled, or reused across multiple documents.
-
Keeping it separate helps avoid duplication and simplifies updates.
Example: Reference a supplier document when multiple products share the same supplier.
- Hybrid approach:
Use embedding for frequently accessed, denormalized data and references for less critical or shared data.
Note: MongoDB has a 16 MB document size limit. Avoid storing unbounded arrays or deeply nested structures that may exceed this.
Best Practices:
-
Balance normalization and denormalization based on access frequency and update patterns.
-
Denormalize (embed) when read performance is critical.
-
Normalize when the data is large or infrequently accessed.
-
Avoid excessive nesting that leads to large document rewrites or performance issues.
Document Modeling Example#
The following example illustrates how to apply best practices in MongoDB document modeling. It demonstrates how to balance embedding and referencing to optimize read performance, reduce duplication, and keep documents manageable in size.
{
"orderId": "1001",
"userId": "5001",
"items": [
{"productId": "2001", "name": "Laptop", "price": 1200, "quantity": 1},
{"productId": "2010", "name": "Mouse", "price": 25, "quantity": 2}
],
"status": "Pending",
"orderDate": "2023-10-01T12:34:56Z",
"shippingAddress": {
"street": "123 Elm St",
"city": "Springfield",
"state": "IL",
"zipCode": "62701"
}
}
{
"name": "Order",
"namespace": "order",
"fields": [
{
"name": "orderId",
"type": "string",
"doc":"Primary ID for the order in format NN-XXXX",
},
{
"name": "userId",
"type": "string",
"doc":"Primary ID for the user",
},
{
"name": "items",
"type": "record",
"doc":"items in the order",
"namespace":"order.items",
fields :[
{
"name": "productId",
"type": "string",
"doc":"Primary ID for the product",
},
{
"name": "name",
"type": "string",
"doc":"type of product",
},
{
"name": "qantity",
"type": "integer",
"doc":"type of product",
"default":1
}
]
}
items and shippingAddress for quick access.
- Reference other entities like userId if they are needed across multiple collections.
Naming and Indexing Best Practices:
- Use Consistent Naming Conventions
- Use clear, descriptive, and consistent field names throughout your collections.
- Avoid special characters or overly long names.
Example: Use lastName instead of surname_of_person. -
- Indexing Best Practices
- Index your data on fields that are queried frequently.
- Avoid over-indexing, as it increases memory usage and slows down write operations.
What Happens Next?#
Once a data model is finalized (or modified) or an Update designed for the existing one, it must be submitted as an [an Architectural Proposal to Design Authority] to ensure alignment with Data and Technical strategy.
Upon approval, the data model becomes part of the Requirements Specification for design and implementation.