Data Modelling#
Overview: What is Data Modelling and Why It Matters#
Data modelling is the structured process of defining and organizing data elements and their relationships to represent the real-world entities relevant to a data product. It transforms business concepts into data assets that can be consistently interpreted, shared, and reused across systems. In the context of a data product lifecycle, data modelling forms the bridge between business understanding and technical implementation — ensuring that the right data is captured, structured, and made interoperable from the start.
Data modelling spans multiple phases of the lifecycle: conceptual models emerge in the Feasibility phase to shape business understanding, logical models evolve during the Design phase to define the product-specific structure, and physical models (when applicable) are implemented in Build and Deploy to bring the model to life within the technical environment. In the context of building a data product, data modelling is focused on the conceptual and logical stages — the layers that define the 'what' and 'how' of data before engineering execution begins.
Feasibility Phase: Conceptual Data Modelling#
Overview#
During the Feasibility phase, data modelling focuses on building the conceptual model — a high-level representation of the data domain that defines key business entities and relationships. This model establishes a shared understanding of data meaning and scope across business and technical teams.
Learning Objectives#
- Understand why conceptual data modelling is essential in the feasibility phase
- Learn how to represent key entities and relationships within a data domain
- Identify dependencies, overlaps, and boundaries for future data products
Key Personas & Stakeholders (RACI Matrix)#
| Activity | Data Product Manager | Data Architect | Data Steward | Business Stakeholder | Governance Team |
|---|---|---|---|---|---|
| Define domain scope | A/R | C | I | R | C |
| Develop conceptual model | A | R | C | C | I |
| Validate and approve model | R | A | I | C | R |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Prerequisites#
- Business understanding of the domain
- Availability of subject matter experts
- Access to enterprise reference models and Erwin tool.Refer Enterprise Data Model Hub for instructions to install and access Erwin.
Step-by-Step Process#
- Identify and define key business entities within the data domain
- Capture their high-level relationships and dependencies
- Represent these entities and associations using any diagramming tool of choice, such as Miro, Whiteboard, etc. Optionally start in Erwin, as it preserves metadata and eases extension into the logical model
- Validate the model with domain experts and stakeholders for accuracy and completeness
- Review alignment with enterprise data standards and the EDM framework
Success Metrics & Checkpoints#
- Conceptual model approved by both business and data architecture teams
- Key entities and relationships documented in Erwin
- Domain boundaries clearly defined, and overlaps resolved
Common Challenges & Solutions#
- Challenge: Misalignment between business and technical perspectives
- Solution: Conduct joint modelling sessions and validate terminology early
- Prevention: Maintain a shared glossary for consistent interpretation
Next Steps#
Once the conceptual model is validated and approved, transition to the Design phase to create the logical data model specific to the data product within the defined domain.
Design Phase: Logical Data Modelling#
Overview#
In the Design phase, the conceptual model evolves into a logical data model. This model defines the structure, attributes, and relationships of data specific to the data product. Logical models serve as the blueprint for schema and pipeline design, enabling consistency across ingestion, transformation, and presentation layers.
Learning Objectives#
- Learn how to refine conceptual models into logical representations
- Understand how logical models guide schema design and data pipelines
- Recognize where modelling fits within the Medallion architecture (Silver and Gold layers)
Key Personas & Stakeholders (RACI Matrix)#
| Activity | Data Product Manager | Data Architect | Data Steward | Business Stakeholder | Governance Team |
|---|---|---|---|---|---|
| Refine conceptual model | C | R/A | C | I | C |
| Develop logical data model | A | R | R | C | I |
| Flag design decisions and ambiguities | R | A | C | C | I |
| Drive schema and pipeline design | I | R | A | C | I |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Prerequisites#
- Approved conceptual data model
- Stakeholder sign-off on data domain boundaries
- Access to modelling tool (Erwin) and data platform specifications. Refer Enterprise Data Model Hub for instructions to install and access Erwin
Step-by-Step Process#
- Import the approved conceptual model into Erwin
- Expand entities with attributes, data types, and business rules
- Define primary and foreign key relationships and cardinality
- Document assumptions and design decisions for traceability
- Validate the model for alignment with Medallion architecture — Silver for cleaned, conformed data and Gold for curated, analytical structures
- Review with data owners and ensure readiness for schema and pipeline generation
Success Metrics & Checkpoints#
- Logical model validated by data architects and owners
- Model used to define schema for silver and gold layers
- All key design decisions documented and approved
Common Challenges & Solutions#
- Challenge: Overly complex logical models not aligned with business need
- Solution: Prioritize simplicity and purpose alignment
- Prevention: Conduct early reviews with business stakeholders
Next Steps#
After finalizing the logical data model, transition into the Build & Deploy phase where data engineers use it to create actual schemas, data pipelines, and validation frameworks.
Additional Resources#
- EDM Learning & Support Page – Refer to this page for foundational knowledge on data modelling concepts and step-by-step guidance on navigating the Erwin tool
- Data Modelling Best Practices Guide – Explore this guide to understand the core principles of effective data modelling and learn how to design models aligned with industry-leading standards
- Data Modelling Standard Operating Procedure.docx – Consult this SOP to ensure consistency in modelling practices, governance alignment, and adherence to enterprise data standards across all data products
Build & Deploy: Physical Modelling#
The logical data model serves as the foundation for the Build & Deploy phase, providing the technical blueprint needed to construct schemas, transformation logic, and integration pipelines. At this stage, the focus shifts from modelling to implementation — translating the logical structures into real data flows and physical assets within the enterprise data platform.
In traditional database development, physical modelling involves generating the Data Definition Language (DDL) — defining tables, keys, and indexes at the database level. However, within the data product context, this step is not separately required, as the logical model already defines the structure and relationships that guide automated schema generation within the platform. The physical layer is therefore abstracted and handled through data engineering workflows and deployment pipelines.
Key Takeaway: Success in this transition depends on close collaboration between data architects and data engineers to ensure that the logical model's intent — data integrity, reusability, and interoperability — is fully preserved during implementation and deployment.