Requirements Specification#
Overview#
The Requirements Specification phase translates business needs into detailed technical and functional requirements for your data product. This critical step ensures all stakeholders have a shared understanding of what will be delivered and establishes clear boundaries for the implementation team.
What you will learn#
After completing this chapter, you will understand how to:
- Transform business requirements into actionable technical specifications
- Conduct stakeholder interviews to gather comprehensive requirements
- Create essential documentation including data contracts, business glossary, and gap analyses
- Define clear scope boundaries and success criteria for your data product
- Establish proper governance, security, and compliance requirements
Key Personas & Stakeholders - RACI Matrix#
| Activity | Product Owner | Business Analyst | Data Engineer | Solution Architect | Business Stakeholders | Governance Team |
|---|---|---|---|---|---|---|
| Requirements Gathering | A | R | C | C | C | C |
| Technical Analysis | C | R | R | A | I | C |
| Gap Analysis | A | R | C | R | C | C |
| Scope Definition | A | R | C | C | I | I |
| Business Glossary Creation/Maintenance | A | R | I | C | C | R |
| Data Contract Development/Maintenance | A | C | R | C | I | I |
| Security Requirements | I | C | C | R | C | A |
R = Responsible, A = Accountable, C = Consulted, I = Informed
Prerequisites#
- Completed: Business Requirements Document
- Access: Stakeholder contact information and availability for interviews
- Tools: Access to ADO/TIMS for requirement management and dc-release framework setup
- Knowledge: Understanding of your organization's data governance framework and compliance requirements
Before You Begin
Consider using TIMS/ADO managed projects for capturing final requirements. If your team follows the dc-release Operating Model, requirements should be captured in feature files.
Step-by-Step Process#
Step 1: Stakeholder Interview Planning#
Objective: Identify all relevant parties and plan comprehensive requirement gathering sessions.
-
Create stakeholder matrix: - Primary business stakeholders (Product Owner, Business Owner) - Technical stakeholders (Data Engineers, Solution Architects) - Governance parties (Compliance, Security, Data Governance) - End users and consumers of the data product
-
Schedule interview sessions with each stakeholder group
- Prepare interview templates focusing on their specific domain expertise
Interview Best Practices
Structure interviews around the stakeholder's expertise area. Ask governance teams about data definitions, security teams about access requirements, and business users about expected outcomes.
Expected Outcome: Comprehensive stakeholder engagement plan with scheduled interviews
Step 2: Requirements Gathering and Documentation#
Objective: Capture detailed functional and non-functional requirements from all stakeholders. Use Business Requirements Document as the baseline.
Functional Requirements#
Document what the data product must do:
- Data transformations and calculations
- Expected outputs and formats
- User interface requirements (for dashboards/reports)
- Integration requirements with existing systems
Non-Functional Requirements#
Define how the data product should perform:
- Performance requirements (latency, throughput)
- Security requirements (encryption, access controls)
- Compliance requirements (GDPR, industry regulations)
- Scalability and availability needs
Use this Requirements Template to document your requirements
Metadata Requirements for FAIR Data Products#
To ensure Data Products are Findable, Accessible, Interoperable, and Reusable, follow these guidelines:
- Governance: Assign a Data Owner and Data Steward for every Data Product.
- Risk Assessment: Conduct and document risk and classification assessments (e.g., Data Classification, PII, GxP). Currently, use the standard risk assessment format for IT. A simplified version of a risk assessment for data products is planned for November linked to the entity in ServiceNOW.
- Data Product Attributes: Maintain complete metadata, including source and product IDs, descriptions, etc.
- Source Tagging: Tag source data with classification and domain tags from enterprise data guardrails.
- Lineage: Ensure data lineage from source to product is captured in NNDM, where all input and output ports are present, e.g., with linked sources, applications, and data contracts or consumer data products.
- Semantically Consistent: Use defined business terms from the business glossary; link data models to the enterprise model where relevant.
- Data Quality: Define and monitor data quality metrics (completeness, accuracy, timeliness). For more information and support, check the services and support from data quality here.
- Usage Map: Map producing and consuming systems; document interface agreements. To be defined further
- Implementation SOP: Ensure the implementation of data products is in alignment with the applicable SOPs, e.g., that data products are correctly classified according to Protecting and Handling Information SOP.
- Retention Policy: Define and document retention and disposal procedures aligned with corporate policies.
| Data Product Attributes | Description | Required/ Optional | Rationale |
|---|---|---|---|
| 1. Governance | |||
| Data Owner | Owner of Data Products | Required | To establish ownership and stewardship for Data Products critical for governance, maintenance, and enhancements of Data Products. LoBs can further customize the roles to required granularity, e.g., Data (product) Owner, Data (integrity) Owner. Key consideration is that Owner is ultimately accountable, while steward is responsible for operations and adherence to the requirement of CRUD operations. SME roles cover extended roles specific to an LoB (Data Modelers, ontologists, scientific stewards, researchers, etc.). |
| Data Steward | Data Steward responsible for maintaining the data product content and quality | Required | |
| Subject Matter Expert | Responsible for questions regarding the data asset or data product | Optional | |
| 2. Risk Assessment | |||
| Business Impact | Risk Classification (High, Medium, Low) in alignment with IT Security Risk assessment Framework | Required | Helps assess and manage the risks associated with a data product to protect NOVO NORDISK data and ensure compliance with regulations. |
| Data Sensitivity Classification | Data sensitivity classification (e.g., public, Internal Use, Confidential, Strictly Confidential) | Required | |
| Personal Identifiable Information (PII) | Specify if the data source contains Personally identifiable information (PII) (e.g., Yes or No) | Required | |
| GxP | Specify if the data source contains GxP data as part of a GxP validated process | Required | |
| 3. Attributes | |||
| Data Product ID | Global Unique and persistent identifier from Purview and NOVO NORDISK DM | Required | Identification and brief description of Data Product for quick understanding. |
| Data Product Name | A user-oriented name of the data product. Use plain English and avoid abbreviations | Required | |
| Data Product Description | A description of the data product in non-technical lingo, use plain English and do not use abbreviations that are more commonly used, e.g., PC, ATM, etc. | Required | |
| Source ID | Unique ID of the source system | Required | Identification and brief description of the source system where actual data is hosted and its owner. |
| Source System Name | Name of the software used to perform a function | Required | |
| IT-system owner | The owner of an IT system is a Line Manager who is responsible for one of the business processes supported by the IT system | Required | |
| Source Type | Type of source (e.g., Oracle, Azure SQL, SQL Server, ADLS Gen2, Databricks, PostgreSQL, Snowflake, Power BI) | Optional | Identifies the origin of the data product, helping consumers understand the underlying technology. |
| Source Connection Details | Connection details like S3 Bucket URL | Optional | Defines how data consumers can technically connect to the data source, e.g., PostgreSQL: Server, Port, DB Name. |
| Service Account | Azure AD identity used for accessing the source (if Azure-hosted) | Optional | To enforce secure authentication and ensure compliance with Novo Nordisk's security guidelines. |
| Access Credentials | Credentials (username/password) for accessing the data source (if not using a service principal) | Optional | |
| 4. Source Tagging | |||
| Tag Name | Name of tags in Data Sources (E.g., Data Quality tags, Business or Domain tags such as Financial Data, Clinical Data, etc.) | Optional | Helps categorize the Data Products. |
| 5. Lineage | |||
| Data Lineage | Lineage from source to data product in the data marketplace | Optional | Helps with auditing, troubleshooting, and understanding data flows. |
| 6. Semantically Consistent | |||
| Business Glossary | Definition of business terms used in Data Product in the glossary | Optional | Ensures common understanding of business terminology, enhancing communication and reducing ambiguity across teams. |
| Data Model | Data Model for Data Assets & Products linked to enterprise data model | Required | Helps understand the underlying structure of a Data Product. |
| 7. Data Quality | |||
| Data Quality Measures | Data should have data quality metrics and monitoring defined | Required | Ensures Data Products meet business and user needs. |
| 8. Usage Map | |||
| Interface Agreements | Data map over producing and consuming systems | Optional | Interface Agreements act as the foundation for standardized, secure, and governed data exchanges, enabling seamless interoperability between systems while ensuring compliance, trust, and usability of Data Products. |
| 9. Implementation SOP | Ensure Implementation of data product is in alignment with the applicable SoPs e.g., that data products are correctly classified according to Protecting and Handling Information SOP. | Optional | Ensure all compliance requirements are met. |
| 10. Retention Policy | |||
| Retention Time | Retention time for the content of the data | Required | Ensure compliance with legal and regulatory requirements and optimal data storage management. |
Expected Outcomes:
- Complete requirements documentation in your chosen format (ADO stories, TIMS requirements, or markdown).
- Risk assessments completed and documented.
- Metadata validated against FAIR Data Product requirements
Step 3: Data Modelling#
Objective: Capture data modelling requirements for conceptual and logical data models. Physical data models are purely design choice determined by underlying storage formats and data loading logic.
-
Conceptual Data Modelling(If present for data domain, can be re-used)
Hold "Alt" / "Option" to enable Pan & Zoom
- Identify and define key business entities within the data domain
- Capture their high-level relationships and dependencies
- Represent these entities and associations using any diagramming tool of choice, such as Miro, Whiteboard, etc. Optionally start in Erwin, as it preserves metadata and eases extension into the logical model
-
Logical Data Modelling(mandatory, will outline the schematic representation of the critical data elements and their relations)
Hold "Alt" / "Option" to enable Pan & Zoom
- Refer conceptual model and expand entities with attributes, data types, and business rules
- Define primary and foreign key relationships and cardinality
- Document assumptions and design decisions for traceability
- Represent the above in diagramming tool of choice, such as Miro, Whiteboard, etc. Optionally start in Erwin, as it preserves metadata
Step 4: Business Glossary and Data Contracts#
Objective: Establish standardized definitions and data agreements.
-
Create Business Glossary:
- Define all business terms and KPIs
- Establish calculation methods for metrics
- Document data lineage for key attributes
- Create managed tags for NNDM compliance
-
Develop Data Contracts:
- Ingestion contracts: Define source data expectations
- Processing contracts: Specify transformation rules
- Output contracts: Define target data product structure
Critical Success Factor
Ensure all stakeholders agree on business definitions before proceeding. Misaligned definitions are a leading cause of data product failures.
Expected Outcome: Approved business glossary and comprehensive data contracts
Step 5: Gap Analysis and Technical Assessment#
Objective: Identify potential implementation challenges and dependencies happening post the feasibility phase.
Note
These requirements go beyond the high-level feasibility phase. They cover the detailed specifications teams need when designing, modifying, or expanding the solution
Evaluate the following areas:
-
Data Availability, Volume And Sensitivity:
- Are required data sources and assets accessible? Can any part of the required data assets (columns or tables) change the data classification.
- Do source systems have necessary APIs or export capabilities?
- Has the data volume changed from during the initial feasibility?
- Is historical data available for the required timeframe?
-
Infrastructure Readiness:
- Are compute resources sufficient? (E.g. There is a change in data volume)
- Do you need additional storage capacity?
- Are networking connections established? (E.g. New Sources being added or new services which can act as new source are discovered)
-
Organizational Capabilities:
- Are governance processes established? (E.g. Data Classification has changed due to sensitive attributes being added which were not determined in feasibility, appropriate governance process is required in such cases)
- Is support structure in place? (E.g. If your product requires a 24*5 support but no team exists, either lower the criticality or establish support processes first.)
Gap Analysis Template
For each identified gap:
- Gap Description: What is missing?
- Impact: How does this affect delivery?
- Resolution: What needs to be done?
- Owner: Who is responsible for resolution?
- Timeline: When must this be resolved?
Expected Outcome: Documented gaps with resolution plans and updated project scope in the project management tool of choice for e.g. ADO.
Step 6: Scope Definition and Finalization#
Objective: Establish clear project boundaries and deliverable definitions.
-
Define what's included:
- Specific data sources to be integrated
- Transformations to be implemented
- Outputs to be delivered
- Support and maintenance responsibilities
-
Define what's excluded:
- Out-of-scope data sources or requirements
- Future enhancements not part of initial delivery
- Dependencies resolved by other teams
-
Create acceptance criteria for each major deliverable
- Establish success metrics for the overall data product
Expected Outcome: Signed-off requirements specification with clear scope boundaries
Step 7: dc-release/CI-CD Feature File Creation#
Objective: Translate requirements into dc-release-compatible feature files for CI/CD implementation.
- Structure feature files following dc-release standards
- Define scenarios for each major requirement
- Link to supporting documentation
Expected Outcome: Complete feature files ready for dc-release pipeline integration
Success Metrics & Checkpoints#
- Stakeholder Sign-off: All key stakeholders have approved their respective requirement areas
- Complete Documentation: Business glossary, data contracts, and technical specifications are documented
- Gap Resolution: All identified gaps have approved resolution plans with assigned owners
- Scope Clarity: Project boundaries are clearly defined and accepted by all parties
- dc-release Integration: Feature files are created and validated in the dc-release pipeline
- Compliance Review: Security and governance requirements have been reviewed and approved
Next Steps#
After completing requirements specification:
- Begin technical design based on approved requirements
- Start data marketplace registration process
- Initialize dc-release pipeline with your feature files
The next chapter will guide you through Data Marketplace Registration to ensure your data product is properly catalogued and discoverable.