PoC1 Data Quality Checks and Policies#
1. Checks before landing data to the Landing Zone(Source-to-Landing Data Quality Gate)#
- Incoming json structure verification against TODO: Provide a document specifying the structure to check against, must be approved by Data Provider
Policies#
If the check fails, partner gets error message from the API, the data is not landed.
Call status is saved to a log.
The content of erroneous request stored in error file in the Landing Zone (to /<data provider>/failed directory according and named by convention defined in Landing Zone Design).
2. Checks before Bronze layer (Landing-to-Bronze Data Quality Gate)#
- Validation of data types against data contract
- Validation of values of the attributes against reference tables
consent_status.statusagainstconsent_status_names.status_nameinsulin.insulinTypeagainstinsulin_types.insulin_type_namemedication.medicationTypeagainstmedication_types.medication_type_nameuser_profile.diabetesTypeagainstdiabetes_types.diabetes_type_nameworkout.typeagainstworkout_types.workout_type_name
- Primary Keys uniqueness
consent_status:userId, sys_received_atwalking:userId, date, sys_received_atworkout:id ,sys_received_atmeal:id, sys_received_atuser_profile:userId, sys_received_atsmart_pen:id, sys_received_atmedication:id, sys_received_atinsulin:id, sys_received_atsmart_pen_error_events:id, sys_received_at- Business Keys uniqueness
consent_status:userId, createdAtworkout:userId, type, startedAtmeal:userId, startedAtsmart_pen:userId, deviceInstance_id, serialNumbermedication:userId, medication_type, intakeAtinsulin:userId, insulinType, injectedAtsmart_pen_error_events:userId, smartPenId, createAt
Policies#
If any of the checks fails, a message is recorded to sys_dq_messages attribute of the record (for checks 3 and 4 - to all the records with duplicated keys):
1. Type violation: {Field} - {Type} expected, {Type} provided
* example: "Type violation: total_step - int expected, str provided"
2. Unexpected {Field} value: {Value}
* example: "Unexpected status value: cancelled"
3. Primary key duplication - the record with PK {...PK=values} already exists.
* example: "Primary key duplication - the record with PK (userId=13, sys_received_at=1747997372) already exists."
4. Business key duplication - the record with business key {...BK=values} already exists.
* example: "Business key duplication - the record with business key (userId=13, deviceInstanceId=31, serialNumber=13_31) already exists."
The same message is recorded to the job log.
sys_dq_status to be written warning value.
The job assigned "Ended with warning" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).
3. Checks before Silver layer (Bronze-to-Silver Data Quality Gate)#
sys_dq_statusattribute of the record must beok(Landing-to-Bronze Gate passed)- Foreign keys matching (reference check) - all the foreign keys must have matching key in the table referred.
- consent_status.user_id <- user_profile.user_id
- consent_status.consent_status_name_id <- common.consent_status_names.consent_status_name_id
- walking.user_id <- user_profile.user_id
- workout.user_id <- user_profile.user_id
- workout.workout_type_id <- common.workout_types.workout_type_id
- meal.user_id <- user_profile.user_id
- user_profile.diabetes_type_id <- common.diabetes_types.diabetes_type_id
- smart_pen.user_id <- user_profile.user_id
- medication.user_id <- user_profile.user_id
- medication.medication_type_id <- common.medication_types.medication_type_id
- insulin.user_id <- user_profile.user_id
- insulin.smart_pen_id <- smart_pen.smart_pen_id
- insulin.insulin_type_id <- common.insulin_types.insulin_type_id
- smart_pen_error_events.user_id <- user_profile.user_id
- smart_pen_error_events.smart_pen_id <- smart_pen.smart_pen_id
Policies#
- The record is not being saved to Silver. The job assigned "Ended with errors" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).
- A message is recorded to
sys_dq_messagesattribute of the record:Reference failure: {FK}={value} has no match in {table_name} table. Example:Reference failure: user_id=13 has no match in user_profile tablesys_dq_statusto be writtenwarningvalue.- If no other errors, the job assigned "Ended with warning" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).