Skip to content

PoC1 Data Quality Checks and Policies#

1. Checks before landing data to the Landing Zone(Source-to-Landing Data Quality Gate)#

  1. Incoming json structure verification against TODO: Provide a document specifying the structure to check against, must be approved by Data Provider

Policies#

If the check fails, partner gets error message from the API, the data is not landed. Call status is saved to a log. The content of erroneous request stored in error file in the Landing Zone (to /<data provider>/failed directory according and named by convention defined in Landing Zone Design).


2. Checks before Bronze layer (Landing-to-Bronze Data Quality Gate)#

  1. Validation of data types against data contract
  2. Validation of values of the attributes against reference tables
    • consent_status.status against consent_status_names.status_name
    • insulin.insulinType against insulin_types.insulin_type_name
    • medication.medicationType against medication_types.medication_type_name
    • user_profile.diabetesType against diabetes_types.diabetes_type_name
    • workout.type against workout_types.workout_type_name
  3. Primary Keys uniqueness
  4. consent_status: userId, sys_received_at
  5. walking: userId, date, sys_received_at
  6. workout: id ,sys_received_at
  7. meal: id, sys_received_at
  8. user_profile: userId, sys_received_at
  9. smart_pen: id, sys_received_at
  10. medication: id, sys_received_at
  11. insulin: id, sys_received_at
  12. smart_pen_error_events: id, sys_received_at
  13. Business Keys uniqueness
  14. consent_status: userId, createdAt
  15. workout: userId, type, startedAt
  16. meal: userId, startedAt
  17. smart_pen: userId, deviceInstance_id, serialNumber
  18. medication: userId, medication_type, intakeAt
  19. insulin: userId, insulinType, injectedAt
  20. smart_pen_error_events: userId, smartPenId, createAt

Policies#

If any of the checks fails, a message is recorded to sys_dq_messages attribute of the record (for checks 3 and 4 - to all the records with duplicated keys): 1. Type violation: {Field} - {Type} expected, {Type} provided * example: "Type violation: total_step - int expected, str provided" 2. Unexpected {Field} value: {Value} * example: "Unexpected status value: cancelled" 3. Primary key duplication - the record with PK {...PK=values} already exists. * example: "Primary key duplication - the record with PK (userId=13, sys_received_at=1747997372) already exists." 4. Business key duplication - the record with business key {...BK=values} already exists. * example: "Business key duplication - the record with business key (userId=13, deviceInstanceId=31, serialNumber=13_31) already exists."

The same message is recorded to the job log.
sys_dq_status to be written warning value.
The job assigned "Ended with warning" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).


3. Checks before Silver layer (Bronze-to-Silver Data Quality Gate)#

  1. sys_dq_status attribute of the record must be ok (Landing-to-Bronze Gate passed)
  2. Foreign keys matching (reference check) - all the foreign keys must have matching key in the table referred.
  3. consent_status.user_id <- user_profile.user_id
  4. consent_status.consent_status_name_id <- common.consent_status_names.consent_status_name_id
  5. walking.user_id <- user_profile.user_id
  6. workout.user_id <- user_profile.user_id
  7. workout.workout_type_id <- common.workout_types.workout_type_id
  8. meal.user_id <- user_profile.user_id
  9. user_profile.diabetes_type_id <- common.diabetes_types.diabetes_type_id
  10. smart_pen.user_id <- user_profile.user_id
  11. medication.user_id <- user_profile.user_id
  12. medication.medication_type_id <- common.medication_types.medication_type_id
  13. insulin.user_id <- user_profile.user_id
  14. insulin.smart_pen_id <- smart_pen.smart_pen_id
  15. insulin.insulin_type_id <- common.insulin_types.insulin_type_id
  16. smart_pen_error_events.user_id <- user_profile.user_id
  17. smart_pen_error_events.smart_pen_id <- smart_pen.smart_pen_id

Policies#

  1. The record is not being saved to Silver. The job assigned "Ended with errors" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).
  2. A message is recorded to sys_dq_messages attribute of the record: Reference failure: {FK}={value} has no match in {table_name} table. Example: Reference failure: user_id=13 has no match in user_profile table
    • sys_dq_status to be written warning value.
    • If no other errors, the job assigned "Ended with warning" status, an email with issues listed is sent to Data Product support team (nlmm@novonordisk.com, nlmh@novonordisk.com, osyj@novonordisk.com).