Data quality checks - AWS Prescriptive Guidance

Data quality checks

Data quality is an integral yet often overlooked part of the data cleaning process. The following diagram shows how data quality checks fit into the data engineering automation and access control lifecycle.

Data quality diagram

The following table provides an overview of different data quality solutions based on use case.

Use case

Solution

Example

No-code solution to add column-level or table-level quality conditions

AWS Glue DataBrew

Checks if all column values are between 1 and 12, or if a table or column is empty

Custom code added to an AWS Glue job or a no-code solution (in preview) to add column-level or table-level quality conditions

AWS Glue Data Quality

Checks if the column first_name is not null, or if the column phone_number contains only numbers or a "+" operator and/or statistical functions, such as average or sum

Custom checks

ETL of choice, such as AWS Lambda, AWS Glue, or Amazon EMR

Checks if the value of column A is always greater than the corresponding value of column B and column C, or if the value of column continent is always geographically correct and derived from the city column

Sophisticated solution with a metrics report, constraint validation, and constraint suggestions

Deequ

Checks if the CompletenessConstraint for the Completeness of column metric review_id is equal to 1