Available checks
The following table lists references for all available conditions that can be used in your rules. Note that aggregated conditions cannot be combined with non-aggregated conditions in the same rule.
Note
For SDK users, to apply the same rule to multiple columns use the ColumnSelectors
attribute of a Rule and specify validated columns using either their names or a regular expression.
In this case, you should use implicit CheckExpression. For example, “> :val”
to compare values in each of the selected columns
with the provided value. DataBrew uses implicit syntax for defining
FilterExpression in dynamic datasets. If you want to specify column(s)
for each check individually, don't set the ColumnSelectors attribute. Instead, provide an explicit expression. For example,
“:col > :val”
as a CheckExpression in a Rule.
Condition type | Data quality check | Additional parameters | Comparison type | SDK syntax example |
---|---|---|---|---|
Aggregate dataset conditions | Number of rows | Numeric comparison against custom value |
|
|
Number of columns | Numeric comparison against custom value |
| ||
Duplicate rows |
Numeric comparison against custom value |
or
|
||
Aggregate column statistics conditions | Missing values | Numeric comparison against custom value |
or
|
|
Duplicate values | Numeric comparison against custom value |
or
| ||
Valid values |
Numeric comparison against custom value |
or
|
||
Distinct values | Numeric comparison against custom value |
or
| ||
Unique values |
Numeric comparison against custom value |
or
|
||
Outliers |
Z-score threshold | Numeric comparison against custom value |
or
|
|
Value distribution statistics |
Statistics name (see next table) | Numeric comparison against custom value |
or
Note See next table for possible |
|
Numerical statistics |
Statistics name (see next table) | Numeric comparison against custom value |
or
Note See next table for possible |
|
Non aggregate (accepts threshold) | Value is exactly | Exact comparison against a list of values |
|
|
Value is not exactly | Value shouldn't exactly match any value from a list |
| ||
String values | String comparison against custom value or other string column |
or
| ||
Numeric values | Numeric comparison against custom value or other numeric column |
or
| ||
Value string length | Numeric comparison against custom value or other numeric column |
or
|
Numeric comparisons
DataBrew supports the following operations for numeric comparison: Is equals (==), Is not equals (!=), Less than (<), Less than equals (<=), Greater than (>), Greater than equals (>=) and Is between (is_between :val1 and :val2).
String comparisons
The following string comparisons are supported: Starts with, Doesn’t start with, Ends with, Doesn’t end with, Contains, Doesn’t contain, Is equals, Is not equals, Matches, Doesn’t match.
The following table displays available statistics that you can use for Value distribution statistics and Numerical statistics:
Data quality check | Statistics name | Additional parameters | SDK syntax |
---|---|---|---|
Value distribution statistics | Min | "CheckExpression": "AGG(MAX) < :val",
"SubstitutionMap": {":val", "100"}
|
|
Max | "CheckExpression": "AGG(MIN) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Median | "CheckExpression": "AGG(MEDIAN) >= :val",
"SubstitutionMap": {":val", "50"}
|
||
Mean | "CheckExpression": "AGG(MEAN) <= :val",
"SubstitutionMap": {":val", "10"}
|
||
Mode | "CheckExpression": "AGG(MODE) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Standard deviation | "CheckExpression": "AGG(STANDARD_DEVIATION) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Entropy | "CheckExpression": "AGG(ENTROPY) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Numerical statistics | Sum | "CheckExpression": "AGG(SUM) > :val",
"SubstitutionMap": {":val", "0"}
|
|
Kurtosis | "CheckExpression": "AGG(KURTOSIS) > :val",
"SubstitutionMap": {":val", "0"}
| ||
Skewness | "CheckExpression": "AGG(SKEWNESS) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Variance | "CheckExpression": "AGG(VARIANCE) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Absolute deviation | "CheckExpression": "AGG(MEDIAN_ABSOLUTE_DEVIATION) > :val",
"SubstitutionMap": {":val", "0"}
|
||
Quantile | Quantile: one of '0.25', '0.5', '0.75' | "CheckExpression": "AGG(QUANTILE, :pct) > :val",
"SubstitutionMap": {":pct": "0.25", ":val", "0"}
|