DistinctValuesCount - AWS Glue

DistinctValuesCount

Checks the number of distinct values in a column against a given expression.

Syntax

DistinctValuesCount <COL_NAME> <EXPRESSION>
  • COL_NAME – The name of the column that you want to evaluate the data quality rule against.

    Supported column types: Any column type

  • EXPRESSION – An expression to run against the rule type response in order to produce a Boolean value. For more information, see Expressions.

Example: Distinct column value count

The following example rule checks that the column named State contains more than 3 distinct values.

DistinctValuesCount "State" > 3 DistinctValuesCount "Customer_ID" < 6 where "Customer_ID < 10"

Sample dynamic rules

  • DistinctValuesCount "colA" between avg(last(10))-1 and avg(last(10))+1

  • DistinctValuesCount "colA" <= index(last(10),2) + std(last(5))