AWS::DataBrew::Ruleset Rule
Represents a single data quality requirement that should be validated in the scope of this dataset.
Syntax
To declare this entity in your AWS CloudFormation template, use the following syntax:
JSON
{ "CheckExpression" :
String
, "ColumnSelectors" :[ ColumnSelector, ... ]
, "Disabled" :Boolean
, "Name" :String
, "SubstitutionMap" :[ SubstitutionValue, ... ]
, "Threshold" :Threshold
}
YAML
CheckExpression:
String
ColumnSelectors:- ColumnSelector
Disabled:Boolean
Name:String
SubstitutionMap:- SubstitutionValue
Threshold:Threshold
Properties
CheckExpression
-
The expression which includes column references, condition names followed by variable references, possibly grouped and combined with other conditions. For example,
(:col1 starts_with :prefix1 or :col1 starts_with :prefix2) and (:col1 ends_with :suffix1 or :col1 ends_with :suffix2)
. Column and value references are substitution variables that should start with the ':' symbol. Depending on the context, substitution variables' values can be either an actual value or a column name. These values are defined in the SubstitutionMap. If a CheckExpression starts with a column reference, then ColumnSelectors in the rule should be null. If ColumnSelectors has been defined, then there should be no columnn reference in the left side of a condition, for example,is_between :val1 and :val2
.Required: Yes
Type: String
Pattern:
^[><0-9A-Za-z_.,:)(!= ]+$
Minimum:
4
Maximum:
1024
Update requires: No interruption
ColumnSelectors
-
List of column selectors. Selectors can be used to select columns using a name or regular expression from the dataset. Rule will be applied to selected columns.
Required: No
Type: Array of ColumnSelector
Minimum:
1
Update requires: No interruption
Disabled
-
A value that specifies whether the rule is disabled. Once a rule is disabled, a profile job will not validate it during a job run. Default value is false.
Required: No
Type: Boolean
Update requires: No interruption
Name
-
The name of the rule.
Required: Yes
Type: String
Minimum:
1
Maximum:
128
Update requires: No interruption
SubstitutionMap
-
The map of substitution variable names to their values used in a check expression. Variable names should start with a ':' (colon). Variable values can either be actual values or column names. To differentiate between the two, column names should be enclosed in backticks, for example,
":col1": "`Column A`".
Required: No
Type: Array of SubstitutionValue
Update requires: No interruption
Threshold
-
The threshold used with a non-aggregate check expression. Non-aggregate check expressions will be applied to each row in a specific column, and the threshold will be used to determine whether the validation succeeds.
Required: No
Type: Threshold
Update requires: No interruption