Schema for Constraints (constraints.json file)
A constraints.json file is used to express the constraints that a dataset must satisfy. Amazon SageMaker Model Monitor containers can use the constraints.json file to evaluate datasets against. Prebuilt containers provide the ability to generate the constraints.json file automatically for a baseline dataset. If you bring your own container, you can provide it with similar abilities or you can create the constraints.json file in some other way. Here is the schema for the constraint file that the prebuilt container uses. Bring your own containers can adopt the same format or enhance it as required.
{ "version": 0, "features": [ { "name": "string", "inferred_type": "Integral" | "Fractional" | | "String" | "Unknown", "completeness": number, "num_constraints": { "is_non_negative": boolean }, "string_constraints": { "domains": [ "list of", "observed values", "for small cardinality" ] }, "monitoringConfigOverrides": {} } ], "monitoring_config": { "evaluate_constraints": "Enabled", "emit_metrics": "Enabled", "datatype_check_threshold": 0.1, "domain_content_threshold": 0.1, "distribution_constraints": { "perform_comparison": "Enabled", "comparison_threshold": 0.1, "comparison_method": "Simple"||"Robust", "categorical_comparison_threshold": 0.1, "categorical_drift_method": "LInfinity"||"ChiSquared" } } }
The monitoring_config
object contains options for monitoring
job for the feature. The following table describes each option.
Monitoring Constraints
Constraint | Description |
---|---|
evaluate_constraints |
When Valid values: Default: |
emit_metrics |
When Valid values: Default: |
datatype_check_threshold |
If the threshold is above the value of the
specified During the baseline step, the generated
constraints suggest the inferred data type for each
column. The Valid values: float Default: 0.1 |
domain_content_threshold |
If there are more unknown values for a String field in the current dataset than in the baseline dataset, this threshold can be used to dictate if it needs to be flagged as a violation. Valid values: float Default: 0.1 |
distribution_constraints |
perform_comparison
When Valid
values: Default:
|
comparison_threshold
If the threshold is above the value set for the
Valid values: float Default: 0.1 |
|
comparison_method
Whether to calculate Valid values:
|
|
categorical_comparison_threshold Optional. Sets a threshold for categorical features. If the value in the dataset exceeds the threshold that you set, a violation is recorded in the violation report. Valid values: float Default: The value assigned to the
|
|
categorical_drift_method Optional. For categorical features, specifies the computation method used to detect distribution drift. If you don't set this parameter, the K-S (LInfinity) test is used. Valid Values: Default:
|