Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Schema for Statistics (statistics.json file)

Focus mode
Schema for Statistics (statistics.json file) - Amazon SageMaker AI

The schema defined in the statistics.json file specifies the statistical parameters to be calculated for the baseline and data that is captured. It also configures the bucket to be used by KLL, a very compact quantiles sketch with lazy compaction scheme.

{ "version": 0, # dataset level stats "dataset": { "item_count": number }, # feature level stats "features": [ { "name": "feature-name", "inferred_type": "Fractional" | "Integral", "numerical_statistics": { "common": { "num_present": number, "num_missing": number }, "mean": number, "sum": number, "std_dev": number, "min": number, "max": number, "distribution": { "kll": { "buckets": [ { "lower_bound": number, "upper_bound": number, "count": number } ], "sketch": { "parameters": { "c": number, "k": number }, "data": [ [ num, num, num, num ], [ num, num ][ num, num ] ] }#sketch }#KLL }#distribution }#num_stats }, { "name": "feature-name", "inferred_type": "String", "string_statistics": { "common": { "num_present": number, "num_missing": number }, "distinct_count": number, "distribution": { "categorical": { "buckets": [ { "value": "string", "count": number } ] } } }, #provision for custom stats } ] }
Notes
  • The specified metrics are recognized by SageMaker AI in later visualization changes. The container can emit more metrics if required.

  • KLL sketch is the recognized sketch. Custom containers can write their own representation, but it won’t be recognized by SageMaker AI in visualizations.

  • By default, the distribution is materialized in 10 buckets. You can't change this.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.