Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Data Format Compatibility Guide

Focus mode
Data Format Compatibility Guide - Amazon SageMaker AI

This guide describes the data format types that are compatible with SageMaker Clarify processing jobs. The supported data format types include the file extensions, data structure, and specific requirements or restrictions for tabular, image, and time series datasets. This guide also shows how to check if your dataset conforms to these requirements.

At a high level, the SageMaker Clarify processing job follows the input–process–output model to compute bias metrics and feature attributions. Refer to the following examples for details.

The input to the SageMaker Clarify processing job consists of the following:

  • The dataset to be analyzed.

  • The analysis configuration. For more information about how to configure an analysis, see Analysis Configuration Files.

During the processing stage, SageMaker Clarify computes bias metrics and feature attributions. The SageMaker Clarify processing job completes the following steps in the backend:

  • The SageMaker Clarify processing job parses your analysis configuration and loads your dataset.

  • To compute post-training bias metrics and feature attributions, the job requires model predictions from your model. The SageMaker Clarify processing job serializes your data and sends it as a request to your model that is deployed on a SageMaker AI real-time inference endpoint. After that, the SageMaker Clarify processing job extracts predictions from the response.

  • The SageMaker Clarify processing job performs the bias and explainability analysis, and then it outputs the results.

For more information, see How SageMaker Clarify Processing Jobs Work .

The parameter that' you use to specify the format of the data depends on where the data is used in the processing flow as follows:

  • For an input dataset, use the dataset_type parameter to specify the format or MIME type.

  • For a request to an endpoint, use the content_type parameter to specify the format.

  • For a response from an endpoint, use the accept_type parameter to specify the format.

The input dataset, request, and the response to and from the endpoint don't require the same format. For example, you can use a Parquet dataset with a CSV request payload and a JSON Lines response payload given the following conditions.

  • Your analysis is configured correctly.

  • Your model supports the request and response formats.

Note

If content_type or accept_type are not provided, then the SageMaker Clarify container infers the content_type and accept_type.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.