Configure data input mode using the SageMaker Python SDK

SageMaker Python SDK provides the generic Estimator class and its variations for ML frameworks for launching training jobs. You can specify one of the data input modes while configuring the SageMaker AI Estimator class or the Estimator.fit method. The following code templates show the two ways to specify input modes.

To specify the input mode using the Estimator class


from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

estimator = Estimator(
    checkpoint_s3_uri='s3://amzn-s3-demo-bucket/checkpoint-destination/',
    output_path='s3://amzn-s3-demo-bucket/output-path/',
    base_job_name='job-name',
    input_mode='File'  # Available options: File | Pipe | FastFile
    ...
)

# Run the training job
estimator.fit(
    inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data/train")
)

For more information, see the sagemaker.estimator.Estimator class in the SageMaker Python SDK documentation.

To specify the input mode through the estimator.fit() method


from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

estimator = Estimator(
    checkpoint_s3_uri='s3://amzn-s3-demo-bucket/checkpoint-destination/',
    output_path='s3://amzn-s3-demo-bucket/output-path/',
    base_job_name='job-name',
    ...
)

# Run the training job
estimator.fit(
    inputs=TrainingInput(
        s3_data="s3://amzn-s3-demo-bucket/my-data/train",
        input_mode='File'  # Available options: File | Pipe | FastFile
    )
)

For more information, see the sagemaker.estimator.Estimator.fit class method and the sagemaker.inputs.TrainingInput class in the SageMaker Python SDK documentation.

Tip

To learn more about how to configure Amazon FSx for Lustre or Amazon EFS with your VPC configuration using the SageMaker Python SDK estimators, see Use File Systems as Training Inputs in the SageMaker AI Python SDK documentation.

Tip

The data input mode integrations with Amazon S3, Amazon EFS, and FSx for Lustre are recommended ways to optimally configure data source for the best practices. You can strategically improve data loading performance using the SageMaker AI managed storage options and input modes, but it's not strictly constrained. You can write your own data reading logic directly in your training container. For example, you can set to read from a different data source, write your own S3 data loader class, or use third-party frameworks' data loading functions within your training script. However, you must make sure that you specify the right paths that SageMaker AI can recognize.

Tip

If you use a custom training container, make sure you install the SageMaker training toolkit that helps set up the environment for SageMaker training jobs. Otherwise, you must specify the environment variables explicitly in your Dockerfile. For more information, see Create a container with your own algorithms and models.

For more information about how to set the data input modes using the low-level SageMaker APIs, see How Amazon SageMaker AI Provides Training Information, the CreateTrainingJob API, and the TrainingInputMode in AlgorithmSpecification.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Setting up training jobs to access datasets

Configure data input channel to use Amazon FSx for Lustre