Create a batch inference job - Amazon Bedrock

Create a batch inference job

After you've set up an Amazon S3 bucket with files for running model inference, you can create a batch inference job.

Note

To submit a batch inference job using a VPC, you must use the API. Select the API tab to learn how to include the VPC configuration.

To learn how to create a batch inference job, select the tab corresponding to your method of choice and follow the steps:

Console
To create a batch inference job
  1. Sign in to the AWS Management Console using an IAM role with Amazon Bedrock permissions, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/.

  2. From the left navigation pane, select Batch inference.

  3. In the Batch inference jobs section, choose Create job.

  4. In the Job details section, give the batch inference job a Job name and select a model to use for the batch inference job by choosing Select model.

  5. In the Input data section, choose Browse S3 and select the S3 location containing the files for your batch inference job. Check that the files conform to the format described in Format and upload your batch inference data.

    Note

    If the input data is in an S3 bucket that belongs to a different account from the one from which you're submitting the job, you must use the API to submit the batch inference job. To learn how to do this, select the API tab above.

  6. In the Output data section, choose Browse S3 and select an S3 location to store the outtput files from your batch inference job. By default, the output data will be encrypted by an AWS managed key. To choose a custom KMS key, select Customize encryption settings (advanced) and choose a key. For more information about encryption of Amazon Bedrock resources and setting up a custom KMS key see Data encryption.

    Note

    If you plan to write the output data to an S3 bucket that belongs to a different account from the one from which you're submitting the job, you must use the API to submit the batch inference job. To learn how to do this, select the API tab above.

  7. In the Service access section, select one of the following options:

    • Use an existing service role – Select a service role from the drop-down list. For more information on setting up a custom role with the appropriate permissions, see Required permissions for batch inference.

    • Create and use a new service role – Enter a name for the service role.

  8. (Optional) To associate tags with the batch inference job, expand the Tags section and add a key and optional value for each tag. For more information, see Tagging Amazon Bedrock resources.

  9. Choose Create batch inference job.

API

To create a batch inference job, send a CreateModelInvocationJob request (see link for request and response formats and field details) with an Amazon Bedrock control plane endpoint.

The following fields are required:

Field Use case
jobName To specify a name for the job.
roleArn To specify the Amazon Resource Name (ARN) of the service role with permissions to create and manage the job. For more information, see Create a service role for batch inference.
modelId To specify the ID or ARN of the model to use in inference.
inputDataConfig To specify the S3 location containing the prompts and configurations to submit to the job. For more information, see Format and upload your batch inference data.
outputDataConfig To specify the S3 location to write the model responses to.

The following fields are optional:

Field Use case
timeoutDurationInHours To specify the duration in hours after which the job will time out.
tags To specify any tags to associate with the job. For more information, see Tagging Amazon Bedrock resources.
vpcConfig To specify the VPC configuration to use to protect your data during the job. For more information, see Protect batch inference jobs using a VPC.
clientRequestToken To ensure the API request completes only once. For more information, see Ensuring idempotency.

The response returns a jobArn that you can use to refer to the job when carrying out other batch inference-related API calls.