Process multiple prompts with batch inference - Amazon Bedrock

Process multiple prompts with batch inference

With batch inference, you can submit multiple prompts and generate responses asynchronously. Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket. After defining model inputs in files you create, you upload the files to an S3 bucket. You then submit a batch inference request and specify the S3 bucket. After the job is complete, you can retrieve the output files from S3. You can use batch inference to improve the performance of model inference on large datasets.

Note

Batch inference isn't supported for provisioned models.

Refer to the following resources for general information about batch inference: