

# AI & machine learning
<a name="machinelearning-pattern-list"></a>

**Topics**
+ [Associate an AWS CodeCommit repository in one AWS account with Amazon SageMaker AI Studio Classic in another account](associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.md)
+ [Automatically extract content from PDF files using Amazon Textract](automatically-extract-content-from-pdf-files-using-amazon-textract.md)
+ [Build a cold start forecasting model by using DeepAR for time series in Amazon SageMaker AI Studio Lab](build-a-cold-start-forecasting-model-by-using-deepar.md)
+ [Build an MLOps workflow by using Amazon SageMaker AI and Azure DevOps](build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops.md)
+ [Configure model invocation logging in Amazon Bedrock by using AWS CloudFormation](configure-bedrock-invocation-logging-cloudformation.md)
+ [Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions](create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions.md)
+ [Use Amazon Bedrock agents to automate creation of access entry controls in Amazon EKS through text-based prompts](using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.md)
+ [Deploy a RAG use case on AWS by using Terraform and Amazon Bedrock](deploy-rag-use-case-on-aws.md)
+ [Deploy preprocessing logic into an ML model in a single endpoint using an inference pipeline in Amazon SageMaker](deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker.md)
+ [Deploy real-time coding security validation by using an MCP server with Kiro and other coding assistants](deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.md)
+ [Develop advanced generative AI chat-based assistants by using RAG and ReAct prompting](develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.md)
+ [Develop a fully automated chat-based assistant by using Amazon Bedrock agents and knowledge bases](develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases.md)
+ [Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe](document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.md)
+ [Generate personalized and re-ranked recommendations using Amazon Personalize](generate-personalized-and-re-ranked-recommendations-using-amazon-personalize.md)
+ [Streamline machine learning workflows from local development to scalable experiments by using SageMaker AI and Hydra](streamline-machine-learning-workflows-by-using-amazon-sagemaker.md)
+ [Translate natural language into query DSL for OpenSearch and Elasticsearch queries](translate-natural-language-query-dsl-opensearch-elasticsearch.md)
+ [Use Amazon Q Developer as a coding assistant to increase your productivity](use-q-developer-as-coding-assistant-to-increase-productivity.md)
+ [Use SageMaker Processing for distributed feature engineering of terabyte-scale ML datasets](use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets.md)
+ [Visualize AI/ML model results using Flask and AWS Elastic Beanstalk](visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk.md)
+ [More patterns](machinelearning-more-patterns-pattern-list.md)

# Associate an AWS CodeCommit repository in one AWS account with Amazon SageMaker AI Studio Classic in another account
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account"></a>

*Laurens van der Maas and Aubrey Oosthuizen, Amazon Web Services*

## Summary
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-summary"></a>

This pattern provides instructions and code on how to associate an AWS CodeCommit repository in one AWS account (Account A) with Amazon SageMaker AI Studio Classic in another AWS account (Account B). To set up the association, you must create an AWS Identity and Access Management (IAM) policy and role in Account A and an IAM inline policy in Account B. Then, you use a shell script to clone the CodeCommit repository from Account A to Amazon SageMaker AI Classic in Account B.

## Prerequisites and limitations
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-prereqs"></a>

**Prerequisites **
+ Two [AWS accounts](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/), one containing the CodeCommit repository and the other containing a SageMaker AI Domain with a user
+ Provisioned [SageMaker AI Domain and user](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html), with internet access or access to CodeCommit and AWS Security Token Service (AWS STS) through virtual private network (VPC) endpoints
+ A basic understanding of [IAM](https://docs.aws.amazon.com/iam/?id=docs_gateway)
+ A basic understanding of [SageMaker AI Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html)
+ A basic understanding of [Git](https://git-scm.com/) and [CodeCommit ](https://docs.aws.amazon.com/codecommit/index.html)

**Limitations **

This pattern applies to SageMaker AI Studio Classic only, not to RStudio on Amazon SageMaker AI.

## Architecture
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-architecture"></a>

**Technology stack**
+ Amazon SageMaker AI
+ Amazon SageMaker AI Studio Classic
+ AWS CodeCommit
+ AWS Identity and Access Management (IAM) 
+ Git

**Target architecture**

The following diagram shows an architecture that associates a CodeCommit repository from Account A to SageMaker AI Studio Classic in Account B.

![\[Architecture diagram for cross-account association\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/d40df9eb-6ee2-4cb8-8257-051fa624e52a/images/abb89a66-fc8f-4e72-8f45-f0f44c2ec6ce.png)


The diagram shows the following workflow:

1. A user assumes the `MyCrossAccountRepositoryContributorRole` role in Account A through the `sts:AssumeRole` role, while using the SageMaker AI execution role in SageMaker AI Studio Classic in Account B. The assumed role includes the CodeCommit permissions to clone and interact with the specified repository.

1. The user performs Git commands from the system terminal in SageMaker AI Studio Classic.

**Automation and scale**

This pattern consists of manual steps that can be automated by using the [AWS Cloud Development Kit (AWS CDK)](https://docs.aws.amazon.com/cdk/?id=docs_gateway), [AWS CloudFormation](https://docs.aws.amazon.com/cloudformation/?id=docs_gateway), or [Terraform](https://www.terraform.io/).

## Tools
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-tools"></a>

**AWS tools**
+ [Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/?id=docs_gateway) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [Amazon SageMaker AI Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models.
+ [AWS CodeCommit](https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html) is a version control service that helps you privately store and manage Git repositories, without needing to manage your own source control system.

  **Notice**: AWS CodeCommit is no longer available to new customers. Existing customers of AWS CodeCommit can continue to use the service as normal. [Learn more](https://aws.amazon.com/blogs/devops/how-to-migrate-your-aws-codecommit-repository-to-another-git-provider/)
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.

**Other tools**
+ [Git](https://git-scm.com/) is a distributed version-control system for tracking changes in source code during software development.

## Epics
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-epics"></a>

### Create an IAM policy and IAM role in Account A
<a name="create-an-iam-policy-and-iam-role-in-account-a"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create an IAM policy for repository access in Account A. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.html)It's a best practice to restrict the scope of your IAM policies to the minimum required permissions for your use case. | AWS DevOps | 
| Create an IAM role for repository access in Account A. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.html) | AWS DevOps | 

### Create an IAM inline policy in Account B
<a name="create-an-iam-inline-policy-in-account-b"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Attach an inline policy to the execution role that's attached to your SageMaker Domain user in Account B. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.html) | AWS DevOps | 

### Clone the repository in SageMaker AI Studio Classic for Account B
<a name="clone-the-repository-in-sm-studio-classic-for-account-b"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create the shell script in SageMaker AI Studio Classic in Account B. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.html) | AWS DevOps | 
| Invoke the shell script from the system terminal. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account.html)You have cloned your CodeCommit repository in a SageMaker AI Studio cross-account. You can now perform all Git commands from the system terminal. | AWS DevOps | 

## Additional information
<a name="associate-an-aws-codecommit-repository-in-one-aws-account-with-sagemaker-studio-in-another-account-additional"></a>

**Example IAM policy**

If you use this example policy, do the following:
+ Replace `<CodeCommit_Repository_Region>` with the AWS Region for the repository.
+ Replace `<Account_A_ID>` with the account ID for Account A.
+ Replace `<CodeCommit_Repository_Name>` with the name of your CodeCommit repository in Account A.

```
{
"Version": "2012-10-17",		 	 	 
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "codecommit:BatchGet*",
            "codecommit:Create*",
            "codecommit:DeleteBranch",
            "codecommit:Get*",
            "codecommit:List*",
            "codecommit:Describe*",
            "codecommit:Put*",
            "codecommit:Post*",
            "codecommit:Merge*",
            "codecommit:Test*",
            "codecommit:Update*",
            "codecommit:GitPull",
            "codecommit:GitPush"
        ],
        "Resource": [
            "arn:aws:codecommit:<CodeCommit_Repository_Region>:<Account_A_ID>:<CodeCommit_Repository_Name>"
        ]
    }
]
}
```

**Example SageMaker AI shell script**

If you use this example script, do the following:
+ Replace `<Account_A_ID>` with the account ID for Account A.
+ Replace `<Account_A_Role_Name>` with the name of the IAM role that you created earlier.
+ Replace `<CodeCommit_Repository_Region>` with the AWS Region for the repository.
+ Replace `<CodeCommit_Repository_Name>` with the name of your CodeCommit repository in Account A.

```
#!/usr/bin/env bash
#Launch from system terminal
pip install --quiet git-remote-codecommit

mkdir -p ~/.aws
touch ~/.aws/config

echo "[profile CrossAccountAccessProfile]
region = <CodeCommit_Repository_Region>
credential_source=EcsContainer
role_arn = arn:aws:iam::<Account_A_ID>:role/<Account_A_Role_Name>
output = json" > ~/.aws/config

echo '[credential "https://git-codecommit.<CodeCommit_Repository_Region>.amazonaws.com"]
        helper = !aws codecommit credential-helper $@ --profile CrossAccountAccessProfile
        UseHttpPath = true' > ~/.gitconfig
        
git clone codecommit::<CodeCommit_Repository_Region>://CrossAccountAccessProfile@<CodeCommit_Repository_Name>
```

# Automatically extract content from PDF files using Amazon Textract
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract"></a>

*Tianxia Jia, Amazon Web Services*

## Summary
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-summary"></a>

Many organizations need to extract information from PDF files that are uploaded to their business applications. For example, an organization could need to accurately extract information from tax or medical PDF files for tax analysis or medical claim processing.

On the Amazon Web Services (AWS) Cloud, Amazon Textract automatically extracts information (for example, printed text, forms, and tables) from PDF files and produces a JSON-formatted file that contains information from the original PDF file. You can use Amazon Textract in the AWS Management Console or by implementing API calls. We recommend that you use [programmatic API calls](https://aws.amazon.com/textract/faqs/) to scale and automatically process large numbers of PDF files.

When Amazon Textract processes a file, it creates the following list of `Block` objects: pages, lines and words of text, forms (key-value pairs), tables and cells, and selection elements. Other object information is also included, for example, [bounding boxes](https://docs.aws.amazon.com/textract/latest/dg/API_BoundingBox.html), confidence intervals, IDs, and relationships. Amazon Textract extracts the content information as strings. Correctly identified and transformed data values are required because they can be more easily used by your downstream applications. 

This pattern describes a step-by-step workflow for using Amazon Textract to automatically extract content from PDF files and process it into a clean output. The pattern uses a template matching technique to correctly identify the required field, key name, and tables, and then applies post-processing corrections to each data type. You can use this pattern to process different types of PDF files and you can then scale and automate this workflow to process PDF files that have an identical format.   

## Prerequisites and limitations
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-prereqs"></a>

**Prerequisites **
+ An active AWS account.
+ An existing Amazon Simple Storage Service (Amazon S3) bucket to store the PDF files after they are converted to JPEG format for processing by Amazon Textract. For more information about S3 buckets, see [Buckets overview](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html) in the Amazon S3 documentation.
+ The `Textract_PostProcessing.ipynb` Jupyter notebook (attached), installed and configured. For more information about Jupyter notebooks, see [Create a Jupyter notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html) in the Amazon SageMaker documentation.
+ Existing PDF files that have an identical format.
+ An understanding of Python.

**Limitations **
+ Your PDF files must be of good quality and clearly readable. Native PDF files are recommended, but you can use scanned documents that are converted to a PDF format if all the individual words are clear. For more information about this, see [PDF document preprocessing with Amazon Textract: Visuals detection and removal](https://aws.amazon.com/blogs/machine-learning/process-text-and-images-in-pdf-documents-with-amazon-textract/) on the AWS Machine Learning Blog.
+ For multipage files, you can use an asynchronous operation or split the PDF files into a single page and use a synchronous operation. For more information about these two options, see [Detecting and analyzing text in multipage documents](https://docs.aws.amazon.com/textract/latest/dg/async.html) and [Detecting and analyzing text in single-page documents](https://docs.aws.amazon.com/textract/latest/dg/sync.html) in the Amazon Textract documentation.

## Architecture
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-architecture"></a>

This pattern’s workflow first runs Amazon Textract on a sample PDF file (*First-time run*) and then runs it on PDF files that have an identical format to the first PDF (*Repeat run*). The following diagram shows the combined *First-time run* and *Repeat run *workflow that automatically and repeatedly extracts content from PDF files with identical formats.

![\[Using Amazon Textract to extract content from PDF files\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/2d724523-2cab-42c9-a773-65857014d9ec/images/9e20070f-3e0c-46aa-aa98-a8b1eb3395dc.png)


 

The diagram shows the following workflow for this pattern:

1. Convert a PDF file into JPEG format and store it in an S3 bucket. 

1. Call the Amazon Textract API and parse the Amazon Textract response JSON file. 

1. Edit the JSON file by adding the correct `KeyName:DataType` pair for each required field. Create a `TemplateJSON` file for the *Repeat run* stage.

1. Define the post-processing correction functions for each data type (for example, float, integer, and date).

1. Prepare the PDF files that have an identical format to your first PDF file.

1. Call the Amazon Textract API and parse the Amazon Textract response JSON.

1. Match the parsed JSON file with the `TemplateJSON` file.

1. Implement post-processing corrections.

The final JSON output file has the correct `KeyName` and `Value` for each required field.

**Target technology stack  **
+ Amazon SageMaker 
+ Amazon S3 
+ Amazon Textract

**Automation and scale**

You can automate the *Repeat run* workflow by using an AWS Lambda function that initiates Amazon Textract when a new PDF file is added to Amazon S3. Amazon Textract then runs the processing scripts and the final output can be saved to a storage location. For more information about this, see [Using an Amazon S3 trigger to invoke a Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html) in the Lambda documentation.

## Tools
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-tools"></a>
+ [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) is a fully managed ML service that helps you to quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
+ [Amazon Textract](https://docs.aws.amazon.com/textract/latest/dg/what-is.html) makes it easy to add document text detection and analysis to your applications.

## Epics
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-epics"></a>

### First-time run
<a name="first-time-run"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Convert the PDF file. | Prepare the PDF file for your first-time run by splitting it into a single page and converting it into JPEG format for the Amazon Textract [synchronous operation](https://docs.aws.amazon.com/textract/latest/dg/sync.html) (`Syn API`).You can also use the Amazon Textract [asynchronous operation](https://docs.aws.amazon.com/textract/latest/dg/async.html) (`Asyn API`) for multipage PDF files. | Data scientist, Developer | 
| Parse the Amazon Textract response JSON. | Open the `Textract_PostProcessing.ipynb` Jupyter notebook (attached) and call the Amazon Textract API by using the following code:<pre>response = textract.analyze_document(<br />Document={<br />        'S3Object': {<br />            'Bucket': BUCKET,<br />            'Name': '{}'.format(filename)<br />                    }<br />                },<br />        FeatureTypes=["TABLES", "FORMS"])</pre>Parse the response JSON into a form and table by using the following code:<pre>parseformKV=form_kv_from_JSON(response)<br />parseformTables=get_tables_fromJSON(response)</pre> | Data scientist, Developer | 
| Edit the TemplateJSON file. | Edit the parsed JSON for each `KeyName` and corresponding `DataType` (for example, string, float, integer, or date), and table headers (for example, `ColumnNames` and `RowNames`).This template is used for each individual PDF file type, which means that the template can be reused for PDF files that have an identical format. | Data scientist, Developer | 
| Define the post-processing correction functions. | The values in Amazon Textract's response for the `TemplateJSON` file are strings. There is no differentiation for date, float, integer, or currency. These values must be converted to the correct data type for your downstream use case. Correct each data type according to the `TemplateJSON` file by using the following code:<pre>finalJSON=postprocessingCorrection(parsedJSON,templateJSON)</pre> | Data scientist, Developer | 

### Repeat run
<a name="repeat-run"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Prepare the PDF files. | Prepare the PDF files by splitting them into a single page and converting them into JPEG format for the Amazon Textract [synchronous operation](https://docs.aws.amazon.com/textract/latest/dg/sync.html) (`Syn API`).You can also use the Amazon Textract [asynchronous operation](https://docs.aws.amazon.com/textract/latest/dg/async.html) (`Asyn API`) for multipage PDF files. | Data scientist, Developer | 
| Call the Amazon Textract API. | Call the Amazon Textract API by using the following code:<pre>response = textract.analyze_document(<br />        Document={<br />        'S3Object': {<br />            'Bucket': BUCKET,<br />            'Name': '{}'.format(filename)<br />                    }<br />                },<br />        FeatureTypes=["TABLES", "FORMS"])</pre> | Data scientist, Developer | 
| Parse the Amazon Textract response JSON. | Parse the response JSON into a form and table by using the following code:<pre>parseformKV=form_kv_from_JSON(response)<br />parseformTables=get_tables_fromJSON(response)</pre> | Data scientist, Developer | 
| Load the TemplateJSON file and match it with the parsed JSON. | Use the `TemplateJSON` file to extract the correct key-value pairs and table by using the following commands:<pre>form_kv_corrected=form_kv_correction(parseformKV,templateJSON)<br />form_table_corrected=form_Table_correction(parseformTables, templateJSON)<br />form_kv_table_corrected_final={**form_kv_corrected , **form_table_corrected}</pre> | Data scientist, Developer | 
| Post-processing corrections. | Use `DataType` in the `TemplateJSON` file and post-processing functions to correct data by using the following code: <pre>finalJSON=postprocessingCorrection(form_kv_table_corrected_final,templateJSON)</pre> | Data scientist, Developer | 

## Related resources
<a name="automatically-extract-content-from-pdf-files-using-amazon-textract-resources"></a>
+ [Automatically extract text and structured data from documents with Amazon Textract](https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/)
+ [Extract text and structured data with Amazon Textract ](https://aws.amazon.com/getting-started/hands-on/extract-text-with-amazon-textract/)
+ [Amazon Textract resources](https://aws.amazon.com/textract/resources/?blog-posts-cards.sort-by=item.additionalFields.createdDate&blog-posts-cards.sort-order=desc)

## Attachments
<a name="attachments-2d724523-2cab-42c9-a773-65857014d9ec"></a>

To access additional content that is associated with this document, unzip the following file: [attachment.zip](samples/p-attach/2d724523-2cab-42c9-a773-65857014d9ec/attachments/attachment.zip)

# Build a cold start forecasting model by using DeepAR for time series in Amazon SageMaker AI Studio Lab
<a name="build-a-cold-start-forecasting-model-by-using-deepar"></a>

*Ivan Cui and Eyal Shacham, Amazon Web Services*

## Summary
<a name="build-a-cold-start-forecasting-model-by-using-deepar-summary"></a>

Whether you’re allocating resources more efficiently for web traffic, forecasting patient demand for staffing needs, or anticipating sales of a company’s products, forecasting is an essential tool. Cold start forecasting builds forecasts for a time series that has little historical data, such as a new product that just entered the retail market. This pattern uses the Amazon SageMaker AI DeepAR forecasting algorithm to train a cold start forecasting model and demonstrates how to perform forecasting on cold start items.

 

[DeepAR](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html) is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN). DeepAR takes the approach of training a single model jointly over all of the time series of related products’ time series. 

Traditional time series forecasting methods such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS) rely heavily on historical time series of each individual product. Therefore, those methods aren’t effective for cold start forecasting. When your dataset contains hundreds of related time series, DeepAR outperforms the standard ARIMA and ETS methods. You can also use the trained model to generate forecasts for new time series that are similar to the time series that it has been trained on.

## Prerequisites and limitations
<a name="build-a-cold-start-forecasting-model-by-using-deepar-prereqs"></a>

**Prerequisites**
+ An active AWS account.
+ An Amazon SageMaker AI [domain.](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html)
+ An [Amazon SageMaker AI Studio Lab](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-lab.html) or Jupiter lab application.
+ An Amazon Simple Storage Service (Amazon S3) bucket with read and write permissions.
+ Knowledge of programming in Python.
+ Knowledge of using a Jupyter notebook.

**Limitations**
+ Invoking the forecast model without any historical data points will return an error. Invoking the model with minimal historical data points will return inaccurate predictions with high confidence. This pattern suggests an approach to resolving these known limitations of cold start forecasting.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html), and choose the link for the service.

**Product versions**
+ Python version 3.10 or later.
+ The pattern’s notebook was tested in Amazon SageMaker AI Studio on an ml.t3.medium instance with the Python 3 (Data Science) kernel.

## Architecture
<a name="build-a-cold-start-forecasting-model-by-using-deepar-architecture"></a>

The following diagram shows the workflow and architecture components for this pattern.

![\[Workflow to build a cold start forecasting model using SageMaker and Amazon S3.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/98d021d3-96d2-40a9-b0ce-717934652173/images/d97d66a0-8eef-4d30-ac5f-4c6c79cf6c9f.png)


The workflow performs the following tasks:

1. Input files of training and testing data are synthesized and then uploaded to an Amazon S3 bucket. This data includes multiple time series with categorical and dynamic features, along with target values (to be predicted). The Jupyter notebook visualizes the data to better understand the requirements of the training data and the expected predicted values.

1. A hyperparameters tuner job is created to train the model and find the best model based on predefined metrics.

1. The input files are downloaded from the Amazon S3 bucket to each instance of the hyperparameters tuning jobs.

1. After the tuner job selects the best model based on the tuner’s predefined threshold, the model is deployed as a SageMaker AI endpoint.

1. The deployed model is then ready to be invoked where its predictions are validated against the test data.

The notebook demonstrates how well the model predicts the target values when an adequate number of historical data points are available. However, when we invoke the model with fewer historical data points (which represent a cold product), the model’s predictions do not match the original testing data even within the model’s confidence levels. In the pattern, a new model is built for cold products where its initial context length (predicted points) is defined as the amount of available historical points, and a new model is trained iteratively as new data points are acquired. The notebook shows that the model will have accurate predictions as long as the amount of historical data points is close to its context length.

## Tools
<a name="build-a-cold-start-forecasting-model-by-using-deepar-tools"></a>

**AWS services**
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
+ [Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/?id=docs_gateway) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [Amazon SageMaker AI Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

**Other tools**
+ [Python](https://www.python.org/) is a general-purpose computer programming language.

**Code repository**

The code for this pattern is available in the GitHub [DeepAR-ColdProduct-Pattern](https://github.com/aws-samples/DeepAR-ColdProduct-Pattern) repository.

## Best practices
<a name="build-a-cold-start-forecasting-model-by-using-deepar-best-practices"></a>
+ Train your model in a virtual environment, and always use version control for the highest reproducibility effort.
+ Include as many high-quality categorical features as you can to get the highest predictive model.
+ Make sure that the metadata contains similar categorical items in order for the model to infer cold start products predictions adequately.
+ Run a hyperparameter tuning job to get the highest predictive model.
+ In this pattern, the model you build has a context length of 24 hours, which means that it will predict the next 24 hours. If you try to predict the next 24 hours when you have less than 24 hours of data historically, the model’s prediction accuracy degrades linearly based on the amount of historical data points. To mitigate this issue, create a new model for each set of historical data points until this number reaches the desired prediction (context) length. For example, start with a context length model of 2 hours, then increase the model progressively to 4 hours, 8 hours, 16 hours, and 24 hours.

## Epics
<a name="build-a-cold-start-forecasting-model-by-using-deepar-epics"></a>

### Start your SageMaker AI Studio Classic application
<a name="start-your-sm-studio-classic-application"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Start your notebook environment. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-a-cold-start-forecasting-model-by-using-deepar.html)For more information, see [Launch Amazon SageMaker AI Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) in the SageMaker AI documentation. | Data scientist | 

### Create and activate the notebook
<a name="create-and-activate-the-notebook"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Set up your virtual environment for model training. | To set up your virtual environment for model training, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-a-cold-start-forecasting-model-by-using-deepar.html)For more information, see [Upload Files to SageMaker AI Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks-files.html) in the SageMaker AI documentation. | Data scientist | 
| Create and validate a forecasting model. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-a-cold-start-forecasting-model-by-using-deepar.html) | Data scientist | 

## Related resources
<a name="build-a-cold-start-forecasting-model-by-using-deepar-resources"></a>
+ [DeepAR Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html)
+ [Forecasting demand for new product introductions by using AWS machine learning services](https://docs.aws.amazon.com/prescriptive-guidance/latest/forecast-demand-new-product/introduction.html)
+ [Launch Amazon SageMaker AI Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)
+ [Use the SageMaker AI DeepAR forecasting algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)

# Build an MLOps workflow by using Amazon SageMaker AI and Azure DevOps
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops"></a>

*Deepika Kumar, Sara van de Moosdijk, and Philips Kokoh Prasetyo, Amazon Web Services*

## Summary
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-summary"></a>

Machine learning operations (MLOps) is a set of practices that automate and simplify machine learning (ML) workflows and deployments. MLOps focuses on automating the ML lifecycle. It helps ensure that models are not just developed but also deployed, monitored, and retrained systematically and repeatedly. It brings DevOps principles to ML. MLOps results in faster deployment of ML models, better accuracy over time, and stronger assurance that they provide real business value.

Organizations often have existing DevOps tools and data storage solutions before starting their MLOps journey. This pattern showcases how to harness the strengths of both Microsoft Azure and AWS. It helps you integrate Azure DevOps with Amazon SageMaker AI to create an MLOps workflow.

The solution simplifies working between Azure and AWS. You can use Azure for development and AWS for machine learning. It promotes an effective process for making machine learning models from start to finish, including data handling, training, and deployment on AWS. For efficiency, you manage these processes through Azure DevOps pipelines. The solution is applicable to foundation model operations (FMOps) and large language model operations (LLMOps) in generative AI, which includes fine-tuning, vector databases, and prompt management.

## Prerequisites and limitations
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-prereqs"></a>

**Prerequisites**
+ **Azure subscription** – Access to Azure services, such as Azure DevOps, for setting up the continuous integration and continuous deployment (CI/CD) pipelines.
+ **Active AWS account** – Permissions to use the AWS services used in this pattern.
+ **Data** – Access to historical data for training the machine learning model.
+ **Familiarity with ML concepts** – Understanding of Python, Jupyter Notebooks, and machine learning model development.
+ **Security configuration** – Proper configuration of roles, policies, and permissions across both Azure and AWS to ensure secure data transfer and access.
+ **(Optional) Vector database **– If you're using a Retrieval Augmented Generation (RAG) approach and a third-party service for the vector database, you need access to the external vector database.

**Limitations**
+ This guidance does not discuss secure cross-cloud data transfers. For more information about cross-cloud data transfers, see [AWS Solutions for Hybrid and Multicloud](https://aws.amazon.com/hybrid-multicloud/).
+ Multicloud solutions may increase latency for real-time data processing and model inference.
+ This guidance provides one example of a multi-account MLOps architecture. Adjustments are necessary based on your machine learning and AWS strategy.
+ This guidance does not describe the use of AI/ML services other than Amazon SageMaker AI.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see the [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html) page, and choose the link for the service.

## Architecture
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-architecture"></a>

**Target architecture**

The target architecture integrates Azure DevOps with Amazon SageMaker AI, creating a cross-cloud ML workflow. It uses Azure for CI/CD processes and SageMaker AI for ML model training and deployment. It outlines the process of obtaining data (from sources such as Amazon S3, Snowflake, and Azure Data Lake) through model building and deployment. Key components include CI/CD pipelines for model building and deployment, data preparation, infrastructure management, and Amazon SageMaker AI for training and fine-tuning, evaluation, and deployment of ML models. This architecture is designed to provide efficient, automated, and scalable ML workflows across cloud platforms.

![\[Architecture diagram of an MLOps workflow that uses Azure Devops and SageMaker.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/95fdf414-e561-4a93-9628-b41db39a577e/images/84ddcc36-54ef-473e-875f-154fae18cb13.png)


The architecture consists of the following components:

1. Data scientists perform ML experiments in the development account to explore different approaches for ML use cases by using various data sources. Data scientists perform unit tests and trials, and to track their experiments, they can use [Amazon SageMaker AI with MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html). In generative AI model development, data scientists fine-tune foundation models from Amazon SageMaker AI JumpStart model hub. Following model evaluation, data scientists push and merge the code to the Model Build repository, which is hosted on Azure DevOps. This repository contains code for a multi-step model building pipeline.

1. On Azure DevOps, the Model Build pipeline, which provides continuous integration (CI), can be activated automatically or manually upon code merge to the main branch. In the Automation account, this activates the SageMaker AI pipeline for data preprocessing, model training and fine-tuning, model evaluation, and conditional model registration based on accuracy.

1. The Automation account is a central account across ML platforms that hosts ML environments (Amazon ECR), models (Amazon S3), model metadata (SageMaker AI Model Registry), features (SageMaker AI Feature Store), automated pipelines (SageMaker AI Pipelines), and ML log insights (CloudWatch). For a generative AI workload, you might require additional evaluations for prompts in the downstream applications. A prompt management application helps to streamline and automate the process. This account allows reusability of ML assets and enforces best practices accelerate delivery of ML use cases.

1. The latest model version is added to SageMaker AI Model Registry for review. It tracks model versions and respective artifacts (lineage and metadata). It also manages the status of the model (approve, reject, or pending), and it manages the version for downstream deployment.

1. After a trained model in Model Registry is approved through the studio interface or an API call, an event can be dispatched to Amazon EventBridge. EventBridge starts the Model Deploy pipeline on Azure DevOps.

1. The Model Deploy pipeline, which provides continuous deployment (CD), checks out the source from the Model Deploy repository. The source contains code, the configuration for the model deployment, and test scripts for quality benchmarks. The Model Deploy pipeline can be tailored to your inference type.

1. After quality control checks, the Model Deploy pipeline deploys the model to the Staging account. The Staging account is a copy of the Production account, and it is used for integration testing and evaluation. For a batch transformation, the Model Deploy pipeline can automatically update the batch inference process to use the latest approved model version. For a real-time, serverless, or asynchronous inference, it sets up or updates the respective model endpoint.

1. After successful testing in the Staging account, a model can be deployed to the Production account by manual approval through the Model Deploy pipeline. This pipeline provisions a production endpoint in the **Deploy to production** step, including model monitoring and a data feedback mechanism.

1. After the model is in production, use tools such as SageMaker AI Model Monitor and SageMaker AI Clarify to identify bias, detect drift, and continuously monitor the model's performance.

**Automation and scale**

Use infrastructure as code (IaC) to automatically deploy to multiple accounts and environments. By automating the process of setting up an MLOps workflow, it is possible to separate the environments used by ML teams working on different projects. [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) helps you model, provision, and manage AWS resources by treating infrastructure as code.

## Tools
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-tools"></a>

**AWS services**
+ [Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) is a managed ML service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html) is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data. In this pattern, Amazon S3 is used for data storage and integrated with SageMaker AI for model training and model objects.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use. In this pattern, Lambda is used for data pre-processing and post-processing tasks.
+ [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable. In this pattern, it stores Docker containers that SageMaker AI uses as training and deployment environments.
+ [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html) is a serverless event bus service that helps you connect your applications with real-time data from a variety of sources. In this pattern, EventBridge orchestrates event-driven or time-based workflows that initiate automatic model retraining or deployment.
+ [Amazon API Gateway](https://docs.aws.amazon.com/apigateway/latest/developerguide/welcome.html) helps you create, publish, maintain, monitor, and secure REST, HTTP, and WebSocket APIs at any scale.  In this pattern, it is used to create an external-facing, single point of entry for SageMaker AI endpoints.
+ For RAG applications, you can use AWS services, such as [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) and [Amazon RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html), to store the vector embeddings that provide the LLM with your internal data.

**Other tools**
+ [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/user-guide/what-is-azure-devops) helps you manage CI/CD pipelines and facilitate code builds, tests, and deployment.
+ [Azure Data Lake Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) or [Snowflake](https://docs.snowflake.com/en/) are possible third-party sources of training data for ML models.
+ [Pinecone](https://docs.pinecone.io/guides/get-started/overview), [Milvus](https://milvus.io/docs/overview.md), or [ChromaDB](https://docs.trychroma.com/) are possible third-party vector databases to store vector embeddings.

## Best practices
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-best-practices"></a>

Before implementing any component of this multicloud MLOps workflow, complete the following activities:
+ Define and understand the machine learning workflow and the tools required to support it. Different use cases require different workflows and components. For example, a feature store might be required for feature reuse and low latency inference in a personalization use case, but it may not be required for other use cases. Understanding the target workflow, use case requirements, and preferred collaboration methods of the data science team is needed to successfully customize the architecture.
+ Create a clear separation of responsibility for each component of the architecture. Spreading data storage across Azure Data Lake Storage, Snowflake, and Amazon S3 can increase complexity and cost. If possible, choose a consistent storage mechanism. Similarly, avoid using a combination of Azure and AWS DevOps services, or a combination of Azure and AWS ML services.
+ Choose one or more existing models and datasets to perform end-to-end testing of the MLOps workflow. The test artifacts should reflect real use cases that the data science teams develop when the platform enters production.

## Epics
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-epics"></a>

### Design your MLOps architecture
<a name="design-your-mlops-architecture"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Identify data sources. | Based on current and future use cases, available data sources, and types of data (such as confidential data), document the data sources that need to be integrated with the MLOps platform. Data can be stored in Amazon S3, Azure Data Lake Storage, Snowflake, or other sources. For generative AI workloads, data might also include a knowledge base that grounds the generated response. This data is stored as vector embeddings in vector databases. Create a plan for integrating these sources with your platform and securing access to the correct resources. | Data engineer, Data scientist, Cloud architect | 
| Choose applicable services. | Customize the architecture by adding or removing services based on the desired workflow of the data science team, applicable data sources, and existing cloud architecture. For example, data engineers and data scientists may perform data preprocessing and feature engineering in SageMaker AI, AWS Glue, or Amazon EMR. It is unlikely that all three services would be required. | AWS administrator, Data engineer, Data scientist, ML engineer | 
| Analyze security requirements. | Gather and document security requirements. This includes determining:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops.html)For more information about securing generative AI workloads, see [Securing generative AI: An introduction to the Generative AI Security Scoping Matrix](https://aws.amazon.com/blogs/security/securing-generative-ai-an-introduction-to-the-generative-ai-security-scoping-matrix/) (AWS blog post). | AWS administrator, Cloud architect | 

### Set up AWS Organizations
<a name="set-up-aolong"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Set up AWS Organizations. | Set up AWS Organizations on the root AWS account. This helps you manage the subsequent accounts that you create as part of a multi-account MLOps strategy. For more information, see the [AWS Organizations documentation](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_tutorials_basic.html). | AWS administrator | 

### Set up the development environment and versioning
<a name="set-up-the-development-environment-and-versioning"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create an AWS development account. | Create an AWS account where data engineers and data scientists have permissions to experiment and create ML models. For instructions, see [Creating a member account in your organization](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_create.html) in the AWS Organizations documentation. | AWS administrator | 
| Create a Model Build repository. | Create a Git repository in Azure where data scientists can push their model build and deployment code after the experimentation phase is complete. For instructions, see [Set up a Git repository](https://learn.microsoft.com/en-us/devops/develop/git/set-up-a-git-repository) in the Azure DevOps documentation. | DevOps engineer, ML engineer | 
| Create a Model Deploy repository. | Create a Git repository in Azure that stores standard deployment code and templates. It should include code for every deployment option that the organization uses, as identified in the design phase. For example, it should include real-time endpoints, asynchronous endpoints, serverless inference, or batch transforms. For instructions, see [Set up a Git repository](https://learn.microsoft.com/en-us/devops/develop/git/set-up-a-git-repository) in the Azure DevOps documentation. | DevOps engineer, ML engineer | 
| Create an Amazon ECR repository. | Set up an Amazon ECR repository that stores the approved ML environments as Docker images. Allow data scientists and ML engineers to define new environments. For instructions, see [Creating a private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) in the Amazon ECR documentation. | ML engineer | 
| Set up SageMaker AI Studio. | Set up SageMaker AI Studio on the development account according to the previously defined security requirements, preferred data science tools (such as MLflow), and preferred integrated development environment (IDE). Use lifecycle configurations to automate the installation of key functionality and create a uniform development environment for data scientists. For more information, see [Amazon SageMaker AI Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated.html) and [MLflow tracking server](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) in the SageMaker AI documentation. | Data scientist, ML engineer, Prompt engineer | 

### Integrate CI/CD pipelines
<a name="integrate-ci-cd-pipelines"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create an Automation account. | Create an AWS account where automated pipelines and jobs run. You can give data science teams read access to this account. For instructions, see [Creating a member account in your organization](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_create.html) in the AWS Organizations documentation. | AWS administrator | 
| Set up a model registry. | Set up SageMaker AI Model Registry in the Automation account. This registry stores the metadata for ML models and helps certain data scientists or team leads to approve or reject models. For more information, see [Register and deploy models with Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) in the SageMaker AI documentation. | ML engineer | 
| Create a Model Build pipeline. | Create a CI/CD pipeline in Azure that starts manually or automatically when code is pushed to the Model Build repository. The pipeline should check out the source code and create or update a SageMaker AI pipeline in the Automation account. The pipeline should add a new model to the model registry. For more information about creating a pipeline, see the [Azure Pipelines documentation](https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/what-is-azure-pipelines). | DevOps engineer, ML engineer | 

### Build the deployment stack
<a name="build-the-deployment-stack"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create AWS staging and deployment accounts. | Create AWS accounts for staging and deployment of ML models. These accounts should be identical to allow for accurate testing of the models in staging before moving to production. You can give data science teams read access to the staging account. For instructions, see [Creating a member account in your organization](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_create.html) in the AWS Organizations documentation. | AWS administrator | 
| Set up S3 buckets for model monitoring. | Complete this step if you want to enable model monitoring for the deployed models that are created by the Model Deploy pipeline. Create Amazon S3 buckets for storing the input and output data. For more information about creating S3 buckets, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the Amazon S3 documentation. Set up cross-account permissions so that the automated model monitoring jobs run in the Automation account. For more information, see [Monitor data and model quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) in the SageMaker AI documentation. | ML engineer | 
| Create a Model Deploy pipeline. | Create a CI/CD pipeline in Azure that starts when a model is approved in the model registry. The pipeline should check out the source code and model artifact, build the infrastructure templates for deploying the model in the staging and production accounts, deploy the model in the staging account, run automated tests, wait for manual approval, and deploy the approved model into the production account. For more information about creating a pipeline, see the [Azure Pipelines documentation](https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/what-is-azure-pipelines). | DevOps engineer, ML engineer | 

### (Optional) Automate ML environment infrastructure
<a name="optional-automate-ml-environment-infrastructure"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Build AWS CDK or CloudFormation templates. | Define AWS Cloud Development Kit (AWS CDK) or AWS CloudFormation templates for all environments that need to be deployed automatically. This might include the development environment, automation environment, and staging and deployment environments. For more information, see the [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/home.html) and [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) documentation. | AWS DevOps | 
| Create an Infrastructure pipeline. | Create a CI/CD pipeline in Azure for infrastructure deployment. An administrator can initiate this pipeline to create new AWS accounts and set up the environments that the ML team requires. | DevOps engineer | 

## Troubleshooting
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| **Insufficient monitoring and drift detection **– Inadequate monitoring can lead to missed detection of model performance issues or data drift. | Strengthen monitoring frameworks with tools such as Amazon CloudWatch, SageMaker AI Model Monitor, and SageMaker AI Clarify. Configure alerts for immediate action on identified issues. | 
| **CI pipeline trigger errors **–** **The CI pipeline in Azure DevOps might not be triggered upon code merge due to misconfiguration. | Check the Azure DevOps project settings to ensure that the webhooks are properly set up and pointing to the correct SageMaker AI endpoints. | 
| **Governance **–** **The central Automation account might not enforce best practices across ML platforms, leading to inconsistent workflows. | Audit the Automation account settings, ensuring that all ML environments and models conform to predefined best practices and policies. | 
| **Model registry approval delays – **This happens when there's a delay in checking and approving the model, either because people take time to review it or because of technical issues. | Implement a notification system to alert stakeholders of models that are pending approval, and streamline the review process. | 
| **Model deployment event failures **–** **Events dispatched to start model deployment pipelines might fail, causing deployment delays. | Confirm that Amazon EventBridge has the correct permissions and event patterns to invoke Azure DevOps pipelines successfully. | 
| **Production deployment bottlenecks **–** **Manual approval processes can create bottlenecks, delaying the production deployment of models. | Optimize the approval workflow within the model deploy pipeline, promoting timely reviews and clear communication channels. | 

## Related resources
<a name="build-an-mlops-workflow-by-using-amazon-sagemaker-and-azure-devops-resources"></a>

**AWS documentation**
+ [Amazon SageMaker AI documentation](https://docs.aws.amazon.com/sagemaker/)
+ [Machine Learning Lens](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/machine-learning-lens.html) (AWS Well Architected Framework)
+ [Planning for successful MLOps](https://docs.aws.amazon.com/prescriptive-guidance/latest/ml-operations-planning/welcome.html) (AWS Prescriptive Guidance)

**Other AWS resources**
+ [MLOps foundation roadmap for enterprises with Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/mlops-foundation-roadmap-for-enterprises-with-amazon-sagemaker/) (AWS blog post)
+ [AWS Summit ANZ 2022 - End-to-end MLOps for architects](https://www.youtube.com/watch?v=UnAN35gu3Rw) (YouTube video)
+ [FMOps/LLMOps: Operationalize generative AI and differences with MLOps](https://aws.amazon.com/blogs/machine-learning/fmops-llmops-operationalize-generative-ai-and-differences-with-mlops/) (AWS blog post)
+ [Operationalize LLM Evaluation at Scale using Amazon SageMaker AI Clarify and MLOps services](https://aws.amazon.com/blogs/machine-learning/operationalize-llm-evaluation-at-scale-using-amazon-sagemaker-clarify-and-mlops-services/) (AWS blog post)
+ [The role of vector databases in generative AI applications](https://aws.amazon.com/blogs/database/the-role-of-vector-datastores-in-generative-ai-applications/) (AWS blog post)

**Azure documentation**
+ [Azure DevOps documentation](https://learn.microsoft.com/en-us/azure/devops/user-guide/what-is-azure-devops)
+ [Azure Pipelines documentation](https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/what-is-azure-pipelines)

# Configure model invocation logging in Amazon Bedrock by using AWS CloudFormation
<a name="configure-bedrock-invocation-logging-cloudformation"></a>

*Vikramaditya Bhatnagar, Amazon Web Services*

## Summary
<a name="configure-bedrock-invocation-logging-cloudformation-summary"></a>

You can configure Amazon Bedrock to collect invocation logs, model input data, and model output data for all model invocations in your AWS account. This is a [best practice](https://aws.amazon.com/blogs/machine-learning/best-practices-for-building-robust-generative-ai-applications-with-amazon-bedrock-agents-part-2/) for building robust generative AI applications with Amazon Bedrock. You can store model invocation logs in an Amazon CloudWatch Logs log group, in an Amazon Simple Storage Service (Amazon S3) bucket, or in both. Having log data in CloudWatch Logs helps you create custom metric filters, alarms, and dashboards. Amazon S3 is ideal for replicating data across AWS Regions or for long-term storage, as governed by your organization's policies.

This pattern provides a sample AWS CloudFormation template that uses an infrastructure as code (IaC) approach to configure model invocation logging for Amazon Bedrock. The template configures log storage in both CloudWatch Logs and Amazon S3.

## Prerequisites and limitations
<a name="configure-bedrock-invocation-logging-cloudformation-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ The following permissions:
  + [Permissions](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-iam-template.html) to create CloudFormation stacks
  + [Permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/security-iam-awsmanpol.html#security-iam-awsmanpol-updates) to access Amazon Bedrock
  + [Permissions](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-policies-s3.html) to create and access Amazon S3 buckets
  + [Permissions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/iam-identity-based-access-control-cwl.html) to create and access CloudWatch Logs log groups
  + [Permissions](https://docs.aws.amazon.com/lambda/latest/dg/security-iam-awsmanpol.html) to create and access AWS Lambda functions
  + [Permissions](https://docs.aws.amazon.com/kms/latest/developerguide/customer-managed-policies.html) to create and access AWS Key Management Service (AWS KMS) keys

**Limitations**

This pattern logs model invocations to both CloudWatch Logs and Amazon S3. It does not support choosing only one of these two services.

## Architecture
<a name="configure-bedrock-invocation-logging-cloudformation-architecture"></a>

**Target architecture**

The CloudFormation template provisions the following resources in your target AWS account:
+ A CloudWatch Logs log group for storing model invocation logs
+ An Amazon S3 bucket for storing model invocation logs and a corresponding bucket policy
+ An Amazon S3 bucket for storing server-side access logs and a corresponding bucket policy
+ An AWS Lambda function that configures logging settings in Amazon Bedrock
+ An AWS KMS key and a corresponding key alias
+ An AWS Identity and Access Management (IAM) service role for Amazon Bedrock

The following diagram shows how invocation logs are stored after you deploy the CloudFormation stack associated with this pattern. Amazon Bedrock publishes log data when the foundation model delivers text, an image, a video, or embedding data. As shown in the diagram, the Amazon S3 buckets and the CloudWatch Logs log group are encrypted with an AWS KMS key.

![\[Workflow for logging invocations of an Amazon Bedrock foundation model.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/a55e7495-ec84-4d41-886e-5c37b37aac67/images/a958d52f-9072-40af-80cb-360f6c1c7fd5.png)


The diagram shows the following workflow:

1. A user submits a query to a foundation model in Amazon Bedrock.

1. Amazon Bedrock assumes the IAM service role.

1. Amazon Bedrock generates log data and stores it in an CloudWatch Logs log group and in an Amazon S3 bucket.

1. If a user reads, uploads, or deletes any files in the Amazon S3 bucket that contains the model invocation logs, those activities are logged in another Amazon S3 bucket for server-side access logs.

**Automation and scale**

To scale this solution, you can deploy the CloudFormation template as a stack set to multiple AWS Regions and AWS accounts. For more information, see [Managing stacks across accounts and Regions with StackSets](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/what-is-cfnstacksets.html) in the CloudFormation documentation.

## Tools
<a name="configure-bedrock-invocation-logging-cloudformation-tools"></a>

**AWS services**
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies and Amazon available for your use through a unified API.
+ [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and AWS Regions.
+ [Amazon CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) helps you centralize the logs from all of your systems, applications, and AWS services so you can monitor them and archive them securely.
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [AWS Key Management Service (AWS KMS)](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) helps you create and control cryptographic keys to help protect your data.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

**Other tools**
+ [Git](https://git-scm.com/docs) is an open source, distributed version control system.

**Code repository**

The code for this pattern is available in the GitHub [enable-bedrock-logging-using-cloudformation](https://github.com/aws-samples/enable-bedrock-logging-using-cloudformation) repository.

## Epics
<a name="configure-bedrock-invocation-logging-cloudformation-epics"></a>

### Create the CloudFormation stack
<a name="create-the-cfnshort-stack"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Download the CloudFormation template. | Download the [CloudFormation template](https://github.com/aws-samples/enable-bedrock-logging-using-cloudformation/blob/main/enable-bedrock-logging-using-cloudformation.yaml) from the GitHub repository. | Cloud architect | 
| Deploy the template. | Create a stack in your target account and Region. In the **Parameters** section, specify values for the parameters that are defined in the template. For instructions, see [Creating a stack](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) in the CloudFormation documentation. | Cloud architect | 

### Test the solution
<a name="test-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Enable model access. | In Amazon Bedrock, add access to the foundation model. For instructions, see [Add or remove access to Amazon Bedrock foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) in the Amazon Bedrock documentation. | Cloud architect | 
| Run a sample prompt. | In Amazon Bedrock playgrounds, run a sample prompt. For instructions, see [Generate responses in the console using playgrounds](https://docs.aws.amazon.com/bedrock/latest/userguide/playgrounds.html) in the Amazon Bedrock documentation. | Cloud architect | 
| Review the logging configuration. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/configure-bedrock-invocation-logging-cloudformation.html) | Cloud architect | 
| Review the Amazon S3 bucket. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/configure-bedrock-invocation-logging-cloudformation.html) | Cloud architect | 
| Review the log group. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/configure-bedrock-invocation-logging-cloudformation.html) | Cloud architect | 

## Related resources
<a name="configure-bedrock-invocation-logging-cloudformation-resources"></a>

**AWS documentation**
+ [Accessing an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html) (Amazon S3 documentation)
+ [Creating and managing stacks](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/stacks.html) (CloudFormation documentation)
+ [Monitor model invocation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html) (Amazon Bedrock documentation)
+ [Working with log groups and log streams](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html) (CloudWatch Logs documentation)

**AWS blog posts**
+ [Monitoring Generative AI applications using Amazon Bedrock and Amazon CloudWatch integration](https://aws.amazon.com/blogs/mt/monitoring-generative-ai-applications-using-amazon-bedrock-and-amazon-cloudwatch-integration/)
+ [Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1](https://aws.amazon.com/blogs/machine-learning/best-practices-for-building-robust-generative-ai-applications-with-amazon-bedrock-agents-part-1/)
+ [Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2](https://aws.amazon.com/blogs/machine-learning/best-practices-for-building-robust-generative-ai-applications-with-amazon-bedrock-agents-part-2/)

# Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions"></a>

*Julia Bluszcz, Aubrey Oosthuizen, Mohan Gowda Purushothama, Neha Sharma, and Mateusz Zaremba, Amazon Web Services*

## Summary
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-summary"></a>

This pattern shows how to create a Docker container image for [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) and use it for a training model in [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html). By packaging custom algorithms in a container, you can run almost any code in the SageMaker environment, regardless of programming language, framework, or dependencies.

In the example [SageMaker notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) provided, the custom Docker container image is stored in [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html). Step Functions then uses the container that’s stored in Amazon ECR to run a Python processing script for SageMaker. Then, the container exports the model to [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html).

## Prerequisites and limitations
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ An [AWS Identity and Access Management (IAM) role for SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with Amazon S3 permissions
+ An [IAM role for Step Functions](https://sagemaker-examples.readthedocs.io/en/latest/step-functions-data-science-sdk/step_functions_mlworkflow_processing/step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.html#Create-an-Execution-Role-for-Step-Functions)
+ Familiarity with Python
+ Familiarity with the Amazon SageMaker Python SDK
+ Familiarity with the AWS Command Line Interface (AWS CLI)
+ Familiarity with AWS SDK for Python (Boto3)
+ Familiarity with Amazon ECR
+ Familiarity with Docker

**Product versions**
+ AWS Step Functions Data Science SDK version 2.3.0
+ Amazon SageMaker Python SDK version 2.78.0

## Architecture
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-architecture"></a>

The following diagram shows an example workflow for creating a Docker container image for SageMaker, then using it for a training model in Step Functions:

![\[Workflow to create Docker container image for SageMaker to use as a Step Functions training model.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/7857d57f-3077-4b06-8971-fb5846387693/images/37755e38-0bc4-4dd0-90c7-135d95b00053.png)


The diagram shows the following workflow:

1. A data scientist or DevOps engineer uses a Amazon SageMaker notebook to create a custom Docker container image.

1. A data scientist or DevOps engineer stores the Docker container image in an Amazon ECR private repository that’s in a private registry.

1. A data scientist or DevOps engineer uses the Docker container to run a Python SageMaker processing job in a Step Functions workflow.

**Automation and scale**

The example SageMaker notebook in this pattern uses an `ml.m5.xlarge` notebook instance type. You can change the instance type to fit your use case. For more information about SageMaker notebook instance types, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).

## Tools
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-tools"></a>
+ [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable.
+ [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [Amazon SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) is an open source library for training and deploying machine-learning models on SageMaker.
+ [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.
+ [AWS Step Functions Data Science Python SDK](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/index.html) is an open source library that helps you create Step Functions workflows that process and publish machine learning models.

## Epics
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-epics"></a>

### Create a custom Docker container image and store it in Amazon ECR
<a name="create-a-custom-docker-container-image-and-store-it-in-amazon-ecr"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Setup Amazon ECR and create a new private registry. | If you haven’t already, set up Amazon ECR by following the instructions in [Setting up with Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/get-set-up-for-amazon-ecr.html) in the *Amazon ECR User Guide*. Each AWS account is provided with a default private Amazon ECR registry. | DevOps engineer | 
| Create an Amazon ECR private repository. | Follow the instructions in [Creating a private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) in the *Amazon ECR User Guide*.The repository that you create is where you’ll store your custom Docker container images. | DevOps engineer | 
| Create a Dockerfile that includes the specifications needed to run your SageMaker processing job.  | Create a Dockerfile that includes the specifications needed to run your SageMaker processing job by configuring a Dockerfile. For instructions, see [Adapting your own training container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html) in the *Amazon SageMaker Developer Guide*.For more information about Dockerfiles, see the [Dockerfile Reference](https://docs.docker.com/engine/reference/builder/) in the Docker documentation.**Example Jupyter notebook code cells to create a Dockerfile***Cell 1*<pre># Make docker folder<br />!mkdir -p docker</pre>*Cell 2*<pre>%%writefile docker/Dockerfile<br /><br />FROM python:3.7-slim-buster<br /><br />RUN pip3 install pandas==0.25.3 scikit-learn==0.21.3<br />ENV PYTHONUNBUFFERED=TRUE<br /><br />ENTRYPOINT ["python3"]</pre> | DevOps engineer | 
| Build your Docker container image and push it to Amazon ECR. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions.html)For more information, see [Building and registering the container](https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.html#Building-and-registering-the-container) in *Building your own algorithm container* on GitHub.**Example Jupyter notebook code cells to build and register a Docker image**Before running the following cells, make sure that you’ve created a Dockerfile and stored it in the directory called `docker`. Also, make sure that you’ve created an Amazon ECR repository, and that you replace the `ecr_repository` value in the first cell with your repository’s name.*Cell 1*<pre>import boto3<br />tag = ':latest'<br />account_id = boto3.client('sts').get_caller_identity().get('Account')<br />region = boto3.Session().region_name<br />ecr_repository = 'byoc'<br /><br />image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)</pre>*Cell 2*<pre># Build docker image<br />!docker build -t $image_uri docker</pre>*Cell 3*<pre># Authenticate to ECR<br />!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com</pre>*Cell 4*<pre># Push docker image<br />!docker push $image_uri</pre>You must [authenticate your Docker client to your private registry](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html) so that you can use the `docker push` and `docker pull` commands. These commands push and pull images to and from the repositories in your registry. | DevOps engineer | 

### Create a Step Functions workflow that uses your custom Docker container image
<a name="create-a-step-functions-workflow-that-uses-your-custom-docker-container-image"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create a Python script that includes your custom processing and model training logic. | Write custom processing logic to run in your data processing script. Then, save it as a Python script named `training.py`.For more information, see [Bring your own model with SageMaker Script Mode](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/sagemaker-script-mode.html) on GitHub.**Example Python script that includes custom processing and model training logic**<pre>%%writefile training.py<br />from numpy import empty<br />import pandas as pd<br />import os<br />from sklearn import datasets, svm<br />from joblib import dump, load<br /><br /><br />if __name__ == '__main__':<br />    digits = datasets.load_digits()<br />    #create classifier object<br />    clf = svm.SVC(gamma=0.001, C=100.)<br />    <br />    #fit the model<br />    clf.fit(digits.data[:-1], digits.target[:-1])<br />    <br />    #model output in binary format<br />    output_path = os.path.join('/opt/ml/processing/model', "model.joblib")<br />    dump(clf, output_path)</pre> | Data scientist | 
| Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps.  | Install and import the [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/readmelink.html) and upload the **training.py** file to Amazon S3. Then, use the [Amazon SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to define a processing step in Step Functions.Make sure that you’ve [created an IAM execution role for Step Functions](https://sagemaker-examples.readthedocs.io/en/latest/step-functions-data-science-sdk/step_functions_mlworkflow_processing/step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.html#Create-an-Execution-Role-for-Step-Functions) in your AWS account.**Example environment set up and custom training script to upload to Amazon S3**<pre>!pip install stepfunctions<br /><br />import boto3<br />import stepfunctions<br />import sagemaker<br />import datetime<br /><br />from stepfunctions import steps<br />from stepfunctions.inputs import ExecutionInput<br />from stepfunctions.steps import (<br />    Chain<br />)<br />from stepfunctions.workflow import Workflow<br />from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput<br /><br />sagemaker_session = sagemaker.Session()<br />bucket = sagemaker_session.default_bucket() <br />role = sagemaker.get_execution_role()<br />prefix = 'byoc-training-model'<br /><br /># See prerequisites section to create this role<br />workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole"<br /><br />execution_input = ExecutionInput(<br />    schema={<br />        "PreprocessingJobName": str})<br /><br /><br />input_code = sagemaker_session.upload_data(<br />    "training.py",<br />    bucket=bucket,<br />    key_prefix="preprocessing.py",<br />)</pre>**Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script**Make sure that you use the `execution_input` parameter to specify the job name. The parameter’s value must be unique each time the job runs. Also, the **training.py** file’s code is passed as an `input` parameter to the `ProcessingStep`, which means that it will be copied inside the container. The destination for the `ProcessingInput` code is the same as the second argument inside the `container_entrypoint`.<pre>script_processor = ScriptProcessor(command=['python3'],<br />                image_uri=image_uri,<br />                role=role,<br />                instance_count=1,<br />                instance_type='ml.m5.xlarge')<br /><br /><br />processing_step = steps.ProcessingStep(<br />    "training-step",<br />    processor=script_processor,<br />    job_name=execution_input["PreprocessingJobName"],<br />    inputs=[<br />        ProcessingInput(<br />            source=input_code,<br />            destination="/opt/ml/processing/input/code",<br />            input_name="code",<br />        ),<br />    ],<br />    outputs=[<br />        ProcessingOutput(<br />            source='/opt/ml/processing/model', <br />            destination="s3://{}/{}".format(bucket, prefix), <br />            output_name='byoc-example')<br />    ],<br />    container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"],<br />)</pre>**Example Step Functions workflow that runs a SageMaker processing job**This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see [Example notebooks in SageMaker](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/readmelink.html#example-notebooks-in-sagemaker) in the AWS Step Functions Data Science SDK documentation.<pre>workflow_graph = Chain([processing_step])<br /><br />workflow = Workflow(<br />    name="ProcessingWorkflow",<br />    definition=workflow_graph,<br />    role=workflow_execution_role<br />)<br /><br />workflow.create()<br /># Execute workflow<br />execution = workflow.execute(<br />    inputs={<br />        "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")),  # Each pre processing job (SageMaker processing job) requires a unique name,<br />    }<br />)<br />execution_output = execution.get_output(wait=True)</pre> | Data scientist | 

## Related resources
<a name="create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions-resources"></a>
+ [Process data](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) (*Amazon SageMaker Developer Guide*)
+ [Adapting your own training container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html) (*Amazon SageMaker Developer Guide*)

# Use Amazon Bedrock agents to automate creation of access entry controls in Amazon EKS through text-based prompts
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks"></a>

*Keshav Ganesh and Sudhanshu Saurav, Amazon Web Services*

## Summary
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-summary"></a>

Organizations face challenges in managing access controls and resource provisioning when multiple teams need to work with a shared Amazon Elastic Kubernetes Service (Amazon EKS) cluster. A managed Kubernetes service such as Amazon EKS has simplified cluster operations. However, the administrative overhead of managing team access and resource permissions remains complex and time-consuming. 

This pattern shows how Amazon Bedrock agents can help you automate Amazon EKS cluster access management. This automation allows development teams to focus on their core application development rather than dealing with access control setup and management. You can customize an Amazon Bedrock agent to perform actions for a wide variety of tasks through simple natural language prompts.

By using AWS Lambda functions as action groups, an Amazon Bedrock agent can handle tasks such as creating user access entries and managing access policies. In addition, an Amazon Bedrock agent can configure pod identity associations that allow access to AWS Identity and Access Management (IAM) resources for the pods running in the cluster. Using this solution, organizations can streamline their Amazon EKS cluster administration with simple text-based prompts, reduce manual overhead, and improve overall development efficiency.

## Prerequisites and limitations
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-prereqs"></a>

**Prerequisites**
+ An active AWS account.
+ Established IAM [roles and permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html) for the deployment process. This includes permissions to access Amazon Bedrock foundation models (FM), create Lambda functions, and any other required resources across the target AWS accounts.
+ [Access enabled](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) in the active AWS account to these Amazon Bedrock FMs: Amazon Titan Text Embeddings V2 and Anthropic Claude 3 Haiku.
+ AWS Command Line Interface (AWS CLI) version 2.9.11 or later, [installed](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) and [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html).
+ eksctl 0.194.0 or later, [installed](https://eksctl.io/installation/).

**Limitations**
+ Training and documentation might be required to help ensure smooth adoption and effective use of these techniques. Using Amazon Bedrock, Amazon EKS, Lambda, Amazon OpenSearch Service, and [OpenAPI](https://www.openapis.org/what-is-openapi) involve a significant learning curve for developers and DevOps teams.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html), and choose the link for the service.

## Architecture
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-architecture"></a>

The following diagram shows the workflow and architecture components for this pattern.

![\[Workflow and components to create access controls in Amazon EKS with Amazon Bedrock agents.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/2c52b1ba-bbad-4a46-ab1e-10e69a0a66e7/images/c7981a86-f734-4c07-a2f7-63ad38b66ab6.png)


This solution performs the following steps:

1. The user interacts with the Amazon Bedrock agent by submitting a prompt or query that serves as input for the agent to process and take action.

1. Based on the prompt, the Amazon Bedrock agent checks the OpenAPI schema to identify the correct API to target. If the Amazon Bedrock agent finds the correct API call, the request goes to the action group that is associated with the Lambda function that implements these actions.

1. If a relevant API isn’t found, the Amazon Bedrock agent queries the OpenSearch collection. The OpenSearch collection uses indexed knowledge base content that is sourced from the Amazon S3 bucket that contains the *Amazon EKS User Guide*.

1. The OpenSearch collection returns relevant contextual information to the Amazon Bedrock agent.

1. For actionable requests (those that match an API operation), the Amazon Bedrock agent executes within a virtual private cloud (VPC) and triggers the Lambda function.

1. The Lambda function performs an action that’s based on the user’s input inside the Amazon EKS cluster.

1. The Amazon S3 bucket for the Lambda code stores the artifact that has the code and logic written for the Lambda function.

## Tools
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-tools"></a>

**AWS services**
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
+ [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and AWS Regions.
+ [Amazon Elastic Kubernetes Service (Amazon EKS)](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html) helps you run Kubernetes on AWS without needing to install or maintain your own Kubernetes control plane or nodes.
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) is a managed service that helps you deploy, operate, and scale OpenSearch clusters in the AWS Cloud. Its collections feature helps you to organize your data and build comprehensive knowledge bases that AI assistants such as Amazon Bedrock agents can use.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

**Other tools**
+ [eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) is a command-line utility for creating and managing Kubernetes clusters on Amazon EKS.

**Code repository**

The code for this pattern is available in the GitHub [eks-access-controls-bedrock-agent](https://github.com/aws-samples/eks-access-controls-bedrock-agent.git) repository.

## Best practices
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-best-practices"></a>
+ Maintain the highest possible security when implementing this pattern. Make sure that the Amazon EKS cluster is private, has limited access permissions, and all the resources are inside a virtual private cloud (VPC). For additional information, see [Best practices for security](https://docs.aws.amazon.com/eks/latest/best-practices/security.html) in the Amazon EKS documentation.
+ Use AWS KMS [customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html) wherever possible, and grant limited access permissions to them.
+ Follow the principle of least privilege and grant the minimum permissions required to perform a task. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#grant-least-priv) and [Security best practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) in the IAM documentation.

## Epics
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-epics"></a>

### Set up the environment
<a name="set-up-the-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the repository. | To clone this pattern’s repository, run the following command in your local workstation:<pre>git clone https://github.com/aws-samples/eks-access-controls-bedrock-agent.git</pre> | AWS DevOps | 
| Get the AWS account ID. | To get the AWS account ID, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)This command stores your AWS account ID in the `AWS_ACCOUNT` variable. | AWS DevOps | 
| Create the S3 bucket for Lambda code. | To implement this solution, you must create three Amazon S3 buckets that serve different purposes, as shown in the [architecture](#using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-architecture) diagram. The S3 buckets are for Lambda code, a knowledge base, and OpenAPI schema.To create the Lambda code bucket, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)The package command creates a new CloudFormation template (`eks-access-controls-template.yaml`) that contains:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 
| Create the S3 bucket for the knowledge base. | To create the Amazon S3 bucket for the knowledge base, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 
| Create the S3 bucket for the OpenAPI schema. | To create the Amazon S3 bucket for the OpenAPI schema, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 

### Deploy the CloudFormation stack
<a name="deploy-the-cfnshort-stack"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Deploy the CloudFormation stack.  | To deploy the CloudFormation stack, use the CloudFormation template file `eks-access-controls-template.yaml` that you created earlier. For more detailed instructions, see [Create a stack from the CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) in the CloudFormation documentation.Provisioning the OpenSearch index with the CloudFormation template takes about 10 minutes.After the stack is created, make a note of the `VPC_ID` and `PRIVATE_SUBNET ID`s. | AWS DevOps | 
| Create the Amazon EKS cluster.  | To create the Amazon EKS cluster inside the VPC, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)The expected results are as follows:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 

### Connect the Lambda function and the Amazon EKS cluster
<a name="connect-the-lam-function-and-the-eks-cluster"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create a connection between the Amazon EKS cluster and the Lambda function. | To set up network and IAM permissions to allow the Lambda function to communicate with the Amazon EKS cluster, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 

### Test the solution
<a name="test-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Test the Amazon Bedrock agent. | Before testing the Amazon Bedrock agent, make sure that you do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)To access the Amazon Bedrock agent, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html)You can also ask the agent to perform actions for EKS Pod Identity associations. For more details, see [Learn how EKS Pod Identity grants pods access to AWS services](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html) in the Amazon EKS documentation. | AWS DevOps | 

### Clean up
<a name="clean-up"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clean up resources. | To clean up the resources that this pattern created, use the following procedure. Wait for each deletion step to complete before proceeding to the next step.This procedure will permanently delete all resources created by these stacks. Make sure that you've backed up any important data before proceeding.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks.html) | AWS DevOps | 

## Troubleshooting
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| A non-zero error code is returned during environment setup. | Verify that you’re using the correct folder when running any command to deploy this solution. For more information, see the [FIRST\$1DEPLOY.md](https://github.com/aws-samples/eks-access-controls-bedrock-agent/blob/main/FIRST_DEPLOY.md) file in this pattern’s repository. | 
| The Lambda function isn’t able to do the task. | Make sure that connectivity is set up correctly from the Lambda function to the Amazon EKS cluster. | 
| The agent prompts don’t recognize the APIs. | Redeploy the solution. For more information, see the [RE\$1DEPLOY.md](https://github.com/aws-samples/eks-access-controls-bedrock-agent/blob/main/RE_DEPLOY.md) file in this pattern’s repository. | 
| The stack fails to delete. | An initial attempt to delete the stack might fail. This failure can occur because of dependency issues with the custom resource that was created for the OpenSearch collection which does the indexing for the knowledge base. To delete the stack, retry the delete operation by retaining the custom resource. | 

## Related resources
<a name="using-amazon-bedrock-agents-to-automate-creation-of-access-entry-controls-in-amazon-eks-resources"></a>

**AWS Blog **
+ [A deep dive into simplified Amazon EKS access management controls](https://aws.amazon.com/blogs/containers/a-deep-dive-into-simplified-amazon-eks-access-management-controls/) 

**Amazon Bedrock documentation**
+ [Automate tasks in your application using AI agents](https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html) 
+ [How Amazon Bedrock Agents works](https://docs.aws.amazon.com/bedrock/latest/userguide/agents-how.html)
+ [Test and troubleshoot agent behavior](https://docs.aws.amazon.com/bedrock/latest/userguide/agents-test.html)
+ [Use action groups to define actions for your agent to perform](https://docs.aws.amazon.com/bedrock/latest/userguide/agents-action-create.html) 

**Amazon EKS documentation**
+ [Learn how access control works in Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/cluster-auth.html)

# Deploy a RAG use case on AWS by using Terraform and Amazon Bedrock
<a name="deploy-rag-use-case-on-aws"></a>

*Martin Maritsch, Nicolas Jacob Baer, Olivier Brique, Julian Ferdinand Grueber, Alice Morano, and Nicola D Orazio, Amazon Web Services*

## Summary
<a name="deploy-rag-use-case-on-aws-summary"></a>

AWS provides various options to build your [Retrieval Augmented Generation (RAG)](https://aws.amazon.com/what-is/retrieval-augmented-generation/)-enabled generative AI use cases. This pattern provides you with a solution for a RAG-based application based on LangChain and Amazon Aurora PostgreSQL-Compatible as a vector store. You can directly deploy this solution with Terraform into your AWS account and implement the following simple RAG use case:

1. The user manually uploads a file to an Amazon Simple Storage Service (Amazon S3) bucket, such as a Microsoft Excel file or a PDF document. (For more information about supported file types, see the [Unstructured](https://docs.unstructured.io/open-source/core-functionality/partitioning) documentation.)

1. The content of the file is extracted and embedded into a knowledge database that’s based on serverless Aurora PostgreSQL-Compatible, which supports near real-time ingestion of documents into the vector store. This approach enables the RAG model to access and retrieve relevant information for use cases where low latencies matter.

1. When the user engages with the text generation model, it enhances the interaction through retrieval augmentation of relevant content from the previously uploaded files.

The pattern uses [Amazon Titan Text Embeddings v2](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) as the embedding model and [Anthropic Claude 3 Sonnet](https://aws.amazon.com/bedrock/claude/) as the text generation model, both available on Amazon Bedrock.

## Prerequisites and limitations
<a name="deploy-rag-use-case-on-aws-prereqs"></a>

**Prerequisites**
+ An active AWS account.
+ AWS Command Line Interface (AWS CLI) installed and configured with your AWS account. For installation instructions, see [Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) in the AWS CLI documentation. To review your AWS credentials and your access to your account, see [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) in the AWS CLI documentation.
+ Model access that’s enabled for the required large language models (LLMs) in the Amazon Bedrock console of your AWS account. This pattern requires the following LLMs:
  + `amazon.titan-embed-text-v2:0`
  + `anthropic.claude-3-sonnet-20240229-v1:0`

**Limitations**
+ This sample architecture doesn't include an interface for programmatic question answering with the vector database. If your use case requires an API, consider adding [Amazon API Gateway](https://docs.aws.amazon.com/apigateway/latest/developerguide) with an AWS Lambda function that runs retrieval and question-answering tasks. 
+ This sample architecture doesn't include monitoring features for the deployed infrastructure. If your use case requires monitoring, consider adding [AWS monitoring services](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/welcome.html).
+ If you upload a lot of documents in a short time frame to the Amazon S3 bucket, the Lambda function might encounter rate limits. As a solution, you can decouple the Lambda function with an Amazon Simple Queue Service (Amazon SQS) queue where you can control the rate of Lambda invocations.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html), and choose the link for the service.

**Product versions**
+ [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) version 2 or later
+ [Docker](https://docs.docker.com/get-started/) version 26.0.0 or later
+ [Poetry](https://pypi.org/project/poetry/) version 1.7.1 or later
+ [Python](https://www.python.org/downloads/) version 3.10 or later
+ [Terraform](https://developer.hashicorp.com/terraform/install) version 1.8.4 or later

## Architecture
<a name="deploy-rag-use-case-on-aws-architecture"></a>

The following diagram shows the workflow and architecture components for this pattern.

![\[Workflow to create a RAG-based application using Aurora PostgreSQL and LLMs on Amazon Bedrock.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/8f184945-7f17-4760-8806-6d0eaeef372a/images/3771b7a0-05bd-4eb3-ad5b-199e22f86184.png)


This diagram illustrates the following:

1. When an object is created in the Amazon S3 bucket `bedrock-rag-template-<account_id>`, an [Amazon S3 notification](https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html) invokes the Lambda function `data-ingestion-processor`.

1. The Lambda function `data-ingestion-processor` is based on a Docker image stored in the Amazon Elastic Container Registry (Amazon ECR) repository `bedrock-rag-template`.

   The function uses the [LangChain S3FileLoader](https://python.langchain.com/v0.1/docs/integrations/document_loaders/aws_s3_file/) to read the file as a [LangChain Document](https://api.python.langchain.com/en/v0.0.339/schema/langchain.schema.document.Document.html). Then, the [LangChain RecursiveCharacterTextSplitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/) chunks each document, given a `CHUNK_SIZE` and a `CHUNK_OVERLAP` that depends on the maximum token size of the Amazon Titan Text Embedding V2 embedding model. Next, the Lambda function invokes the embedding model on Amazon Bedrock to embed the chunks into numerical vector representations. Lastly, these vectors are stored in the Aurora PostgreSQL database. To access the database, the Lambda function first retrieves the username and password from AWS Secrets Manager.

1. On the Amazon SageMaker AI [notebook instance](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) `aws-sample-bedrock-rag-template`, the user can write a question prompt. The code invokes Claude 3 on Amazon Bedrock and adds the knowledge base information to the context of the prompt. As a result, Claude 3 provides responses using the information in the documents.

This pattern’s approach to networking and security is as follows:
+ The Lambda function `data-ingestion-processor` is in a private subnet within the virtual private cloud (VPC). The Lambda function isn’t allowed to send traffic to the public internet because of its security group. As a result, the traffic to Amazon S3 and Amazon Bedrock is routed through the VPC endpoints only. Consequently, the traffic doesn’t traverse the public internet, which reduces latency and adds an additional layer of security at the networking level.
+ All the resources and data are encrypted whenever applicable by using the AWS Key Management Service (AWS KMS) key with the alias `aws-sample/bedrock-rag-template`.

**Automation and scale**

This pattern uses Terraform to deploy the infrastructure from the code repository into an AWS account.

## Tools
<a name="deploy-rag-use-case-on-aws-tools"></a>

**AWS services**
+ [Amazon Aurora PostgreSQL-Compatible Edition](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html) is a fully managed, ACID-compliant relational database engine that helps you set up, operate, and scale PostgreSQL deployments. In this pattern, Aurora PostgreSQL-Compatible uses the pgvector plugin as the vector database.
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open source tool that helps you interact with AWS services through commands in your command line shell.
+ [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable. In this pattern, Amazon ECR hosts the Docker image for the `data-ingestion-processor` Lambda function.
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
+ [AWS Key Management Service (AWS KMS)](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) helps you create and control cryptographic keys to help protect your data.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use. In this pattern, Lambda ingests data into the vector store.
+ [Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/?id=docs_gateway) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html) helps you replace hardcoded credentials in your code, including passwords, with an API call to Secrets Manager to retrieve the secret programmatically.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
+ [Amazon Virtual Private Cloud (Amazon VPC)](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html) helps you launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS. The VPC includes subnets and routing tables to control traffic flow.

**Other tools**
+ [Docker](https://docs.docker.com/manuals/) is a set of platform as a service (PaaS) products that use virtualization at the operating-system level to deliver software in containers.
+ [HashiCorp Terraform](https://www.terraform.io/docs) is an infrastructure as code (IaC) tool that helps you use code to provision and manage cloud infrastructure and resources.
+ [Poetry](https://pypi.org/project/poetry/) is a tool for dependency management and packaging in Python.
+ [Python](https://www.python.org/) is a general-purpose computer programming language.

**Code repository**

The code for this pattern is available in the GitHub [terraform-rag-template-using-amazon-bedrock](https://github.com/aws-samples/terraform-rag-template-using-amazon-bedrock) repository.

## Best practices
<a name="deploy-rag-use-case-on-aws-best-practices"></a>
+ Although this code sample can be deployed into any AWS Region, we recommend that you use US East (N. Virginia) – `us-east-1` or US West (N. California) – `us-west-1`. This recommendation is based on the availability of foundation and embedding models in Amazon Bedrock at the time of this pattern’s publication. For an up-to-date list of Amazon Bedrock foundation model support in AWS Regions, see [Model support by AWS Region](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) in the Amazon Bedrock documentation. For information about deploying this code sample to other Regions, see [Additional information](#deploy-rag-use-case-on-aws-additional).
+ This pattern provides a proof-of-concept (PoC) or pilot demo only. If you want to take the code to production, be sure to use the following best practices:
  + Enable server access logging for Amazon S3.
  + Set up [monitoring and alerting](https://docs.aws.amazon.com/lambda/latest/dg/lambda-monitoring.html) for the Lambda function.
  + If your use case requires an API, consider adding Amazon API Gateway with a Lambda function that runs retrieval and question-answering tasks.
+ Follow the principle of least privilege and grant the minimum permissions required to perform a task. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#grant-least-priv) and [Security best practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/IAMBestPracticesAndUseCases.html) in the IAM documentation.

## Epics
<a name="deploy-rag-use-case-on-aws-epics"></a>

### Deploy the solution in an AWS account
<a name="deploy-the-solution-in-an-aws-account"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the repository. | To clone the GitHub repository provided with this pattern, use the following command:<pre>git clone https://github.com/aws-samples/terraform-rag-template-using-amazon-bedrock</pre> | AWS DevOps | 
| Configure the variables. | To configure the parameters for this pattern, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-rag-use-case-on-aws.html) | AWS DevOps | 
| Deploy the solution. | To deploy the solution, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-rag-use-case-on-aws.html)The infrastructure deployment provisions an SageMaker AI instance inside the VPC and with the permissions to access the Aurora PostgreSQL database. | AWS DevOps | 

### Test the solution
<a name="test-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Run the demo. | After the previous infrastructure deployment has succeeded, use the following steps to run the demo in a Jupyter notebook:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-rag-use-case-on-aws.html)The Jupyter notebook guides you through the following process:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-rag-use-case-on-aws.html) | General AWS | 

### Clean up infrastucture
<a name="clean-up-infrastucture"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clean up the infrastructure. | To remove all the resources that you created when they are no longer required, use the following command:<pre>terraform destroy -var-file=commons.tfvars</pre> | AWS DevOps | 

## Related resources
<a name="deploy-rag-use-case-on-aws-resources"></a>

**AWS resources**
+ [Building Lambda functions with Python](https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html)
+ [Inference parameters for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)
+ [Access to Amazon Bedrock foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)
+ [The role of vector databases in generative AI applications](https://aws.amazon.com/blogs/database/the-role-of-vector-datastores-in-generative-ai-applications/) (AWS Database Blog)
+ [Working with Amazon Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html)

**Other resources**
+ [pgvector documentation](https://github.com/pgvector/pgvector)

## Additional information
<a name="deploy-rag-use-case-on-aws-additional"></a>

**Implementing a vector database**

This pattern uses Aurora PostgreSQL-Compatible to implement a vector database for RAG. As alternatives to Aurora PostgreSQL, AWS provides other capabilities and services for RAG, such as Amazon Bedrock Knowledge Bases and Amazon OpenSearch Service. You can choose the solution that best fits your specific requirements:
+ [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) provides distributed search and analytics engines that you can use to store and query large volumes of data.
+ [Amazon Bedrock Knowledge Bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html) is designed for building and deploying knowledge bases as an additional abstraction to simplify the RAG ingestion and retrieval process. Amazon Bedrock Knowledge Bases can work with both Aurora PostgreSQL and Amazon OpenSearch Service.

**Deploying to other AWS Regions**

As described in [Architecture](#deploy-rag-use-case-on-aws-architecture), we recommend that you use either the Region US East (N. Virginia) – `us-east-1` or US West (N. California) – `us-west-1` to deploy this code sample. However, there are two possible ways to deploy this code sample to Regions other than `us-east-1` and `us-west-1`. You can configure the deployment Region in the `commons.tfvars` file. For cross-Region foundation model access, consider the following options:
+ **Traversing the public internet** – If the traffic can traverse the public internet, add internet gateways to the VPC. Then, adjust the security group assigned to the Lambda function `data-ingestion-processor` and the SageMaker AI notebook instance to allow outbound traffic to the public internet.
+ **Not traversing the public internet** – To deploy this sample to any Region other than `us-east-1` or `us-west-1`, do the following:

1. In either the `us-east-1` or `us-west-1` Region, create an additional VPC including a VPC endpoint for `bedrock-runtime`. 

1. Create a peering connection by using [VPC peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) or a [transit gateway](https://docs.aws.amazon.com/vpc/latest/tgw/tgw-peering.html) to the application VPC.

1. When configuring the `bedrock-runtime` boto3 client in any Lambda function outside of `us-east-1` or `us-west-1`, pass the private DNS name of the VPC endpoint for `bedrock-runtime` in `us-east-1` or us-west-1 as the `endpoint_url` to the boto3 client.

# Deploy preprocessing logic into an ML model in a single endpoint using an inference pipeline in Amazon SageMaker
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker"></a>

*Mohan Gowda Purushothama, Gabriel Rodriguez Garcia, and Mateusz Zaremba, Amazon Web Services*

## Summary
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-summary"></a>

This pattern explains how to deploy multiple pipeline model objects in a single endpoint by using an [inference pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html) in Amazon SageMaker. The pipeline model object represents different machine learning (ML) workflow stages, such as preprocessing, model inference, and postprocessing. To illustrate the deployment of serially connected pipeline model objects, this pattern shows you how to deploy a preprocessing [Scikit-learn](https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html) container and a regression model based on the [linear learner algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) built into SageMaker. The deployment is hosted behind a single endpoint in SageMaker.

**Note**  
The deployment in this pattern uses the ml.m4.2xlarge instance type. We recommend using an instance type that aligns with your data size requirements and the complexity of your workflow. For more information, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/). This pattern uses [prebuilt Docker images for Scikit-learn](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-docker-containers-scikit-learn-spark.html), but you can use your own Docker containers and integrate them into your workflow.

## Prerequisites and limitations
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ [Python 3.9](https://www.python.org/downloads/release/python-390/)
+ [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) and [Boto3 library](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
+ AWS Identity and Access Management (AWS IAM) [role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) with basic SageMaker [permissions](https://docs.aws.amazon.com/sagemaker/latest/dg/api-permissions-reference.html) and Amazon Simple Storage Service (Amazon S3) [permissions](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html)

**Product versions**
+ [Amazon SageMaker Python SDK 2.49.2](https://sagemaker.readthedocs.io/en/v2.49.2/)

## Architecture
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-architecture"></a>

**Target technology stack**
+ Amazon Elastic Container Registry (Amazon ECR)
+ Amazon SageMaker
+ Amazon SageMaker Studio
+ Amazon Simple Storage Service (Amazon S3)
+ [Real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html) endpoint for Amazon SageMaker

**Target architecture**

The following diagram shows the architecture for the deployment of an Amazon SageMaker pipeline model object.

![\[Architecture for deployment of SageMaker pipeline model object\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/1105d51b-752f-46d7-962c-acef1fb3399f/images/12f06715-b1c2-4de0-b277-99ce87308152.png)


The diagram shows the following workflow:

1. A SageMaker notebook deploys a pipeline model.

1. An S3 bucket stores the model artifacts.

1. Amazon ECR gets the source container images from the S3 bucket.

## Tools
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-tools"></a>

**AWS tools**
+ [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable.
+ [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) is a managed ML service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
+ [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

**Code**

The code for this pattern is available in the GitHub [Inference Pipeline with Scikit-learn and Linear Learner](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb) repository.

## Epics
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-epics"></a>

### Prepare the dataset
<a name="prepare-the-dataset"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Prepare the dataset for your regression task. | [Open a notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-create-open.html#notebooks-open) in Amazon SageMaker Studio.To import all necessary libraries and initialize your working environment, use the following example code in your notebook:<pre>import sagemaker<br />from sagemaker import get_execution_role<br /><br />sagemaker_session = sagemaker.Session()<br /><br /># Get a SageMaker-compatible role used by this Notebook Instance.<br />role = get_execution_role()<br /><br /># S3 prefix<br />bucket = sagemaker_session.default_bucket()<br />prefix = "Scikit-LinearLearner-pipeline-abalone-example"</pre>To download a sample dataset, add the following code to your notebook:<pre>! mkdir abalone_data<br />! aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv ./abalone_data</pre>** **The example in this pattern uses the [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/abalone) from the UCI Machine Learning Repository. | Data scientist | 
| Upload the dataset to an S3 bucket. | In the notebook where you prepared your dataset earlier, add the following code to upload your sample data to an S3 bucket:<pre>WORK_DIRECTORY = "abalone_data"<br /><br />train_input = sagemaker_session.upload_data(<br />    path="{}/{}".format(WORK_DIRECTORY, "abalone.csv"),<br />    bucket=bucket,<br />    key_prefix="{}/{}".format(prefix, "train"),<br />)</pre> | Data scientist | 

### Create the data preprocessor using SKLearn
<a name="create-the-data-preprocessor-using-sklearn"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Prepare the preprocessor.py script. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker.html) | Data scientist | 
| Create the SKLearn preprocessor object. | To create an SKLearn preprocessor object (called SKLearn Estimator) that you can incorporate into your final inference pipeline, run the following code in your SageMaker notebook:<pre>from sagemaker.sklearn.estimator import SKLearn<br /><br />FRAMEWORK_VERSION = "0.23-1"<br />script_path = "sklearn_abalone_featurizer.py"<br /><br />sklearn_preprocessor = SKLearn(<br />    entry_point=script_path,<br />    role=role,<br />    framework_version=FRAMEWORK_VERSION,<br />    instance_type="ml.c4.xlarge",<br />    sagemaker_session=sagemaker_session,<br />)<br />sklearn_preprocessor.fit({"train": train_input})</pre> | Data scientist | 
| Test the preprocessor's inference. | To confirm that your preprocessor is defined correctly, launch a [batch transform job](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) by entering the following code in your SageMaker notebook:<pre># Define a SKLearn Transformer from the trained SKLearn Estimator<br />transformer = sklearn_preprocessor.transformer(<br />    instance_count=1, instance_type="ml.m5.xlarge", assemble_with="Line", accept="text/csv"<br />)<br /><br /><br /># Preprocess training input<br />transformer.transform(train_input, content_type="text/csv")<br />print("Waiting for transform job: " + transformer.latest_transform_job.job_name)<br />transformer.wait()<br />preprocessed_train = transformer.output_path</pre> |  | 

### Create a machine learning model
<a name="create-a-machine-learning-model"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create a model object. | To create a model object based on the linear learner algorithm, enter the following code in your SageMaker notebook:<pre>import boto3<br />from sagemaker.image_uris import retrieve<br /><br />ll_image = retrieve("linear-learner", boto3.Session().region_name)<br />s3_ll_output_key_prefix = "ll_training_output"<br />s3_ll_output_location = "s3://{}/{}/{}/{}".format(<br />    bucket, prefix, s3_ll_output_key_prefix, "ll_model"<br />)<br /><br />ll_estimator = sagemaker.estimator.Estimator(<br />    ll_image,<br />    role,<br />    instance_count=1,<br />    instance_type="ml.m4.2xlarge",<br />    volume_size=20,<br />    max_run=3600,<br />    input_mode="File",<br />    output_path=s3_ll_output_location,<br />    sagemaker_session=sagemaker_session,<br />)<br /><br />ll_estimator.set_hyperparameters(feature_dim=10, predictor_type="regressor", mini_batch_size=32)<br /><br />ll_train_data = sagemaker.inputs.TrainingInput(<br />    preprocessed_train,<br />    distribution="FullyReplicated",<br />    content_type="text/csv",<br />    s3_data_type="S3Prefix",<br />)<br /><br />data_channels = {"train": ll_train_data}<br />ll_estimator.fit(inputs=data_channels, logs=True)</pre>The preceding code retrieves the relevant Amazon ECR Docker image from the public Amazon ECR Registry for the model, creates an estimator object, and then uses that object to train the regression model. | Data scientist | 

### Deploy the final pipeline
<a name="deploy-the-final-pipeline"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Deploy the pipeline model. | To create a pipeline model object (that is, a preprocessor object) and deploy the object, enter the following code in your SageMaker notebook:<pre>from sagemaker.model import Model<br />from sagemaker.pipeline import PipelineModel<br />import boto3<br />from time import gmtime, strftime<br /><br />timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())<br /><br />scikit_learn_inferencee_model = sklearn_preprocessor.create_model()<br />linear_learner_model = ll_estimator.create_model()<br /><br />model_name = "inference-pipeline-" + timestamp_prefix<br />endpoint_name = "inference-pipeline-ep-" + timestamp_prefix<br />sm_model = PipelineModel(<br />    name=model_name, role=role, models= [scikit_learn_inferencee_model, linear_learner_model]<br />)<br /><br />sm_model.deploy(initial_instance_count=1, instance_type="ml.c4.xlarge", endpoint_name=endpoint_name)</pre>You can adjust the instance type used in the model object to meet your needs. | Data scientist | 
| Test the inference. | To confirm the endpoint is working correctly, run the following sample inference code in your SageMaker notebook:<pre>from sagemaker.predictor import Predictor<br />from sagemaker.serializers import CSVSerializer<br /><br />payload = "M, 0.44, 0.365, 0.125, 0.516, 0.2155, 0.114, 0.155"<br />actual_rings = 10<br />predictor = Predictor(<br />    endpoint_name=endpoint_name, sagemaker_session=sagemaker_session, serializer=CSVSerializer()<br />)<br /><br />print(predictor.predict(payload))</pre> | Data scientist | 

## Related resources
<a name="deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker-resources"></a>
+ [Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn](https://aws.amazon.com/blogs/machine-learning/preprocess-input-data-before-making-predictions-using-amazon-sagemaker-inference-pipelines-and-scikit-learn/) (AWS Machine Learning Blog)
+ [End to end Machine Learning with Amazon SageMaker](https://github.com/aws-samples/amazon-sagemaker-build-train-deploy) (GitHub)

# Deploy real-time coding security validation by using an MCP server with Kiro and other coding assistants
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants"></a>

*Ivan Girardi and Iker Reina Fuente, Amazon Web Services*

## Summary
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-summary"></a>

This pattern describes how to implement a Model Context Protocol (MCP) server that integrates three industry-standard security scanning tools to provide comprehensive code security analysis. The server enables AI coding assistants (such as Kiro, Amazon Q Developer, and Cline) to automatically scan code snippets and infrastructure as code (IaC) configurations. With these scans, the coding assistants can help identify security vulnerabilities, misconfigurations, and compliance violations.

AI code generators trained on millions of code snippets create a security blind spot—how secure was that training data? This pattern provides real-time security validation during code generation, helping developers identify and understand potential security issues as they code. This approach helps developers address both direct vulnerabilities and inherited risks from dependencies. By bridging the gap between AI efficiency and security compliance, this pattern helps to enable safe adoption of AI-powered development tools.

This pattern helps organizations enhance their development security practices through AI-assisted coding tools, providing continuous security scanning capabilities across multiple programming languages and infrastructure definitions. The solution combines the capabilities of the following tools:
+ Checkov for scanning IaC files, including Terraform, AWS CloudFormation, and Kubernetes manifests
+ Semgrep for analyzing multiple programming languages such as Python, JavaScript, Java, and others
+ Bandit for specialized Python security scanning 

Key features of this solution include the following:
+ Delta scanning of new code segments, reducing computational overhead
+ Isolated security tool environments, preventing cross-tool contamination
+ Seamless integration with AI coding assistants (Kiro, Amazon Q Developer, Cline, and others)
+ Real-time security feedback during code generation
+ Customizable scanning rules for organizational compliance

The pattern provides a unified interface for security scanning with standardized response formats, making it easier to integrate security checks into development workflows. The pattern uses Python and the MCP framework to deliver automated security feedback. This approach helps developers identify and address security issues early in the development process while learning about security best practices through detailed findings.

## Prerequisites and limitations
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-prereqs"></a>

**Prerequisites**
+ An active AWS account with access to use Kiro or Amazon Q Developer, if you want to use either of those coding assistants
+ Python version 3.10 or later [installed](https://www.python.org/downloads/)
+ `uv` package manager [installed](https://docs.astral.sh/uv/getting-started/installation/)
+ Familiarity with security scanning tools and concepts
+ Basic understanding of IaC and application security

**Limitations**
+ Bandit scanning is limited to Python files only.
+ Real-time scanning might impact performance for large code bases.
+ Tool-specific limitations are based on supported file formats and languages.
+ Manual review is required to validate security findings.
+ Security scanning results require security expertise for proper interpretation.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS Services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html), and choose the link for the service.

**Product versions**
+ Python version 3.10 or later
+ Checkov version 3.0.0 or later
+ Semgrep version 1.45.0 or later
+ Bandit version 1.7.5 or later
+ MCP[cli] version 1.11.0 or later
+ Pydantic version 1.10.0 or later
+ Loguru version 0.6.0 or later

## Architecture
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-architecture"></a>

The following diagram shows the architecture for this solution.

![\[AI assistants send code to MCP security scanner server to route to specialized scanners; scan results sent to developer.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/fa623544-4d54-48af-a4e4-9b6a0624e776/images/9c881f95-76d0-40f6-983e-d987fd2097b8.png)


The diagram shows the following workflow:

1. The developer uses AI assistants (for example, Kiro, Cline, Amazon Q Developer, or Roo Code) to generate or analyze code. The AI assistant sends the code for security scanning.

1. The MCP security scanner server processes the request by routing it to the appropriate specialized scanner: Checkov for IaC files, Semgrep for multiple programming languages analysis, or Bandit for Python-specific security scanning.

1. Scanner results with security findings, severity levels, detailed descriptions, and suggested fixes are sent back to the developer through the AI assistant.

1. A continuous feedback loop is established where the developer receives real-time security validation, enabling automated fixes through AI assistants and promoting security best practices during development.

The architecture mitigates the following common security risks: 
+ Command injection
+ Prompt injection
+ Path traversal
+ Dependency attacks
+ Resource exhaustion 

The architecture mitigates these common security risks by implementing the following best practices: 
+ All user and AI model inputs are written to temporary files.
+ No direct inputs are provided to command line interface (CLI) commands.
+ File system access is restricted to temporary directories and files only.
+ Temporary files are automatically cleaned up.
+ Scanning responses are sanitized.
+ Process isolation that restricts process capabilities is enforced.
+ All scanning activities are logged.

**Automation and scale**

The pattern supports automation through the following capabilities:
+ Integration with AI coding assistants for automatic code scanning
+ Standardized API responses for automated processing
+ Configuration through MCP configuration files
+ Support for batch processing of multiple files
+ Scalable scanning across multiple programming languages and IaC formats

The scanning process can be automated through the provided API endpoints:
+ `scan_with_checkov` for IaC scanning
+ `scan_with_semgrep` for multi-language code scanning
+ `scan_with_bandit` for Python-specific scanning
+ `get_supported_formats` for format validation

When extending the scanning tools, follow the design principles and best practices described earlier in this section. Also see [Best practices](#deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-best-practices). 

## Tools
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-tools"></a>

**AWS services**
+ [Kiro](https://aws.amazon.com/documentation-overview/kiro/) is an agentic coding service that works alongside developers to turn prompts into detailed specs, then into working code, docs, and tests. Kiro agents help developers solve challenging problems and automate tasks like generating documentation and unit tests.
+ [Amazon Q Developer](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html) is a generative AI-powered conversational assistant that can help you understand, build, extend, and operate AWS applications.

**Other tools**
+ [Bandit](https://bandit.readthedocs.io/en/latest/) is a specialized Python security scanner tool. It detects common Python security issues like insecure functions, hardcoded secrets, and injection vulnerabilities. Bandit provides detailed confidence and severity ratings.
+ [Checkov](https://github.com/bridgecrewio/checkov) is a static code-analysis tool that checks IaC for security and compliance misconfigurations. In addition, Checkov detects compliance violations and security best practices.
+ [Cline](https://cline.bot/) is an AI-powered coding assistant that runs in VS Code.
+ [Loguru](https://loguru.readthedocs.io/en/stable/) is a data validation library for Python.
+ [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro) is an open source framework for building AI-assisted development tools.
+ [Pydantic](https://docs.pydantic.dev/latest/) is a data validation library for Python.
+ [Semgrep](https://semgrep.dev/docs/introduction) analyzes source code for security vulnerabilities and bugs. It supports multiple programming languages. Semgrep uses security-focused rulesets for comprehensive analysis. It provides detailed confidence and severity ratings.

**Code repository**

The code for this pattern is available in the GitHub [MCP Security Scanner: Real-Time Protection for AI Code Assistants](https://github.com/aws-samples/sample-mcp-security-scanner) repository. The repository includes the MCP server implementation, details on the MCP configuration for Kiro, Amazon Q Developer, Cline and others, configuration examples, and testing utilities.

The repository structure includes:
+ `security_scanner_mcp_server/` - Main server implementation
+ `docs/` - Documentation and demo materials
+ `tests/` - Test files
+ `mcp-config-example.json` - Example MCP configuration
+ `requirements.txt` - Project dependencies

## Best practices
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-best-practices"></a>

**Security scanning implementation**
+ Review security findings to validate and prioritize issues.
+ Keep scanning tools (Checkov, Semgrep, and Bandit) updated to latest versions.
+ Use this pattern’s MCP security tool in conjunction with other security measures and tools.
+ Update security rule sets and policies regularly.

**Configuration management**
+ Use the MCP configuration files in the official version control source.
+ Document custom rules and configurations.

**Integration**
+ Integrate security scanning early in the development cycle.
+ Set up automated scanning in pre-commit hooks or continuous integration and continuous deployment (CI/CD) pipelines.
+ Configure appropriate severity thresholds for your environment.
+ Establish clear procedures for handling security findings.

**Operational considerations**
+ Monitor scanning performance and resource usage.
+ Implement proper error handling and logging.
+ Maintain documentation of custom configurations.
+ Establish a process for reviewing and updating security rules.

Also, keep in mind the following best practices:
+ Always validate security findings in your specific context.
+ Keep security tools and dependencies up to date.
+ Use multiple security tools for comprehensive coverage.
+ Follow security best practices in your development process.

## Epics
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-epics"></a>

### (Kiro users) Set up the MCP security scanner server
<a name="kiro-users-set-up-the-mcp-security-scanner-server"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Configure MCP settings. | You can edit the configuration files in Kiro either by (Option 1) manually locating the configuration files or (Option 2) by using the Kiro IDE.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html)<pre>{<br />  "mcpServers": {<br />    "security-scanner": {<br />      "command": "uvx",<br />      "args": [<br />        "--from",<br />        "git+https://github.com/aws-samples/sample-mcp-security-scanner.git@main",<br />        "security_scanner_mcp_server"<br />      ],<br />      "env": {<br />        "FASTMCP_LOG_LEVEL": "ERROR"<br />      },<br />      "disabled": false,<br />      "autoApprove": []<br />    }<br />  }<br />}</pre> | App developer | 

### (Amazon Q Developer users) Set up the MCP security scanner server
<a name="qdevlong-users-set-up-the-mcp-security-scanner-server"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Configure MCP settings. | To configure the MCP settings manually, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html)<pre>{<br />  "mcpServers": {<br />    "security-scanner": {<br />      "command": "uvx",<br />      "args": [<br />        "--from",<br />        "git+https://github.com/aws-samples/sample-mcp-security-scanner.git@main",<br />        "security_scanner_mcp_server"<br />      ],<br />      "env": {<br />        "FASTMCP_LOG_LEVEL": "ERROR"<br />      }<br />    }<br />  }<br />}</pre> | App developer | 

### (Cline users) Set up the MCP security scanner server
<a name="cline-users-set-up-the-mcp-security-scanner-server"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Configure MCP settings. | To configure the MCP settings manually, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html)<pre>{<br />  "mcpServers": {<br />    "security-scanner": {<br />      "command": "uvx",<br />      "args": [<br />        "--from",<br />        "git+https://github.com/aws-samples/sample-mcp-security-scanner.git@main",<br />        "security_scanner_mcp_server"<br />      ],<br />      "env": {<br />        "FASTMCP_LOG_LEVEL": "ERROR"<br />      },<br />      "disabled": false,<br />      "autoApprove": []<br />    }<br />  }<br />}</pre> | App developer | 

### Example of code analysis using Python and Bandit
<a name="example-of-code-analysis-using-python-and-bandit"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Perform code analysis. | To perform code analysis by using Python and Bandit, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | App developer | 

### Example of code analysis using Terraform and Checkov
<a name="example-of-code-analysis-using-terraform-and-checkov"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Perform code analysis. | To perform code analysis by using Terraform and Checkov, use the following steps:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | App developer | 

### Example of advanced scanning capabilities
<a name="example-of-advanced-scanning-capabilities"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Perform targeted scanning. | Following are examples of requests that you can use to perform a targeted scan:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | App developer | 
| Use security scanning with code generation. | To resolve security findings by using code generation loops, use the following steps (this example uses Kiro as the coding assistant):[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | App developer | 

## Troubleshooting
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| Environment setup issues | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | 
| Scanner issues | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | 
| Integration problems | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | 
| Additional support | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants.html) | 

## Related resources
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-resources"></a>

**AWS documentation**
+ [Infrastructure as code](https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/infrastructure-as-code.html) (AWS Whitepaper *Introduction to DevOps on AWS*)

**Other AWS resources**
+ [Best Practices for Security, Identity, & Compliance](https://aws.amazon.com/architecture/security-identity-compliance/)

**Other resources**
+ [Bandit documentation](https://aws.amazon.com/architecture/security-identity-compliance/)
+ [Checkov documentation](https://aws.amazon.com/architecture/security-identity-compliance/)
+ [Model Context Protocol (MCP) documentation](https://aws.amazon.com/architecture/security-identity-compliance/)
+ [OWASP Secure Coding Practices](https://owasp.org/www-project-secure-coding-practices-quick-reference-guide/) (OWASP Foundation website)
+ [Semgrep documentation](https://aws.amazon.com/architecture/security-identity-compliance/)

## Additional information
<a name="deploy-real-time-coding-security-validation-by-using-an-mcp-server-with-kiro-and-other-coding-assistants-additional"></a>

**Example MCP configuration with auto approved enabled**

Without `autoApprove` configured, the user must grant approval to send the code to the MCP security server for scanning. When `autoApprove` is configured, the code assistant is allowed to invoke the tools without user approval. These tools run locally on the machine, no data is sent out, and only a code scan is performed.

The following configuration enables automatic execution of all security scanning functions:

```
{
  "mcpServers": {
    "security-scanner": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/aws-samples/sample-mcp-security-scanner.git@main",
        "security_scanner_mcp_server"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "disabled": false,
      "autoApprove": [
        "scan_with_checkov",
        "scan_with_semgrep", 
        "scan_with_bandit",
        "get_supported_formats"
      ]
    }
  }
}
```

To enable debug logging, set `"FASTMCP_LOG_LEVEL"` to `"DEBUG"`.

**File formats supported by security scanning tools**

Each security scanning tool in this solution supports the following file formats:

*Checkov (IaC)*
+ Terraform – .tf, .tfvars, .tfstate
+ CloudFormation – .yaml, .yml, .json, .template
+ Kubernetes – .yaml, .yml
+ Dockerfile – Dockerfile
+ ARM – .json (Azure Resource Manager)
+ Bicep – .bicep
+ Serverless – .yml, .yaml
+ Helm – .yaml, .yml, .tpl
+ GitHub Actions – .yml, .yaml
+ GitLab\$1ci – .yml, .yaml
+ Ansible – .yml, .yaml

*Semgrep (Source code)*
+ Python – .py
+ JavaScript – .js
+ TypeScript – .ts
+ Java – .java
+ Go – .go
+ C – .c
+ C\$1\$1 – .cpp
+ C\$1 – .cs
+ Ruby – .rb
+ PHP – .php
+ Scala – .scala
+ Kotlin – .kt
+ Rust – .rs

*Bandit (Python only)*
+ Python – .py

**Demos**

For code scanning, try the following sample prompts with your AI assistant:
+ "Scan the current script and tell me the results."
+ "Scan lines 20–60 and tell me the results."
+ "Scan this Amazon DynamoDB table resource and tell me the result."

For more information, see this [code scanning demo](https://github.com/aws-samples/sample-mcp-security-scanner/blob/main/docs/demo_code_scan.gif) in this pattern’s GitHub repository.

To generate secure code, try the following sample prompts:
+ "Generate a Terraform configuration to create a DynamoDB table with encryption enabled and scan it for security issues."
+ "Create a Python Lambda function that writes to DynamoDB and scan it for vulnerabilities."
+ "Generate a CloudFormation template for an S3 bucket with proper security settings and verify it passes security checks."
+ "Write a Python script to query DynamoDB with pagination and scan for security best practices."
+ "Create a Kubernetes deployment manifest for a microservice with security hardening and validate it."

For more information, see this [code generation with security scanning](https://github.com/aws-samples/sample-mcp-security-scanner/blob/main/docs/demo_code_generation.gif) demo in this pattern’s GitHub repository.

# Develop advanced generative AI chat-based assistants by using RAG and ReAct prompting
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting"></a>

*Praveen Kumar Jeyarajan, Shuai Cao, Noah Hamilton, Kiowa Jackson, Jundong Qiao, and Kara Yang, Amazon Web Services*

## Summary
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-summary"></a>

A typical corporation has 70 percent of its data trapped in siloed systems. You can use generative AI-powered chat-based assistants to unlock insights and relationships between these data silos through natural language interactions. To get the most out of generative AI, the outputs must be trustworthy, accurate, and inclusive of the available corporate data. Successful chat-based assistants depend on the following:
+ Generative AI models (such as Anthropic Claude 2)
+ Data source vectorization
+ Advanced reasoning techniques, such as the [ReAct framework](https://www.promptingguide.ai/techniques/react), for prompting the model

This pattern provides data-retrieval approaches from data sources such as Amazon Simple Storage Service (Amazon S3) buckets, AWS Glue, and Amazon Relational Database Service (Amazon RDS). Value is gained from that data by interleaving [Retrieval Augmented Generation (RAG)](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html) with chain-of-thought methods. The results support complex chat-based assistant conversations that draw on the entirety of your corporation's stored data.

This pattern uses Amazon SageMaker manuals and pricing data tables as an example to explore the capabilities of a generative AI chat-based assistant. You will build a chat-based assistant that helps customers evaluate the SageMaker service by answering questions about pricing and the service's capabilities. The solution uses a Streamlit library for building the frontend application and the LangChain framework for developing the application backend powered by a large language model (LLM).

Inquiries to the chat-based assistant are met with an initial intent classification for routing to one of three possible workflows. The most sophisticated workflow combines general advisory guidance with complex pricing analysis. You can adapt the pattern to suit enterprise, corporate, and industrial use cases.

## Prerequisites and limitations
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-prereqs"></a>

**Prerequisites**
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) installed and configured
+ [AWS Cloud Development Kit (AWS CDK) Toolkit 2.114.1 or later](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) installed and configured
+ Basic familiarity with Python and AWS CDK
+ [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) installed
+ [Docker](https://docs.docker.com/get-docker/) installed
+ [Python 3.11 or later](https://www.python.org/downloads/) installed and configured (for more information, see the [Tools](#develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-tools) section)
+ An [active AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html) bootstrapped by using [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html)
+ Amazon Titan and Anthropic Claude [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html#add-model-access) enabled in the Amazon Bedrock service
+ [AWS security credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html), including `AWS_ACCESS_KEY_ID`, correctly configured in your terminal environment

**Limitations**
+ LangChain doesn't support every LLM for streaming. The Anthropic Claude models are supported, but models from AI21 Labs are not.
+ This solution is deployed to a single AWS account.
+ This solution can be deployed only in AWS Regions where Amazon Bedrock and Amazon Kendra are available. For information about availability, see the documentation for [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html#bedrock-regions) and [Amazon Kendra](https://docs.aws.amazon.com/general/latest/gr/kendra.html).

**Product versions**
+ Python version 3.11 or later
+ Streamlit version 1.30.0 or later
+ Streamlit-chat version 0.1.1 or later
+ LangChain version 0.1.12 or later
+ AWS CDK version 2.132.1 or later

## Architecture
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-architecture"></a>

**Target technology stack**
+ Amazon Athena
+ Amazon Bedrock
+ Amazon Elastic Container Service (Amazon ECS)
+ AWS Glue
+ AWS Lambda
+ Amazon S3
+ Amazon Kendra
+ Elastic Load Balancing

**Target architecture**

The AWS CDK code will deploy all the resources that are required to set up the chat-based assistant application in an AWS account. The chat-based assistant application shown in the following diagram is designed to answer SageMaker related queries from users. Users connect through an Application Load Balancer to a VPC that contains an Amazon ECS cluster hosting the Streamlit application. An orchestration Lambda function connects to the application. S3 bucket data sources provide data to the Lambda function through Amazon Kendra and AWS Glue. The Lambda function connects to Amazon Bedrock for answering queries (questions) from chat-based assistant users.

![\[Architecture diagram.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/b4df6405-76ab-4493-a722-15ceca067254/images/4e5856cf-9489-41f8-a411-e3b8d8a50748.png)


1. The orchestration Lambda function sends the LLM prompt request to the Amazon Bedrock model (Claude 2).

1. Amazon Bedrock sends the LLM response back to the orchestration Lambda function.

**Logic flow within the orchestration Lambda function**

When users ask a question through the Streamlit application, it invokes the orchestration Lambda function directly. The following diagram shows the logic flow when the Lambda function is invoked.

![\[Architecture diagram.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/b4df6405-76ab-4493-a722-15ceca067254/images/70ae4736-06a6-4d3a-903a-edc5c10d78a0.png)

+ Step 1 – The input `query` (question) is classified into one of the three intents:
  + General SageMaker guidance questions
  + General SageMaker pricing (training/inference) questions
  + Complex questions related to SageMaker and pricing
+ Step 2 – The input `query` initiates one of the three services:
  + `RAG Retrieval service`, which retrieves relevant context from the [Amazon Kendra](https://aws.amazon.com/kendra/) vector database and calls the LLM through [Amazon Bedrock](https://aws.amazon.com/bedrock/) to summarize the retrieved context as the response.
  + `Database Query service`, which uses- the LLM, database metadata, and sample rows from relevant tables to convert the input `query` into a SQL query. Database Query service runs the SQL query against the SageMaker pricing database through [Amazon Athena](https://aws.amazon.com/athena/) and summarizes the query results as the response.
  + `In-context ReACT Agent service`, which breaks down the input `query` into multiple steps before providing a response. The agent uses `RAG Retrieval service` and `Database Query service` as tools to retrieve relevant information during the reasoning process. After the reasoning and actions processes are complete, the agent generates the final answer as the response.
+ Step 3 – The response from the orchestration Lambda function is sent to the Streamlit application as output.

## Tools
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-tools"></a>

**AWS services**
+ [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/what-is.html) is an interactive query service that helps you analyze data directly in Amazon Simple Storage Service (Amazon S3) by using standard SQL.
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
+ [AWS Cloud Development Kit (AWS CDK)](https://docs.aws.amazon.com/cdk/latest/guide/home.html) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
+ [Amazon Elastic Container Service (Amazon ECS)](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html) is a fast and scalable container management service that helps you run, stop, and manage containers on a cluster.
+ [AWS Glue](https://docs.aws.amazon.com/glue/) is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams. This pattern uses an AWS Glue crawler and an AWS Glue Data Catalog table.
+ [Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html) is an intelligent search service that uses natural language processing and advanced machine learning algorithms to return specific answers to search questions from your data.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
+ [Elastic Load Balancing (ELB)](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html) distributes incoming application or network traffic across multiple targets. For example, you can distribute traffic across Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, and IP addresses in one or more Availability Zones.

**Code repository**

The code for this pattern is available in the GitHub [genai-bedrock-chatbot](https://github.com/awslabs/genai-bedrock-chatbot) repository.

The code repository contains the following files and folders:
+ `assets` folder – The static assets the architecture diagram and the public dataset
+ `code/lambda-container` folder – The Python code that is run in the Lambda function
+ `code/streamlit-app` folder – The Python code that is run as the container image in Amazon ECS
+ `tests` folder – The Python files that are run to unit test the AWS CDK constructs
+ `code/code_stack.py` – The AWS CDK construct Python files used to create AWS resources
+ `app.py` – The AWS CDK stack Python files used to deploy AWS resources in the target AWS account
+ `requirements.txt` – The list of all Python dependencies that must be installed for AWS CDK
+ `requirements-dev.txt` – The list of all Python dependencies that must be installed for AWS CDK to run the unit-test suite
+ `cdk.json` – The input file to provide values required to spin up resources


| 
| 
| Note: The AWS CDK code uses [L3 (layer 3) constructs](https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html) and [AWS Identity and Access Management (IAM) policies managed by AWS](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#aws-managed-policies) for deploying the solution. | 
| --- |

## Best practices
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-best-practices"></a>
+ The code example provided here is for a proof-of-concept (PoC) or pilot demo only. If you want to take the code to Production, be sure to use the following best practices:
  + [Amazon S3 access logging is enabled](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-server-access-logging.html).
  + [VPC Flow Logs is enabled](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html).
  + The [Amazon Kendra Enterprise Edition index](https://docs.aws.amazon.com/whitepapers/latest/how-aws-pricing-works/amazon-kendra.html) is enabled.
+ Set up monitoring and alerting for the Lambda function. For more information, see [Monitoring and troubleshooting Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/lambda-monitoring.html). For general best practices when working with Lambda functions, see the [AWS documentation](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html).

## Epics
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-epics"></a>

### Set up AWS credentials on your local machine
<a name="set-up-aws-credentials-on-your-local-machine"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Export variables for the account and AWS Region where the stack will be deployed. | To provide AWS credentials for AWS CDK by using environment variables, run the following commands.<pre>export CDK_DEFAULT_ACCOUNT=<12 Digit AWS Account Number><br />export CDK_DEFAULT_REGION=<region></pre> | DevOps engineer, AWS DevOps | 
| Set up the AWS CLI profile. | To set up the AWS CLI profile for the account, follow the instructions in the [AWS documentation](https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/keys-profiles-credentials.html). | DevOps engineer, AWS DevOps | 

### Set up your environment
<a name="set-up-your-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the repo on your local machine. | To clone the repository, run the following command in your terminal.<pre>git clone https://github.com/awslabs/genai-bedrock-chatbot.git</pre> | DevOps engineer, AWS DevOps | 
| Set up the Python virtual environment and install required dependencies. | To set up the Python virtual environment, run the following commands.<pre>cd genai-bedrock-chatbot<br />python3 -m venv .venv<br />source .venv/bin/activate</pre>To set up the required dependencies, run the following command.<pre>pip3 install -r requirements.txt</pre> | DevOps engineer, AWS DevOps | 
| Set up the AWS CDK environment and synthesize the AWS CDK code. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.html) | DevOps engineer, AWS DevOps | 

### Configure and deploy the chat-based assistant application
<a name="configure-and-deploy-the-chat-based-assistant-application"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Provision Claude model access. | To enable Anthropic Claude model access for your AWS account, follow the instructions in the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html#add-model-access). | AWS DevOps | 
| Deploy resources in the account. | To deploy resources in the AWS account by using the AWS CDK, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.html)Upon successful deployment, you can access the chat-based assistant application by using the URL provided in the CloudFormation **Outputs** section. | AWS DevOps, DevOps engineer | 
| Run the AWS Glue crawler and create the Data Catalog table. | An [AWS Glue crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) is used to keep the data schema dynamic. The solution creates and updates partitions in the [AWS Glue Data Catalog table](https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html) by running the crawler on demand. After the CSV dataset files are copied into the S3 bucket, run the AWS Glue crawler and create the Data Catalog table schema for testing:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.html)The AWS CDK code configures the AWS Glue crawler to run on demand, but you can also [schedule](https://docs.aws.amazon.com/glue/latest/dg/schedule-crawler.html) it to run periodically. | DevOps engineer, AWS DevOps | 
| Initiate document indexing. | After the files are copied into the S3 bucket, use Amazon Kendra to crawl and index them:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.html)The AWS CDK code configures the Amazon Kendra index sync to run on demand, but you can also run periodically by using the [Schedule parameter](https://docs.aws.amazon.com/kendra/latest/dg/data-source.html#cron). | AWS DevOps, DevOps engineer | 

### Clean up all AWS resources in the solution
<a name="clean-up-all-aws-resources-in-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Remove the AWS resources. | After you test the solution, clean up the resources:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting.html) | DevOps engineer, AWS DevOps | 

## Troubleshooting
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| AWS CDK returns errors. | For help with AWS CDK issues, see [Troubleshooting common AWS CDK issues](https://docs.aws.amazon.com/cdk/v2/guide/troubleshooting.html). | 

## Related resources
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-resources"></a>
+ Amazon Bedrock:
  + [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)
  + [Inference parameters for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)
+ [Building Lambda functions with Python](https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html)
+ [Get started with the AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html)
+ [Working with the AWS CDK in Python](https://docs.aws.amazon.com/cdk/v2/guide/work-with-cdk-python.html)
+ [Generative AI Application Builder on AWS](https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/solution-overview.html)
+ [LangChain documentation](https://python.langchain.com/docs/get_started/introduction)
+ [Streamlit documentation](https://docs.streamlit.io/)

## Additional information
<a name="develop-advanced-generative-ai-chat-based-assistants-by-using-rag-and-react-prompting-additional"></a>

**AWS CDK commands**

When working with AWS CDK, keep in mind the following useful commands:
+ Lists all stacks in the app

  ```
  cdk ls
  ```
+ Emits the synthesized AWS CloudFormation template

  ```
  cdk synth
  ```
+ Deploys the stack to your default AWS account and Region

  ```
  cdk deploy
  ```
+ Compares the deployed stack with the current state

  ```
  cdk diff
  ```
+ Opens the AWS CDK documentation

  ```
  cdk docs
  ```
+ Deletes the CloudFormation stack and removes AWS deployed resources

  ```
  cdk destroy
  ```

# Develop a fully automated chat-based assistant by using Amazon Bedrock agents and knowledge bases
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases"></a>

*Jundong Qiao, Shuai Cao, Noah Hamilton, Kiowa Jackson, Praveen Kumar Jeyarajan, and Kara Yang, Amazon Web Services*

## Summary
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-summary"></a>

Many organizations face challenges when creating a chat-based assistant that is capable of orchestrating diverse data sources to offer comprehensive answers. This pattern presents a solution for developing a chat-based assistant that is capable of answering queries from both documentation and databases, with a straightforward deployment.

Starting with [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html), this fully managed generative artificial intelligence (AI) service provides a wide array of advanced foundation models (FMs). This facilitates the efficient creation of generative AI applications with a strong focus on privacy and security. In the context of documentation retrieval, the [Retrieval Augmented Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) is a pivotal feature. It uses [knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html) to augment FM prompts with contextually relevant information from external sources. An [Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html) index serves as the vector database behind the knowledge bases for Amazon Bedrock. This integration is enhanced through careful prompt engineering to minimize inaccuracies and make sure that responses are anchored in factual documentation. For database queries, the FMs of Amazon Bedrock transform textual inquiries into structured SQL queries, incorporating specific parameters. This enables the precise retrieval of data from databases managed by [AWS Glue databases](https://docs.aws.amazon.com/glue/latest/dg/define-database.html). [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/what-is.html) is used for these queries.

For handling more intricate queries, achieving comprehensive answers demands information sourced from both documentation and databases. [Agents for Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html) is a generative AI feature that helps you build autonomous agents that can understand complex tasks and break them down into simpler tasks for orchestration. The combination of insights retrieved from the simplified tasks, facilitated by Amazon Bedrock autonomous agents, enhances the synthesis of information, leading to more thorough and exhaustive answers. This pattern demonstrates how to build a chat-based assistant by using Amazon Bedrock and the related generative AI services and features within an automated solution.

## Prerequisites and limitations
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ Docker, [installed](https://docs.docker.com/engine/install/)
+ AWS Cloud Development Kit (AWS CDK), [installed](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_tools) and [bootstrapped](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_bootstrap) to the `us-east-1` or `us-west-2` AWS Regions
+ AWS CDK Toolkit version 2.114.1 or later, [installed](https://docs.aws.amazon.com/cdk/v2/guide/cli.html)
+ AWS Command Line Interface (AWS CLI), [installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
+ Python version 3.11 or later, [installed](https://www.python.org/downloads/)
+ In Amazon Bedrock, [enable access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) to Claude 2, Claude 2.1, Claude Instant, and Titan Embeddings G1 – Text

**Limitations**
+ This solution is deployed to a single AWS account.
+ This solution can be deployed only in AWS Regions where Amazon Bedrock and Amazon OpenSearch Serverless are supported. For more information, see the documentation for [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html) and [Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-regions).

**Product versions**
+ Llama-index version 0.10.6 or later
+ Sqlalchemy version 2.0.23 or later
+ Opensearch-py version 2.4.2 or later
+ Requests\$1aws4auth version 1.2.3 or later
+ AWS SDK for Python (Boto3) version 1.34.57 or later

## Architecture
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-architecture"></a>

**Target technology stack **

The [AWS Cloud Development Kit (AWS CDK)](https://docs.aws.amazon.com/cdk/v2/guide/home.html) is an open source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. The AWS CDK stack used in this pattern deploys the following AWS resources: 
+ AWS Key Management Service (AWS KMS)
+ Amazon Simple Storage Service (Amazon S3)
+ AWS Glue Data Catalog, for the AWS Glue database component
+ AWS Lambda
+ AWS Identity and Access Management (IAM)
+ Amazon OpenSearch Serverless
+ Amazon Elastic Container Registry (Amazon ECR) 
+ Amazon Elastic Container Service (Amazon ECS)
+ AWS Fargate
+ Amazon Virtual Private Cloud (Amazon VPC)
+ [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 

**Target architecture **

![\[Architecture diagram using an Amazon Bedrock knowledge base and agent\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/15372718-3a5d-4918-9cfa-422c455f288d/images/ff19152e-0bb6-4758-a6dd-4f6140e55113.png)


The diagram shows a comprehensive AWS cloud-native setup within a single AWS Region, using multiple AWS services. The primary interface for the chat-based assistant is a [Streamlit](https://docs.streamlit.io/) application hosted on an Amazon ECS cluster. An [Application Load Balancer](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/) manages accessibility. Queries made through this interface activate the `Invocation` Lambda function, which then interfaces with agents for Amazon Bedrock. This agent responds to user inquiries by either consulting the knowledge bases for Amazon Bedrock or by invoking an `Agent executor` Lambda function. This function triggers a set of actions associated with the agent, following a predefined API schema. The knowledge bases for Amazon Bedrock use an OpenSearch Serverless index as their vector database foundation. Additionally, the `Agent executor` function generates SQL queries that are executed against the AWS Glue database through Amazon Athena. 

## Tools
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-tools"></a>

**AWS services**
+ [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/what-is.html) is an interactive query service that helps you analyze data directly in Amazon Simple Storage Service (Amazon S3) by using standard SQL.
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
+ [AWS Cloud Development Kit (AWS CDK)](https://docs.aws.amazon.com/cdk/latest/guide/home.html) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open source tool that helps you interact with AWS services through commands in your command-line shell.
+ [Amazon Elastic Container Service (Amazon ECS)](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html) is a fast and scalable container management service that helps you run, stop, and manage containers on a cluster.
+ [Elastic Load Balancing ](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html)distributes incoming application or network traffic across multiple targets. For example, you can distribute traffic across Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, and IP addresses in one or more Availability Zones.
+ [AWS Glue](https://docs.aws.amazon.com/glue/) is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams. This pattern uses an AWS Glue crawler and an AWS Glue Data Catalog table.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html) is an on-demand serverless configuration for Amazon OpenSearch Service. In this pattern, an OpenSearch Serverless index serves as a vector database for the knowledge bases for Amazon Bedrock.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

**Other tools**
+ [Streamlit](https://docs.streamlit.io/) is an open source Python framework to create data applications.

**Code repository**

The code for this pattern is available in the GitHub [genai-bedrock-agent-chatbot](https://github.com/awslabs/genai-bedrock-agent-chatbot/) repository. The code repository contains the following files and folders:
+ `assets` folder – The static assets, such as the architecture diagram and the public dataset.
+ `code/lambdas/action-lambda` folder – The Python code for the Lambda function that acts as an action for the Amazon Bedrock agent.
+ `code/lambdas/create-index-lambda` folder – The Python code for the Lambda function that creates the OpenSearch Serverless index.
+ `code/lambdas/invoke-lambda` folder – The Python code for the Lambda function that invokes the Amazon Bedrock agent, which is called directly from the Streamlit application.
+ `code/lambdas/update-lambda` folder – The Python code for the Lambda function that updates or deletes resources after the AWS resources are deployed through the AWS CDK.
+ `code/layers/boto3_layer` folder – The AWS CDK stack that creates a Boto3 layer that is shared across all Lambda functions.
+ `code/layers/opensearch_layer` folder – The AWS CDK stack that creates an OpenSearch Serverless layer that installs all dependencies to create the index.
+ `code/streamlit-app` folder – The Python code that is run as the container image in Amazon ECS.
+ `code/code_stack.py` – The AWS CDK construct Python files that create AWS resources.
+ `app.py` – The AWS CDK stack Python files that deploy AWS resources in the target AWS account.
+ `requirements.txt` – The list of all Python dependencies that must be installed for the AWS CDK.
+ `cdk.json` – The input file to provide the values that are required to create resources. Also, in the `context/config` fields, you can customize the solution accordingly. For more information about customization, see the [Additional information](#develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-additional) section.

## Best practices
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-best-practices"></a>
+ The code example provided here is for proof-of-concept (PoC) or pilot purposes only. If you want to take the code to production, be sure to use the following best practices:
  + Enable [Amazon S3 access logging](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-server-access-logging.html)
  + Enable [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html)
+ Set up monitoring and alerting for the Lambda functions. For more information, see [Monitoring and troubleshooting Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/lambda-monitoring.html). For best practices, see the [Best practices for working with AWS Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html).

## Epics
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-epics"></a>

### Set up AWS credentials on your local workstation
<a name="set-up-aws-credentials-on-your-local-workstation"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Export variables for the account and Region. | To provide AWS credentials for the AWS CDK by using environment variables, run the following commands.<pre>export CDK_DEFAULT_ACCOUNT=<12-digit AWS account number><br />export CDK_DEFAULT_REGION=<Region></pre> | AWS DevOps, DevOps engineer | 
| Set up the AWS CLI named profile. | To set up the AWS CLI named profile for the account, follow the instructions in [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). | AWS DevOps, DevOps engineer | 

### Set up your environment
<a name="set-up-your-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the repo to your local workstation. | To clone the repository, run the following command in your terminal.<pre>git clone https://github.com/awslabs/genai-bedrock-agent-chatbot.git</pre> | DevOps engineer, AWS DevOps | 
| Set up the Python virtual environment. | To set up the Python virtual environment, run the following commands.<pre>cd genai-bedrock-agent-chatbot<br />python3 -m venv .venv<br />source .venv/bin/activate</pre>To set up the required dependencies, run the following command.<pre>pip3 install -r requirements.txt</pre> | DevOps engineer, AWS DevOps | 
| Set up the AWS CDK environment. | To convert the code to an AWS CloudFormation template, run the command `cdk synth`. | AWS DevOps, DevOps engineer | 

### Configure and deploy the application
<a name="configure-and-deploy-the-application"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Deploy resources in the account. | To deploy resources in the AWS account by using the AWS CDK, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases.html)After successful deployment, you can access the chat-based assistant application by using the URL provided on the **Outputs** tab in the CloudFormation console. | DevOps engineer, AWS DevOps | 

### Clean up all AWS resources in the solution
<a name="clean-up-all-aws-resources-in-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Remove the AWS resources. | After you test the solution, to clean up the resources, run the command `cdk destroy`. | AWS DevOps, DevOps engineer | 

## Related resources
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-resources"></a>

**AWS documentation**
+ Amazon Bedrock resources:
  + [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)
  + [Inference parameters for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)
  + [Agents for Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html)
  + [Knowledge bases for Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html)
+ [Building Lambda functions with Python](https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html)
+ AWS CDK resources:
  + [Get started with the AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html)
  + [Troubleshooting common AWS CDK issues](https://docs.aws.amazon.com/cdk/v2/guide/troubleshooting.html)
  + [Working with the AWS CDK in Python](https://docs.aws.amazon.com/cdk/v2/guide/work-with-cdk-python.html)
+ [Generative AI Application Builder on AWS](https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/solution-overview.html)

**Other AWS resources**
+ [Vector Engine for Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/serverless-vector-engine/)

**Other resources**
+ [LlamaIndex documentation](https://docs.llamaindex.ai/en/stable/)
+ [Streamlit documentation](https://docs.streamlit.io/)

## Additional information
<a name="develop-a-fully-automated-chat-based-assistant-by-using-amazon-bedrock-agents-and-knowledge-bases-additional"></a>

**Customize the chat-based assistant with your own data**

To integrate your custom data for deploying the solution, follow these structured guidelines. These steps are designed to ensure a seamless and efficient integration process, enabling you to deploy the solution effectively with your bespoke data.

*For knowledge base data integration*

**Data preparation**

1. Locate the `assets/knowledgebase_data_source/` directory.

1. Place your dataset within this folder.

**Configuration adjustments**

1. Open the `cdk.json` file.

1. Navigate to the `context/configure/paths/knowledgebase_file_name` field, and then update it accordingly.

1. Navigate to the `bedrock_instructions/knowledgebase_instruction` field, and then update it to accurately reflect the nuances and context of your new dataset.

*For structural data integration*

**Data organization**

1. Within the `assets/data_query_data_source/` directory, create a subdirectory, such as `tabular_data`.

1. Put your structured dataset (acceptable formats include CSV, JSON, ORC, and Parquet) into this newly created subfolder.

1. If you are connecting to an existing database, update the function `create_sql_engine()` in `code/lambda/action-lambda/build_query_engine.py` to connect to your database.

**Configuration and code updates**

1. In the `cdk.json` file, update the `context/configure/paths/athena_table_data_prefix` field to align with the new data path.

1. Revise `code/lambda/action-lambda/dynamic_examples.csv` by incorporating new text-to-SQL examples that correspond with your dataset.

1. Revise `code/lambda/action-lambda/prompt_templates.py` to mirror the attributes of your structured dataset.

1. In the `cdk.json` file, update the `context/configure/bedrock_instructions/action_group_description` field to explain the purpose and functionality of the `Action group` Lambda function.

1. In the `assets/agent_api_schema/artifacts_schema.json` file, explain the new functionalities of your `Action group` Lambda function.

*General update*

In the `cdk.json` file, in the `context/configure/bedrock_instructions/agent_instruction` section, provide a comprehensive description of the Amazon Bedrock agent's intended functionality and design purpose, taking into account the newly integrated data.

# Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe"></a>

*Praveen Kumar Jeyarajan, Jundong Qiao, Rajiv Upadhyay, and Megan Wu, Amazon Web Services*

## Summary
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-summary"></a>

Capturing institutional knowledge is paramount for ensuring organizational success and resilience. Institutional knowledge represents the collective wisdom, insights, and experiences accumulated by employees over time, often tacit in nature and passed down informally. This wealth of information encompasses unique approaches, best practices, and solutions to intricate problems that might not be documented elsewhere. By formalizing and documenting this knowledge, companies can preserve institutional memory, foster innovation, enhance decision-making processes, and accelerate learning curves for new employees. Additionally, it promotes collaboration, empowers individuals, and cultivates a culture of continuous improvement. Ultimately, harnessing institutional knowledge helps companies use their most valuable asset—the collective intelligence of their workforce—to navigate challenges, drive growth, and maintain competitive advantage in dynamic business environments.

This pattern explains how to capture institutional knowledge through voice recordings from senior employees. It uses [Amazon Transcribe](https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html) and [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) for systematic documentation and verification. By documenting this informal knowledge, you can preserve it and share it with subsequent cohorts of employees. This endeavor supports operational excellence and improves the effectiveness of training programs through the incorporation of practical knowledge acquired through direct experience.

## Prerequisites and limitations
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ Docker, [installed](https://docs.docker.com/engine/install/)
+ AWS Cloud Development Kit (AWS CDK) version 2.114.1 or later, [installed](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_tools) and [bootstrapped](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_bootstrap) to the `us-east-1` or `us-west-2` AWS Regions
+ AWS CDK Toolkit version 2.114.1 or later, [installed](https://docs.aws.amazon.com/cdk/v2/guide/cli.html)
+ AWS Command Line Interface (AWS CLI), [installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
+ Python version 3.12 or later, [installed](https://www.python.org/downloads/)
+ Permissions to create Amazon Transcribe, Amazon Bedrock, Amazon Simple Storage Service (Amazon S3), and AWS Lambda resources

**Limitations**
+ This solution is deployed to a single AWS account.
+ This solution can be deployed only in AWS Regions where Amazon Bedrock and Amazon Transcribe are available. For information about availability, see the documentation for [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html) and [Amazon Transcribe](https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html#tsc-regions).
+ The audio files must be in a format that Amazon Transcribe supports. For a list of supported formats, see [Media formats](https://docs.aws.amazon.com/transcribe/latest/dg/how-input.html#how-input-audio) in the Transcribe documentation.

**Product versions**
+ AWS SDK for Python (Boto3) version 1.34.57 or later
+ LangChain version 0.1.12 or later

## Architecture
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-architecture"></a>

The architecture represents a serverless workflow on AWS. [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) orchestrates Lambda functions for audio processing, text analysis, and document generation. The following diagram shows the Step Functions workflow, also known as a *state machine*.

![\[Architecture diagram of Step Functions state machine generating a document\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/f1e0106d-b046-4adc-9718-c299efb7b436/images/e90298ca-1b7f-4c3e-97bd-311a9d5a4997.png)


Each step in the state machine is handled by a distinct Lambda function. The following are the steps in the document generation process:

1. The `preprocess` Lambda function validates the input passed to Step Functions and lists all of the audio files present in the provided Amazon S3 URI folder path. Downstream Lambda functions in the workflow use the file list to validate, summarize, and generate the document.

1. The `transcribe` Lambda function uses Amazon Transcribe to convert audio files into text transcripts. This Lambda function is responsible for initiating the transcription process and accurately transforming speech into text, which is then stored for subsequent processing.

1. The `validate` Lambda function analyzes the text transcripts, determining the relevance of the responses to the initial questions. By using a large language model (LLM) through Amazon Bedrock, it identifies and separates on-topic answers from off-topic responses.

1. The `summarize` Lambda function uses Amazon Bedrock to generate a coherent and concise summary of the on-topic answers.

1. The `generate` Lambda function assembles the summaries into a well-structured document. It can format the document according to predefined templates and include any additional necessary content or data.

1. If any of the Lambda functions fail, you receive an email notification through Amazon Simple Notification Service (Amazon SNS).

Throughout this process, AWS Step Functions makes sure that each Lambda function is initiated in the correct sequence. This state machine has the capacity for parallel processing to enhance efficiency. An Amazon S3 bucket acts as the central storage repository, supporting the workflow by managing the various media and document formats involved.

## Tools
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-tools"></a>

**AWS services**
+ [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [Amazon Simple Notification Service (Amazon SNS)](https://docs.aws.amazon.com/sns/latest/dg/welcome.html) helps you coordinate and manage the exchange of messages between publishers and clients, including web servers and email addresses.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
+ [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.  
+ [Amazon Transcribe](https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html) is an automatic speech recognition service that uses machine learning models to convert audio to text.

**Other tools**
+ [LangChain](https://python.langchain.com/docs/get_started/introduction/) is a framework for developing applications that are powered by large language models (LLMs).

**Code repository**

The code for this pattern is available in the GitHub [genai-knowledge-capture](https://github.com/aws-samples/genai-knowledge-capture) repository.

The code repository contains the following files and folders:
+ `assets` folder – The static assets for the solution, such as the architecture diagram and the public dataset
+ `code/lambdas` folder – The Python code for all Lambda functions
  + `code/lambdas/generate` folder - The Python code that generates a document from the summarized data in the S3 bucket
  + `code/lambdas/preprocess` folder - The Python code that processes the inputs for the Step Functions state machine
  + `code/lambdas/summarize` folder - The Python code that summarizes the transcribed data by using Amazon Bedrock service
  + `code/lambdas/transcribe` folder - The Python code that converts speech data (audio file) into text by using Amazon Transcribe
  + `code/lambdas/validate` folder - The Python code that validates whether all answers pertain to the same topic
+ `code/code_stack.py` – The AWS CDK construct Python file that is used to create AWS resources
+ `app.py` – The AWS CDK app Python file that is used to deploy AWS resources in the target AWS account
+ `requirements.txt` – The list of all Python dependencies that must be installed for the AWS CDK
+ `cdk.json` – The input file to provide values that are required to create resources

## Best practices
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-best-practices"></a>

The code example provided is for proof-of-concept (PoC) or pilot purposes only. If you want to take the solution to production, use the following best practices:
+ Enable [Amazon S3 access logging](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-server-access-logging.html)
+ Enable [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html)

## Epics
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-epics"></a>

### Set up AWS credentials on your local workstation
<a name="set-up-aws-credentials-on-your-local-workstation"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Export variables for the account and AWS Region. | To provide AWS credentials for the AWS CDK by using environment variables, run the following commands.<pre>export CDK_DEFAULT_ACCOUNT=<12-digit AWS account number><br />export CDK_DEFAULT_REGION=<Region></pre> | AWS DevOps, DevOps engineer | 
| Set up the AWS CLI named profile. | To set up the AWS CLI named profile for the account, follow the instructions in [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). | AWS DevOps, DevOps engineer | 

### Set up your environment
<a name="set-up-your-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the repo to your local workstation. | To clone the [genai-knowledge-capture](https://github.com/aws-samples/genai-knowledge-capture) repository, run the following command in your terminal.<pre>git clone https://github.com/aws-samples/genai-knowledge-capture</pre> | AWS DevOps, DevOps engineer | 
| (Optional) Replace the audio files. | To customize the sample application to incorporate your own data, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.html) | AWS DevOps, DevOps engineer | 
| Set up the Python virtual environment. | To set up the Python virtual environment, run the following commands.<pre>cd genai-knowledge-capture<br />python3 -m venv .venv<br />source .venv/bin/activate<br />pip install -r requirements.txt</pre> | AWS DevOps, DevOps engineer | 
| Synthesize the AWS CDK code. | To convert the code to an AWS CloudFormation stack configuration, run the following command.<pre>cdk synth</pre> | AWS DevOps, DevOps engineer | 

### Configure and deploy the solution
<a name="configure-and-deploy-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Provision foundation model access. | Enable access to the Anthropic Claude 3 Sonnet model for your AWS account. For instructions, see [Add model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html#model-access-add) in the Bedrock documentation. | AWS DevOps | 
| Deploy resources in the account. | To deploy resources in the AWS account by using the AWS CDK, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.html) | AWS DevOps, DevOps engineer | 
| Subscribe to the Amazon SNS topic. | To subscribe to the Amazon SNS topic for notification, do the following:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.html) | General AWS | 

### Test the solution
<a name="test-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Run the state machine. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.html) | App developer, General AWS | 

### Clean up all AWS resources in the solution
<a name="clean-up-all-aws-resources-in-the-solution"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Remove the AWS resources. | After you test the solution, clean up the resources:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe.html) | AWS DevOps, DevOps engineer | 

## Related resources
<a name="document-institutional-knowledge-from-voice-inputs-by-using-amazon-bedrock-and-amazon-transcribe-resources"></a>

**AWS documentation**
+ Amazon Bedrock resources:
  + [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)
  + [Inference parameters for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)
+ AWS CDK resources:
  + [Get started with the AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html)
  + [Working with the AWS CDK in Python](https://docs.aws.amazon.com/cdk/v2/guide/work-with-cdk-python.html)
  + [Troubleshooting common AWS CDK issues](https://docs.aws.amazon.com/cdk/v2/guide/troubleshooting.html)
  + [Toolkit commands](https://docs.aws.amazon.com/cdk/v2/guide/cli.html#cli-commands)
+ AWS Step Functions resources:
  + [Getting started with AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/getting-started-with-sfn.html)
  + [Troubleshooting](https://docs.aws.amazon.com/step-functions/latest/dg/troubleshooting.html)
+ [Building Lambda functions with Python](https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html)
+ [Generative AI Application Builder on AWS](https://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/solution-overview.html)

**Other resources**
+ [LangChain documentation](https://python.langchain.com/docs/get_started/introduction)

# Generate personalized and re-ranked recommendations using Amazon Personalize
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize"></a>

*Mason Cahill, Matthew Chasse, and Tayo Olajide, Amazon Web Services*

## Summary
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-summary"></a>

This pattern shows you how to use Amazon Personalize to generate personalized recommendations—including re-ranked recommendations—for your users based on the ingestion of real-time user-interaction data from those users. The example scenario used in this pattern is based on a pet adoption website that generates recommendations for its users based on their interactions (for example, what pets a user visits). By following the example scenario, you learn to use Amazon Kinesis Data Streams to ingest interaction data, AWS Lambda to generate recommendations and re-rank the recommendations, and Amazon Data Firehose to store the data in an Amazon Simple Storage Service (Amazon S3) bucket. You also learn to use AWS Step Functions to build a state machine that manages the solution version (that is, a trained model) that generates your recommendations.

## Prerequisites and limitations
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-prereqs"></a>

**Prerequisites**
+ An active [AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) with a [bootstrapped](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html) AWS Cloud Development Kit (AWS CDK)
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) with configured credentials
+ [Python 3.9](https://www.python.org/downloads/release/python-390/)

**Product versions**
+ Python 3.9
+ AWS CDK 2.23.0 or later
+ AWS CLI 2.7.27 or later

## Architecture
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-architecture"></a>

**Technology stack**
+ Amazon Data Firehose
+ Amazon Kinesis Data Streams
+ Amazon Personalize
+ Amazon Simple Storage Service (Amazon S3)
+ AWS Cloud Development Kit (AWS CDK)
+ AWS Command Line Interface (AWS CLI)
+ AWS Lambda
+ AWS Step Functions

**Target architecture**

The following diagram illustrates a pipeline for ingesting real-time data into Amazon Personalize. The pipeline then uses that data to generate personalized and re-ranked recommendations for users.

![\[Data ingestion architecture for Amazon Personalize\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/42eb193b-2347-408a-8b25-46beeb3b29ca/images/786dbd56-7d7f-41bb-90f6-d4485d73fe15.png)


The diagram shows the following workflow:

1. Kinesis Data Streams ingests real-time user data (for example, events like visited pets) for processing by Lambda and Firehose.

1. A Lambda function processes the records from Kinesis Data Streams and makes an API call to add the user-interaction in the record to an event tracker in Amazon Personalize.

1. A time-based rule invokes a Step Functions state machine and generates new solution versions for the recommendation and re-ranking models by using the events from the event tracker in Amazon Personalize.

1. Amazon Personalize [campaigns](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html) are updated by the state machine to use the new [solution version](https://docs.aws.amazon.com/personalize/latest/dg/creating-a-solution-version.html).

1. Lambda re-ranks the list of recommended items by calling the Amazon Personalize re-ranking campaign.

1. Lambda retrieves the list of recommended items by calling the Amazon Personalize recommendations campaign.

1. Firehose saves the events to an S3 bucket where they can be accessed as historical data.

## Tools
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-tools"></a>

**AWS tools**
+ [AWS Cloud Development Kit (AWS CDK)](https://docs.aws.amazon.com/cdk/latest/guide/home.html) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
+ [Amazon Data Firehose](https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html) helps you deliver real-time [streaming data](https://aws.amazon.com/streaming-data/) to other AWS services, custom HTTP endpoints, and HTTP endpoints owned by supported third-party service providers.
+ [Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/introduction.html) helps you collect and process large streams of data records in real time.
+ [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
+ [Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) is a fully managed machine learning (ML) service that helps you generate item recommendations for your users based on your data.
+ [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) is a serverless orchestration service that helps you combine Lambda functions and other AWS services to build business-critical applications.

**Other tools**
+ [pytest ](https://docs.pytest.org/en/7.2.x/index.html)is a Python framework for writing small, readable tests.
+ [Python](https://www.python.org/) is a general-purpose computer programming language.

**Code**

The code for this pattern is available in the GitHub [Animal Recommender](https://github.com/aws-samples/personalize-pet-recommendations) repository. You can use the AWS CloudFormation template from this repository to deploy the resources for the example solution.

**Note**  
The Amazon Personalize solution versions, event tracker, and campaigns are backed by [custom resources](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html) (within the infrastructure) that expand on native CloudFormation resources.

## Epics
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-epics"></a>

### Create the infrastructure
<a name="create-the-infrastructure"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create an isolated Python environment. | **Mac/Linux setup**[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/generate-personalized-and-re-ranked-recommendations-using-amazon-personalize.html)**Windows setup**To manually create a virtual environment, run the `% .venv\Scripts\activate.bat` command from your terminal. | DevOps engineer | 
| Synthesize the CloudFormation template. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/generate-personalized-and-re-ranked-recommendations-using-amazon-personalize.html)In step 2, `CDK_ENVIRONMENT` refers to the `config/{env}.yml` file. | DevOps engineer | 
| Deploy resources and create infrastructure. | To deploy the solution resources, run the `./deploy.sh` command from your terminal.This command installs the required Python dependencies. A Python script creates an S3 bucket and an AWS Key Management Service (AWS KMS) key, and then adds the seed data for the initial model creations. Finally, the script runs `cdk deploy` to create the remaining infrastructure.The initial model training happens during stack creation. It can take up to two hours for the stack to finish getting created. | DevOps engineer | 

## Related resources
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-resources"></a>
+ [Animal Recommender](https://github.com/aws-samples/personalize-pet-recommendations) (GitHub)
+ [AWS CDK Reference Documentation](https://docs.aws.amazon.com/cdk/api/v2/)
+ [Boto3 Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
+ [Optimize personalized recommendations for a business metric of your choice with Amazon Personalize](https://aws.amazon.com/blogs/machine-learning/optimize-personalized-recommendations-for-a-business-metric-of-your-choice-with-amazon-personalize/) (AWS Machine Learning Blog)

## Additional information
<a name="generate-personalized-and-re-ranked-recommendations-using-amazon-personalize-additional"></a>

**Example payloads and responses**

*Recommendation Lambda function*

To retrieve recommendations, submit a request to the recommendation Lambda function with a payload in the following format:

```
{
  "userId": "3578196281679609099",
  "limit": 6
}
```

The following example response contains a list of animal groups:

```
[{"id": "1-domestic short hair-1-1"},
{"id": "1-domestic short hair-3-3"},
{"id": "1-domestic short hair-3-2"},
{"id": "1-domestic short hair-1-2"},
{"id": "1-domestic short hair-3-1"},
{"id": "2-beagle-3-3"},
```

If you leave out the `userId` field, the function returns general recommendations.

*Re-ranking Lambda function*

To use re-ranking, submit a request to the re-ranking Lambda function. The payload contains the `userId` of all the item IDs to be re-ranked and their metadata. The following example data uses the Oxford Pets classes for `animal_species_id` (1=cat, 2=dog) and integers 1-5 for `animal_age_id` and `animal_size_id`:

```
{
   "userId":"12345",
   "itemMetadataList":[
      {
         "itemId":"1",
         "animalMetadata":{
            "animal_species_id":"2",
            "animal_primary_breed_id":"Saint_Bernard",
            "animal_size_id":"3",
            "animal_age_id":"2"
         }
      },
      {
         "itemId":"2",
         "animalMetadata":{
            "animal_species_id":"1",
            "animal_primary_breed_id":"Egyptian_Mau",
            "animal_size_id":"1",
            "animal_age_id":"1"
         }
      },
      {
         "itemId":"3",
         "animalMetadata":{
            "animal_species_id":"2",
            "animal_primary_breed_id":"Saint_Bernard",
            "animal_size_id":"3",
            "animal_age_id":"2"
         }
      }
   ]
}
```

The Lambda function re-ranks these items, and then returns an ordered list that includes the item IDs and the direct response from Amazon Personalize. This is a ranked list of the animal groups that the items are in and their score. Amazon Personalize uses [User-Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html) and [Personalized-Ranking](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-search.html) recipes to include a score for each item in the recommendations. These scores represent the relative certainty that Amazon Personalize has about which item the user will choose next. Higher scores represent greater certainty.

```
{
   "ranking":[
      "1",
      "3",
      "2"
   ],
   "personalizeResponse":{
      "ResponseMetadata":{
         "RequestId":"a2ec0417-9dcd-4986-8341-a3b3d26cd694",
         "HTTPStatusCode":200,
         "HTTPHeaders":{
            "date":"Thu, 16 Jun 2022 22:23:33 GMT",
            "content-type":"application/json",
            "content-length":"243",
            "connection":"keep-alive",
            "x-amzn-requestid":"a2ec0417-9dcd-4986-8341-a3b3d26cd694"
         },
         "RetryAttempts":0
      },
      "personalizedRanking":[
         {
            "itemId":"2-Saint_Bernard-3-2",
            "score":0.8947961
         },
         {
            "itemId":"1-Siamese-1-1",
            "score":0.105204
         }
      ],
      "recommendationId":"RID-d97c7a87-bd4e-47b5-a89b-ac1d19386aec"
   }
}
```

*Amazon Kinesis payload*

The payload to send to Amazon Kinesis has the following format:

```
{
    "Partitionkey": "randomstring",
    "Data": {
        "userId": "12345",
        "sessionId": "sessionId4545454",
        "eventType": "DetailView",
        "animalMetadata": {
            "animal_species_id": "1",
            "animal_primary_breed_id": "Russian_Blue",
            "animal_size_id": "1",
            "animal_age_id": "2"
        },
        "animal_id": "98765"
        
    }
}
```

**Note**  
The `userId` field is removed for an unauthenticated user.

# Streamline machine learning workflows from local development to scalable experiments by using SageMaker AI and Hydra
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker"></a>

*David Sauerwein, Marco Geiger, and Julian Ferdinand Grueber, Amazon Web Services*

## Summary
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-summary"></a>

This pattern provides a unified approach to configuring and running machine learning (ML) algorithms from local testing to production on Amazon SageMaker AI. ML algorithms are the focus of this pattern, but its approach extends to feature engineering, inference, and whole ML pipelines. This pattern demonstrates the transition from local script development to SageMaker AI training jobs through a sample use case.

A typical ML workflow is to develop and test solutions on a local machine, run large scale experiments (for example, with different parameters) in the cloud, and deploy the approved solution in the cloud. Then, the deployed solution must be monitored and maintained. Without a unified approach to this workflow, developers often need to refactor their code at each stage. If the solution depends on a large number of parameters that might change at any stage of this workflow, it can become increasingly difficult to remain organized and consistent. 

This pattern addresses these challenges. First, it eliminates the need for code refactoring between environments by providing a unified workflow that remains consistent whether running on local machines, in containers, or on SageMaker AI. Second, it simplifies parameter management through Hydra's configuration system, where parameters are defined in separate configuration files that can be easily modified and combined, with automatic logging of each run's configuration. For more details about how this pattern addresses these challenges, see [Additional information](#streamline-machine-learning-workflows-by-using-amazon-sagemaker-additional).

## Prerequisites and limitations
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-prereqs"></a>

**Prerequisites**
+ An active AWS account
+ An AWS Identity and Access Management (IAM) [user role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html) for deploying and starting the SageMaker AI training jobs
+ AWS Command Line Interface (AWS CLI) version 2.0 or later [installed](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) and [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
+ [Poetry](https://python-poetry.org/) version 1.8 or later, but earlier than 2.0, installed
+ [Docker](https://www.docker.com/) installed
+ Python [version 3.10.x](https://www.python.org/downloads/release/python-31011/)

**Limitations**
+ The code currently only targets SageMaker AI training jobs. Extending it to processing jobs and whole SageMaker AI pipelines is straightforward.
+ For a fully productionized SageMaker AI setup, additional details need to be in place. Examples could be custom AWS Key Management Service (AWS KMS) keys for compute and storage, or networking configurations. You can also configure these additional options by using Hydra in a dedicated subfolder of the `config` folder.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS Services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html), and choose the link for the service.

## Architecture
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-architecture"></a>

The following diagram depicts the architecture of the solution.

![\[Workflow to create and run SageMaker AI training or HPO jobs.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/1db57484-f85c-49a6-b870-471dade02b26/images/d80e7474-a975-4d92-8f66-2d34e33053fd.png)


The diagram shows the following workflow:

1. The data scientist can iterate over the algorithm at small scale in a local environment, adjust parameters, and test the training script rapidly without the need for Docker or SageMaker AI. (For more details, see the "Run locally for quick testing" task in [Epics](#streamline-machine-learning-workflows-by-using-amazon-sagemaker-epics).)

1. Once satisfied with the algorithm, the data scientist builds and pushes the Docker image to the Amazon Elastic Container Registry (Amazon ECR) repository named `hydra-sm-artifact`. (For more details, see "Run workflows on SageMaker AI" in [Epics](#streamline-machine-learning-workflows-by-using-amazon-sagemaker-epics).)

1. The data scientist initiates either SageMaker AI training jobs or hyperparameter optimization (HPO) jobs by using Python scripts. For regular training jobs, the adjusted configuration is written to the Amazon Simple Storage Service (Amazon S3) bucket named `hydra-sample-config`. For HPO jobs, the default configuration set located in the `config` folder is applied.

1. The SageMaker AI training job pulls the Docker image, reads the input data from the Amazon S3 bucket `hydra-sample-data`, and either fetches the configuration from the Amazon S3 bucket `hydra-sample-config` or uses the default configuration. After training, the job saves the output data to the Amazon S3 bucket `hydra-sample-data`.

**Automation and scale**
+ For automated training, retraining, or inference, you can integrate the AWS CLI code with services like [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html), [AWS CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/welcome.html), or [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html).
+ Scaling can be achieved by changing configurations for instance sizes or by adding configurations for distributed training.

## Tools
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-tools"></a>

**AWS services**
+ [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and AWS Regions.
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open source tool that helps you interact with AWS services through commands in your command-line shell. For this pattern, the AWS CLI is useful for both initial resource configuration and testing.
+ [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable.
+ [Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/?id=docs_gateway) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment. SageMaker AI Training is a fully managed ML service within SageMaker AI that enables the training of ML models at scale. The tool can handle the computational demands of training models efficiently, making use of built-in scalability and integration with other AWS services. SageMaker AI Training also supports custom algorithms and containers, making it flexible for a wide range of ML workflows.
+ [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

**Other tools**
+ [Docker](https://www.docker.com/) is a set of platform as a service (PaaS) products that use virtualization at the operating-system level to deliver software in containers. It was used in this pattern to ensure consistent environments across various stages, from development to deployment, and to package dependencies and code reliably. Docker’s containerization allowed for easy scaling and version control across the workflow.
+ [Hydra](https://hydra.cc/) is a configuration management tool that provides flexibility for handling multiple configurations and dynamic resource management. It is instrumental in managing environment configurations, allowing seamless deployment across different environments. For more details about Hydra, see [Additional information](#streamline-machine-learning-workflows-by-using-amazon-sagemaker-additional).
+ [Python](https://www.python.org/) is a general-purpose computer programming language. Python was used to write the ML code and the deployment workflow.
+ [Poetry](https://python-poetry.org/) is a tool for dependency management and packaging in Python.

**Code repository**

The code for this pattern is available in the GitHub [configuring-sagemaker-training-jobs-with-hydra](https://github.com/aws-samples/configuring-sagemaker-training-jobs-with-hydra) repository.

## Best practices
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-best-practices"></a>
+ Choose an IAM role for deploying and starting the SageMaker AI training jobs that follows the principle of least privilege and grant the minimum permissions required to perform a task. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#grant-least-priv) and [Security best practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) in the IAM documentation.
+ Use temporary credentials to access the IAM role in the terminal.

## Epics
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-epics"></a>

### Set up the environment
<a name="set-up-the-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create and activate the virtual environment. | To create and activate the virtual environment, run the following commands in the root of the repository:<pre>poetry install <br />poetry shell</pre> | General AWS | 
| Deploy the infrastructure.  | To deploy the infrastructure using CloudFormation, run the following command:<pre>aws cloudformation deploy --template-file infra/hydra-sagemaker-setup.yaml --stack-name hydra-sagemaker-setup  --capabilities CAPABILITY_NAMED_IAM</pre> | General AWS, DevOps engineer | 
| Download the sample data.  | To download the input data from [openml](https://www.openml.org/) to your local machine, run the following command:<pre>python scripts/download_data.py</pre> | General AWS | 
| Run locally for quick testing. | To run the training code locally for testing, run the following command:<pre>python mypackage/train.py data.train_data_path=data/train.csv evaluation.base_dir_path=data</pre>The logs of all executions are stored by execution time in a folder called `outputs`. For more information, see the "Output" section in the [GitHub repository](https://github.com/aws-samples/configuring-sagemaker-training-jobs-with-hydra).You can also perform multiple trainings in parallel, with different parameters, by using the `--multirun` functionality. For more details, see the [Hydra documentation](https://hydra.cc/docs/tutorials/basic/running_your_app/multi-run/). | Data scientist | 

### Run workflows on SageMaker AI
<a name="run-workflows-on-sm"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Set the environment variables. | To run your job on SageMaker AI, set the following environment variables, providing your AWS Region and your AWS account ID:<pre>export ECR_REPO_NAME=hydra-sm-artifact<br />export image_tag=latest<br />export AWS_REGION="<your_aws_region>" # for instance, us-east-1<br />export ACCOUNT_ID="<your_account_id>"<br />export BUCKET_NAME_DATA=hydra-sample-data-$ACCOUNT_ID<br />export BUCKET_NAME_CONFIG=hydra-sample-config-$ACCOUNT_ID<br />export AWS_DEFAULT_REGION=$AWS_REGION<br />export ROLE_ARN=arn:aws:iam::${ACCOUNT_ID}:role/hydra-sample-sagemaker<br />export INPUT_DATA_S3_PATH=s3://$BUCKET_NAME_DATA/hydra-on-sm/input/<br />export OUTPUT_DATA_S3_PATH=s3://$BUCKET_NAME_DATA/hydra-on-sm/output/</pre> | General AWS | 
| Create and push Docker image. | To create the Docker image and push it to the Amazon ECR repository, run the following command:<pre>chmod +x scripts/create_and_push_image.sh<br />scripts/create_and_push_image.sh $ECR_REPO_NAME $image_tag $AWS_REGION $ACCOUNT_ID</pre>This task assumes that you have valid credentials in your environment. The Docker image is pushed to the Amazon ECR repository specified in the environment variable in the previous task and is used to activate the SageMaker AI container in which the training job will run. | ML engineer, General AWS | 
| Copy input data to Amazon S3. | The SageMaker AI training job needs to pick up the input data. To copy the input data to the Amazon S3 bucket for data, run the following command: <pre>aws s3 cp data/train.csv "${INPUT_DATA_S3_PATH}train.csv" </pre> | Data engineer, General AWS | 
| Submit SageMaker AI training jobs. | To simplify the execution of your scripts, specify default configuration parameters in the `default.yaml` file. In addition to ensuring consistency across runs, this approach also offers the flexibility to easily override default settings as needed. See the following example:<pre>python scripts/start_sagemaker_training_job.py sagemaker.role_arn=$ROLE_ARN sagemaker.config_s3_bucket=$BUCKET_NAME_CONFIG sagemaker.input_data_s3_path=$INPUT_DATA_S3_PATH sagemaker.output_data_s3_path=$OUTPUT_DATA_S3_PATH</pre> | General AWS, ML engineer, Data scientist | 
| Run SageMaker AI hyperparameter tuning. | Running SageMaker AI hyperparameter tuning is similar to submitting a SageMaker AII training job. However, the execution script differs in some important ways as you can see in the [start\$1sagemaker\$1hpo\$1job.py](https://github.com/aws-samples/configuring-sagemaker-training-jobs-with-hydra/blob/main/scripts/start_sagemaker_hpo_job.py) file. The hyperparameters to be tuned must be passed through the boto3 payload, not a channel to the training job.To start the hyperparameter optimization (HPO) job, run the following commands:<pre>python scripts/start_sagemaker_hpo_job.py sagemaker.role_arn=$ROLE_ARN sagemaker.config_s3_bucket=$BUCKET_NAME_CONFIG sagemaker.input_data_s3_path=$INPUT_DATA_S3_PATH sagemaker.output_data_s3_path=$OUTPUT_DATA_S3_PATH</pre> | Data scientist | 

## Troubleshooting
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| Expired token | Export fresh AWS credentials. | 
| Lack of IAM permissions | Make sure that you export the credentials of an IAM role that has all the required IAM permissions to deploy the CloudFormation template and to start the SageMaker AI training jobs. | 

## Related resources
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-resources"></a>
+ [Train a model with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html) (AWS documentation)
+ [What is Hyperparameter Tuning?](https://aws.amazon.com/what-is/hyperparameter-tuning/#:~:text=Hyperparameter%20tuning%20allows%20data%20scientists,the%20model%20as%20a%20hyperparameter.)

## Additional information
<a name="streamline-machine-learning-workflows-by-using-amazon-sagemaker-additional"></a>

This pattern addresses the following challenges:

**Consistency from local development to at-scale deployment** – With this pattern, developers can use the same workflow, regardless of whether they’re using local Python scripts, running local Docker containers, conducting large experiments on SageMaker AI, or deploying in production on SageMaker AI. This consistency is important for the following reasons:
+ **Faster iteration** – It allows for fast, local experimentation without the need for major adjustments when scaling up.
+ **No refactoring** – Transitioning to larger experiments on SageMaker AI is seamless, requiring no overhaul of the existing setup.
+ **Continuous improvement** – Developing new features and continuously improving the algorithm is straightforward because the code remains the same across environments.

**Configuration management** – This pattern makes use of [Hydra](https://hydra.cc/), a configuration management tool, to provide the following benefits:
+ Parameters are defined in configuration files, separate from the code.
+ Different parameter sets can be swapped or combined easily.
+ Experiment tracking is simplified because each run's configuration is logged automatically.
+ Cloud experiments can use the same configuration structure as local runs, ensuring consistency.

With Hydra, you can manage configuration effectively, enabling the following features:
+ **Divide configurations** – Break your project configurations into smaller, manageable pieces that can be independently modified. This approach makes it easier to handle complex projects.
+ **Adjust defaults easily** – Change your baseline configurations quickly, making it simpler to test new ideas.
+ **Align CLI inputs and config files** – Combine command line inputs with your configuration files smoothly. This approach reduces clutter and confusion, making your project more manageable over time.

 

# Translate natural language into query DSL for OpenSearch and Elasticsearch queries
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch"></a>

*Tabby Ward, Nicholas Switzer, and Breanne Warner, Amazon Web Services*

## Summary
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-summary"></a>

This pattern demonstrates how to use large language models (LLMs) to convert natural language queries into query domain-specific language (query DSL), which makes it easier for users to interact with search services such as OpenSearch and Elasticsearch without extensive knowledge of the query language. This resource is particularly valuable for developers and data scientists who want to enhance search-based applications with natural language querying capabilities, ultimately improving user experience and search functionality.

The pattern illustrates techniques for prompt engineering, iterative refinement, and incorporation of specialized knowledge, all of which are crucial in synthetic data generation. Although this approach focuses primarily on query conversion, it implicitly demonstrates the potential for data augmentation and scalable synthetic data production. This foundation could be extended to more comprehensive synthetic data generation tasks, to highlight the power of LLMs in bridging unstructured natural language inputs with structured, application-specific outputs.

This solution doesn't involve migration or deployment tools in the traditional sense. Instead, it focuses on demonstrating a proof of concept (PoC) for converting natural language queries to query DSL by using LLMs.
+ The pattern uses a Jupyter notebook as a step-by-step guide for setting up the environment and implementing the text-to-query conversion.
+ It uses Amazon Bedrock to access LLMs, which are crucial for interpreting natural language and generating appropriate queries.
+ The solution is designed to work with Amazon OpenSearch Service. You can follow a similar process for Elasticsearch, and the generated queries could potentially be adapted for similar search engines.

[Query DSL](https://opensearch.org/docs/latest/query-dsl/) is a flexible, JSON-based search language that's used to construct complex queries in both Elasticsearch and OpenSearch. It enables you to specify queries in the query parameter of search operations, and supports various query types. A DSL query includes leaf queries and compound queries. Leaf queries search for specific values in certain fields and encompass full-text, term-level, geographic, joining, span, and specialized queries. Compound queries act as wrappers for multiple leaf or compound clauses, and combine their results or modify their behavior. Query DSL supports the creation of sophisticated searches, ranging from simple, match-all queries to complex, multi-clause queries that produce highly specific results. Query DSL is particularly valuable for projects that require advanced search capabilities, flexible query construction, and JSON-based query structures.

This pattern uses techniques such as few-shot prompting, system prompts, structured output, prompt chaining, context provision, and task-specific prompts for text-to-query DSL conversion. For definitions and examples of these techniques, see the [Additional information](#translate-natural-language-query-dsl-opensearch-elasticsearch-additional) section.

## Prerequisites and limitations
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-prereqs"></a>

**Prerequisites**

To effectively use the Jupyter notebook for converting natural language queries into query DSL queries, you need:
+ **Familiarity with Jupyter notebooks**. Basic understanding of how to navigate and run code in a Jupyter notebook environment.
+ **Python environment**. A working Python environment, preferably Python 3.x, with the necessary libraries installed.
+ **Elasticsearch or OpenSearch knowledge**. Basic knowledge of Elasticsearch or OpenSearch, including its architecture and how to perform queries.
+ **AWS account**. An active AWS account to access Amazon Bedrock and other related services.
+ **Libraries and dependencies**. Installation of specific libraries mentioned in the notebook, such as `boto3` for AWS interaction, and any other dependencies required for LLM integration.
+ **Model access within Amazon Bedrock**. This pattern uses three Claude LLMs from Anthropic. Open the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/) and choose **Model access**. On the next screen, choose **Enable specific models** and select these three models:
  + Claude 3 Sonnet
  + Claude 3.5 Sonnet
  + Claude 3 Haiku
+ **Proper IAM policies and IAM role**. To run the notebook in an AWS account, your AWS Identity and Access Management (IAM) role requires the `SagemakerFullAccess` policy as well as the policy that’s provided in the [Additional information](#translate-natural-language-query-dsl-opensearch-elasticsearch-additional) section, which you can name `APGtext2querydslpolicy`. This policy includes subscribing to the three Claude models listed.

Having these prerequisites in place ensures a smooth experience when you work with the notebook and implement the text-to-query functionality.

**Limitations**
+ **Proof of concept status**. This project is primarily intended as a proof of concept (PoC). It demonstrates the potential of using LLMs to convert natural language queries into query DSL, but it might not be fully optimized or production-ready.
+ **Model limitations**:

  **Context window constraints***.* When using the LLMs that are available on Amazon Bedrock, be aware of the context window limitations:

  Claude models (as of September 2024):
  + Claude 3 Opus: 200,000 tokens
  + Claude 3 Sonnet: 200,000 tokens
  + Claude 3 Haiku: 200,000 tokens

  Other models on Amazon Bedrock might have different context window sizes. Always check the latest documentation for the latest information.

  **Model availability***. *The availability of specific models on Amazon Bedrock can vary. Make sure that you have access to the required models before you implement this solution.
+ **Additional limitations**
  + **Query complexity**. The effectiveness of the natural language to query DSL conversion might vary depending on the complexity of the input query.
  + **Version compatibility**. The generated query DSL might need adjustments based on the specific version of Elasticsearch or OpenSearch that you use.
  + **Performance**. This pattern provides a PoC implementation, so query generation speed and accuracy might not be optimal for large-scale production use.
  + **Cost**. Using LLMs in Amazon Bedrock incurs costs. Be aware of the pricing structure for your chosen model. For more information, see [Amazon Bedrock pricing](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-pricing.html).
  + **Maintenance**. Regular updates to the prompts and model selection might be necessary to keep up with advancements in LLM technology and changes in query DSL syntax.

**Product versions**

This solution was tested in Amazon OpenSearch Service. If you want to use Elasticsearch, you might have to make some changes to replicate the exact functionality of this pattern.
+ **OpenSearch version compatibility**.** **OpenSearch maintains backward compatibility within major versions. For example:
  + OpenSearch 1.x clients are generally compatible with OpenSearch 1.x clusters.
  + OpenSearch 2.x clients are generally compatible with OpenSearch 2.x clusters.

  However, it's always best to use the same minor version for both client and cluster when possible.
+ **OpenSearch API compatibility**.** **OpenSearch maintains API compatibility with Elasticsearch OSS 7.10.2 for most operations. However, some differences exist, especially in newer versions.
+ **OpenSearch upgrade considerations**:
  + Direct downgrades are not supported. Use snapshots for rollback if needed.
  + When you upgrade, check the [compatibility matrix and release notes](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-operations.html) for any breaking changes.

**Elasticsearch considerations**
+ **Elasticsearch version**. The major version of Elasticsearch you're using is crucial, because query syntax and features can change between major versions. Currently, the latest stable version is Elasticsearch 8.x. Make sure that your queries are compatible with your specific Elasticsearch version.
+ **Elasticsearch query DSL library version**. If you're using the Elasticsearch query DSL Python library, make sure that its version matches your Elasticsearch version. For example:
  + For Elasticsearch 8.x, use an `elasticsearch-dsl` version that's greater or equal to 8.0.0 but smaller than 9.0.0.
  + For Elasticsearch 7.x, use an `elasticsearch-dsl` version that's greater or equal to 7.0.0  but smaller than 8.0.0.
+ **Client library version**. Whether you're using the official Elasticsearch client or a language-specific client, make sure that it's compatible with your Elasticsearch version.
+ **Query DSL version**. Query DSL evolves with Elasticsearch versions. Some query types or parameters might be deprecated or introduced in different versions.
+ **Mapping version**. The way you define mappings for your indexes and change between versions. Always check the mapping documentation for your specific Elasticsearch version.
+ **Analysis tools versions**. If you're using analyzers, tokenizers, or other text analysis tools, their behavior or availability might change between versions.

## Architecture
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-architecture"></a>

**Target architecture**

The following diagram illustrates the architecture for this pattern.

![\[Architecture for translating natural language to query DSL in Amazon Bedrock.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/75296405-2893-4328-9551-9bcc6ec7fd3e/images/ffb1b893-d23c-4e1c-b679-8063b4f85a8a.png)


where:

1. User input and system prompt with few-shot prompting examples. The process begins with a user who provides a natural language query or a request for schema generation.

1. Amazon Bedrock. The input is sent to Amazon Bedrock, which serves as the interface to access the Claude LLM.

1. Claude 3 Sonnet LLM. Amazon Bedrock uses Claude 3 Sonnet from the Claude 3 family of LLMs to process the input. It interprets and generates the appropriate Elasticsearch or OpenSearch query DSL. For schema requests, it generates synthetic Elasticsearch or OpenSearch mappings.

1. Query DSL generation. For natural language queries, the application takes the LLM's output and formats it into a valid Elasticsearch or OpenSearch Service query DSL.

1. Synthetic data generation. The application also takes schemas to create synthetic Elasticsearch or OpenSearch data to be loaded into an OpenSearch Serverless collection for testing.

1. OpenSearch or Elasticsearch. The generated Query DSL is queried against an OpenSearch Serverless collection on all indexes. JSON output contains the relevant data and number of *hits* from the data that resides in the OpenSearch Serverless collection.

**Automation and scale**

The code that's provided with this pattern is built strictly for PoC purposes. The following list provides a few suggestions for automating and scaling the solution further and moving the code to production. These enhancements are outside the scope of this pattern.
+ Containerization:
  + Dockerize the application to ensure consistency across different environments.
  + Use container orchestration platforms such as Amazon Elastic Container Service (Amazon ECS) or Kubernetes for scalable deployments.
+ Serverless architecture:
  + Convert the core functionality into AWS Lambda functions.
  + Use Amazon API Gateway to create RESTful endpoints for the natural language query input.
+ Asynchronous processing:
  + Implement Amazon Simple Queue Service (Amazon SQS) to queue incoming queries.
  + Use AWS Step Functions to orchestrate the workflow of processing queries and generating query DSL.
+ Caching:
  + Implement a mechanism to cache the prompts.
+ Monitoring and logging:
  + Use Amazon CloudWatch for monitoring and alerting.
  + Implement centralized logging with Amazon CloudWatch Logs or Amazon OpenSearch Service for log analytics.
+ Security enhancements:
  + Implement IAM roles for fine-grained access control.
  + Use AWS Secrets Manager to securely store and manage API keys and credentials.
+ Multi-Region deployment:
  + Consider deploying the solution across multiple AWS Regions for improved latency and disaster recovery.
  + Use Amazon Route 53 for intelligent request routing.

By implementing these suggestions, you can transform this PoC into a robust, scalable, and production-ready solution. We recommend that you thoroughly test each component and the entire system before full deployment.

## Tools
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-tools"></a>

**Tools**
+ [Amazon SageMaker AI notebooks](https://aws.amazon.com/sagemaker/notebooks/) are fully managed Jupyter notebooks for machine learning development. This pattern uses notebooks as an interactive environment for data exploration, model development, and experimentation in Amazon SageMaker AI. Notebooks provide seamless integration with other SageMaker AI features and AWS services.
+ [Python](https://www.python.org/) is a general-purpose computer programming language. This pattern uses Python as the core language to implement the solution.
+ [Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API. Amazon Bedrock provides access to LLMs for natural language processing. This pattern uses Anthropic Claude 3 models.
+ [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) is a software development kit that helps you integrate your Python application, library, or script with AWS services, including Amazon Bedrock.
+ [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) is a managed service that helps you deploy, operate, and scale OpenSearch Service clusters in the AWS Cloud. This pattern uses OpenSearch Service as the target system for generating query DSL.

**Code repository**

The code for this pattern is available in the GitHub [Prompt Engineering Text-to-QueryDSL Using Claude 3 Models](https://github.com/aws-samples/text-to-queryDSL/blob/main/text2ES_prompting_guide.ipynb) repository. The example uses a health social media app that creates posts for users and user profiles associated with the health application.

## Best practices
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-best-practices"></a>

When working with this solution, consider the following:
+ The need for proper AWS credentials and permissions to access Amazon Bedrock
+ Potential costs associated with using AWS services and LLMs
+ The importance of understanding query DSL to validate and potentially modify the generated queries

## Epics
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-epics"></a>

### Set up the environment and prepare data
<a name="set-up-the-environment-and-prepare-data"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Set up the development environment. | For detailed instructions and code for this and the other steps in this pattern, see the comprehensive walkthrough in the [GitHub repository](https://github.com/aws-samples/text-to-queryDSL/blob/main/text2ES_prompting_guide.ipynb).[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/translate-natural-language-query-dsl-opensearch-elasticsearch.html) | Python, pip, AWS SDK | 
| Set up AWS access. | Set up the Amazon Bedrock client and SageMaker AI session. Retrieve the Amazon Resource Name (ARN) for the SageMaker AI execution role for later use in creating the OpenSearch Serverless collection. | IAM, AWS CLI, Amazon Bedrock, Amazon SageMaker | 
| Load health app schemas. | Read and parse JSON schemas for health posts and user profiles from predefined files. Convert schemas to strings for later use in prompts. | DevOps engineer, General AWS, Python, JSON | 

### Generate synthetic data
<a name="generate-synthetic-data"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create an LLM-based data generator. | Implement the **generate\$1data()** function to call the Amazon Bedrock Converse API with Claude 3 models. Set up model IDs for Sonnet, Sonnet 3.5, and Haiku:<pre>model_id_sonnet3_5 = "anthropic.claude-3-5-sonnet-20240620-v1:0" <br />model_id_sonnet = "anthropic.claude-3-sonnet-20240229-v1:0" <br />model_id_haiku = "anthropic.claude-3-haiku-20240307-v1:0"</pre> | Python, Amazon Bedrock API, LLM prompting | 
| Create synthetic health posts. | Use the **generate\$1data()** function with a specific message prompt to create synthetic health post entries based on the provided schema. The function call looks like this: <pre>health_post_data = generate_data(bedrock_rt, model_id_sonnet, system_prompt, message_healthpost, inference_config)</pre> | Python, JSON | 
| Create synthetic user profiles. | Use the **generate\$1data()** function with a specific message prompt to create synthetic user profile entries based on the provided schema. This is similar to health posts generation, but uses a different prompt. | Python, JSON | 

### Set up OpenSearch and ingest data
<a name="set-up-opensearch-and-ingest-data"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Set up an OpenSearch Serverless collection. | Use Boto3 to create an OpenSearch Serverless collection with appropriate encryption, network, and access policies. The collection creation looks like this: <pre>collection = aoss_client.create_collection(name=es_name, type='SEARCH')</pre> For more information about OpenSearch Serverless, see the [AWS documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless.html). | OpenSearch Serverless, IAM | 
| Define OpenSearch indexes. | Create indiexes for health posts and user profiles by using the OpenSearch client, based on the predefined schema mappings. The index creation looks like this:<pre>response_health = oss_client.indices.create(healthpost_index, body=healthpost_body)</pre> | OpenSearch, JSON | 
| Load data into OpenSearch. | Run the **ingest\$1data()** function to bulk insert the synthetic health posts and user profiles into their respective OpenSearch indexes. The function uses the bulk helper from `opensearch-py`:<pre>success, failed = bulk(oss_client, actions)</pre> | Python, OpenSearch API, bulk data operations | 

### Generate and run queries
<a name="generate-and-run-queries"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Design few-shot prompt examples. | Generate example queries and corresponding natural language questions by using Claude 3 models to serve as few-shot examples for query generation. The system prompt includes these examples:<pre>system_prompt_query_generation = [{"text": f"""You are an expert query dsl generator. ... Examples: {example_prompt} ..."""}]</pre> | LLM prompting, query DSL | 
| Create a text-to-query DSL converter. | Implement the system prompt, which includes schemas, data, and few-shot examples, for query generation. Use the system prompt to convert natural language queries to query DSL. The function call looks like this:<pre>query_response = generate_data(bedrock_rt, model_id, system_prompt_query_generation, query, inference_config)</pre> | Python, Amazon Bedrock API, LLM prompting | 
| Test query DSL on OpenSearch. | Run the **query\$1oss()** function to run the generated query DSL against the OpenSearch Serverless collection and return results. The function uses the OpenSearch client's search method:<pre>response = oss_client.search(index="_all", body=temp)</pre> | Python, OpenSearch API, query DSL | 

### Test and evaluate
<a name="test-and-evaluate"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create a test query set. | Use Claude 3 to generate a diverse set of test questions based on the synthetic data and schemas:<pre>test_queries = generate_data(bedrock_rt, model_id_sonnet, query_system_prompt, query_prompt, inference_config)</pre> | LLM prompting | 
| Assess the accuracy of the query DSL conversion. | Test the generated query DSL by running queries against OpenSearch and analyzing the returned results for relevance and accuracy. This involves running the query and inspecting the hits:<pre>output = query_oss(response1) print("Response after running query against Opensearch") print(output)</pre> | Python, data analysis, query DSL | 
| Benchmark Claude 3 models.  | Compare the performance of different Claude 3 models (Haiku, Sonnet, Sonnet 3.5) for query generation in terms of accuracy and latency. To compare, change the `model_id` when you call **generate\$1data()** and measure execution time. | Python, performance benchmarking | 

### Clean up and document
<a name="clean-up-and-document"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Develop a cleanup process. | Delete all indexes from the OpenSearch Serverless collection after use. | Python, AWS SDK, OpenSearch API | 

## Related resources
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-resources"></a>
+ [Query DSL](https://opensearch.org/docs/latest/query-dsl/) (OpenSearch documentation)
+ [Amazon OpenSearch Service documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html)
+ [OpenSearch Serverless collections](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-manage.html)
+ [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
+ [Amazon SageMaker AI documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html)
+ [AWS SDK for Python (Boto3) documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)

## Additional information
<a name="translate-natural-language-query-dsl-opensearch-elasticsearch-additional"></a>

**IAM policy**

Here’s the `APGtext2querydslpolicy` policy for the IAM role used in this pattern:

```
{
  "Version": "2012-10-17",		 	 	  
  "Statement": [
    { "Effect": "Allow", 
      "Action": [ 
        "bedrock:InvokeModel", 
        "bedrock:InvokeModelWithResponseStream"
      ], 
      "Resource": "*" 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "s3:GetObject", 
        "s3:PutObject", 
        "s3:ListBucket"
      ], 
      "Resource": [
        "arn:aws:s3:::sagemaker-*", 
        "arn:aws:s3:::sagemaker-*/*" 
      ] 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "logs:CreateLogGroup", 
        "logs:CreateLogStream", 
        "logs:PutLogEvents" 
      ], 
      "Resource": "arn:aws:logs:*:*:log-group:/aws/sagemaker/*" 
    }, 
    { "Effect": "Allow", 
      "Action": [
        "ec2:CreateNetworkInterface", 
        "ec2:DescribeNetworkInterfaces", 
        "ec2:DeleteNetworkInterface" 
      ], 
      "Resource": "*" 
    }, 
    { "Effect": "Allow", 
      "Action": [
        "aoss:*" 
      ], 
      "Resource": "*" 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "iam:PassRole", 
        "sagemaker:*" 
      ], 
      "Resource": [ 
        "arn:aws:iam::*:role/*", "*" 
      ], 
      "Condition": { 
        "StringEquals": { 
          "iam:PassedToService": "sagemaker.amazonaws.com" 
          } 
      } 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "codecommit:GetBranch", 
        "codecommit:GetCommit", 
        "codecommit:GetRepository", 
        "codecommit:ListBranches", 
        "codecommit:ListRepositories" 
      ], 
      "Resource": "*" 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "aws-marketplace:Subscribe" 
      ], 
      "Resource": "*", 
      "Condition": {
        "ForAnyValue:StringEquals": { 
          "aws-marketplace:ProductId": [ 
            "prod-6dw3qvchef7zy", 
            "prod-m5ilt4siql27k", 
            "prod-ozonys2hmmpeu" 
          ]
        } 
      } 
    }, 
    { "Effect": "Allow", 
      "Action": [ 
        "aws-marketplace:Unsubscribe", 
        "aws-marketplace:ViewSubscriptions" 
      ], 
      "Resource": "*" 
    }, 
    { "Effect": "Allow", 
      "Action": "iam:*", 
      "Resource": "*" 
    } 
  ] 
}
```

**Prompt techniques with Anthropic Claude 3 models**

This pattern demonstrates the following prompting techniques for text-to-query DSL conversion using Claude 3 models.
+ **Few-shot prompting:** Few-shot prompting is a powerful technique for improving the performance of Claude 3 models on Amazon Bedrock. This approach involves providing the model with a small number of examples that demonstrate the desired input/output behavior before asking it to perform a similar task. When you use Claude 3 models on Amazon Bedrock, few-shot prompting can be particularly effective for tasks that require specific formatting, reasoning patterns, or domain knowledge. To implement this technique, you typically structure your prompt with two main components: the example section and the actual query. The example section contains one or more input/output pairs that illustrate the task, and the query section presents the new input for which you want a response. This method helps Claude 3 understand the context and expected output format, and often results in a more accurate and consistent response.

  Example:

  ```
  "query": {
    "bool": {
      "must": [
        {"match": {"post_type": "recipe"}},
        {"range": {"likes_count": {"gte": 100}}},
        {"exists": {"field": "media_urls"}}
      ]
    }
  }
  Question: Find all recipe posts that have at least 100 likes and include media URLs.
  ```
+ **System prompts: **In addition to few-shot prompting, Claude 3 models on Amazon Bedrock also support the use of system prompts. System prompts are a way to provide overall context, instructions, or guidelines to the model before presenting it with specific user inputs. They are particularly useful for setting the tone, defining the model's role, or establishing constraints for the entire conversation. To use a system prompt with Claude 3 on Amazon Bedrock, you include it in the `system` parameter of your API request. This is separate from the user messages and applies to the entire interaction. Detailed system prompts are used to set context and provide guidelines for the model.

  Example:

  ```
  You are an expert query dsl generator. Your task is to take an input question and generate a query dsl to answer the question. Use the schemas and data below to generate the query.
  
  Schemas: [schema details]
  Data: [sample data]
  Guidelines: 
  - Ensure the generated query adheres to DSL query syntax
  - Do not create new mappings or other items that aren't included in the provided schemas.
  ```
+ **Structured output**: You can instruct the model to provide output in specific formats, such as JSON or within XML tags.

  Example:

  ```
  Put the query in json tags
  ```
+ **Prompt chaining**: The notebook uses the output of one LLM call as input for another, such as using generated synthetic data to create example questions.
+ **Context provision**: Relevant context, including schemas and sample data, is provided in the prompts.

  Example:

  ```
  Schemas: [schema details]
  Data: [sample data]
  ```
+ **Task-specific prompts**: Different prompts are crafted for specific tasks, such as generating synthetic data, creating example questions, and converting natural language queries to query DSL.

  Example for generating test questions:

  ```
  Your task is to generate 5 example questions users can ask the health app based on provided schemas and data. Only include the questions generated in the response.
  ```

# Use Amazon Q Developer as a coding assistant to increase your productivity
<a name="use-q-developer-as-coding-assistant-to-increase-productivity"></a>

*Ram Kandaswamy, Amazon Web Services*

## Summary
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-summary"></a>

This pattern uses a tic-tac-toe game to demonstrate how you can apply Amazon Q Developer across a range of development tasks. It generates code for a tic-tac-toe game as a single-page application (SPA), enhances its UI, and creates scripts to deploy the application on AWS.

Amazon Q Developer functions as a coding assistant to help accelerate software development workflows and enhance productivity for both developers and non-developers. Regardless of your technical expertise, it helps you create architectures and design solutions for business problems, bootstraps your working environment, helps you implement new features, and generates test cases for validation. It uses natural language instructions and AI capabilities to ensure consistent, high-quality code and to mitigate coding challenges regardless of your programming skills.

The key advantage of Amazon Q Developer is its ability to liberate you from repetitive coding tasks. When you use the `@workspace` annotation, Amazon Q Developer ingests and indexes all code files, configurations, and project structure in your integrated development environment (IDE), and provides tailored responses to help you focus on creative problem-solving. You can dedicate more time to designing innovative solutions and enhancing the user experience. If you aren't technical, you can use Amazon Q Developer to streamline workflows and collaborate more effectively with the development team. The Amazon Q Developer **Explain code** feature offers detailed instructions and summaries, so you can navigate complex code bases.

Furthermore, Amazon Q Developer provides a language-agnostic approach that helps junior and mid-level developers expand their skill sets. You can concentrate on core concepts and business logic instead of language-specific syntax. This reduces the learning curve when you switch technologies.

## Prerequisites and limitations
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-prereqs"></a>

**Prerequisites**
+ IDE (for example, WebStorm or Visual Studio Code) with the Amazon Q Developer plugin installed. For instructions, see [Installing the Amazon Q Developer extension or plugin in your IDE](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-in-IDE-setup.html) in the Amazon Q Developer documentation.
+ An active AWS account set up with Amazon Q Developer. For instructions, see [Getting started](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/getting-started-q-dev.html) in the Amazon Q Developer documentation.
+ **npm** installed. For instructions, see the [npm documentation](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm). This pattern was tested with npm version 10.8.
+ AWS Command Line Interface (AWS CLI) installed. For instructions, see the [AWS CLI documentation](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

**Limitations**
+ Amazon Q Developer can perform only one development task at a time.
+ Some AWS services aren’t available in all AWS Regions. For Region availability, see [AWS services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). For specific endpoints, see the [Service endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/aws-service-information.html) page, and choose the link for the service.

## Tools
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-tools"></a>
+ This pattern requires an IDE such as Visual Studio Code or WebStorm. For a list of supported IDEs, see the [Amazon Q Developer documentation](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-in-IDE.html#supported-ides-features).
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open source tool that helps you interact with AWS services through commands in your command-line shell.

## Best practices
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-best-practices"></a>

See [Best coding practices with Amazon Q Developer](https://docs.aws.amazon.com/prescriptive-guidance/latest/best-practices-code-generation/best-practices-coding.html) in AWS Prescriptive Guidance. In addition:
+ When you provide prompts to Amazon Q Developer, make sure that your instructions are clear and unambiguous. Add code snippets and annotations such as `@workspace` to the prompt to provide more context for your prompts.
+ Include relevant libraries and import them to avoid conflicts or incorrect guesses by the system.
+ If the code generated isn't accurate or as expected, use the **Provide feedback & regenerate** option. Try breaking the prompts into smaller instructions.

## Epics
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-epics"></a>

### Set up the working environment
<a name="set-up-the-working-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create a new project. | To create a new project in your working environment, run the following command and accept the default settings for all questions:<pre>npx create-next-app@latest</pre> | App developer, Programmer, Software developer | 
| Test the base application. | Run the following command and confirm that the base application loads successfully in the browser:<pre>npm run dev </pre> | App developer, Programmer, Software developer | 
| Clean up the base code. | Navigate to the `page.tsx` file in the `src/app` folder and delete the default content to get a blank page. After deletion, the file should look like this:<pre>export default function Home() {<br />  return (<div></div><br />      );<br />}</pre> | App developer, Programmer, Software developer | 

### Use Amazon Q Developer to design a tic-tac-toe game project
<a name="use-qdevlong-to-design-a-tic-tac-toe-game-project"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Get an overview of steps. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-q-developer-as-coding-assistant-to-increase-productivity.html) | App developer, Programmer, Software developer | 
| Generate code for tic-tac-toe. | In the chat panel, start a development task by using the `/dev` command followed by the description of the task. For example:<pre>/dev Create a React-based single-page application  written in TypeScript for a tic-tac-toe game with the following specifications:<br />1. Design an aesthetically pleasing interface with the game grid centered vertically and <br />horizontally on the page. <br />2. Include a heading and clear instructions on how to play the game.<br />3. Implement color-coding for X and O marks to distinguish them easily. </pre>Amazon Q Developer generates code based on your instructions. | App developer, Programmer, Software developer | 
| Inspect and accept the generated code. | Visually inspect the code, and choose **Accept code** to automatically replace the `page.tsx` file.If you run into issues, choose **Provide feedback & regenerate** and describe the issue you encountered. | App developer, Programmer, Software developer | 
| Fix lint errors. | The example tic-tac-toe game includes a grid. The code that Amazon Q Developer generates might use the default type `any`. You can add type safety by prompting Amazon Q Developer as follows:<pre>/dev Ensure proper TypeScript typing for the onSquare Click event handler <br />to resolve any 'any' type issues.</pre> | App developer, Programmer, Software developer | 
| Add visual appeal. | You can break the original requirement into smaller fragments. For example, you can improve the game UI with the following prompts in the dev tasks. This prompt enhances Cascading Style Sheets (CSS) styles and exports the app for deployment.<pre>/dev Debug and fix any CSS issues to correctly display the game grid and overall layout. <br /><br />Simplify the code by removing game history functionality and related components. <br /><br />Implement static file export to an 'out' directory for easy deployment. The solution <br />should be fully functional, visually appealing, and free of typing errors or layout issues. </pre> | App developer, Programmer, Software developer | 
| Test again. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-q-developer-as-coding-assistant-to-increase-productivity.html) | App developer, Programmer, Software developer | 

### Deploy the application to the AWS Cloud
<a name="deploy-the-application-to-the-aws-cloud"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create folders and files for deployment. | In the project in your working environment, create a deployment folder and two files inside it: `pushtos3.sh` and `cloudformation.yml`:<pre>mkdir deployment && cd deployment<br />touch pushtos3.sh && chmod +x pushtos3.sh<br />touch cloudformation.yml</pre> | App developer, Programmer, Software developer | 
| Generate automation code. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-q-developer-as-coding-assistant-to-increase-productivity.html) | AWS administrator, AWS DevOps, App developer | 
| Generate script content. | To create a deployment script, use the following prompt:<pre>/dev Modify the pushtos3 shell script so that it can use AWS CLI commands to create a <br />CloudFormation stack named tictactoe-stack if it does not exist already, and use <br />cloudformation.yml as the source template. Wait for the stack to complete and sync the <br />contents from the out folder to the S3 bucket. Perform invalidation of the CloudFront <br />origin.</pre> | App developer, Programmer, Software developer | 
| Deploy the application to the AWS Cloud. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/use-q-developer-as-coding-assistant-to-increase-productivity.html) | AWS administrator, AWS DevOps, Cloud architect, App developer | 

## Troubleshooting
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| The build doesn't create a single-page application or export it to the output folder. | Look at the contents of the `next.config.mjs` file.If the code has the following default configuration:<pre>const nextConfig = {};</pre>modify it as follows:<pre>const nextConfig = {<br />  output: 'export',<br />  distDir: 'out',<br />};</pre> | 

## Related resources
<a name="use-q-developer-as-coding-assistant-to-increase-productivity-resources"></a>
+ [Creating a new React project](https://react.dev/learn/start-a-new-react-project) (React documentation)
+ [Amazon Q Developer overview](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html) (AWS documentation)
+ [Amazon Q Developer best practices ](https://docs.aws.amazon.com/prescriptive-guidance/latest/best-practices-code-generation/introduction.html)(AWS Prescriptive Guidance)
+ [Installing, configuring, and using Amazon Q Developer with JetBrains IDEs](https://www.youtube.com/watch?v=-iQfIhTA4J0&pp=ygUSYW1hem9uIHEgZGV2ZWxvcGVy) (YouTube video)
+ [Installing Amazon Q for the command line](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-getting-started-installing.html) (AWS documentation)

# Use SageMaker Processing for distributed feature engineering of terabyte-scale ML datasets
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets"></a>

*Chris Boomhower, Amazon Web Services*

## Summary
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-summary"></a>

Many terabyte-scale or larger datasets often consist of a hierarchical folder structure, and the files in the dataset sometimes share interdependencies. For this reason, machine learning (ML) engineers and data scientists must make thoughtful design decisions to prepare such data for model training and inference. This pattern demonstrates how you can use manual macrosharding and microsharding techniques in combination with Amazon SageMaker Processing and virtual CPU (vCPU) parallelization to efficiently scale feature engineering processes for complicated big data ML datasets. 

This pattern defines *macrosharding* as the splitting of data directories across multiple machines for processing, and *microsharding* as the splitting of data on each machine across multiple processing threads. The pattern demonstrates these techniques by using Amazon SageMaker with sample time-series waveform records from the [PhysioNet MIMIC-III](https://physionet.org/content/mimic3wdb/1.0/) dataset. By implementing the techniques in this pattern, you can minimize the processing time and costs for feature engineering while maximizing resource utilization and throughput efficiency. These optimizations rely on distributed SageMaker Processing on Amazon Elastic Compute Cloud (Amazon EC2) instances and vCPUs for similar, large datasets, regardless of data type.

## Prerequisites and limitations
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-prereqs"></a>

**Prerequisites**
+ Access to SageMaker notebook instances or SageMaker Studio, if you want to implement this pattern for your own dataset. If you are using Amazon SageMaker for the first time, see [Get started with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html) in the AWS documentation.
+ SageMaker Studio, if you want to implement this pattern with the [PhysioNet MIMIC-III](https://physionet.org/content/mimic3wdb/1.0/) sample data. 
+ The pattern uses SageMaker Processing, but doesn’t require any experience running SageMaker Processing jobs.

**Limitations**
+ This pattern is well suited to ML datasets that include interdependent files. These interdependencies benefit the most from manual macrosharding and running multiple, single-instance SageMaker Processing jobs in parallel. For datasets where such interdependencies do not exist, the `ShardedByS3Key` feature in SageMaker Processing might be a better alternative to macrosharding, because it sends sharded data to multiple instances that are managed by the same Processing job. However, you can implement this pattern’s microsharding strategy in both scenarios to best utilize instance vCPUs.

**Product versions**
+ Amazon SageMaker Python SDK version 2

## Architecture
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-architecture"></a>

**Target technology stack**
+ Amazon Simple Storage Service (Amazon S3)
+ Amazon SageMaker

**Target architecture**

*Macrosharding and distributed EC2 instances*

The 10 parallel processes represented in this architecture reflect the structure of the MIMIC-III dataset. (Processes are represented by ellipses for diagram simplification.) A similar architecture applies to any dataset when you use manual macrosharding. In the case of MIMIC-III, you can use the dataset's raw structure to your advantage by processing each patient group folder separately, with minimal effort. In the following diagram, the record groups block appears on the left (1). Given the distributed nature of the data, it makes sense to shard by patient group.

![\[Architecture for microsharding and distributed EC2 instances\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/e7a90b31-de8f-41fd-bb3f-c7c6100fc306/images/c19a8f87-ac59-458e-89cb-50be17ca4a0c.png)


However, manually sharding by patient group means that a separate Processing job is required for each patient group folder, as you can see in the middle section of the diagram (2), instead of a single Processing job with multiple EC2 instances. Because MIMIC-III's data includes both binary waveform files and matching text-based header files, and there is a required dependency on the [wfdb library](https://wfdb.readthedocs.io/en/latest/) for binary data extraction, all the records for a specific patient must be made available on the same instance. The only way to be certain that each binary waveform file's associated header file is also present is to implement manual sharding to run each shard within its own Processing job, and to specify `s3_data_distribution_type='FullyReplicated'` when you define the Processing job input. Alternatively, if all data were available in a single directory and no dependencies existed between files, a more suitable option might be to launch a single Processing job with multiple EC2 instances and `s3_data_distribution_type='ShardedByS3Key'` specified. Specifying `ShardedByS3Key `as the Amazon S3 data distribution type directs SageMaker to manage data sharding automatically across instances. 

Launching a Processing job for each folder is a cost-efficient way to preprocess the data, because running multiple instances concurrently saves time. For additional cost and time savings, you can use microsharding within each Processing job. 

*Microsharding and parallel vCPUs*

Within each Processing job, the grouped data is further divided to maximize use of all available vCPUs on the SageMaker fully managed EC2 instance. The blocks in the middle section of the diagram (2) depict what happens within each primary Processing job. The contents of the patient record folders are flattened and divided evenly based on the  number of available vCPUs on the instance. After the folder contents are divided, the evenly sized set of files are distributed across all vCPUs for processing. When processing is complete, the results from each vCPU are combined into a single data file for each Processing job. 

In the attached code, these concepts are represented in the following section of the `src/feature-engineering-pass1/preprocessing.py` file.

```
def chunks(lst, n):
    """
    Yield successive n-sized chunks from lst.
    
    :param lst: list of elements to be divided
    :param n: number of elements per chunk
    :type lst: list
    :type n: int
    :return: generator comprising evenly sized chunks
    :rtype: class 'generator'
    """
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
 
 
# Generate list of data files on machine
data_dir = input_dir
d_subs = next(os.walk(os.path.join(data_dir, '.')))[1]
file_list = []
for ds in d_subs:
    file_list.extend(os.listdir(os.path.join(data_dir, ds, '.')))
dat_list = [os.path.join(re.split('_|\.', f)[0].replace('n', ''), f[:-4]) for f in file_list if f[-4:] == '.dat']
 
# Split list of files into sub-lists
cpu_count = multiprocessing.cpu_count()
splits = int(len(dat_list) / cpu_count)
if splits == 0: splits = 1
dat_chunks = list(chunks(dat_list, splits))
 
# Parallelize processing of sub-lists across CPUs
ws_df_list = Parallel(n_jobs=-1, verbose=0)(delayed(run_process)(dc) for dc in dat_chunks)
 
# Compile and pickle patient group dataframe
ws_df_group = pd.concat(ws_df_list)
ws_df_group = ws_df_group.reset_index().rename(columns={'index': 'signal'})
ws_df_group.to_json(os.path.join(output_dir, group_data_out))
```

A function, `chunks`, is first defined to consume a given list by dividing it into evenly sized chunks of length `n `and by returning these results as a generator. Next, the data is flattened across patient folders by compiling a list of all binary waveform files that are present. After this is done, the number of vCPUs available on the EC2 instance is obtained. The list of binary waveform files is evenly divided across these vCPUs by calling `chunks`, and then each waveform sublist is processed on its own vCPU by using [joblib's Parallel class](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html). Results are automatically combined into a single list of dataframes by the Processing job, which SageMaker then processes further before writing it to Amazon S3 upon job completion. In this example, there are 10 files written to Amazon S3 by the Processing jobs (one for each job).

When all the initial Processing jobs are complete, a secondary Processing job, which is shown in the block to the right of the diagram (3) combines the output files produced by each primary Processing job and writes the combined output to Amazon S3 (4).

## Tools
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-tools"></a>

**Tools**
+ [Python](https://www.python.org/) – The sample code used for this pattern is Python (version 3).
+ [SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) – Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models. You run SageMaker Processing jobs by using Jupyter notebooks inside SageMaker Studio.
+ [SageMaker Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) – Amazon SageMaker Processing provides a simplified way to run your data processing workloads. In this pattern, the feature engineering code is implemented at scale by using SageMaker Processing jobs.

**Code**

The attached .zip file provides the complete code for this pattern. The following section describes the steps to build the architecture for this pattern. Each step is illustrated by sample code from the attachment.

## Epics
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-epics"></a>

### Set up your SageMaker Studio environment
<a name="set-up-your-sagemaker-studio-environment"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Access Amazon SageMaker Studio. | Onboard to SageMaker Studio in your AWS account by following the directions provided in the [Amazon SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). | Data scientist, ML engineer | 
| Install the wget utility. | Install *wget* if you onboarded with a new SageMaker Studio configuration or if you've never used these utilities in SageMaker Studio before. To install, open a terminal window in the SageMaker Studio console and run the following command:<pre>sudo yum install wget</pre> | Data scientist, ML engineer | 
| Download and unzip the sample code. | Download the `attachments.zip` file in the *Attachments* section. In a terminal window, navigate to the folder where you downloaded the file and extract its contents:<pre>unzip attachment.zip</pre>Navigate to the folder where you extracted the .zip file, and extract the contents of the `Scaled-Processing.zip` file.<pre>unzip Scaled-Processing.zip</pre> | Data scientist, ML engineer | 
| Download the sample dataset from physionet.org and upload it to Amazon S3. | Run the `get_data.ipynb` Jupyter notebook within the folder that contains the `Scaled-Processing` files. This notebook downloads a sample MIMIC-III dataset from [physionet.org](https://physionet.org) and uploads it to your SageMaker Studio session bucket in Amazon S3. | Data scientist, ML engineer | 

### Configure the first preprocessing script
<a name="configure-the-first-preprocessing-script"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Flatten the file hierarchy across all subdirectories. | In large datasets such as MIMIC-III, files are often distributed across multiple subdirectories even within a logical parent group. Your script should be configured to flatten all group files across all subdirectories, as the following code demonstrates.<pre># Generate list of .dat files on machine<br />data_dir = input_dir<br />d_subs = next(os.walk(os.path.join(data_dir, '.')))[1]<br />file_list = []<br />for ds in d_subs:<br />    file_list.extend(os.listdir(os.path.join(data_dir, ds, '.')))<br />dat_list = [os.path.join(re.split('_|\.', f)[0].replace('n', ''), f[:-4]) for f in file_list if f[-4:] == '.dat']</pre>    The example code snippets in this epic are from the `src/feature-engineering-pass1/preprocessing.py` file, which is provided in the attachment. | Data scientist, ML engineer | 
| Divide files into subgroups based on vCPU count. | Files should be divided into evenly sized subgroups, or chunks, depending on the number of vCPUs present on the instance that runs the script. For this step, you can implement code similar to the following.<pre># Split list of files into sub-lists<br />cpu_count = multiprocessing.cpu_count()<br />splits = int(len(dat_list) / cpu_count)<br />if splits == 0: splits = 1<br />dat_chunks = list(chunks(dat_list, splits))</pre> | Data scientist, ML engineer | 
| Parallelize processing of subgroups across vCPUs. | Script logic should be configured to process all subgroups in parallel. To do this, use the Joblib library's `Parallel `class and `delayed `method as follows. <pre># Parallelize processing of sub-lists across CPUs<br />ws_df_list = Parallel(n_jobs=-1, verbose=0)(delayed(run_process)(dc) for dc in dat_chunks)</pre> | Data scientist, ML engineer | 
| Save single file group output to Amazon S3. | When parallel vCPU processing is complete, the results from each vCPU should be combined and uploaded to the file group's S3 bucket path. For this step, you can use code similar to the following.<pre># Compile and pickle patient group dataframe<br />ws_df_group = pd.concat(ws_df_list)<br />ws_df_group = ws_df_group.reset_index().rename(columns={'index': 'signal'})<br />ws_df_group.to_json(os.path.join(output_dir, group_data_out))</pre> | Data scientist, ML engineer | 

### Configure the second preprocessing script
<a name="configure-the-second-preprocessing-script"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Combine data files produced across all Processing jobs that ran the first script. | The previous script outputs a single file for each SageMaker Processing job that processes a group of files from the dataset.  Next, you need to combine these output files into a single object and write a single output dataset to Amazon S3. This is demonstrated in the `src/feature-engineering-pass1p5/preprocessing.py` file, which is provided in the attachment, as follows.<pre>def write_parquet(wavs_df, path):<br />    """<br />    Write waveform summary dataframe to S3 in parquet format.<br />    <br />    :param wavs_df: waveform summary dataframe<br />    :param path: S3 directory prefix<br />    :type wavs_df: pandas dataframe<br />    :type path: str<br />    :return: None<br />    """<br />    extra_args = {"ServerSideEncryption": "aws:kms"}<br />    wr.s3.to_parquet(<br />        df=wavs_df,<br />        path=path,<br />        compression='snappy',<br />        s3_additional_kwargs=extra_args)<br /> <br /> <br />def combine_data():<br />    """<br />    Get combined data and write to parquet.<br />    <br />    :return: waveform summary dataframe<br />    :rtype: pandas dataframe<br />    """<br />    wavs_df = get_data()<br />    wavs_df = normalize_signal_names(wavs_df)<br />    write_parquet(wavs_df, "s3://{}/{}/{}".format(bucket_xform, dataset_prefix, pass1p5out_data))<br /> <br />    return wavs_df<br /> <br /> <br />wavs_df = combine_data()</pre> | Data scientist, ML engineer | 

### Run Processing jobs
<a name="run-processing-jobs"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Run the first Processing job. | To perform macrosharding, run a separate Processing job for each file group. Microsharding is performed inside each Processing job, because each job runs your first script. The following code demonstrates how to launch a Processing job for each file group directory in the following snippet (included in `notebooks/FeatExtract_Pass1.ipynb`).<pre>pat_groups = list(range(30,40))<br />ts = str(int(time.time()))<br /> <br />for group in pat_groups:<br />    sklearn_processor = SKLearnProcessor(framework_version='0.20.0',<br />                                     role=role,<br />                                     instance_type='ml.m5.4xlarge',<br />                                     instance_count=1,<br />                                     volume_size_in_gb=5)<br />    sklearn_processor.run(<br />        code='../src/feature-engineering-pass1/preprocessing.py',<br />        job_name='-'.join(['scaled-processing-p1', str(group), ts]),<br />        arguments=[<br />            "input_path", "/opt/ml/processing/input",<br />            "output_path", "/opt/ml/processing/output",<br />            "group_data_out", "ws_df_group.json"<br />        ],<br />        inputs=<br />        [<br />            ProcessingInput(<br />                source=f's3://{sess.default_bucket()}/data_inputs/{group}',<br />                destination='/opt/ml/processing/input',<br />                s3_data_distribution_type='FullyReplicated'<br />            )<br />        ],<br />        outputs=<br />        [<br />            ProcessingOutput(<br />                source='/opt/ml/processing/output',<br />                destination=f's3://{sess.default_bucket()}/data_outputs/{group}'<br />            )<br />        ],<br />        wait=False<br />    )</pre> | Data scientist, ML engineer | 
| Run the second Processing job. | To combine the outputs generated by the first set of processing jobs and perform any additional computations for preprocessing, you run your second script by using a single SageMaker Processing job. The following code demonstrates this (included in `notebooks/FeatExtract_Pass1p5.ipynb`).<pre>ts = str(int(time.time()))<br />bucket = sess.default_bucket()<br />     <br />sklearn_processor = SKLearnProcessor(framework_version='0.20.0',<br />                                 role=role,<br />                                 instance_type='ml.t3.2xlarge',<br />                                 instance_count=1,<br />                                 volume_size_in_gb=5)<br />sklearn_processor.run(<br />    code='../src/feature-engineering-pass1p5/preprocessing.py',<br />    job_name='-'.join(['scaled-processing', 'p1p5', ts]),<br />    arguments=['bucket', bucket,<br />               'pass1out_prefix', 'data_outputs',<br />               'pass1out_data', 'ws_df_group.json',<br />               'pass1p5out_data', 'waveform_summary.parquet',<br />               'statsdata_name', 'signal_stats.csv'],<br />    wait=True<br />)</pre> | Data scientist, ML engineer | 

## Related resources
<a name="use-sagemaker-processing-for-distributed-feature-engineering-of-terabyte-scale-ml-datasets-resources"></a>
+ [Onboard to Amazon SageMaker Studio Using Quick Start](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html) (SageMaker documentation)
+ [Process Data](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) (SageMaker documentation) 
+ [Data Processing with scikit-learn](https://docs.aws.amazon.com/sagemaker/latest/dg/use-scikit-learn-processing-container.html) (SageMaker documentation) 
+ [joblib.Parallel documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html)
+ Moody, B., Moody, G., Villarroel, M., Clifford, G. D., & Silva, I. (2020). [MIMIC-III Waveform Database](https://doi.org/10.13026/c2607m) (version 1.0). *PhysioNet*.
+ Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). [MIMIC-III, a freely accessible critical care database](https://dx.doi.org/10.1038/sdata.2016.35). Scientific Data, 3, 160035.
+ [MIMIC-III Waveform Database license](https://physionet.org/content/mimic3wdb/1.0/LICENSE.txt)

## Attachments
<a name="attachments-e7a90b31-de8f-41fd-bb3f-c7c6100fc306"></a>

To access additional content that is associated with this document, unzip the following file: [attachment.zip](samples/p-attach/e7a90b31-de8f-41fd-bb3f-c7c6100fc306/attachments/attachment.zip)

# Visualize AI/ML model results using Flask and AWS Elastic Beanstalk
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk"></a>

*Chris Caudill and Durga Sury, Amazon Web Services*

## Summary
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-summary"></a>

Visualizing output from artificial intelligence and machine learning (AI/ML) services often requires complex API calls that must be customized by your developers and engineers. This can be a drawback if your analysts want to quickly explore a new dataset.

You can enhance the accessibility of your services and provide a more interactive form of data analysis by using a web-based user interface (UI) that enables users to upload their own data and visualize the model results in a dashboard.

This pattern uses [Flask](https://flask.palletsprojects.com/en/stable/) and [Plotly](https://plotly.com/) to integrate Amazon Comprehend with a custom web application and visualize sentiments and entities from user-provided data. The pattern also provides the steps to deploy an application by using AWS Elastic Beanstalk. You can adapt the application by using [AWS AI services](https://aws.amazon.com/machine-learning/ai-services/) or with a custom trained model hosted on an endpoint (for example, an [Amazon SageMaker endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html)).

## Prerequisites and limitations
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-prereqs"></a>

**Prerequisites **
+ An active AWS account. 
+ AWS Command Line Interface (AWS CLI), installed and configured on your local machine. For more information about this, see [Configuration basics](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) in the AWS CLI documentation. You can also use an AWS Cloud9 integrated development environment (IDE); for more information about this, see [Python tutorial for AWS Cloud9](https://docs.aws.amazon.com/cloud9/latest/user-guide/sample-python.html) and [Previewing running applications in the AWS Cloud9 IDE](https://docs.aws.amazon.com/cloud9/latest/user-guide/app-preview.html) in the AWS Cloud9 documentation.

  **Notice**: AWS Cloud9 is no longer available to new customers. Existing customers of AWS Cloud9 can continue to use the service as normal. [Learn more](https://aws.amazon.com/blogs/devops/how-to-migrate-from-aws-cloud9-to-aws-ide-toolkits-or-aws-cloudshell/)
+ An understanding of Flask’s web application framework. For more information about Flask, see the [Quickstart](https://flask.palletsprojects.com/en/stable/quickstart/) in the Flask documentation.
+ Python version 3.6 or later, installed and configured. You can install Python by following the instructions from [Setting up your Python development environment](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/python-development-environment.html) in the AWS Elastic Beanstalk documentation.
+ Elastic Beanstalk Command Line Interface (EB CLI), installed and configured. For more information about this, see [Install the EB CLI](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-install.html) and [Configure the EB CLI](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-configuration.html) from the Elastic Beanstalk documentation.

**Limitations**
+ This pattern’s Flask application is designed to work with .csv files that use a single text column and are restricted to 200 rows. The application code can be adapted to handle other file types and data volumes.
+ The application doesn’t consider data retention and continues to aggregate uploaded user files until they are manually deleted. You can integrate the application with Amazon Simple Storage Service (Amazon S3) for persistent object storage or use a database such as Amazon DynamoDB for serverless key-value storage.
+ The application only considers documents in the English language. However, you can use Amazon Comprehend to detect a document’s primary language. For more information about the supported languages for each action, see [API reference](https://docs.aws.amazon.com/comprehend/latest/dg/API_Reference.html) in the Amazon Comprehend documentation.

## Architecture
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-architecture"></a>

**Flask application architecture**

Flask is a lightweight framework for developing web applications in Python. It is designed to combine Python’s powerful data processing with a rich web UI. The pattern’s Flask application shows you how to build a web application that enables users to upload data, sends the data to Amazon Comprehend for inference, and then visualizes the results.   The application has the following structure:
+ `static` – Contains all the static files that support the web UI (for example, JavaScript, CSS, and images)
+ `templates` – Contains all of the application's HTML pages
+ `userData` – Stores uploaded user data
+ `application.py` – The Flask application file
+ `comprehend_helper.py` – Functions to make API calls to Amazon Comprehend
+ `config.py` – The application configuration file
+ `requirements.txt` – The Python dependencies required by the application

The `application.py` script contains the web application's core functionality, which consists of four Flask routes. The following diagram shows these Flask routes.

![\[The four Flask routes that make up the web application's core functionality.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/03d80cf1-ec97-43f7-adb5-2746a9ec70e6/images/9ca6bad1-26e2-4262-98d0-d54c172336bf.png)


 
+ `/` is the application's root and directs users to the `upload.html` page (stored in the `templates` directory).
+ `/saveFile` is a route that is invoked after a user uploads a file. This route receives a `POST` request via an HTML form, which contains the file uploaded by the user. The file is saved in the `userData` directory and the route redirects users to the `/dashboard` route.
+ `/dashboard` sends users to the `dashboard.html` page. Within this page's HTML, it runs the JavaScript code in `static/js/core.js` that reads data from the `/data` route and then builds visualizations for the page.
+ `/data` is a JSON API that presents the data to be visualized in the dashboard. This route reads the user-provided data and uses the functions in `comprehend_helper.py` to send the user data to Amazon Comprehend for sentiment analysis and named entity recognition (NER). The Amazon Comprehend response is formatted and returned as a JSON object.

**Deployment architecture**

![\[Architecture diagram for using Flask and Elastic Beanstalk to visualize AI/ML model results.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/03d80cf1-ec97-43f7-adb5-2746a9ec70e6/images/d691bfd2-e2ec-4830-8bff-ffa1e3a95c4a.png)


[Design considerations](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts.concepts.design.html)

 For more information about design considerations for applications deployed using Elastic Beanstalk on the AWS Cloud, see in the AWS Elastic Beanstalk documentation.

**Technology stack**
+ Amazon Comprehend
+ Elastic Beanstalk
+ Flask 

**Automation and scale**

Elastic Beanstalk deployments are automatically set up with load balancers and auto scaling groups. For more configuration options, see [Configuring Elastic Beanstalk environments](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers.html) in the Elastic Beanstalk documentation.

## Tools
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-tools"></a>
+ [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is a unified tool that provides a consistent interface for interacting with all parts of AWS.
+ [Amazon Comprehend](https://docs.aws.amazon.com/comprehend/latest/dg/comprehend-general.html) uses natural language processing (NLP) to extract insights about the content of documents without requiring special preprocessing.
+ [AWS Elastic Beanstalk](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html) helps you quickly deploy and manage applications in the AWS Cloud without having to learn about the infrastructure that runs those applications.
+ [Elastic Beanstalk CLI (EB CLI)](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3.html) is a command line interface for AWS Elastic Beanstalk that provides interactive commands to simplify creating, updating, and monitoring environments from a local repository.
+ The [Flask](https://flask.palletsprojects.com/en/stable/) framework performs data processing and API calls using Python and offers interactive web visualization with Plotly.

**Code repository**

The code for this pattern is available in the GitHub [Visualize AI/ML model results using Flask and AWS Elastic Beanstalk](https://github.com/aws-samples/aws-comprehend-elasticbeanstalk-for-flask) repository.

## Epics
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-epics"></a>

### Set up the Flask application
<a name="set-up-the-flask-application"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Clone the GitHub repository. | Pull the application code from the GitHub [Visualize AI/ML model results using Flask and AWS Elastic Beanstalk](https://github.com/aws-samples/aws-comprehend-elasticbeanstalk-for-flask) repository by running the following command:`git clone git@github.com:aws-samples/aws-comprehend-elasticbeanstalk-for-flask.git`Make sure that you configure your SSH keys with GitHub. | Developer | 
| Install the Python modules. | After you clone the repository, a new local `aws-comprehend-elasticbeanstalk-for-flask` directory is created. In that directory, the `requirements.txt` file contains the Python modules and versions that run the application. Use the following commands to install the modules:`cd aws-comprehend-elasticbeanstalk-for-flask``pip install -r requirements.txt` | Python developer | 
| Test the application locally. | Start the Flask server by running the following command:`python application.py`This returns information about the running server. You should be able to access the application by opening a browser and visiting http://localhost:5000If you're running the application in an AWS Cloud9 IDE, you need to replace the `application.run()` command in the `application.py` file with the following line:`application.run(host=os.getenv('IP', '0.0.0.0'),port=int(os.getenv('PORT', 8080)))`You must revert this change before deployment. | Python developer | 

### Deploy the application to Elastic Beanstalk
<a name="deploy-the-application-to-aeb"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Launch the Elastic Beanstalk application. | To launch your project as an Elastic Beanstalk application, run the following command from your application’s root directory:`eb init -p python-3.6 comprehend_flask --region us-east-1` [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk.html)Run the `eb init -i` command for more deployment configuration options. | Architect, Developer | 
| Deploy the Elastic Beanstalk environment. | Run the following command from the application's root directory:`eb create comprehend-flask-env``comprehend-flask-env` is the name of the Elastic Beanstalk environment and can be changed according to your requirements. The name can only contain letters, numbers, and dashes. | Architect, Developer | 
| Authorize your deployment to use Amazon Comprehend. | Although your application might be successfully deployed, you should also provide your deployment with access to Amazon Comprehend. `ComprehendFullAccess` is an AWS managed policy that provides the deployed application with permissions to make API calls to Amazon Comprehend.Attach the `ComprehendFullAccess` policy to `aws-elasticbeanstalk-ec2-role` (this role is automatically created for your deployment’s Amazon Elastic Compute Cloud (Amazon EC2) instances) by running the following command:`aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/ComprehendFullAccess --role-name aws-elasticbeanstalk-ec2-role``aws-elasticbeanstalk-ec2-role` is created when your application deploys. You must complete the deployment process before you can attach the AWS Identity and Access Management (IAM) policy. | Developer, Security architect | 
| Visit your deployed application. | After your application successfully deploys, you can visit it by running the `eb open` command.You can also run the `eb status` command to receive details about your deployment. The deployment URL is listed under `CNAME`. | Architect, Developer | 

### (Optional) Customize the application to your ML model
<a name="optional-customize-the-application-to-your-ml-model"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Authorize Elastic Beanstalk to access the new model. | Make sure that Elastic Beanstalk has the required access permissions for your new model endpoint. For example, if you use an Amazon SageMaker AI endpoint, your deployment needs to have permission to invoke the endpoint. For more information about this, see [InvokeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) in the Amazon SageMaker AI documentation. | Developer, Security architect | 
| Send the user data to a new model. | To change the underlying ML model in this application, you must change the following files:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk.html) | Data scientist | 
| Update the dashboard visualizations. | Typically, incorporating a new ML model means that visualizations must be updated to reflect the new results. These changes are made in the following files:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk.html) | Web developer | 

### (Optional) Deploy the updated application
<a name="optional-deploy-the-updated-application"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Update your application's requirements file. | Before sending changes to Elastic Beanstalk, update the `requirements.txt` file to reflect any new Python modules by running the following command in your application's root directory:`pip freeze > requirements.txt` | Python developer | 
| Redeploy the Elastic Beanstalk environment. | To ensure that your application changes are reflected in your Elastic Beanstalk deployment, navigate to your application's root directory and run the following command:`eb deploy`This sends the most recent version of the application's code to your existing Elastic Beanstalk deployment. | Systems administrator, Architect | 

## Troubleshooting
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-troubleshooting"></a>


| Issue | Solution | 
| --- | --- | 
| `Unable to assume role "arn:aws:iam::xxxxxxxxxx:role/aws-elasticbeanstalk-ec2-role". Verify that the role exists and is configured correctly.` | If this error occurs when you run `eb create`, create a sample application on the Elastic Beanstalk console to create the default instance profile. For more information about this, see [Creating an Elastic Beanstalk environment](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.environments.html) in the AWS Elastic Beanstalk documentation. | 
| `Your WSGIPath refers to a file that does not exist.` | This error occurs in deployment logs because Elastic Beanstalk expects the Flask code to be named `application.py`. If you chose a different name, run `eb config` and edit the WSGIPath as shown in the following code sample:<pre>aws:elasticbeanstalk:container:python:<br />     NumProcesses: '1'<br />     NumThreads: '15'<br />     StaticFiles: /static/=static/<br />     WSGIPath: application.py</pre>Make sure that you replace `application.py` with your file name.You can also leverage Gunicorn and a Procfile. For more information about this approach, see [Configuring the WSGI server with a Procfile](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/python-configuration-procfile.html) in the AWS Elastic Beanstalk documentation. | 
| `Target WSGI script '/opt/python/current/app/application.py' does not contain WSGI application 'application'.` | Elastic Beanstalk expects the variable that represents your Flask application to be named `application`. Make sure that the `application.py` file uses `application` as the variable name:<pre>application = Flask(__name__)</pre> | 
| `The EB CLI cannot find your SSH key file for keyname` | Use the EB CLI to specify which key pair to use or to create a key pair for your deployment’s Amazon EC2 instances. To resolve the error, run `eb init -i` and one of the options will ask:<pre>Do you want to set up SSH for your instances?</pre>Respond with `Y` to either create a key pair or specify an existing key pair. | 
| I’ve updated my code and redeployed, but my deployment is not reflecting my changes. | If you’re using a Git repository with your deployment, make sure that you add and commit your changes before redeploying. | 
| You are previewing the Flask application from an AWS Cloud9 IDE and run into errors. | For more information about this, see [Previewing running applications in the AWS Cloud9 IDE](https://docs.aws.amazon.com/cloud9/latest/user-guide/app-preview.html) in the AWS Cloud9 documentation. | 

## Related resources
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-resources"></a>
+ [Call an Amazon SageMaker AI model endpoint using Amazon API Gateway and AWS Lambda](https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/)
+ [Deploying a Flask application to Elastic Beanstalk](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html)
+ [EB CLI command reference](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-cmd-commands.html)
+ [Setting up your Python development environment](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/python-development-environment.html)

## Additional information
<a name="visualize-ai-ml-model-results-using-flask-and-aws-elastic-beanstalk-additional"></a>

*Natural language processing using Amazon Comprehend*

By choosing to use Amazon Comprehend, you can detect custom entities in individual text documents by running real-time analysis or asynchronous batch jobs. Amazon Comprehend also enables you to train custom entity recognition and text classification models that can be used in real time by creating an endpoint.

This pattern uses asynchronous batch jobs to detect sentiments and entities from an input file that contains multiple documents. The sample application provided by this pattern is designed for users to upload a .csv file containing a single column with one text document per row. The `comprehend_helper.py` file in the GitHub [Visualize AI/ML model results using Flask and AWS Elastic Beanstalk](https://github.com/aws-samples/aws-comprehend-elasticbeanstalk-for-flask) repository reads the input file and sends the input to Amazon Comprehend for processing.

*BatchDetectEntities*

Amazon Comprehend inspects the text of a batch of documents for named entities and returns the detected entity, location, [type of entity](https://docs.aws.amazon.com/comprehend/latest/dg/how-entities.html), and a score that indicates the Amazon Comprehend level of confidence. A maximum of 25 documents can be sent in one API call, with each document smaller than 5,000 bytes in size. You can filter the results to show only certain entities based on the use case. For example, you could skip the `‘quantity’` entity type and set a threshold score for the detected entity (for example, 0.75). We recommend that you explore the results for your specific use case before choosing a threshold value. For more information about this, see [BatchDetectEntities](https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectEntities.html) in the Amazon Comprehend documentation.

*BatchDetectSentiment*

Amazon Comprehend inspects a batch of incoming documents and returns the prevailing sentiment for each document (`POSITIVE`, `NEUTRAL`, `MIXED`, or `NEGATIVE`). A maximum of 25 documents can be sent in one API call, with each document smaller than 5,000 bytes in size. Analyzing the sentiment is straightforward and you choose the sentiment with the highest score to be displayed in the final results. For more information about this, see [BatchDetectSentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectSentiment.html) in the Amazon Comprehend documentation.

*Flask configuration handling*

Flask servers use a series of [configuration variables](https://flask.palletsprojects.com/en/stable/config/) to control how the server runs. These variables can contain debug output, session tokens, or other application settings. You can also define custom variables that can be accessed while the application is running. There are multiple approaches for setting configuration variables.

In this pattern, the configuration is defined in `config.py` and inherited within `application.py`.
+ `config.py` contains the configuration variables that are set up on the application's startup. In this application, a `DEBUG` variable is defined to tell the application to run the server in [debug mode](https://flask.palletsprojects.com/en/stable/config/#DEBUG). 
**Note**  
Debug mode should not be used when running an application in a production environment. `UPLOAD_FOLDER` is a custom variable that is defined to be referenced later in the application and inform it where uploaded user data should be stored.
+ `application.py` initiates the Flask application and inherits the configuration settings defined in `config.py`. This is performed by the following code:

```
application = Flask(__name__)
application.config.from_pyfile('config.py')
```

# More patterns
<a name="machinelearning-more-patterns-pattern-list"></a>

**Topics**
+ [Accelerate MLOps with Backstage and self-service Amazon SageMaker AI templates](accelerate-mlops-with-backstage-and-sagemaker-templates.md)
+ [Automate AWS infrastructure operations by using Amazon Bedrock](automate-aws-infrastructure-operations-by-using-amazon-bedrock.md)
+ [Automate the setup of inter-Region peering with AWS Transit Gateway](automate-the-setup-of-inter-region-peering-with-aws-transit-gateway.md)
+ [Deploy agentic systems on Amazon Bedrock with the CrewAI framework by using Terraform](deploy-agentic-systems-on-amazon-bedrock-with-the-crewai-framework.md)
+ [Deploy a ChatOps solution to manage SAST scan results by using Amazon Q Developer in chat applications custom actions and CloudFormation](deploy-chatops-solution-to-manage-sast-scan-results.md)
+ [Generate data insights by using AWS Mainframe Modernization and Amazon Q in Quick Sight](generate-data-insights-by-using-aws-mainframe-modernization-and-amazon-q-in-quicksight.md)
+ [Generate Db2 z/OS data insights by using AWS Mainframe Modernization and Amazon Q in Quick Sight](generate-db2-zos-data-insights-aws-mainframe-modernization-amazon-q-in-quicksight.md)
+ [Give SageMaker notebook instances temporary access to a CodeCommit repository in another AWS account](give-sagemaker-notebook-instances-temporary-access-to-a-codecommit-repository-in-another-aws-account.md)
+ [Manage AWS Organizations policies as code by using AWS CodePipeline and Amazon Bedrock](manage-organizations-policies-as-code.md)
+ [Modernize the CardDemo mainframe application by using AWS Transform](modernize-carddemo-mainframe-app.md)
+ [Modernize and deploy mainframe applications using AWS Transform and Terraform](modernize-mainframe-app-transform-terraform.md)
+ [Perform advanced analytics using Amazon Redshift ML](perform-advanced-analytics-using-amazon-redshift-ml.md)
+ [Streamline Amazon EC2 compliance management with Amazon Bedrock agents and AWS Config](streamline-amazon-ec2-compliance-management-with-amazon-bedrock-agents-and-aws-config.md)
+ [Streamline Amazon Lex bot development and deployment by using an automated workflow](streamline-amazon-lex-bot-development-and-deployment-using-an-automated-workflow.md)
+ [Transform Easytrieve to modern languages by using AWS Transform custom](transform-easytrieve-modern-languages.md)
+ [Troubleshoot states in AWS Step Functions by using Amazon Bedrock](troubleshooting-states-in-aws-step-functions.md)