

# Using Amazon Aurora machine learning
<a name="aurora-ml"></a>

By using Amazon Aurora machine learning, you can integrate your Aurora DB cluster with one of the following AWS machine learning services, depending on your needs. They each support specific machine learning use cases.

**Amazon Bedrock**  
Amazon Bedrock is a fully managed service that makes leading foundation models from AI companies available through an API, along with developer tooling to help build and scale generative AI applications. With Amazon Bedrock, you pay to run inference on any of the third-party foundation models. Pricing is based on the volume of input tokens and output tokens, and on whether you have purchased provisioned throughput for the model. For more information, see [What is Amazon Bedrock?](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html) in the *Amazon Bedrock User Guide*. 

**Amazon Comprehend**  
Amazon Comprehend is a managed natural language processing (NLP) service that's used to extract insights from documents. With Amazon Comprehend, you can deduce sentiment based on the content of documents, by analyzing entities, key phrases, language, and other features. To learn more, see [What is Amazon Comprehend?](https://docs.aws.amazon.com/comprehend/latest/dg/what-is.html) in the *Amazon Comprehend Developer Guide*. 

**SageMaker AI**  
Amazon SageMaker AI is a fully managed machine learning service. Data scientists and developers use Amazon SageMaker AI to build, train, and test machine learning models for a variety of inference tasks, such as fraud detection and product recommendation. When a machine learning model is ready for use in production, it can be deployed to the Amazon SageMaker AI hosted environment. For more information, see [What Is Amazon SageMaker AI?](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) in the *Amazon SageMaker AI Developer Guide*.   
Using Amazon Comprehend with your Aurora DB cluster has less preliminary setup than using SageMaker AI. If you're new to AWS machine learning, we recommend that you start by exploring Amazon Comprehend.

**Topics**
+ [Using Amazon Aurora machine learning with Aurora MySQL](mysql-ml.md)
+ [Using Amazon Aurora machine learning with Aurora PostgreSQL](postgresql-ml.md)

# Using Amazon Aurora machine learning with Aurora MySQL
<a name="mysql-ml"></a>

By using Amazon Aurora machine learning with your Aurora MySQL DB cluster, you can use Amazon Bedrock, Amazon Comprehend, or Amazon SageMaker AI, depending on your needs. They each support different machine learning use cases.

**Contents**
+ [Requirements for using Aurora machine learning with Aurora MySQL](#aurora-ml-prereqs)
+ [Region and version availability](#aurora-ml-availability)
+ [Supported features and limitations of Aurora machine learning with Aurora MySQL](#aurora-ml-limitations)
+ [Setting up your Aurora MySQL DB cluster to use Aurora machine learning](#aurora-ml-setting-up-access)
  + [Setting up your Aurora MySQL DB cluster to use Amazon Bedrock](#aurora-ml-setup-bedrock)
  + [Setting up your Aurora MySQL DB cluster to use Amazon Comprehend](#aurora-ml-setting-up-access-comprehend)
  + [Setting up your Aurora MySQL DB cluster to use SageMaker AI](#aurora-ml-setting-up-access-sagemaker)
    + [Setting up your Aurora MySQL DB cluster to use Amazon S3 for SageMaker AI (Optional)](#aurora-ml-setting-up-access-sagemaker-advanced-s3)
  + [Granting database users access to Aurora machine learning](#aurora-ml-sql-privileges)
    + [Granting access to Amazon Bedrock functions](#aurora-ml-sql-privileges.br)
    + [Granting access to Amazon Comprehend functions](#aurora-ml-sql-privileges.cmp)
    + [Granting access to SageMaker AI functions](#aurora-ml-sql-privileges.sm)
+ [Using Amazon Bedrock with your Aurora MySQL DB cluster](#using-amazon-bedrock)
+ [Using Amazon Comprehend with your Aurora MySQL DB cluster](#using-amazon-comprehend-for-sentiment-detection)
+ [Using SageMaker AI with your Aurora MySQL DB cluster](#using-amazon-sagemaker-to-run-your-own-ml-models)
  + [Character set requirement for SageMaker AI functions that return strings](#note-character-set-of-ml-functions-return-type)
  + [Exporting data to Amazon S3 for SageMaker AI model training (Advanced)](#exporting-data-to-s3-for-model-training)
+ [Performance considerations for using Aurora machine learning with Aurora MySQL](#aurora-ml-performance)
  + [Model and prompt](#ml-model-prompt)
  + [Query cache](#ml-query-cache)
  + [Batch optimization for Aurora machine learning function calls](#ml-batch-optimization)
+ [Monitoring Aurora machine learning](#aurora-ml-monitoring)

## Requirements for using Aurora machine learning with Aurora MySQL
<a name="aurora-ml-prereqs"></a>

AWS machine learning services are managed services that are set up and run in their own production environments. Aurora machine learning supports integration with Amazon Bedrock, Amazon Comprehend, and SageMaker AI. Before trying to set up your Aurora MySQL DB cluster to use Aurora machine learning, be sure you understand the following requirements and prerequisites.
+ The machine learning services must be running in the same AWS Region as your Aurora MySQL DB cluster. You can't use machine learning services from an Aurora MySQL DB cluster in a different Region.
+ If your Aurora MySQL DB cluster is in a different virtual public cloud (VPC) from your Amazon Bedrock, Amazon Comprehend, or SageMaker AI service, the VPC's Security group needs to allow outbound connections to the target Aurora machine learning service. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html) in the *Amazon VPC User Guide*.
+ You can upgrade an Aurora cluster that's running a lower version of Aurora MySQL to a supported higher version if you want to use Aurora machine learning with that cluster. For more information, see [Database engine updates for Amazon Aurora MySQLLong-term support (LTS) and beta releases for Amazon Aurora MySQL](AuroraMySQL.Updates.md).
+ Your Aurora MySQL DB cluster must use a custom DB cluster parameter group. At the end of the setup process for each Aurora machine learning service that you want to use, you add the Amazon Resource Name (ARN) of the associated IAM role that was created for the service. We recommend that you create a custom DB cluster parameter group for your Aurora MySQL in advance and configure your Aurora MySQL DB cluster to use it so that it's ready for you to modify at the end of the setup process.
+ For SageMaker AI:
  + The machine learning components that you want to use for inferences must be set up and ready to use. During the configuration process for your Aurora MySQL DB cluster, make sure to have the ARN of the SageMaker AI endpoint available. The data scientists on your team are likely best able to handle working with SageMaker AI to prepare the models and handle the other such tasks. To get started with Amazon SageMaker AI, see [Get Started with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html). For more information about inferences and endpoints, see [Real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html). 
  + To use SageMaker AI with your own training data, you must set up an Amazon S3 bucket as part of your Aurora MySQL configuration for Aurora machine learning. To do so, you follow the same general process as for setting up the SageMaker AI integration. For a summary of this optional setup process, see [Setting up your Aurora MySQL DB cluster to use Amazon S3 for SageMaker AI (Optional)](#aurora-ml-setting-up-access-sagemaker-advanced-s3).
+ For Aurora global databases, you set up the Aurora machine learning services that you want to use in all AWS Regions that make up your Aurora global database. For example, if you want to use Aurora machine learning with SageMaker AI for your Aurora global database, you do the following for every Aurora MySQL DB cluster in every AWS Region:
  + Set up the Amazon SageMaker AI services with the same SageMaker AI training models and endpoints. These must also use the same names.
  + Create the IAM roles as detailed in [Setting up your Aurora MySQL DB cluster to use Aurora machine learning](#aurora-ml-setting-up-access).
  + Add the ARN of the IAM role to the custom DB cluster parameter group for each Aurora MySQL DB cluster in every AWS Region.

  These tasks require that Aurora machine learning is available for your version of Aurora MySQL in all AWS Regions that make up your Aurora global database.

## Region and version availability
<a name="aurora-ml-availability"></a>

Feature availability and support varies across specific versions of each Aurora database engine, and across AWS Regions.
+ For information on version and Region availability for Amazon Comprehend and Amazon SageMaker AI with Aurora MySQL, see [Aurora machine learning with Aurora MySQL](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Aurora_ML.md#Concepts.Aurora_Fea_Regions_DB-eng.Feature.Aurora_ML.amy).
+ Amazon Bedrock is supported only on Aurora MySQL version 3.06 and higher.

  For information on Region availability for Amazon Bedrock, see [Model support by AWS Region](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) in the *Amazon Bedrock User Guide*.

## Supported features and limitations of Aurora machine learning with Aurora MySQL
<a name="aurora-ml-limitations"></a>

When using Aurora MySQL with Aurora machine learning, the following limitations apply:
+ The Aurora machine learning extension doesn't support vector interfaces.
+ Aurora machine learning integrations aren't supported when used in a trigger.
+ Aurora machine learning functions aren't compatible with binary logging (binlog) replication.
  + The setting `--binlog-format=STATEMENT` throws an exception for calls to Aurora machine learning functions.
  + Aurora machine learning functions are nondeterministic, and nondeterministic stored functions aren't compatible with the binlog format.

  For more information, see [Binary Logging Formats](https://dev.mysql.com/doc/refman/5.7/en/binary-log-formats.html) in the MySQL documentation.
+ Stored functions that call tables with generated-always columns aren't supported. This applies to any Aurora MySQL stored function. To learn more about this column type, see [CREATE TABLE and Generated Columns](https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html) in the MySQL documentation.
+ Amazon Bedrock functions don't support `RETURNS JSON`. You can use `CONVERT` or `CAST` to convert from `TEXT` to `JSON` if needed.
+ Amazon Bedrock doesn't support batch requests.
+ Aurora MySQL supports any SageMaker AI endpoint that reads and writes the comma-separated value (CSV) format, through a `ContentType` of `text/csv`. This format is accepted by the following built-in SageMaker AI algorithms:
  + Linear Learner
  + Random Cut Forest
  + XGBoost

  To learn more about these algorithms, see [Choose an Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algorithms-choose.html) in the *Amazon SageMaker AI Developer Guide*.

## Setting up your Aurora MySQL DB cluster to use Aurora machine learning
<a name="aurora-ml-setting-up-access"></a>

In the following topics, you can find separate setup procedures for each of these Aurora machine learning services.

**Contents**
+ [Setting up your Aurora MySQL DB cluster to use Amazon Bedrock](#aurora-ml-setup-bedrock)
+ [Setting up your Aurora MySQL DB cluster to use Amazon Comprehend](#aurora-ml-setting-up-access-comprehend)
+ [Setting up your Aurora MySQL DB cluster to use SageMaker AI](#aurora-ml-setting-up-access-sagemaker)
  + [Setting up your Aurora MySQL DB cluster to use Amazon S3 for SageMaker AI (Optional)](#aurora-ml-setting-up-access-sagemaker-advanced-s3)
+ [Granting database users access to Aurora machine learning](#aurora-ml-sql-privileges)
  + [Granting access to Amazon Bedrock functions](#aurora-ml-sql-privileges.br)
  + [Granting access to Amazon Comprehend functions](#aurora-ml-sql-privileges.cmp)
  + [Granting access to SageMaker AI functions](#aurora-ml-sql-privileges.sm)

### Setting up your Aurora MySQL DB cluster to use Amazon Bedrock
<a name="aurora-ml-setup-bedrock"></a>

Aurora machine learning relies on AWS Identity and Access Management (IAM) roles and policies to allow your Aurora MySQL DB cluster to access and use the Amazon Bedrock services. The following procedures create an IAM permission policy and role so that your DB cluster can integrate with Amazon Bedrock.

**To create the IAM policy**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Policies** in the navigation pane.

1. Choose **Create a policy**.

1. On the **Specify permissions** page, for **Select a service**, choose **Bedrock**.

   The Amazon Bedrock permissions display.

1. Expand **Read**, then select **InvokeModel**.

1. For **Resources**, select **All**.

   The **Specify permissions** page should resemble the following figure.  
![\[Amazon Bedrock IAM permission policy for the InvokeModel operation.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-br-iam-policy.png)

1. Choose **Next**.

1. On the **Review and create** page, enter a name for your policy, for example **BedrockInvokeModel**.

1. Review your policy, then choose **Create policy**.

Next you create the IAM role that uses the Amazon Bedrock permission policy.

**To create the IAM role**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Roles** in the navigation pane.

1. Choose **Create role**.

1. On the **Select trusted entity** page, for **Use case**, choose **RDS**.

1. Select **RDS - Add Role to Database**, then choose **Next**.

1. On the **Add permissions** page, for **Permissions policies**, select the IAM policy that you created, then choose **Next**.

1. On the **Name, review, and create** page, enter a name for your role, for example **ams-bedrock-invoke-model-role**.

   The role should resemble the following figure.  
![\[Amazon Bedrock IAM role for the InvokeModel operation.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-br-iam-role.png)

1. Review your role, then choose **Create role**.

Next you associate the Amazon Bedrock IAM role with your DB cluster.

**To associate the IAM role with your DB cluster**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Choose **Databases** from the navigation pane.

1. Choose the Aurora MySQL DB cluster that you want to connect to Amazon Bedrock services.

1. Choose the **Connectivity & security** tab.

1. For **Manage IAM roles** section, choose **Select IAM to add to this cluster**.

1. Choose the IAM that you created, and then choose **Add role**.

   The IAM role is associated with your DB cluster, first with the status **Pending**, then **Active**. When the process completes, you can find the role in the **Current IAM roles for this cluster** list.  
![\[IAM role associated with your DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-br-manage-iam-roles.png)

You must add the ARN of this IAM role to the `aws_default_bedrock_role` parameter of the custom DB cluster parameter group associated with your Aurora MySQL DB cluster. If your Aurora MySQL DB cluster doesn't use a custom DB cluster parameter group, you need to create one to use with your Aurora MySQL DB cluster to complete the integration. For more information, see [DB cluster parameter groups for Amazon Aurora DB clusters](USER_WorkingWithDBClusterParamGroups.md).

**To configure the DB cluster parameter**

1. In the Amazon RDS Console, open the **Configuration** tab of your Aurora MySQL DB cluster.

1. Locate the DB cluster parameter group configured for your cluster. Choose the link to open your custom DB cluster parameter group, then choose **Edit**.

1. Find the `aws_default_bedrock_role` parameter in your custom DB cluster parameter group.

1. In the **Value** field, enter the ARN of the IAM role.

1. Choose **Save changes** to save the setting.

1. Reboot the primary instance of your Aurora MySQL DB cluster so that this parameter setting takes effect.

The IAM integration for Amazon Bedrock is complete. Continue setting up your Aurora MySQL DB cluster to work with Amazon Bedrock by [Granting database users access to Aurora machine learning](#aurora-ml-sql-privileges).

### Setting up your Aurora MySQL DB cluster to use Amazon Comprehend
<a name="aurora-ml-setting-up-access-comprehend"></a>

Aurora machine learning relies on AWS Identity and Access Management roles and policies to allow your Aurora MySQL DB cluster to access and use the Amazon Comprehend services. The following procedure automatically creates an IAM role and policy for your cluster so that it can use Amazon Comprehend.

**To set up your Aurora MySQL DB cluster to use Amazon Comprehend**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Choose **Databases** from the navigation pane.

1. Choose the Aurora MySQL DB cluster that you want to connect to Amazon Comprehend services.

1. Choose the **Connectivity & security** tab.

1. For **Manage IAM roles** section, choose **Select a service to connect to this cluster**.

1. Choose **Amazon Comprehend** from the menu, and then choose **Connect service**.  
![\[Image showing Amazon Comprehend chosen for the Aurora MySQL DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-cmp-iam-role.png)

1. The **Connect cluster to Amazon Comprehend** dialog doesn't require any additional information. However, you might see a message notifying you that the integration between Aurora and Amazon Comprehend is currently in preview. Be sure to read the message before you continue. You can choose **Cancel** if you prefer not to proceed.

1. Choose **Connect service** to complete the integration process. 

   Aurora creates the IAM role. It also creates the policy that allows the Aurora MySQL DB cluster to use Amazon Comprehend services and attaches the policy to the role. When the process completes, you can find the role in the **Current IAM roles for this cluster** list as shown in the following image.   
![\[Completed IAM integration for using Amazon Comprehend with Aurora MySQL showing the Active role.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-cmp-iam-role-4.png)

   You need to add the ARN of this IAM role to the `aws_default_comprehend_role` parameter of the custom DB cluster parameter group associated with your Aurora MySQL DB cluster. If your Aurora MySQL DB cluster doesn't use a custom DB cluster parameter group, you need to create one to use with your Aurora MySQL DB cluster to complete the integration. For more information, see [DB cluster parameter groups for Amazon Aurora DB clusters](USER_WorkingWithDBClusterParamGroups.md).

   After creating your custom DB cluster parameter group and associating it with your Aurora MySQL DB cluster, you can continue following these steps.

   If your cluster uses a custom DB cluster parameter group, do as follows.

   1. In the Amazon RDS Console, open the **Configuration** tab of your Aurora MySQL DB cluster.

   1. Locate the DB cluster parameter group configured for your cluster. Choose the link to open your custom DB cluster parameter group, then choose **Edit**.

   1. Find the `aws_default_comprehend_role` parameter in your custom DB cluster parameter group.

   1. In the **Value** field, enter the ARN of the IAM role.

   1. Choose **Save changes** to save the setting. In the following image, you can find an example.  
![\[Adding the ARN of the IAM role to the Aurora MySQL's custom DB cluster parameter group.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-cmp-add-arn-to-custom-db-cluster-params.png)

   Reboot the primary instance of your Aurora MySQL DB cluster so that this parameter setting takes effect.

The IAM integration for Amazon Comprehend is complete. Continue setting up your Aurora MySQL DB cluster to work with Amazon Comprehend by granting access to the appropriate database users.

### Setting up your Aurora MySQL DB cluster to use SageMaker AI
<a name="aurora-ml-setting-up-access-sagemaker"></a>

The following procedure automatically creates the IAM role and policy for your Aurora MySQL DB cluster so that it can use SageMaker AI. Before trying to follow this procedure, be sure that you have the SageMaker AI endpoint available so that you can enter it when needed. Typically, data scientists on your team would do the work to produce an endpoint that you can use from your Aurora MySQL DB cluster. You can find such endpoints in the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/home). In the navigation pane, open the **Inference** menu and choose **Endpoints**. In the following image, you can find an example.

![\[Image showing SageMaker AI chosen for the Aurora MySQL DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-sm-endpoint-arn.png)


**To set up your Aurora MySQL DB cluster to use SageMaker AI**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Choose **Databases** from the Amazon RDS navigation menu and then choose the Aurora MySQL DB cluster that you want to connect to SageMaker AI services. 

1. Choose the **Connectivity & security** tab.

1. Scroll to the **Manage IAM roles** section, and then choose **Select a service to connect to this cluster**. Choose **SageMaker AI** from the selector.  
![\[Image showing SageMaker AI chosen for the Aurora MySQL DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-sm-iam-role-1.png)

1. Choose **Connect service**.

1. In the **Connect cluster to SageMaker AI** dialog, enter the ARN of the SageMaker AI endpoint.  
![\[Image showing the Amazon Resource Name (ARN) for SageMaker AI endpoint entered during the configuration process.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-sm-iam-role-3.png)

1. Aurora creates the IAM role. It also creates the policy that allows the Aurora MySQL DB cluster to use SageMaker AI services and attaches the policy to the role. When the process completes, you can find the role in the **Current IAM roles for this cluster** list.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Roles** from the Access management section of the AWS Identity and Access Management navigation menu.

1. Find the role from among those listed. Its name uses the following pattern.

   ```
   rds-sagemaker-your-cluster-name-role-auto-generated-digits
   ```

1. Open the role's Summary page and locate the ARN. Note the ARN or copy it using the copy widget.

1. Open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Choose your Aurora MySQL DB cluster, and then choose its **Configuration** tab.

1. Locate the DB cluster parameter group, and choose the link to open your custom DB cluster parameter group. Find the `aws_default_sagemaker_role` parameter and enter the ARN of the IAM role in the Value field and Save the setting. 

1. Reboot the primary instance of your Aurora MySQL DB cluster so that this parameter setting takes effect. 

The IAM setup is now complete. Continue setting up your Aurora MySQL DB cluster to work with SageMaker AI by granting access to the appropriate database users.

If you want to use your SageMaker AI models for training rather than using pre-built SageMaker AI components, you also need to add the Amazon S3 bucket to your Aurora MySQL DB cluster, as outlined in the [Setting up your Aurora MySQL DB cluster to use Amazon S3 for SageMaker AI (Optional)](#aurora-ml-setting-up-access-sagemaker-advanced-s3) that follows. 

#### Setting up your Aurora MySQL DB cluster to use Amazon S3 for SageMaker AI (Optional)
<a name="aurora-ml-setting-up-access-sagemaker-advanced-s3"></a>

To use SageMaker AI with your own models rather than using the pre-built components provided by SageMaker AI, you need to set up an Amazon S3 bucket for the Aurora MySQL DB cluster to use. For more information about creating an Amazon S3 bucket, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the *Amazon Simple Storage Service User Guide*. 

**To set up your Aurora MySQL DB cluster to use an Amazon S3 bucket for SageMaker AI**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Choose **Databases** from the Amazon RDS navigation menu and then choose the Aurora MySQL DB cluster that you want to connect to SageMaker AI services. 

1. Choose the **Connectivity & security** tab. 

1. Scroll to the **Manage IAM roles** section, and then choose **Select a service to connect to this cluster**. Choose **Amazon S3** from the selector.  
![\[Choosing Amazon S3 as the service to integrate with the Aurora MySQL DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-s3-1.png)

1. Choose **Connect service**.

1. In the **Connect cluster to Amazon S3** dialog, enter the ARN of the Amazon S3 bucket, as shown in the following image.  
![\[Image of ARN for Amazon S3 bucket specified for the Aurora MySQL DB cluster.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/ams-ml-s3-2.png)

1. Choose **Connect service** to complete this process.

For more information about using Amazon S3 buckets with SageMaker AI, see [Specify an Amazon S3 Bucket to Upload Training Datasets and Store Output Data](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex-bucket.html) in the *Amazon SageMaker AI Developer Guide*. To learn more about working with SageMaker AI, see [Get Started with Amazon SageMaker AI Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html)in the in the *Amazon SageMaker AI Developer Guide*.

### Granting database users access to Aurora machine learning
<a name="aurora-ml-sql-privileges"></a>

Database users must be granted permission to invoke Aurora machine learning functions. How you grant permission depends on the version of MySQL that you use for your Aurora MySQL DB cluster, as outlined in the following. How you do so depends on the version of MySQL that your Aurora MySQL DB cluster uses.
+ For Aurora MySQL version 3 (MySQL 8.0 compatible), database users must be granted the appropriate *database role*. For more informations, see [Using Roles](https://dev.mysql.com/doc/refman/8.0/en/roles.html) in the *MySQL 8.0 Reference Manual*.
+ For Aurora MySQL version 2 (MySQL 5.7 compatible), database users are granted *privileges*. For more information, see [Access Control and Account Management](https://dev.mysql.com/doc/refman/5.7/en/access-control.html) in the *MySQL 5.7 Reference Manual*.

The following table shows the roles and privileges that database users need to work with machine learning functions.


| Aurora MySQL version 3 (role) | Aurora MySQL version 2 (privilege) | 
| --- | --- | 
|  AWS\$1BEDROCK\$1ACCESS  |  –  | 
|  AWS\$1COMPREHEND\$1ACCESS  |  INVOKE COMPREHEND  | 
|  AWS\$1SAGEMAKER\$1ACCESS  |  INVOKE SAGEMAKER  | 

#### Granting access to Amazon Bedrock functions
<a name="aurora-ml-sql-privileges.br"></a>

To give database users access to Amazon Bedrock functions, use the following SQL statement:

```
GRANT AWS_BEDROCK_ACCESS TO user@domain-or-ip-address;
```

Database users also need to be granted `EXECUTE` permissions for the functions that you create for working with Amazon Bedrock:

```
GRANT EXECUTE ON FUNCTION database_name.function_name TO user@domain-or-ip-address;
```

Finally, database users must have their roles set to `AWS_BEDROCK_ACCESS`:

```
SET ROLE AWS_BEDROCK_ACCESS;
```

The Amazon Bedrock functions are now available for use.

#### Granting access to Amazon Comprehend functions
<a name="aurora-ml-sql-privileges.cmp"></a>

To give database users access to Amazon Comprehend functions, use the appropriate statement for your Aurora MySQL version.
+ Aurora MySQL version 3 (MySQL 8.0 compatible)

  ```
  GRANT AWS_COMPREHEND_ACCESS TO user@domain-or-ip-address;
  ```
+ Aurora MySQL version 2 (MySQL 5.7 compatible)

  ```
  GRANT INVOKE COMPREHEND ON *.* TO user@domain-or-ip-address;
  ```

The Amazon Comprehend functions are now available for use. For usage examples, see [Using Amazon Comprehend with your Aurora MySQL DB cluster](#using-amazon-comprehend-for-sentiment-detection).

#### Granting access to SageMaker AI functions
<a name="aurora-ml-sql-privileges.sm"></a>

To give database users access to SageMaker AI functions, use the appropriate statement for your Aurora MySQL version.
+ Aurora MySQL version 3 (MySQL 8.0 compatible)

  ```
  GRANT AWS_SAGEMAKER_ACCESS TO user@domain-or-ip-address;
  ```
+ Aurora MySQL version 2 (MySQL 5.7 compatible)

  ```
  GRANT INVOKE SAGEMAKER ON *.* TO user@domain-or-ip-address;
  ```

Database users also need to be granted `EXECUTE` permissions for the functions that you create for working with SageMaker AI. Suppose that you created two functions, `db1.anomoly_score` and `db2.company_forecasts`, to invoke the services of your SageMaker AI endpoint. You grant execute privileges as shown in the following example.

```
GRANT EXECUTE ON FUNCTION db1.anomaly_score TO user1@domain-or-ip-address1;
GRANT EXECUTE ON FUNCTION db2.company_forecasts TO user2@domain-or-ip-address2;
```

The SageMaker AI functions are now available for use. For usage examples, see [Using SageMaker AI with your Aurora MySQL DB cluster](#using-amazon-sagemaker-to-run-your-own-ml-models).

## Using Amazon Bedrock with your Aurora MySQL DB cluster
<a name="using-amazon-bedrock"></a>

To use Amazon Bedrock, you create a user-defined function (UDF) in your Aurora MySQL database that invokes a model. For more information, see [Supported models in Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html) in the *Amazon Bedrock User Guide*.

A UDF uses the following syntax:

```
CREATE FUNCTION function_name (argument type)
        [DEFINER = user]
    RETURNS mysql_data_type
    [SQL SECURITY {DEFINER | INVOKER}]
    ALIAS AWS_BEDROCK_INVOKE_MODEL
    MODEL ID 'model_id'
    [CONTENT_TYPE 'content_type']
    [ACCEPT 'content_type']
    [TIMEOUT_MS timeout_in_milliseconds];
```
+ Amazon Bedrock functions don't support `RETURNS JSON`. You can use `CONVERT` or `CAST` to convert from `TEXT` to `JSON` if needed.
+ If you don't specify `CONTENT_TYPE` or `ACCEPT`, the default is `application/json`.
+ If you don't specify `TIMEOUT_MS`, the value for `aurora_ml_inference_timeout` is used.

For example, the following UDF invokes the Amazon Titan Text Express model:

```
CREATE FUNCTION invoke_titan (request_body TEXT)
    RETURNS TEXT
    ALIAS AWS_BEDROCK_INVOKE_MODEL
    MODEL ID 'amazon.titan-text-express-v1'
    CONTENT_TYPE 'application/json'
    ACCEPT 'application/json';
```

To allow a DB user to use this function, use the following SQL command:

```
GRANT EXECUTE ON FUNCTION database_name.invoke_titan TO user@domain-or-ip-address;
```

Then the user can call `invoke_titan` like any other function, as shown in the following example. Make sure to format the request body according to the [Amazon Titan text models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-text.html).

```
CREATE TABLE prompts (request varchar(1024));
INSERT INTO prompts VALUES (
'{
    "inputText": "Generate synthetic data for daily product sales in various categories - include row number, product name, category, date of sale and price. Produce output in JSON format. Count records and ensure there are no more than 5.",
    "textGenerationConfig": {
        "maxTokenCount": 1024,
        "stopSequences": [],
        "temperature":0,
        "topP":1
    }
}');

SELECT invoke_titan(request) FROM prompts;

{"inputTextTokenCount":44,"results":[{"tokenCount":296,"outputText":"
```tabular-data-json
{
    "rows": [
        {
            "Row Number": "1",
            "Product Name": "T-Shirt",
            "Category": "Clothing",
            "Date of Sale": "2024-01-01",
            "Price": "$20"
        },
        {
            "Row Number": "2",
            "Product Name": "Jeans",
            "Category": "Clothing",
            "Date of Sale": "2024-01-02",
            "Price": "$30"
        },
        {
            "Row Number": "3",
            "Product Name": "Hat",
            "Category": "Accessories",
            "Date of Sale": "2024-01-03",
            "Price": "$15"
        },
        {
            "Row Number": "4",
            "Product Name": "Watch",
            "Category": "Accessories",
            "Date of Sale": "2024-01-04",
            "Price": "$40"
        },
        {
            "Row Number": "5",
            "Product Name": "Phone Case",
            "Category": "Accessories",
            "Date of Sale": "2024-01-05",
            "Price": "$25"
        }
    ]
}
```","completionReason":"FINISH"}]}
```

For other models that you use, make sure to format the request body appropriately for them. For more information, see [Inference parameters for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html) in the *Amazon Bedrock User Guide*.

## Using Amazon Comprehend with your Aurora MySQL DB cluster
<a name="using-amazon-comprehend-for-sentiment-detection"></a>

For Aurora MySQL, Aurora machine learning provides the following two built-in functions for working with Amazon Comprehend and your text data. You provide the text to analyze (`input_data`) and specify the language (`language_code`). 

**aws\$1comprehend\$1detect\$1sentiment**  
This function identifies the text as having a positive, negative, neutral, or mixed emotional posture. This function's reference documentation is as follows.  

```
aws_comprehend_detect_sentiment(
  input_text,
  language_code
  [,max_batch_size]
)
```
To learn more, see [Sentiment](https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html) in the *Amazon Comprehend Developer Guide*.

**aws\$1comprehend\$1detect\$1sentiment\$1confidence**  
This function measures the confidence level of the sentiment detected for a given text. It returns a value (type, `double`) that indicates the confidence of the sentiment assigned by the aws\$1comprehend\$1detect\$1sentiment function to the text. Confidence is a statistical metric between 0 and 1. The higher the confidence level, the more weight you can give the result. A summary of the function's documentation is as follows.  

```
aws_comprehend_detect_sentiment_confidence(
  input_text,
  language_code
  [,max_batch_size]
)
```

In both functions (aws\$1comprehend\$1detect\$1sentiment\$1confidence, aws\$1comprehend\$1detect\$1sentiment) the `max_batch_size` uses a default value of 25 if none is specified. Batch size should always be greater than 0. You can use `max_batch_size` to tune the performance of the Amazon Comprehend function calls. A large batch size trades off faster performance for greater memory usage on the Aurora MySQL DB cluster. For more information, see [Performance considerations for using Aurora machine learning with Aurora MySQL](#aurora-ml-performance). 

 For more information about parameters and return types for the sentiment detection functions in Amazon Comprehend, see [DetectSentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_DetectSentiment.html) 

**Example: A simple query using Amazon Comprehend functions**  
Here's an example of a simple query that invokes these two functions to see how happy your customers are with your support team. Suppose you have a database table (`support`) that stores customer feedback after each request for help. This example query applies both built-in functions to the text in the `feedback` column of the table and outputs the results. The confidence values returned by the function are doubles between 0.0 and 1.0. For more readable output, this query rounds the results to 6 decimal points. For easier comparisons, this query also sorts the results in descending order, from the result having the highest degree of confidence, first.   

```
SELECT feedback AS 'Customer feedback',
    aws_comprehend_detect_sentiment(feedback, 'en') AS Sentiment,
    ROUND(aws_comprehend_detect_sentiment_confidence(feedback, 'en'), 6)
    AS Confidence FROM support
    ORDER BY Confidence DESC;
+----------------------------------------------------------+-----------+------------+
| Customer feedback                                        | Sentiment | Confidence |
+----------------------------------------------------------+-----------+------------+
| Thank you for the excellent customer support!            | POSITIVE  |   0.999771 |
| The latest version of this product stinks!               | NEGATIVE  |   0.999184 |
| Your support team is just awesome! I am blown away.      | POSITIVE  |   0.997774 |
| Your product is too complex, but your support is great.  | MIXED     |   0.957958 |
| Your support tech helped me in fifteen minutes.          | POSITIVE  |   0.949491 |
| My problem was never resolved!                           | NEGATIVE  |   0.920644 |
| When will the new version of this product be released?   | NEUTRAL   |   0.902706 |
| I cannot stand that chatbot.                             | NEGATIVE  |   0.895219 |
| Your support tech talked down to me.                     | NEGATIVE  |   0.868598 |
| It took me way too long to get a real person.            | NEGATIVE  |   0.481805 |
+----------------------------------------------------------+-----------+------------+
 10 rows in set (0.1898 sec)
```

**Example: Determining the average sentiment for text above a specific confidence level**  
A typical Amazon Comprehend query looks for rows where the sentiment is a certain value, with a confidence level greater than a certain number. For example, the following query shows how you can determine the average sentiment of documents in your database. The query considers only documents where the confidence of the assessment is at least 80%.   

```
SELECT AVG(CASE aws_comprehend_detect_sentiment(productTable.document, 'en')
  WHEN 'POSITIVE' THEN 1.0
  WHEN 'NEGATIVE' THEN -1.0
  ELSE 0.0 END) AS avg_sentiment, COUNT(*) AS total
FROM productTable
WHERE productTable.productCode = 1302 AND
    aws_comprehend_detect_sentiment_confidence(productTable.document, 'en') >= 0.80;
```

## Using SageMaker AI with your Aurora MySQL DB cluster
<a name="using-amazon-sagemaker-to-run-your-own-ml-models"></a>

To use SageMaker AI functionality from your Aurora MySQL DB cluster, you need to create stored functions that embed your calls to the SageMaker AI endpoint and its inference features. You do so by using MySQL's `CREATE FUNCTION` in generally the same way that you do for other processing tasks on your Aurora MySQL DB cluster.

To use models deployed in SageMaker AI for inference, you create user-defined functions using MySQL data definition language (DDL) statements for stored functions. Each stored function represents the SageMaker AI endpoint hosting the model. When you define such a function, you specify the input parameters to the model, the specific SageMaker AI endpoint to invoke, and the return type. The function returns the inference computed by the SageMaker AI endpoint after applying the model to the input parameters.

All Aurora machine learning stored functions return numeric types or `VARCHAR`. You can use any numeric type except `BIT`. Other types, such as `JSON`, `BLOB`, `TEXT`, and `DATE` aren't allowed.

The following example shows the `CREATE FUNCTION` syntax for working with SageMaker AI.

```
CREATE FUNCTION function_name (
     arg1 type1, 
     arg2 type2, ...)
     [DEFINER = user]
  RETURNS mysql_type
  [SQL SECURITY { DEFINER | INVOKER } ]
  ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT
  ENDPOINT NAME 'endpoint_name'
  [MAX_BATCH_SIZE max_batch_size];
```

This is an extension of the regular `CREATE FUNCTION` DDL statement. In the `CREATE FUNCTION` statement that defines the SageMaker AI function, you don't specify a function body. Instead, you specify the keyword `ALIAS` where the function body usually goes. Currently, Aurora machine learning only supports `aws_sagemaker_invoke_endpoint` for this extended syntax. You must specify the `endpoint_name` parameter. An SageMaker AI endpoint can have different characteristics for each model. 

**Note**  
For more information about `CREATE FUNCTION`, see [CREATE PROCEDURE and CREATE FUNCTION Statements](https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html) in the MySQL 8.0 Reference Manual.

The `max_batch_size` parameter is optional. By default, maximum batch size is 10,000. You can use this parameter in your function to restrict the maximum number of inputs processed in a batched request to SageMaker AI. The `max_batch_size` parameter can help to avoid an error caused by inputs that are too large, or to make SageMaker AI return a response more quickly. This parameter affects the size of an internal buffer used for SageMaker AI request processing. Specifying too large a value for `max_batch_size` might cause substantial memory overhead on your DB instance.

We recommend that you leave the `MANIFEST` setting at its default value of `OFF`. Although you can use the `MANIFEST ON` option, some SageMaker AI features can't directly use the CSV exported with this option. The manifest format is not compatible with the expected manifest format from SageMaker AI. 

 You create a separate stored function for each of your SageMaker AI models. This mapping of functions to models is required because an endpoint is associated with a specific model, and each model accepts different parameters. Using SQL types for the model inputs and the model output type helps to avoid type conversion errors passing data back and forth between the AWS services. You can control who can apply the model. You can also control the runtime characteristics by specifying a parameter representing the maximum batch size. 

Currently, all Aurora machine learning functions have the `NOT DETERMINISTIC` property. If you don't specify that property explicitly, Aurora sets `NOT DETERMINISTIC` automatically. This requirement is because the SageMaker AI model can be changed without any notification to the database. If that happens, calls to an Aurora machine learning function might return different results for the same input within a single transaction. 

You can't use the characteristics `CONTAINS SQL`, `NO SQL`, `READS SQL DATA`, or `MODIFIES SQL DATA` in your `CREATE FUNCTION` statement. 

Following is an example usage of invoking an SageMaker AI endpoint to detect anomalies. There is an SageMaker AI endpoint `random-cut-forest-model`. The corresponding model is already trained by the `random-cut-forest` algorithm. For each input, the model returns an anomaly score. This example shows the data points whose score is greater than 3 standard deviations (approximately the 99.9th percentile) from the mean score. 

```
CREATE FUNCTION anomaly_score(value real) returns real
  alias aws_sagemaker_invoke_endpoint endpoint name 'random-cut-forest-model-demo';

set @score_cutoff = (select avg(anomaly_score(value)) + 3 * std(anomaly_score(value)) from nyc_taxi);

select *, anomaly_detection(value) score from nyc_taxi
  where anomaly_detection(value) > @score_cutoff;
```

### Character set requirement for SageMaker AI functions that return strings
<a name="note-character-set-of-ml-functions-return-type"></a>

 We recommend specifying a character set of `utf8mb4` as the return type for your SageMaker AI functions that return string values. If that isn't practical, use a large enough string length for the return type to hold a value represented in the `utf8mb4` character set. The following example shows how to declare the `utf8mb4` character set for your function. 

```
CREATE FUNCTION my_ml_func(...) RETURNS VARCHAR(5) CHARSET utf8mb4 ALIAS ...
```

Currently, each SageMaker AI function that returns a string uses the character set `utf8mb4` for the return value. The return value uses this character set even if your SageMaker AI function declares a different character set for its return type implicitly or explicitly. If your SageMaker AI function declares a different character set for the return value, the returned data might be silently truncated if you store it in a table column that isn't long enough. For example, a query with a `DISTINCT` clause creates a temporary table. Thus, the SageMaker AI function result might be truncated due to the way strings are handled internally during a query. 

### Exporting data to Amazon S3 for SageMaker AI model training (Advanced)
<a name="exporting-data-to-s3-for-model-training"></a>

We recommend that you get started with Aurora machine learning and SageMaker AI by using some of the provided algorithms, and that the data scientists on your team provide you with the SageMaker AI endpoints that you can use with your SQL code. In the following, you can find minimal information about using your own Amazon S3 bucket with your your own SageMaker AI models and your Aurora MySQL DB cluster. 

Machine learning consists of two major steps: training, and inference. To train SageMaker AI models, you export data to an Amazon S3 bucket. The Amazon S3 bucket is used by a Jupyter SageMaker AI notebook instance to train your model before it is deployed. You can use the `SELECT INTO OUTFILE S3` statement to query data from an Aurora MySQL DB cluster and save it directly into text files stored in an Amazon S3 bucket. Then the notebook instance consumes the data from the Amazon S3 bucket for training. 

 Aurora machine learning extends the existing `SELECT INTO OUTFILE` syntax in Aurora MySQL to export data to CSV format. The generated CSV file can be directly consumed by models that need this format for training purposes. 

```
SELECT * INTO OUTFILE S3 's3_uri' [FORMAT {CSV|TEXT} [HEADER]] FROM table_name;
```

 The extension supports the standard CSV format. 
+  Format `TEXT` is the same as the existing MySQL export format. This is the default format. 
+  Format `CSV` is a newly introduced format that follows the specification in [RFC-4180](https://tools.ietf.org/html/rfc4180). 
+  If you specify the optional keyword `HEADER`, the output file contains one header line. The labels in the header line correspond to the column names from the `SELECT` statement. 
+  You can still use the keywords `CSV` and `HEADER` as identifiers. 

 The extended syntax and grammar of `SELECT INTO` is now as follows: 

```
INTO OUTFILE S3 's3_uri'
[CHARACTER SET charset_name]
[FORMAT {CSV|TEXT} [HEADER]]
[{FIELDS | COLUMNS}
  [TERMINATED BY 'string']
  [[OPTIONALLY] ENCLOSED BY 'char']
  [ESCAPED BY 'char']
]
[LINES
  [STARTING BY 'string']
  [TERMINATED BY 'string']
]
```

## Performance considerations for using Aurora machine learning with Aurora MySQL
<a name="aurora-ml-performance"></a>

The Amazon Bedrock, Amazon Comprehend, and SageMaker AI services do most of the work when invoked by an Aurora machine learning function. That means that you can scale those resources as needed, independently. For your Aurora MySQL DB cluster, you can make your function calls as efficient as possible. Following, you can find some performance considerations to note when working with Aurora machine learning.

### Model and prompt
<a name="ml-model-prompt"></a>

Performance when using Amazon Bedrock is highly dependent on the model and prompt that you use. Choose a model and prompt that are optimal for your use case.

### Query cache
<a name="ml-query-cache"></a>

The Aurora MySQL query cache doesn't work for Aurora machine learning functions. Aurora MySQL doesn't store query results in the query cache for any SQL statements that call Aurora machine learning functions.

### Batch optimization for Aurora machine learning function calls
<a name="ml-batch-optimization"></a>

The main Aurora machine learning performance aspect that you can influence from your Aurora cluster is the batch mode setting for calls to the Aurora machine learning stored functions. Machine learning functions typically require substantial overhead, making it impractical to call an external service separately for each row. Aurora machine learning can minimize this overhead by combining the calls to the external Aurora machine learning service for many rows into a single batch. Aurora machine learning receives the responses for all the input rows, and delivers the responses, one row at a time, to the query as it runs. This optimization improves the throughput and latency of your Aurora queries without changing the results.

When you create an Aurora stored function that's connected to an SageMaker AI endpoint, you define the batch size parameter. This parameter influences how many rows are transferred for every underlying call to SageMaker AI. For queries that process large numbers of rows, the overhead to make a separate SageMaker AI call for each row can be substantial. The larger the data set processed by the stored procedure, the larger you can make the batch size. 

If the batch mode optimization can be applied to an SageMaker AI function, you can tell by checking the query plan produced by the `EXPLAIN PLAN` statement. In this case, the `extra` column in the execution plan includes `Batched machine learning`. The following example shows a call to an SageMaker AI function that uses batch mode.

```
mysql> CREATE FUNCTION anomaly_score(val real) returns real alias aws_sagemaker_invoke_endpoint endpoint name 'my-rcf-model-20191126';
Query OK, 0 rows affected (0.01 sec)

mysql> explain select timestamp, value, anomaly_score(value) from nyc_taxi;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | nyc_taxi | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   48 |   100.00 | Batched machine learning |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)
```

When you call one of the built-in Amazon Comprehend functions, you can control the batch size by specifying the optional `max_batch_size` parameter. his parameter restricts the maximum number of `input_text` values processed in each batch. By sending multiple items at once, it reduces the number of round trips between Aurora and Amazon Comprehend. Limiting the batch size is useful in situations such as a query with a `LIMIT` clause. By using a small value for `max_batch_size`, you can avoid invoking Amazon Comprehend more times than you have input texts.

The batch optimization for evaluating Aurora machine learning functions applies in the following cases:
+ Function calls within the select list or the `WHERE` clause of `SELECT` statements
+ Function calls in the `VALUES` list of `INSERT` and `REPLACE` statements
+ SageMaker AI functions in `SET` values in `UPDATE` statements:

  ```
  INSERT INTO MY_TABLE (col1, col2, col3) VALUES
    (ML_FUNC(1), ML_FUNC(2), ML_FUNC(3)),
    (ML_FUNC(4), ML_FUNC(5), ML_FUNC(6));
  UPDATE MY_TABLE SET col1 = ML_FUNC(col2), SET col3 = ML_FUNC(col4) WHERE ...;
  ```

## Monitoring Aurora machine learning
<a name="aurora-ml-monitoring"></a>

You can monitor Aurora machine learning batch operations by querying several global variables, as shown in the following example.

```
show status like 'Aurora_ml%';
```

You can reset the status variables by using a `FLUSH STATUS` statement. Thus, all of the figures represent totals, averages, and so on, since the last time the variable was reset.

`Aurora_ml_logical_request_cnt`  
The number of logical requests that the DB instance has evaluated to be sent to the Aurora machine learning services since the last status reset. Depending on whether batching has been used, this value can be higher than `Aurora_ml_actual_request_cnt`.

`Aurora_ml_logical_response_cnt`  
The aggregate response count that Aurora MySQL receives from the Aurora machine learning services across all queries run by users of the DB instance.

`Aurora_ml_actual_request_cnt`  
The aggregate request count that Aurora MySQL makes to the Aurora machine learning services across all queries run by users of the DB instance.

`Aurora_ml_actual_response_cnt`  
The aggregate response count that Aurora MySQL receives from the Aurora machine learning services across all queries run by users of the DB instance.

`Aurora_ml_cache_hit_cnt`  
The aggregate internal cache hit count that Aurora MySQL receives from the Aurora machine learning services across all queries run by users of the DB instance.

`Aurora_ml_retry_request_cnt`  
The number of retried requests that the DB instance has sent to the Aurora machine learning services since the last status reset.

`Aurora_ml_single_request_cnt`  
The aggregate count of Aurora machine learning functions that are evaluated by non-batch mode across all queries run by users of the DB instance. 

For information about monitoring the performance of the SageMaker AI operations called from Aurora machine learning functions, see [Monitor Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-overview.html). 

# Using Amazon Aurora machine learning with Aurora PostgreSQL
<a name="postgresql-ml"></a>

By using Amazon Aurora machine learning with your Aurora PostgreSQL DB cluster, you can use Amazon Comprehend or Amazon SageMaker AI or Amazon Bedrock, depending on your needs. These services each support specific machine learning use cases.

Aurora machine learning is supported in certain AWS Regions and for specific versions of Aurora PostgreSQL only. Before trying to set up Aurora machine learning, check availability for your Aurora PostgreSQL version and your Region. For details, see [Aurora machine learning with Aurora PostgreSQL](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Aurora_ML.md#Concepts.Aurora_Fea_Regions_DB-eng.Feature.Aurora_ML.apg). 

**Topics**
+ [Requirements for using Aurora machine learning with Aurora PostgreSQL](#postgresql-ml-prereqs)
+ [Supported features and limitations of Aurora machine learning with Aurora PostgreSQL](#postgresql-ml-supported-fea-limitations)
+ [Setting up your Aurora PostgreSQL DB cluster to use Aurora machine learning](#postgresql-ml-setting-up)
+ [Using Amazon Bedrock with your Aurora PostgreSQL DB cluster](#postgresql-using-bedrock)
+ [Using Amazon Comprehend with your Aurora PostgreSQL DB cluster](#postgresql-using-comprehend)
+ [Using SageMaker AI with your Aurora PostgreSQL DB cluster](#postgresql-using-sagemaker)
+ [Exporting data to Amazon S3 for SageMaker AI model training (Advanced)](#postgresql-export-to-s3)
+ [Performance considerations for using Aurora machine learning with Aurora PostgreSQL](#postgresql-ml-best-practice)
+ [Monitoring Aurora machine learning](#postgresql-ml-monitoring)

## Requirements for using Aurora machine learning with Aurora PostgreSQL
<a name="postgresql-ml-prereqs"></a>

AWS machine learning services are managed services that are set up and run in their own production environments. Aurora machine learning supports integration with Amazon Comprehend, SageMaker AI, and Amazon Bedrock. Before trying to set up your Aurora PostgreSQL DB cluster to use Aurora machine learning, be sure you understand the following requirements and prerequisites.
+ The Amazon Comprehend, SageMaker AI, and Amazon Bedrock services must be running in the same AWS Region as your Aurora PostgreSQL DB cluster. You can't use Amazon Comprehend or SageMaker AI or Amazon Bedrock services from an Aurora PostgreSQL DB cluster in a different Region.
+ If your Aurora PostgreSQL DB cluster is in a different virtual public cloud (VPC) based on the Amazon VPC service than your Amazon Comprehend and SageMaker AI services, the VPC's Security group needs to allow outbound connections to the target Aurora machine learning service. For more information, see [Enabling network communication from Amazon Aurora to other AWS services](AuroraMySQL.Integrating.Authorizing.Network.md).
+ For SageMaker AI, the machine learning components that you want to use for inferences must be set up and ready to use. During the configuration process for your Aurora PostgreSQL DB cluster, you need to have the Amazon Resource Name (ARN) of the SageMaker AI endpoint available. The data scientists on your team are likely best able to handle working with SageMaker AI to prepare the models and handle the other such tasks. To get started with Amazon SageMaker AI, see [Get Started with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html). For more information about inferences and endpoints, see [Real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html).
+ For Amazon Bedrock, you need to have the model ID of the Bedrock models that you want to use for inferences available during the configuration process of your Aurora PostgreSQL DB cluster. The data scientists on your team are likely best able to work with Bedrock to decide which models to use, fine tune them if needed and handle other such tasks. To get started with Amazon Bedrock, see [How to setup Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html).
+ Amazon Bedrock users need to request access to models before they are available for use. If you want to add additional models for text, chat, and image generation, you need to request access to models in Amazon Bedrock. For more information, see [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html).

## Supported features and limitations of Aurora machine learning with Aurora PostgreSQL
<a name="postgresql-ml-supported-fea-limitations"></a>

Aurora machine learning supports any SageMaker AI endpoint that can read and write the comma-separated value (CSV) format through a `ContentType` value of `text/csv`. The built-in SageMaker AI algorithms that currently accept this format are the following.
+ Linear Learner
+ Random Cut Forest
+ XGBoost

  To learn more about these algorithms, see [Choose an Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algorithms-choose.html) in the *Amazon SageMaker AI Developer Guide*.

When using Amazon Bedrock with Aurora machine learning, the following limitations apply: 
+ The user-defined functions (UDFs) provide a native way to interact with Amazon Bedrock. The UDFs don't have specific request or response requirements, so they can use any model.
+ You can use UDFs to build any work flow desired. For example, you can combine base primitives such as `pg_cron` to run a query, fetch data, generate inferences, and write to tables to serve queries directly.
+ UDFs don't support batched or parallel calls.
+ The Aurora Machine Learning extension doesn't support vector interfaces. As part of the extension, a function is available to output the embeddings of model’s response in the `float8[]` format to store those embeddings in Aurora. For more information on the usage of `float8[]`, see [Using Amazon Bedrock with your Aurora PostgreSQL DB cluster](#postgresql-using-bedrock).

## Setting up your Aurora PostgreSQL DB cluster to use Aurora machine learning
<a name="postgresql-ml-setting-up"></a>

For Aurora machine learning to work with your Aurora PostgreSQL DB cluster, you need to create an AWS Identity and Access Management (IAM) role for each of the services that you want to use. The IAM role allows your Aurora PostgreSQL DB cluster to use the Aurora machine learning service on the cluster's behalf. You also need to install the Aurora machine learning extension. In the following topics, you can find setup procedures for each of these Aurora machine learning services. 

**Contents**
+ [Setting up Aurora PostgreSQL to use Amazon Bedrock](#postgresql-ml-setting-up-apg-br)
+ [Setting up Aurora PostgreSQL to use Amazon Comprehend](#postgresql-ml-setting-up-apg-cmp)
+ [Setting up Aurora PostgreSQL to use Amazon SageMaker AI](#postgresql-ml-setting-up-apg-sagemaker)
  + [Setting up Aurora PostgreSQL to use Amazon S3 for SageMaker AI (Advanced)](#postgresql-ml-setting-up-apg-sagemaker-advanced-option-s3)
+ [Installing the Aurora machine learning extension](#postgresql-ml-aws_ml-install)

### Setting up Aurora PostgreSQL to use Amazon Bedrock
<a name="postgresql-ml-setting-up-apg-br"></a>

In the procedure following, you first create the IAM role and policy that gives your Aurora PostgreSQL permission to use Amazon Bedrock on the cluster's behalf. You then attach the policy to an IAM role that your Aurora PostgreSQL DB cluster uses to work with Amazon Bedrock. For simplicity's sake, this procedure uses the AWS Management Console to complete all tasks.

**To set up your Aurora PostgreSQL DB cluster to use Amazon Bedrock**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Policies** (under Access management) on the AWS Identity and Access Management (IAM) Console menu. 

   1. Choose **Create policy**. In the Visual editor page, choose **Service** and then enter **Bedrock** in the Select a service field. Expand the Read access level. Choose **InvokeModel** from the Amazon Bedrock read settings.

   1. Choose the Foundation/Provisioned model you want to grant read access via the policy.  
![\[Creating the IAM policy to use for Amazon Bedrock.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-bedrock-policy-1.png)

1. Choose **Next: Tags** and define any tags (this is optional). Choose **Next: Review**. Enter a Name for the policy and description, as shown in the image.   
![\[Creating the policy for the IAM role to use for Amazon Bedrock.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-bedrock-policy-2.png)

1. Choose **Create policy**. The Console displays an alert when the policy has been saved. You can find it in the list of Policies.

1. Choose **Roles** (under Access management) on the IAM Console. 

1. Choose **Create role**. 

1. On the Select trusted entity page, choose the **AWS service** tile, and then choose **RDS** to open the selector. 

1. Choose **RDS – Add Role to Database**.   
![\[Creating a role for Aurora PostgreSQL to work with Amazon Bedrock.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-bedrock-policy-3.png)

1. Choose **Next**. On the Add permissions page, find the policy that you created in the previous step and choose it from among those listed. Choose **Next**.

1. **Next: Review**. Enter a name for the IAM role and a description.

1. Open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Navigate to the AWS Region where your Aurora PostgreSQL DB cluster is located.

1. In the navigation pane, choose **Databases**, and then choose the Aurora PostgreSQL DB cluster that you want to use with Bedrock.

1. Choose the **Connectivity & security** tab and scroll to find the **Manage IAM roles** section of the page. From the **Add IAM roles to this cluster** selector, choose the role that you created in the previous steps. In the **Feature** selector, choose Bedrock, and then choose **Add role**. 

The role (with its policy) are associated with the Aurora PostgreSQL DB cluster. When the process completes, the role is listed in the Current IAM roles for this cluster listing, as shown following.

![\[The role for Amazon Bedrock has been added to the Aurora PostgreSQL DB cluster and is now Active.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-bedrock-policy-4.png)


The IAM setup for Amazon Bedrock is complete. Continue setting up your Aurora PostgreSQL to work with Aurora machine learning by installing the extension as detailed in [Installing the Aurora machine learning extension](#postgresql-ml-aws_ml-install) 

### Setting up Aurora PostgreSQL to use Amazon Comprehend
<a name="postgresql-ml-setting-up-apg-cmp"></a>

In the procedure following, you first create the IAM role and policy that gives your Aurora PostgreSQL permission to use Amazon Comprehend on the cluster's behalf. You then attach the policy to an IAM role that your Aurora PostgreSQL DB cluster uses to work with Amazon Comprehend For simplicity's sake, this procedure uses the AWS Management Console to complete all tasks.

**To set up your Aurora PostgreSQL DB cluster to use Amazon Comprehend**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Policies** (under Access management) on the AWS Identity and Access Management (IAM) Console menu.   
![\[Creating the IAM policy to use for Amazon Comprehend.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-iam-role-policy-1.png)

1. Choose **Create policy**. In the Visual editor page, choose **Service** and then enter **Comprehend** in the Select a service field. Expand the Read access level. Choose **BatchDetectSentiment** and **DetectSentiment** from the Amazon Comprehend read settings.

1. Choose **Next: Tags** and define any tags (this is optional). Choose **Next: Review**. Enter a Name for the policy and description, as shown in the image.   
![\[Creating the policy for the IAM role to use for Amazon Comprehend.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-iam-policy-3.png)

1. Choose **Create policy**. The Console displays an alert when the policy has been saved. You can find it in the list of Policies.

1. Choose **Roles** (under Access management) on the IAM Console. 

1. Choose **Create role**. 

1. On the Select trusted entity page, choose the **AWS service** tile, and then choose **RDS** to open the selector. 

1. Choose **RDS – Add Role to Database**.   
![\[Creating the role for Aurora PostgreSQL to work with Amazon Comprehend.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-iam-role-policy-4.png)

1. Choose **Next**. On the Add permissions page, find the policy that you created in the previous step and choose it from among those listed. Choose **Next**

1. **Next: Review**. Enter a name for the IAM role and a description.

1. Open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Navigate to the AWS Region where your Aurora PostgreSQL DB cluster is located.

1. In the navigation pane, choose **Databases**, and then choose the Aurora PostgreSQL DB cluster that you want to use with Amazon Comprehend.

1. Choose the **Connectivity & security** tab and scroll to find the **Manage IAM roles** section of the page. From the **Add IAM roles to this cluster** selector, choose the role that you created in the previous steps. In the **Feature** selector, choose Comprehend, and then choose **Add role**. 

The role (with its policy) are associated with the Aurora PostgreSQL DB cluster. When the process completes, the role is listed in the Current IAM roles for this cluster listing, as shown following.

![\[The role for Amazon Comprehend has been added to the Aurora PostgreSQL DB cluster and is now Active.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-completed-iam-setup-both.png)


The IAM setup for Amazon Comprehend is complete. Continue setting up your Aurora PostgreSQL to work with Aurora machine learning by installing the extension as detailed in [Installing the Aurora machine learning extension](#postgresql-ml-aws_ml-install) 

### Setting up Aurora PostgreSQL to use Amazon SageMaker AI
<a name="postgresql-ml-setting-up-apg-sagemaker"></a>

Before you can create the IAM policy and role for your Aurora PostgreSQL DB cluster, you need to have your SageMaker AI model setup and your endpoint available.

**To set up your Aurora PostgreSQL DB cluster to use SageMaker AI**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Policies** (under Access management) on the AWS Identity and Access Management (IAM) Console menu, and then choose **Create policy**. In the Visual editor, choose **SageMaker** for the Service. For Actions, open the Read selector (under Access level) and choose **InvokeEndpoint**. When you this, a warning icon displays. 

1. Open the Resources selector and choose the **Add ARN to restrict access** link under the Specify endpoint resource ARN for the InvokeEndpoint action. 

1. Enter the AWS Region of your SageMaker AI resources and the name of your endpoint. Your AWS account is prefilled.   
![\[Specifying the ARN for the endpoint for the IAM Policy.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/apg-ml-create-policy-sm-endpoint.png)

1. Choose **Add** to save. Choose **Next: Tags** and **Next: Review** to get to the last page of the policy creation process. 

1. Enter a Name and Description for this policy, and then choose **Create policy**. The policy is created and is added to the Policies list. You see an alert in the Console as this occurs.

1. On the IAM Console, choose **Roles**.

1. Choose **Create role**. 

1. On the Select trusted entity page, choose the **AWS service** tile, and then choose **RDS** to open the selector. 

1. Choose **RDS – Add Role to Database**. 

1. Choose **Next**. On the Add permissions page, find the policy that you created in the previous step and choose it from among those listed. Choose **Next**

1. **Next: Review**. Enter a name for the IAM role and a description.

1. Open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. Navigate to the AWS Region where your Aurora PostgreSQL DB cluster is located.

1. In the navigation pane, choose **Databases**, and then choose the Aurora PostgreSQL DB cluster that you want to use with SageMaker AI.

1. Choose the **Connectivity & security** tab and scroll to find the **Manage IAM roles** section of the page. From the **Add IAM roles to this cluster** selector, choose the role that you created in the previous steps. In the **Feature** selector, choose SageMaker AI, and then choose **Add role**. 

The role (with its policy) are associated with the Aurora PostgreSQL DB cluster. When the process completes, the role is listed in the Current IAM roles for this cluster listing.

The IAM setup for SageMaker AI is complete. Continue setting up your Aurora PostgreSQL to work with Aurora machine learning by installing the extension as detailed in [Installing the Aurora machine learning extension](#postgresql-ml-aws_ml-install).

#### Setting up Aurora PostgreSQL to use Amazon S3 for SageMaker AI (Advanced)
<a name="postgresql-ml-setting-up-apg-sagemaker-advanced-option-s3"></a>

To use SageMaker AI with your own models rather than using the pre-built components provided by SageMaker AI, you need to set up an Amazon Simple Storage Service (Amazon S3) bucket for Aurora PostgreSQL DB cluster to use. This is an advanced topic, and isn't fully documented in this *Amazon Aurora User Guide*. The general process is the same as for integrating support for SageMaker AI, as follows. 

1. Create the IAM policy and role for Amazon S3. 

1. Add the IAM role and the Amazon S3 import or export as a feature on the Connectivity & security tab of your Aurora PostgreSQL DB cluster.

1. Add the ARN of the role to your custom DB cluster parameter group for your Aurora DB cluster.

For basic usage information, see [Exporting data to Amazon S3 for SageMaker AI model training (Advanced)](#postgresql-export-to-s3). 

### Installing the Aurora machine learning extension
<a name="postgresql-ml-aws_ml-install"></a>

The Aurora machine learning extensions `aws_ml 1.0` provides two functions that you can use to invoke Amazon Comprehend, SageMaker AI services and `aws_ml 2.0` provides two additional functions that you can use to invoke Amazon Bedrock services. Installing these extensions on your Aurora PostgreSQL DB cluster also creates an administrative role for the feature. 

**Note**  
Using these functions depends on having the IAM setup for the Aurora machine learning service (Amazon Comprehend, SageMaker AI, Amazon Bedrock) complete, as detailed in [Setting up your Aurora PostgreSQL DB cluster to use Aurora machine learning](#postgresql-ml-setting-up). 
+ **aws\$1comprehend.detect\$1sentiment** – You use this function to apply sentiment analysis to text stored in in the database on your Aurora PostgreSQL DB cluster.
+ **aws\$1sagemaker.invoke\$1endpoint** – You use this function in your SQL code to communicate with the SageMaker AI endpoint from your cluster. 
+ **aws\$1bedrock.invoke\$1model** – You use this function in your SQL code to communicate with the Bedrock Models from your cluster. The response of this function will be in the format of a TEXT, so if a model responds in the format of a JSON body then the output of this function will be relayed in the format of a string to the end user.
+ **aws\$1bedrock.invoke\$1model\$1get\$1embeddings** – You use this function in your SQL code to invoke Bedrock Models that return output embeddings within a JSON response. This can be leveraged when you want to extract the embeddings directly associated with the json-key to streamline the response with any self-managed workflows. 

**To install the Aurora machine learning extension in your Aurora PostgreSQL DB cluster**
+ Use `psql` to connect to the writer instance of your Aurora PostgreSQL DB cluster. Connect to the specific database in which to install the `aws_ml` extension. 

  ```
  psql --host=cluster-instance-1.111122223333.aws-region.rds.amazonaws.com --port=5432 --username=postgres --password --dbname=labdb
  ```

```
labdb=> CREATE EXTENSION IF NOT EXISTS aws_ml CASCADE;
NOTICE:  installing required extension "aws_commons"
CREATE EXTENSION
labdb=>
```

Installing the `aws_ml` extensions also creates the `aws_ml` administrative role and three new schemas, as follows. 
+ `aws_comprehend` – Schema for the Amazon Comprehend service and source of the `detect_sentiment` function (`aws_comprehend.detect_sentiment`).
+ `aws_sagemaker` – Schema for the SageMaker AI service and source of the `invoke_endpoint` function (`aws_sagemaker.invoke_endpoint`).
+ `aws_bedrock` – Schema for the Amazon Bedrock service and source of the `invoke_model(aws_bedrock.invoke_model)` and `invoke_model_get_embeddings(aws_bedrock.invoke_model_get_embeddings)` functions.

The `rds_superuser` role is granted the `aws_ml` administrative role and is made the `OWNER` of these three Aurora machine learning schemas. To allow other database users to access the Aurora machine learning functions, the `rds_superuser` needs to grant `EXECUTE` privileges on the Aurora machine learning functions. By default, `EXECUTE` privileges are revoked from `PUBLIC` on the functions in the two Aurora machine learning schemas. 

In a multi-tenant database configuration, you can prevent tenants from accessing Aurora machine learning functions by using `REVOKE USAGE` on the specific Aurora machine learning schema that you want to protect.

## Using Amazon Bedrock with your Aurora PostgreSQL DB cluster
<a name="postgresql-using-bedrock"></a>

For Aurora PostgreSQL, Aurora machine learning provides the following Amazon Bedrock function for working with your text data. This function is available only after you install the `aws_ml` 2.0 extension and complete all setup procedures. For more information, see [Setting up your Aurora PostgreSQL DB cluster to use Aurora machine learning](#postgresql-ml-setting-up). 

**aws\$1bedrock.invoke\$1model**  
This function takes text formatted in JSON as input and processes it for variety of models hosted on Amazon Bedrock and gets back the JSON text response from the model. This response could contain text, image, or embeddings. A summary of the function's documentation is as follows.  

```
aws_bedrock.invoke_model(
   IN model_id      varchar,
   IN content_type  text,
   IN accept_type   text,
   IN model_input   text,
   OUT model_output varchar)
```

The inputs and outputs of this function are as follows.
+ `model_id` – Identifier of the model. 
+ `content_type` – The type of the request to Bedrock’s model.
+ `accept_type` – The type of the response to expect from Bedrock’s model. Usually application/JSON for most of the models.
+ `model_input` – Prompts; a specific set of inputs to the model in the format as specified by content\$1type. For more information on the request format/structure the model accepts, see [Inference parameters for foundation models](https://docs.aws.amazon.com//bedrock/latest/userguide/model-parameters.html).
+ `model_output` – The Bedrock model's output as text.

The following example shows how to invoke a Anthropic Claude 2 model for Bedrock using invoke\$1model. 

**Example: A simple query using Amazon Bedrock functions**  

```
SELECT aws_bedrock.invoke_model (
    model_id    := 'anthropic.claude-v2',
    content_type:= 'application/json',
    accept_type := 'application/json',
    model_input := '{"prompt": "\n\nHuman: You are a helpful assistant that answers questions directly and only using the information provided in the context below.\nDescribe the answer 
    in detail.\n\nContext: %s \n\nQuestion: %s \n\nAssistant:","max_tokens_to_sample":4096,"temperature":0.5,"top_k":250,"top_p":0.5,"stop_sequences":[]}'
);
```

**aws\$1bedrock.invoke\$1model\$1get\$1embeddings**  
The model output can point to vector embeddings for some cases. Given the response varies per model, another function invoke\$1model\$1get\$1embeddings can be leveraged which works exactly like invoke\$1model but outputs the embeddings by specifying the appropriate json-key.  

```
aws_bedrock.invoke_model_get_embeddings(
   IN model_id      varchar,
   IN content_type  text,
   IN json_key      text,
   IN model_input   text,
   OUT model_output float8[])
```

The inputs and outputs of this function are as follows.
+ `model_id` – Identifier of the model.
+ `content_type` – The type of the request to Bedrock’s model. Here, the accept\$1type is set to default value `application/json`.
+ `model_input` – Prompts; a specific set of inputs to the Model in the format as specified by content\$1type. For more information on the request format/structure the Model accepts, see [Inference parameters for foundation models](https://docs.aws.amazon.com//bedrock/latest/userguide/model-parameters.html).
+ `json_key` – Reference to the field to extract the embedding from. This may vary if the embedding model changes.
+ `model_output` – The Bedrock model's output as an array of embeddings having 16 bit decimals.

The following example shows how to generate an embedding using the Titan Embeddings G1 – Text embedding model for the phrase PostgreSQL I/O monitoring views. 

**Example: A simple query using Amazon Bedrock functions**  

```
SELECT aws_bedrock.invoke_model_get_embeddings(
   model_id      := 'amazon.titan-embed-text-v1',
   content_type  := 'application/json',
   json_key      := 'embedding',
   model_input   := '{ "inputText": "PostgreSQL I/O monitoring views"}') AS embedding;
```

## Using Amazon Comprehend with your Aurora PostgreSQL DB cluster
<a name="postgresql-using-comprehend"></a>

For Aurora PostgreSQL, Aurora machine learning provides the following Amazon Comprehend function for working with your text data. This function is available only after you install the `aws_ml` extension and complete all setup procedures. For more information, see [Setting up your Aurora PostgreSQL DB cluster to use Aurora machine learning](#postgresql-ml-setting-up). 

**aws\$1comprehend.detect\$1sentiment**  
This function takes text as input and evaluates whether the text has a a positive, negative, neutral, or mixed emotional posture. It outputs this sentiment along with a confidence level for its evaluation. A summary of the function's documentation is as follows.  

```
aws_comprehend.detect_sentiment(
   IN input_text varchar,      
   IN language_code varchar,  
   IN max_rows_per_batch int, 
   OUT sentiment varchar, 
   OUT confidence real)
```

The inputs and outputs of this function are as follows.
+ `input_text` – The text to evaluate and to assign sentiment (negative, positive, neutral, mixed). 
+ `language_code` – The language of the `input_text` identified using the 2-letter ISO 639-1 identifier with regional subtag (as needed) or the ISO 639-2 three-letter code, as appropriate. For example, `en` is the code for English, `zh` is the code for simplified Chinese. For more information, see [Supported languages](https://docs.aws.amazon.com/comprehend/latest/dg/supported-languages.html#supported-languages-1) in the *Amazon Comprehend Developer Guide*.
+ `max_rows_per_batch` – The maximum number of rows per batch for batch-mode processing. For more information, see [Understanding batch mode and Aurora machine learning functions](#postgresql-ml-batch-mode).
+ `sentiment` – The sentiment of the input text, identified as POSITIVE, NEGATIVE, NEUTRAL, or MIXED. 
+ `confidence` – The degree of confidence in the accuracy of the specified `sentiment`. Values range from 0.0 to 1.0. 

In the following, you can find examples of how to use this function. 

**Example: A simple query using Amazon Comprehend functions**  
Here's an example of a simple query that invokes this function to assess customer satisfaction with your support team. Suppose you have a database table (`support`) that stores customer feedback after each request for help. This example query applies the `aws_comprehend.detect_sentiment` function to the text in the `feedback` column of the table and outputs the sentiment and the confidence level for that sentiment. This query also outputs results in descending order.   

```
SELECT feedback, s.sentiment,s.confidence 
    FROM support,aws_comprehend.detect_sentiment(feedback, 'en') s 
    ORDER BY s.confidence DESC;
 feedback                         | sentiment | confidence
 ----------------------------------------------------------+-----------+------------
 Thank you for the excellent customer support!            | POSITIVE  |   0.999771
 The latest version of this product stinks!               | NEGATIVE  |   0.999184
 Your support team is just awesome! I am blown away.      | POSITIVE  |   0.997774
 Your product is too complex, but your support is great.  | MIXED     |   0.957958
 Your support tech helped me in fifteen minutes.          | POSITIVE  |   0.949491
 My problem was never resolved!                           | NEGATIVE  |   0.920644
 When will the new version of this product be released?   | NEUTRAL   |   0.902706
 I cannot stand that chatbot.                             | NEGATIVE  |   0.895219
 Your support tech talked down to me.                     | NEGATIVE  |   0.868598
 It took me way too long to get a real person.            | NEGATIVE  |   0.481805
 
 (10 rows)
```

To avoid being charged for sentiment detection more than once per table row, you can materialize the results. Do this on the rows of interest. For example, the clinician's notes are being updated so that only those in French (`fr`) use the sentiment detection function.

```
UPDATE clinician_notes
SET sentiment = (aws_comprehend.detect_sentiment (french_notes, 'fr')).sentiment,
    confidence = (aws_comprehend.detect_sentiment (french_notes, 'fr')).confidence
WHERE
    clinician_notes.french_notes IS NOT NULL AND
    LENGTH(TRIM(clinician_notes.french_notes)) > 0 AND
    clinician_notes.sentiment IS NULL;
```

For more information on optimizing your function calls, see [Performance considerations for using Aurora machine learning with Aurora PostgreSQL](#postgresql-ml-best-practice). 

## Using SageMaker AI with your Aurora PostgreSQL DB cluster
<a name="postgresql-using-sagemaker"></a>

After setting up your SageMaker AI environment and integrating with Aurora PostgreSQL as outlined in [Setting up Aurora PostgreSQL to use Amazon SageMaker AI](#postgresql-ml-setting-up-apg-sagemaker), you can invoke operations by using the `aws_sagemaker.invoke_endpoint` function. The `aws_sagemaker.invoke_endpoint` function connects only to a model endpoint in the same AWS Region. If your database instance has replicas in multiple AWS Regions be sure that you setup and deploy each SageMaker AI model to every AWS Region. 

Calls to `aws_sagemaker.invoke_endpoint` are authenticated using the IAM role that you set up to associated your Aurora PostgreSQL DB cluster with the SageMaker AI service and the endpoint that you provided during the setup process. SageMaker AI model endpoints are scoped to an individual account and are not public. The `endpoint_name` URL doesn't contain the account ID. SageMaker AI determines the account ID from the authentication token that is supplied by the SageMaker AI IAM role of the database instance. 

**aws\$1sagemaker.invoke\$1endpoint**  
This function takes the SageMaker AI endpoint as input and the number of rows that should be processed as a batch. It also takes as input the various parameters expected by the SageMaker AI model endpoint. This function's reference documentation is as follows.  

```
aws_sagemaker.invoke_endpoint(
  IN endpoint_name varchar,   
  IN max_rows_per_batch int,
  VARIADIC model_input "any",
  OUT model_output varchar  
  )
```

The inputs and outputs of this function are as follows.
+ `endpoint_name` – An endpoint URL that is AWS Region–independent. 
+ `max_rows_per_batch` – The maximum number of rows per batch for batch-mode processing. For more information, see [Understanding batch mode and Aurora machine learning functions](#postgresql-ml-batch-mode). 
+ `model_input` – One or more input parameters for the model. These can be any data type needed by the SageMaker AI model. PostgreSQL allows you to specify up to 100 input parameters for a function. Array data types must be one-dimensional, but can contain as many elements as are expected by the SageMaker AI model. The number of inputs to a SageMaker AI model is limited only by the SageMaker AI 6 MB message size limit.
+ `model_output` – The SageMaker AI model's output as text.

### Creating a user-defined function to invoke a SageMaker AI model
<a name="postgresql-using-sagemaker-example1"></a>

Create a separate user-defined function to call `aws_sagemaker.invoke_endpoint` for each of your SageMaker AI models. Your user-defined function represents the SageMaker AI endpoint hosting the model. The `aws_sagemaker.invoke_endpoint` function runs within the user-defined function. User-defined functions provide many advantages:
+ You can give your SageMaker AI model its own name instead of only calling `aws_sagemaker.invoke_endpoint` for all of your SageMaker AI models. 
+ You can specify the model endpoint URL in just one place in your SQL application code.
+ You can control `EXECUTE` privileges to each Aurora machine learning function independently.
+ You can declare the model input and output types using SQL types. SQL enforces the number and type of arguments passed to your SageMaker AI model and performs type conversion if necessary. Using SQL types will also translate `SQL NULL` to the appropriate default value expected by your SageMaker AI model.
+ You can reduce the maximum batch size if you want to return the first few rows a little faster.

To specify a user-defined function, use the SQL data definition language (DDL) statement `CREATE FUNCTION`. When you define the function, you specify the following:
+ The input parameters to the model.
+ The specific SageMaker AI endpoint to invoke.
+ The return type. 

The user-defined function returns the inference computed by the SageMaker AI endpoint after running the model on the input parameters. The following example creates a user-defined function for an SageMaker AI model with two input parameters.

```
CREATE FUNCTION classify_event (IN arg1 INT, IN arg2 DATE, OUT category INT)
AS $$
    SELECT aws_sagemaker.invoke_endpoint (
        'sagemaker_model_endpoint_name', NULL,
        arg1, arg2                        -- model inputs are separate arguments
        )::INT                            -- cast the output to INT
$$ LANGUAGE SQL PARALLEL SAFE COST 5000;
```

Note the following:
+ The `aws_sagemaker.invoke_endpoint` function input can be one or more parameters of any data type. 
+ This example uses an INT output type. If you cast the output from a `varchar` type to a different type, then it must be cast to a PostgreSQL builtin scalar type such as `INTEGER`, `REAL`, `FLOAT`, or `NUMERIC`. For more information about these types, see [Data types](https://www.postgresql.org/docs/current/datatype.html) in the PostgreSQL documentation.
+ Specify `PARALLEL SAFE` to enable parallel query processing. For more information, see [Improving response times with parallel query processing](#postgresql-using-sagemaker-example-parallel).
+ Specify `COST 5000` to estimate the cost of running the function. Use a positive number giving the estimated run cost for the function, in units of `cpu_operator_cost`.

### Passing an array as input to a SageMaker AI model
<a name="postgresql-using-sagemaker-example-array"></a>

The `aws_sagemaker.invoke_endpoint` function can have up to 100 input parameters, which is the limit for PostgreSQL functions. If the SageMaker AI model requires more than 100 parameters of the same type, pass the model parameters as an array. 

The following example defines a function that passes an array as input to the SageMaker AI regression model. The output is cast to a `REAL` value.

```
CREATE FUNCTION regression_model (params REAL[], OUT estimate REAL)
AS $$
    SELECT aws_sagemaker.invoke_endpoint (
      'sagemaker_model_endpoint_name',
      NULL,
      params                            
      )::REAL
$$ LANGUAGE SQL PARALLEL SAFE COST 5000;
```

### Specifying batch size when invoking a SageMaker AI model
<a name="postgresql-using-sagemaker-example3"></a>

The following example creates a user-defined function for a SageMaker AI model that sets the batch size default to NULL. The function also allows you to provide a different batch size when you invoke it.

```
CREATE FUNCTION classify_event (
    IN event_type INT, IN event_day DATE, IN amount REAL, -- model inputs
    max_rows_per_batch INT DEFAULT NULL,  -- optional batch size limit
    OUT category INT)                     -- model output
AS $$
    SELECT aws_sagemaker.invoke_endpoint (
        'sagemaker_model_endpoint_name', max_rows_per_batch,
        event_type, event_day, COALESCE(amount, 0.0)
        )::INT              -- casts output to type INT
$$ LANGUAGE SQL PARALLEL SAFE COST 5000;
```

Note the following:
+ Use the optional `max_rows_per_batch` parameter to provide control of the number of rows for a batch-mode function invocation. If you use a value of NULL, then the query optimizer automatically chooses the maximum batch size. For more information, see [Understanding batch mode and Aurora machine learning functions](#postgresql-ml-batch-mode).
+ By default, passing NULL as a parameter's value is translated to an empty string before passing to SageMaker AI. For this example the inputs have different types.
+ If you have a non-text input, or text input that needs to default to some value other than an empty string, use the `COALESCE` statement. Use `COALESCE` to translate NULL to the desired null replacement value in the call to `aws_sagemaker.invoke_endpoint`. For the `amount` parameter in this example, a NULL value is converted to 0.0. 

### Invoking a SageMaker AI model that has multiple outputs
<a name="postgresql-using-sagemaker-example4"></a>

The following example creates a user-defined function for a SageMaker AI model that returns multiple outputs. Your function needs to cast the output of the `aws_sagemaker.invoke_endpoint` function to a corresponding data type. For example, you could use the built-in PostgreSQL point type for (x,y) pairs or a user-defined composite type. 

This user-defined function returns values from a model that returns multiple outputs by using a composite type for the outputs.

```
CREATE TYPE company_forecasts AS ( 
    six_month_estimated_return real,
    one_year_bankruptcy_probability float);
CREATE FUNCTION analyze_company (
    IN free_cash_flow NUMERIC(18, 6),
    IN debt NUMERIC(18,6),
    IN max_rows_per_batch INT DEFAULT NULL,
    OUT prediction company_forecasts) 
AS $$
    SELECT (aws_sagemaker.invoke_endpoint('endpt_name',
       max_rows_per_batch,free_cash_flow, debt))::company_forecasts;                                                                                                                  
$$ LANGUAGE SQL PARALLEL SAFE COST 5000;
```

For the composite type, use fields in the same order as they appear in the model output and cast the output of `aws_sagemaker.invoke_endpoint` to your composite type. The caller can extract the individual fields either by name or with PostgreSQL ".\$1" notation.

## Exporting data to Amazon S3 for SageMaker AI model training (Advanced)
<a name="postgresql-export-to-s3"></a>

We recommend that you become familiar with Aurora machine learning and SageMaker AI by using the provided algorithms and examples rather than trying to train your own models. For more information, see [Get Started with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html) 

To train SageMaker AI models, you export data to an Amazon S3 bucket. The Amazon S3 bucket is used by SageMaker AI to train your model before it is deployed. You can query data from an Aurora PostgreSQL DB cluster and save it directly into text files stored in an Amazon S3 bucket. Then SageMaker AI consumes the data from the Amazon S3 bucket for training. For more about SageMaker AI model training, see [ Train a model with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html).

**Note**  
When you create an Amazon S3 bucket for SageMaker AI model training or batch scoring, use `sagemaker` in the Amazon S3 bucket name. For more information, see [Specify a Amazon S3 Bucket to Upload Training Datasets and Store Output Data](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex-bucket.html) in the *Amazon SageMaker AI Developer Guide*.

For more information about exporting your data, see [Exporting data from an Aurora PostgreSQL DB cluster to Amazon S3](postgresql-s3-export.md).

## Performance considerations for using Aurora machine learning with Aurora PostgreSQL
<a name="postgresql-ml-best-practice"></a>

The Amazon Comprehend and SageMaker AI services do most of the work when invoked by an Aurora machine learning function. That means that you can scale those resources as needed, independently. For your Aurora PostgreSQL DB cluster, you can make your function calls as efficient as possible. Following, you can find some performance considerations to note when working with Aurora machine learning from Aurora PostgreSQL.

**Topics**
+ [Understanding batch mode and Aurora machine learning functions](#postgresql-ml-batch-mode)
+ [Improving response times with parallel query processing](#postgresql-using-sagemaker-example-parallel)
+ [Using materialized views and materialized columns](#postgresql-using-sagemaker-example-materialized)

### Understanding batch mode and Aurora machine learning functions
<a name="postgresql-ml-batch-mode"></a>

Typically, PostgreSQL runs functions one row at a time. Aurora machine learning can reduce this overhead by combining the calls to the external Aurora machine learning service for many rows into batches with an approach called *batch-mode execution*. In batch mode, Aurora machine learning receives the responses for a batch of input rows, and then delivers the responses back to the running query one row at a time. This optimization improves the throughput of your Aurora queries without limiting the PostgreSQL query optimizer. 

Aurora automatically uses batch mode if the function is referenced from the `SELECT` list, a `WHERE` clause, or a `HAVING` clause. Note that top-level simple `CASE` expressions are eligible for batch-mode execution. Top-level searched `CASE` expressions are also eligible for batch-mode execution provided that the first `WHEN` clause is a simple predicate with a batch-mode function call. 

Your user-defined function must be a `LANGUAGE SQL` function and should specify `PARALLEL SAFE` and `COST 5000`. 

#### Function migration from the SELECT statement to the FROM clause
<a name="postgresql-ml-batch-mode-function-migration"></a>

Usually, an `aws_ml` function that is eligible for batch-mode execution is automatically migrated by Aurora to the `FROM` clause. 

The migration of eligible batch-mode functions to the FROM clause can be examined manually on a per-query level. To do this, you use EXPLAIN statements (and ANALYZE and VERBOSE) and find the "Batch Processing" information below each batch-mode `Function Scan`. You can also use EXPLAIN (with VERBOSE) without running the query. You then observe whether the calls to the function appear as a `Function Scan` under a nested loop join that was not specified in the original statement. 

In the following example, the nested loop join operator in the plan shows that Aurora migrated the `anomaly_score` function. It migrated this function from the SELECT list to the FROM clause, where it's eligible for batch-mode execution.

```
EXPLAIN (VERBOSE, COSTS false) 
SELECT anomaly_score(ts.R.description) from ts.R;
                         QUERY PLAN                          
-------------------------------------------------------------
 Nested Loop
   Output: anomaly_score((r.description)::text)
   ->  Seq Scan on ts.r
         Output: r.id, r.description, r.score
   ->  Function Scan on public.anomaly_score
         Output: anomaly_score.anomaly_score
         Function Call: anomaly_score((r.description)::text)
```

To disable batch-mode execution, set the `apg_enable_function_migration` parameter to `false`. This prevents the migration of `aws_ml` functions from the SELECT to the `FROM` clause. The following shows how.

```
SET apg_enable_function_migration = false;
```

The `apg_enable_function_migration` parameter is a Grand Unified Configuration (GUC) parameter that is recognized by the Aurora PostgreSQL `apg_plan_mgmt` extension for query plan management. To disable function migration in a session, use query plan management to save the resulting plan as an `approved` plan. At runtime, query plan management enforces the `approved` plan with its `apg_enable_function_migration` setting. This enforcement occurs regardless of the `apg_enable_function_migration` GUC parameter setting. For more information, see [Managing query execution plans for Aurora PostgreSQL](AuroraPostgreSQL.Optimize.md).

#### Using the max\$1rows\$1per\$1batch parameter
<a name="postgresql-ml-batch-mode-max_rows_per_batch"></a>

Both the `aws_comprehend.detect_sentiment` and the `aws_sagemaker.invoke_endpoint` functions have a `max_rows_per_batch` parameter. This parameter specifies the number of rows that can be sent to the Aurora machine learning service. The larger the dataset processed by your function, the larger you can make the batch size.

Batch-mode functions improve efficiency by building batches of rows that spread the cost of the Aurora machine learning function calls over a large number of rows. However, if a `SELECT` statement finishes early due to a `LIMIT` clause, then the batch can be constructed over more rows than the query uses. This approach can result in additional charges to your AWS account. To gain the benefits of batch-mode execution but avoid building batches that are too large, use a smaller value for the `max_rows_per_batch` parameter in your function calls.

If you do an `EXPLAIN` (`VERBOSE`, `ANALYZE`) of a query that uses batch-mode execution, you see a `FunctionScan` operator that is below a nested loop join. The number of loops reported by `EXPLAIN` equals the number of times a row was fetched from the `FunctionScan` operator. If a statement uses a LIMIT clause, the number of fetches is consistent. To optimize the size of the batch, set the `max_rows_per_batch` parameter to this value. However, if the batch-mode function is referenced in a predicate in the `WHERE` clause or `HAVING` clause, then you probably can't know the number of fetches in advance. In this case, use the loops as a guideline and experiment with `max_rows_per_batch` to find a setting that optimizes performance.

#### Verifying batch-mode execution
<a name="postgresql-ml-batch-mode-verify"></a>

To see if a function ran in batch mode, use `EXPLAIN ANALYZE`. If batch-mode execution was used, then the query plan will include the information in a "Batch Processing" section.

```
EXPLAIN ANALYZE SELECT user-defined-function();
 Batch Processing: num batches=1 avg/min/max batch size=3333.000/3333.000/3333.000
                                avg/min/max batch call time=146.273/146.273/146.273
```

In this example, there was 1 batch that contained 3,333 rows, which took 146.273 ms to process. The "Batch Processing" section shows the following: 
+ How many batches there were for this function scan operation
+ The batch size average, minimum, and maximum
+ The batch execution time average, minimum, and maximum

Typically the final batch is smaller than the rest, which often results in a minimum batch size that is much smaller than the average. 

To return the first few rows more quickly, set the `max_rows_per_batch` parameter to a smaller value. 

To reduce the number of batch mode calls to the ML service when you use a `LIMIT` in your user-defined function, set the `max_rows_per_batch` parameter to a smaller value. 

### Improving response times with parallel query processing
<a name="postgresql-using-sagemaker-example-parallel"></a>

To get results as fast as possible from a large number of rows, you can combine parallel query processing with batch mode processing. You can use parallel query processing for `SELECT`, `CREATE TABLE AS SELECT`, and `CREATE MATERIALIZED VIEW` statements.

**Note**  
PostgreSQL doesn't yet support parallel query for data manipulation language (DML) statements.

Parallel query processing occurs both within the database and within the ML service. The number of cores in the instance class of the database limits the degree of parallelism that can be used when running a query. The database server can construct a parallel query execution plan that partitions the task among a set of parallel workers. Then each of these workers can build batched requests containing tens of thousands of rows (or as many as are allowed by each service). 

The batched requests from all of the parallel workers are sent to the SageMaker AI endpoint. The degree of parallelism that the endpoint can support is constrained by the number and type of instances that support it. For *K* degrees of parallelism, you need a database instance class that has at least *K* cores. You also need to configure the SageMaker AI endpoint for your model to have *K* initial instances of a sufficiently high-performing instance class.

To use parallel query processing, you can set the `parallel_workers` storage parameter of the table that contains the data that you plan to pass. You set `parallel_workers` to a batch-mode function such as `aws_comprehend.detect_sentiment`. If the optimizer chooses a parallel query plan, the AWS ML services can be called both in batch and in parallel. 

You can use the following parameters with the `aws_comprehend.detect_sentiment` function to get a plan with four-way parallelism. If you change either of the following two parameters, you must restart the database instance for the changes to take effect

```
-- SET max_worker_processes to 8;  -- default value is 8
-- SET max_parallel_workers to 8;  -- not greater than max_worker_processes
SET max_parallel_workers_per_gather to 4;  -- not greater than max_parallel_workers

-- You can set the parallel_workers storage parameter on the table that the data
-- for the Aurora machine learning function is coming from in order to manually override the degree of
-- parallelism that would otherwise be chosen by the query optimizer
--
ALTER TABLE yourTable SET (parallel_workers = 4);

-- Example query to exploit both batch-mode execution and parallel query
EXPLAIN (verbose, analyze, buffers, hashes) 
SELECT aws_comprehend.detect_sentiment(description, 'en')).*
FROM yourTable
WHERE id < 100;
```

For more information about controlling parallel query, see [Parallel plans](https://www.postgresql.org/docs/current/parallel-plans.html) in the PostgreSQL documentation.

### Using materialized views and materialized columns
<a name="postgresql-using-sagemaker-example-materialized"></a>

When you invoke an AWS service such as SageMaker AI or Amazon Comprehend from your database, your account is charged according to the pricing policy of that service. To minimize charges to your account, you can materialize the result of calling the AWS service into a materialized column so that the AWS service is not called more than once per input row. If desired, you can add a `materializedAt` timestamp column to record the time at which the columns were materialized. 

The latency of an ordinary single-row `INSERT` statement is typically much less than the latency of calling a batch-mode function. Thus, you might not be able to meet the latency requirements of your application if you invoke the batch-mode function for every single-row `INSERT` that your application performs. To materialize the result of calling an AWS service into a materialized column, high-performance applications generally need to populate the materialized columns. To do this, they periodically issue an `UPDATE` statement that operates on a large batch of rows at the same time.

`UPDATE` takes a row-level lock that can impact a running application. So you might need to use `SELECT ... FOR UPDATE SKIP LOCKED`, or use `MATERIALIZED VIEW`. 

Analytic queries that operate on a large number of rows in real time can combine batch-mode materialization with real-time processing. To do this, these queries assemble a `UNION ALL` of the pre-materialized results with a query over the rows that don't yet have materialized results. In some cases, such a `UNION ALL` is needed in multiple places, or the query is generated by a third-party application. If so, you can create a `VIEW` to encapsulate the `UNION ALL` operation so this detail isn't exposed to the rest of the SQL application.

You can use a materialized view to materialize the results of an arbitrary `SELECT` statement at a snapshot in time. You can also use it to refresh the materialized view at any time in the future. Currently PostgreSQL doesn't support incremental refresh, so each time the materialized view is refreshed the materialized view is fully recomputed.

You can refresh materialized views with the `CONCURRENTLY` option, which updates the contents of the materialized view without taking an exclusive lock. Doing this allows a SQL application to read from the materialized view while it's being refreshed.

## Monitoring Aurora machine learning
<a name="postgresql-ml-monitoring"></a>

You can monitor the `aws_ml` functions by setting the `track_functions` parameter in your custom DB cluster parameter group to `all`. By default, this parameter is set to `pl` which means that only procedure-language functions are tracked. By changing this to `all`, the `aws_ml` functions are also tracked. For more information, see [Run-time Statistics](https://www.postgresql.org/docs/current/runtime-config-statistics.html) in the PostgreSQL documentation. 

For information about monitoring the performance of the SageMaker AI operations called from Aurora machine learning functions, see [Monitor Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-overview.html) in the *Amazon SageMaker AI Developer Guide*.

With `track_functions` set to `all`, you can query the `pg_stat_user_functions` view to get statistics about the functions that you define and use to invoke Aurora machine learning services. For each function, the view provides the number of `calls`, `total_time`, and `self_time`. 

To view the statistics for the `aws_sagemaker.invoke_endpoint` and the `aws_comprehend.detect_sentiment` functions, you can filter results by schema name using the following query.

```
SELECT * FROM pg_stat_user_functions 
    WHERE schemaname 
    LIKE 'aws_%';
```

To clear the statistics, do as follows.

```
SELECT pg_stat_reset();
```

You can get the names of your SQL functions that call the `aws_sagemaker.invoke_endpoint` function by querying the PostgreSQL `pg_proc` system catalog. This catalog stores information about functions, procedures, and more. For more information, see [pg\$1proc](https://www.postgresql.org/docs/current/catalog-pg-proc.html) in the PostgreSQL documentation. Following is an example of querying the table to get the names of functions (`proname`) whose source (`prosrc`) includes the text *invoke\$1endpoint*. 

```
SELECT proname FROM pg_proc WHERE prosrc LIKE '%invoke_endpoint%';
```