

# Perform advanced analytics using Amazon Redshift ML
<a name="perform-advanced-analytics-using-amazon-redshift-ml"></a>

*Po Hong and Chyanna Antonio, Amazon Web Services*

## Summary
<a name="perform-advanced-analytics-using-amazon-redshift-ml-summary"></a>

On the Amazon Web Services (AWS) Cloud, you can use Amazon Redshift machine learning (Amazon Redshift ML) to perform ML analytics on data stored in either an Amazon Redshift cluster or on Amazon Simple Storage Service (Amazon S3). Amazon Redshift ML supports supervised learning, which is typically used for advanced analytics. Use cases for Amazon Redshift ML include revenue forecasting, credit card fraud detection, and customer lifetime value (CLV) or customer churn predictions.

Amazon Redshift ML makes it easy for database users to create, train, and deploy ML models by using standard SQL commands. Amazon Redshift ML uses Amazon SageMaker Autopilot to automatically train and tune the best ML models for classification or regression based on your data, while you retain control and visibility.

All interactions between Amazon Redshift, Amazon S3, and Amazon SageMaker are abstracted away and automated. After the ML model is trained and deployed, it becomes available as a [user-defined function](https://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html) (UDF) in Amazon Redshift and can be used in SQL queries.  

This pattern complements the [Create, train, and deploy ML models in Amazon Redshift using SQL with Amazon Redshift ML](https://aws.amazon.com/blogs/big-data/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml/) from the AWS Blog, and the [Build, train, and deploy an ML model with Amazon SageMaker](https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/) tutorial from the [Getting Started Resource Center](https://aws.amazon.com/getting-started/).

## Prerequisites and limitations
<a name="perform-advanced-analytics-using-amazon-redshift-ml-prereqs"></a>

**Prerequisites **
+ An active AWS account
+ Existing data in an Amazon Redshift table

**Skills **
+ Familiarity with terms and concepts used by Amazon Redshift ML, including *machine learning*, *training*, and *prediction*. For more information about this, see [Training ML models ](https://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html)in the Amazon Machine Learning (Amazon ML) documentation.
+ Experience with Amazon Redshift user setup, access management, and standard SQL syntax. For more information about this, see [Getting started with Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html) in the Amazon Redshift documentation.
+ Knowledge and experience with Amazon S3 and AWS Identity and Access Management (IAM). 
+ Experience running commands in AWS Command Line Interface (AWS CLI) is also beneficial but not required. 

**Limitations **
+ The Amazon Redshift cluster and S3 bucket must be located in the same AWS Region.
+ This pattern’s approach only supports supervised learning models such as regression, binary classification, and multiclass classification. 

## Architecture
<a name="perform-advanced-analytics-using-amazon-redshift-ml-architecture"></a>

![\[Workflow shows how Amazon Redshift ML works with SageMaker to build, train, and deploy an ML model.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/3b6a5f49-c647-45f4-940e-b623f989326a/images/36cc39fe-c1f3-4d68-b89e-e636156001a6.png)


The following steps explain how Amazon Redshift ML works with SageMaker to build, train, and deploy an ML model: 

1. Amazon Redshift exports training data to an S3 bucket.

1. SageMaker Autopilot automatically preprocesses the training data.

1. After the `CREATE MODEL` statement is invoked, Amazon Redshift ML uses SageMaker for training.

1. SageMaker Autopilot searches for and recommends the ML algorithm and optimal hyper-parameters that optimize the evaluation metrics.

1. Amazon Redshift ML registers the output ML model as a SQL function in the Amazon Redshift cluster.

1. The ML model's function can be used in a SQL statement. 

**Technology stack**
+ Amazon Redshift
+ SageMaker
+ Amazon S3

## Tools
<a name="perform-advanced-analytics-using-amazon-redshift-ml-tools"></a>
+ [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/welcome.html) – Amazon Redshift is an enterprise-level, petabyte scale, fully managed data warehousing service.
+ [Amazon Redshift ML](https://docs.aws.amazon.com/redshift/latest/dg/machine_learning.html) – Amazon Redshift machine learning (Amazon Redshift ML) is a robust, cloud-based service that makes it easy for analysts and data scientists of all skill levels to use ML technology.
+ [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) – Amazon Simple Storage Service (Amazon S3) is storage for the internet. 
+ [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) – SageMaker is a fully managed ML service. 
+ [Amazon SageMaker Autopilot](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html) – SageMaker Autopilot is a feature-set that automates key tasks of an automatic machine learning (AutoML) process.

 

**Code**

You can create a supervised ML model in Amazon Redshift by using the following code:

```
"CREATE MODEL customer_churn_auto_model
FROM (SELECT state,
             account_length,
             area_code,
             total_charge/account_length AS average_daily_spend, 
             cust_serv_calls/account_length AS average_daily_cases,
             churn 
      FROM customer_activity
      WHERE  record_date < '2020-01-01' 
     )
TARGET churn
FUNCTION ml_fn_customer_churn_auto
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/Redshift-ML'
SETTINGS (
  S3_BUCKET 'your-bucket'
);")
```

**Note**  
The `SELECT` state can refer to Amazon Redshift regular tables, Amazon Redshift Spectrum external tables, or both.

## Epics
<a name="perform-advanced-analytics-using-amazon-redshift-ml-epics"></a>

### Prepare a training and test dataset
<a name="prepare-a-training-and-test-dataset"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Prepare a training and test dataset. | Sign in to the AWS Management Console and open the Amazon SageMaker console. Follow the instructions from the [Build, train, and deploy a machine learning model](https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/) tutorial to create a .csv or Apache Parquet file that has a label column (*supervised training*) and no header. We recommend that you shuffle and split the raw dataset into a training set for the model’s training (70 percent) and a test set for the model’s performance evaluation (30 percent). | Data scientist | 

### Prepare and configure the technology stack
<a name="prepare-and-configure-the-technology-stack"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create and configure an Amazon Redshift cluster. | On the Amazon Redshift console, create a cluster according to your requirements. For more information about this, see [Create a cluster](https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data-launch-cluster.html) in the Amazon Redshift documentation.  Amazon Redshift clusters must be created with the `SQL_PREVIEW` maintenance track. For more information about preview tracks, see [Choosing cluster maintenance tracks](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-mgmt-maintenance-tracks) in the Amazon Redshift documentation. | DBA, Cloud architect | 
| Create an S3 bucket to store training data and model artifacts. | On the Amazon S3 console, create an S3 bucket for the training and test data. For more information about creating an S3 bucket, see [Create an S3 bucket](https://docs.aws.amazon.com/quickstarts/latest/s3backup/step-1-create-bucket.html) from AWS Quick Starts. Make sure that your Amazon Redshift cluster and S3 bucket are in the same Region.  | DBA, Cloud architect | 
| Create and attach an IAM policy to the Amazon Redshift cluster. | Create an IAM policy to allow the Amazon Redshift cluster to access SageMaker and Amazon S3. For instructions and steps, see [Cluster setup for using Amazon Redshift ML](https://docs.aws.amazon.com/redshift/latest/dg/cluster-setup.html) in the Amazon Redshift documentation. | DBA, Cloud architect | 
| Allow Amazon Redshift users and groups to access schemas and tables. | Grant permissions to allow users and groups in Amazon Redshift to access internal and external schemas and tables. For steps and instructions, see [Managing permissions and ownership](https://docs.aws.amazon.com/redshift/latest/dg/permissions-ownership.html) in the Amazon Redshift documentation. | DBA | 

### Create and train the ML model in Amazon Redshift
<a name="create-and-train-the-ml-model-in-amazon-redshift"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Create and train the ML model in Amazon Redshift. | Create and train your ML model in Amazon Redshift ML. For more information, see the `CREATE MODEL` statement in the Amazon Redshift documentation. | Developer, Data scientist | 

### Perform batch inference and prediction in Amazon Redshift
<a name="perform-batch-inference-and-prediction-in-amazon-redshift"></a>


| Task | Description | Skills required | 
| --- | --- | --- | 
| Perform inference using the generated ML model function. | For more information about performing inference by using the generated ML model function, see [Prediction](https://docs.aws.amazon.com/redshift/latest/dg/prediction.html) in the Amazon Redshift documentation. | Data scientist, Business intelligence user | 

## Related resources
<a name="perform-advanced-analytics-using-amazon-redshift-ml-resources"></a>

**Prepare a training and test dataset**
+ [Building, training, and deploying a machine learning model with Amazon SageMaker](https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/)

 

**Prepare and configure the technology stack**
+ [Creating an Amazon Redshift cluster](https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data-launch-cluster.html)
+ [Choosing Amazon Redshift cluster maintenance tracks](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-mgmt-maintenance-tracks)
+ [Creating an S3 bucket](https://docs.aws.amazon.com/quickstarts/latest/s3backup/step-1-create-bucket.html)
+ [Setting up an Amazon Redshift cluster for using Amazon Redshift ML](https://docs.aws.amazon.com/redshift/latest/dg/cluster-setup.html)
+ [Managing permissions and ownership in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/permissions-ownership.html)

 

**Create and train the ML model in Amazon Redshift**
+ [CREATE MODEL statement in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_MODEL.html#r_simple_create_model) 

 

**Perform batch inference and prediction in Amazon Redshift**
+ [Prediction in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/prediction.html) 

 

**Other resources**
+ [Getting started with Amazon Redshift ML](https://docs.aws.amazon.com/redshift/latest/dg/geting-started-machine-learning.html)
+ [Creating, training, and deploying ML models in Amazon Redshift using SQL with Amazon Redshift ML](https://aws.amazon.com/blogs/big-data/create-train-and-deploy-machine-learning-models-in-amazon-redshift-using-sql-with-amazon-redshift-ml/)
+ [Amazon Redshift partners](https://aws.amazon.com/redshift/partners/?partner-solutions-cards.sort-by=item.additionalFields.partnerNameLower&partner-solutions-cards.sort-order=aschttps://aws.amazon.com/redshift/partners/?partner-solutions-cards.sort-by=item.additionalFields.partnerNameLower&partner-solutions-cards.sort-order=asc)
+ [AWS machine learning competency partners](https://aws.amazon.com/machine-learning/partner-solutions/?partner-solutions-cards.sort-by=item.additionalFields.partnerNameLower&partner-solutions-cards.sort-order=asc)