# Machine Learning Frameworks and Languages
<a name="frameworks"></a>

Amazon SageMaker AI provides native support for popular programming languages and machine learning frameworks, empowering developers and data scientists to leverage their preferred tools and technologies. This section offers references for working with Python and R, as well as their respective software development kits (SDKs) within SageMaker AI. Additionally, it covers a wide range of machine learning and deep learning frameworks, including Apache MXNet, PyTorch, TensorFlow. 

You can use Python and R natively in Amazon SageMaker notebook kernels. There are also kernels that support specific frameworks. A very popular way to get started with SageMaker AI is to use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). It provides open source Python APIs and containers that make it easy to train and deploy models in SageMaker AI, as well as examples for use with several different machine learning and deep learning frameworks.

For information about using specific frameworks or how to use R in SageMaker AI, see the following topics.

Languages SDKs and user guides:
+ [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable)
+ [R](r-guide.md)
+ [API Reference](api-and-sdk-reference.md)

Machine learning and deep learning frameworks guides:
+ [Apache MXNet](mxnet.md)
+ [Apache Spark](apache-spark.md)
+ [Chainer](chainer.md)
+ [Hugging Face](hugging-face.md)
+ [PyTorch](pytorch.md)
+ [Scikit-learn](sklearn.md)
+ [SparkML Serving](sparkml-serving.md)
+ [TensorFlow](tf.md)
+ [Triton Inference Server](triton.md)

# Resources for using Apache MXNet with Amazon SageMaker AI
<a name="mxnet"></a>

The [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) MXNet estimators and models and the SageMaker AI open-source MXNet container make writing a MXNet script and running it in SageMaker AI easier. The following section provides reference material you can use to learn how to use SageMaker AI to train and deploy a model using custom MXNet code. 

## What do you want to do?
<a name="mxnet-intent"></a>

I want to train a custom MXNet model in SageMaker AI.  
For documentation, see [Train a Model with MXNet](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#train-a-model-with-mxnet).

I have an MXNet model that I trained in SageMaker AI, and I want to deploy it to a hosted endpoint.  
For more information, see [Deploy MXNet models](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#deploy-mxnet-models).

I have an MXNet model that I trained outside of SageMaker AI, and I want to deploy it to a SageMaker AI endpoint  
For more information, see [Deploy Endpoints from Model Data](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#deploy-endpoints-from-model-data).

I want to see the API documentation for [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) MXNet classes.  
For more information, see [MXNet Classes](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html). 

I want to find the SageMaker AI MXNet container repository.  
For more information, see [SageMaker AI MXNet Container GitHub repository](https://github.com/aws/sagemaker-mxnet-container).

I want to find information about MXNet versions supported by AWS Deep Learning Containers.  
For more information, see [Available Deep Learning Container Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).

 For general information about writing MXNet script mode training scripts and using MXNet script mode estimators and models with SageMaker AI, see [Using MXNet with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html).

# Apache Spark with Amazon SageMaker AI
<a name="apache-spark"></a>

Amazon SageMaker AI Spark is an open source Spark library that helps you build Spark machine learning (ML) pipelines with SageMaker AI. This simplifies the integration of Spark ML stages with SageMaker AI stages, like model training and hosting. For information about SageMaker AI Spark, see the [SageMaker AI Spark](https://github.com/aws/sagemaker-spark) GitHub repository. The following topics provide information to learn how to use Apache Spark with SageMaker AI.

The SageMaker AI Spark library is available in Python and Scala. You can use SageMaker AI Spark to train models in SageMaker AI using `org.apache.spark.sql.DataFrame` data frames in your Spark clusters. After model training, you can also host the model using SageMaker AI hosting services. 

The SageMaker AI Spark library, `com.amazonaws.services.sagemaker.sparksdk`, provides the following classes, among others:
+ `SageMakerEstimator`—Extends the `org.apache.spark.ml.Estimator` interface. You can use this estimator for model training in SageMaker AI.
+ `KMeansSageMakerEstimator`, `PCASageMakerEstimator`, and `XGBoostSageMakerEstimator`—Extend the `SageMakerEstimator` class. 
+ `SageMakerModel`—Extends the `org.apache.spark.ml.Model` class. You can use this `SageMakerModel` for model hosting and getting inferences in SageMaker AI.

You can download the source code for both Python Spark (PySpark) and Scala libraries from the [SageMaker AI Spark](https://github.com/aws/sagemaker-spark) GitHub repository.

For installation and examples of the SageMaker AI Spark library, see [SageMaker AI Spark for Scala examples](apache-spark-example1.md) or [Resources for using SageMaker AI Spark for Python (PySpark) examples](apache-spark-additional-examples.md).

If you use Amazon EMR on AWS to manage Spark clusters, see [Apache Spark](https://aws.amazon.com/emr/features/spark/). For more information on using Amazon EMR in SageMaker AI, see [Data preparation using Amazon EMR](studio-notebooks-emr-cluster.md).

**Topics**
+ [

## Integrate your Apache Spark application with SageMaker AI
](#spark-sdk-common-process)
+ [

# SageMaker AI Spark for Scala examples
](apache-spark-example1.md)
+ [

# Resources for using SageMaker AI Spark for Python (PySpark) examples
](apache-spark-additional-examples.md)

## Integrate your Apache Spark application with SageMaker AI
<a name="spark-sdk-common-process"></a>

The following is high-level summary of the steps for integrating your Apache Spark application with SageMaker AI.

1. Continue data preprocessing using the Apache Spark library that you are familiar with. Your dataset remains a `DataFrame` in your Spark cluster. Load your data into a `DataFrame`. Preprocess it so that you have a `features` column with `org.apache.spark.ml.linalg.Vector` of `Doubles`, and an optional `label` column with values of `Double`​ type.

1. Use the estimator in the SageMaker AI Spark library to train your model. For example, if you choose the k-means algorithm provided by SageMaker AI for model training, call the `KMeansSageMakerEstimator.fit` method. 

   Provide your `DataFrame` as input. The estimator returns a `SageMakerModel` object. 
**Note**  
`SageMakerModel` extends the `org.apache.spark.ml.Model`.

   The `fit` method does the following: 

   1. Converts the input `DataFrame` to the protobuf format. It does so by selecting the `features` and `label` columns from the input `DataFrame`. It then uploads the protobuf data to an Amazon S3 bucket. The protobuf format is efficient for model training in SageMaker AI.

   1. Starts model training in SageMaker AI by sending a SageMaker AI [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. After model training has completed, SageMaker AI saves the model artifacts to an S3 bucket. 

      SageMaker AI assumes the IAM role that you specified for model training to perform tasks on your behalf. For example, it uses the role to read training data from an S3 bucket and to write model artifacts to a bucket. 

   1. Creates and returns a `SageMakerModel` object. The constructor does the following tasks, which are related to deploying your model to SageMaker AI. 

      1. Sends a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) request to SageMaker AI. 

      1. Sends a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) request to SageMaker AI.

      1. Sends a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) request to SageMaker AI, which then launches the specified resources, and hosts the model on them. 

1. You can get inferences from your model hosted in SageMaker AI with the `SageMakerModel.transform`. 

   Provide an input `DataFrame` with features as input. The `transform` method transforms it to a `DataFrame` containing inferences. Internally, the `transform` method sends a request to the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) SageMaker API to get inferences. The `transform` method appends the inferences to the input `DataFrame`.

# SageMaker AI Spark for Scala examples
<a name="apache-spark-example1"></a>

Amazon SageMaker AI provides an Apache Spark library ([SageMaker AI Spark](https://github.com/aws/sagemaker-spark/tree/master/sagemaker-spark-sdk)) that you can use to integrate your Apache Spark applications with SageMaker AI. This topic contains examples to help get you started with SageMaker AI Spark with Scala. For information about the SageMaker AI Apache Spark library, see [Apache Spark with Amazon SageMaker AI](apache-spark.md).

**Download Spark for Scala**

You can download the source code and examples for both Python Spark (PySpark) and Scala libraries from the [SageMaker AI Spark](https://github.com/aws/sagemaker-spark) GitHub repository.

For detailed instructions on installing the SageMaker AI Spark library, see [SageMaker AI Spark](https://github.com/aws/sagemaker-spark/tree/master/sagemaker-spark-sdk).

SageMaker AI Spark SDK for Scala is available in the Maven central repository. Add the Spark library to your project by adding the following dependency to your `pom.xml` file:
+  If your project is built with Maven, add the following to your pom.xml file:

  ```
  <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>sagemaker-spark_2.11</artifactId>
      <version>spark_2.2.0-1.0</version>
  </dependency>
  ```
+ If your project depends on Spark 2.1, add the following to your pom.xml file:

  ```
  <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>sagemaker-spark_2.11</artifactId>
      <version>spark_2.1.1-1.0</version>
  </dependency>
  ```

**Spark for Scala example**

This section provides example code that uses the Apache Spark Scala library provided by SageMaker AI to train a model in SageMaker AI using `DataFrame`s in your Spark cluster. This is then followed by examples on how to [Use Custom Algorithms for Model Training and Hosting on Amazon SageMaker AI with Apache Spark](apache-spark-example1-cust-algo.md) and [Use the SageMakerEstimator in a Spark Pipeline](apache-spark-example1-extend-pipeline.md).

The following example hosts the resulting model artifacts using SageMaker AI hosting services. For more details on this example, see [Getting Started: K-Means Clustering on SageMaker AI with SageMaker AI Spark SDK](https://github.com/aws/sagemaker-spark?tab=readme-ov-file#getting-started-k-means-clustering-on-sagemaker-with-sagemaker-spark-sdk) Specifically, this example does the following:
+ Uses the `KMeansSageMakerEstimator` to fit (or train) a model on data

  Because the example uses the k-means algorithm provided by SageMaker AI to train a model, you use the `KMeansSageMakerEstimator`. You train the model using images of handwritten single-digit numbers (from the MNIST dataset). You provide the images as an input `DataFrame`. For your convenience, SageMaker AI provides this dataset in an Amazon S3 bucket.

  In response, the estimator returns a `SageMakerModel` object.
+ Obtains inferences using the trained `SageMakerModel`

  To get inferences from a model hosted in SageMaker AI, you call the `SageMakerModel.transform` method. You pass a `DataFrame` as input. The method transforms the input `DataFrame` to another `DataFrame` containing inferences obtained from the model. 

  For a given input image of a handwritten single-digit number, the inference identifies a cluster that the image belongs to. For more information, see [K-Means Algorithm](k-means.md).

```
import org.apache.spark.sql.SparkSession
import com.amazonaws.services.sagemaker.sparksdk.IAMRole
import com.amazonaws.services.sagemaker.sparksdk.algorithms
import com.amazonaws.services.sagemaker.sparksdk.algorithms.KMeansSageMakerEstimator

val spark = SparkSession.builder.getOrCreate

// load mnist data as a dataframe from libsvm
val region = "us-east-1"
val trainingData = spark.read.format("libsvm")
  .option("numFeatures", "784")
  .load(s"s3://sagemaker-sample-data-$region/spark/mnist/train/")
val testData = spark.read.format("libsvm")
  .option("numFeatures", "784")
  .load(s"s3://sagemaker-sample-data-$region/spark/mnist/test/")

val roleArn = "arn:aws:iam::account-id:role/rolename"

val estimator = new KMeansSageMakerEstimator(
  sagemakerRole = IAMRole(roleArn),
  trainingInstanceType = "ml.p2.xlarge",
  trainingInstanceCount = 1,
  endpointInstanceType = "ml.c4.xlarge",
  endpointInitialInstanceCount = 1)
  .setK(10).setFeatureDim(784)

// train
val model = estimator.fit(trainingData)

val transformedData = model.transform(testData)
transformedData.show
```

The example code does the following:
+ Loads the MNIST dataset from an S3 bucket provided by SageMaker AI (`awsai-sparksdk-dataset`) into a Spark `DataFrame` (`mnistTrainingDataFrame`):

  ```
  // Get a Spark session.
  
  val spark = SparkSession.builder.getOrCreate
  
  // load mnist data as a dataframe from libsvm
  val region = "us-east-1"
  val trainingData = spark.read.format("libsvm")
    .option("numFeatures", "784")
    .load(s"s3://sagemaker-sample-data-$region/spark/mnist/train/")
  val testData = spark.read.format("libsvm")
    .option("numFeatures", "784")
    .load(s"s3://sagemaker-sample-data-$region/spark/mnist/test/")
  
  val roleArn = "arn:aws:iam::account-id:role/rolename"
  trainingData.show()
  ```

  The `show` method displays the first 20 rows in the data frame:

  ```
  +-----+--------------------+
  |label|            features|
  +-----+--------------------+
  |  5.0|(784,[152,153,154...|
  |  0.0|(784,[127,128,129...|
  |  4.0|(784,[160,161,162...|
  |  1.0|(784,[158,159,160...|
  |  9.0|(784,[208,209,210...|
  |  2.0|(784,[155,156,157...|
  |  1.0|(784,[124,125,126...|
  |  3.0|(784,[151,152,153...|
  |  1.0|(784,[152,153,154...|
  |  4.0|(784,[134,135,161...|
  |  3.0|(784,[123,124,125...|
  |  5.0|(784,[216,217,218...|
  |  3.0|(784,[143,144,145...|
  |  6.0|(784,[72,73,74,99...|
  |  1.0|(784,[151,152,153...|
  |  7.0|(784,[211,212,213...|
  |  2.0|(784,[151,152,153...|
  |  8.0|(784,[159,160,161...|
  |  6.0|(784,[100,101,102...|
  |  9.0|(784,[209,210,211...|
  +-----+--------------------+
  only showing top 20 rows
  ```

  In each row:
  + The `label` column identifies the image's label. For example, if the image of the handwritten number is the digit 5, the label value is 5. 
  + The `features` column stores a vector (`org.apache.spark.ml.linalg.Vector`) of `Double` values. These are the 784 features of the handwritten number. (Each handwritten number is a 28 x 28-pixel image, making 784 features.)
+ Creates a SageMaker AI estimator (`KMeansSageMakerEstimator`) 

  The `fit` method of this estimator uses the k-means algorithm provided by SageMaker AI to train models using an input `DataFrame`. In response, it returns a `SageMakerModel` object that you can use to get inferences.
**Note**  
The `KMeansSageMakerEstimator` extends the SageMaker AI `SageMakerEstimator`, which extends the Apache Spark `Estimator`. 

  ```
  val estimator = new KMeansSageMakerEstimator(
    sagemakerRole = IAMRole(roleArn),
    trainingInstanceType = "ml.p2.xlarge",
    trainingInstanceCount = 1,
    endpointInstanceType = "ml.c4.xlarge",
    endpointInitialInstanceCount = 1)
    .setK(10).setFeatureDim(784)
  ```

   
  The constructor parameters provide information that is used for training a model and deploying it on SageMaker AI:
  + `trainingInstanceType` and `trainingInstanceCount`—Identify the type and number of ML compute instances to use for model training.
  + `endpointInstanceType`—Identifies the ML compute instance type to use when hosting the model in SageMaker AI. By default, one ML compute instance is assumed.
  + `endpointInitialInstanceCount`—Identifies the number of ML compute instances initially backing the endpoint hosting the model in SageMaker AI.
  + `sagemakerRole`—SageMaker AI assumes this IAM role to perform tasks on your behalf. For example, for model training, it reads data from S3 and writes training results (model artifacts) to S3. 
**Note**  
This example implicitly creates a SageMaker AI client. To create this client, you must provide your credentials. The API uses these credentials to authenticate requests to SageMaker AI. For example, it uses the credentials to authenticate requests to create a training job and API calls for deploying the model using SageMaker AI hosting services.
  + After the `KMeansSageMakerEstimator` object has been created, you set the following parameters, are used in model training: 
    + The number of clusters that the k-means algorithm should create during model training. You specify 10 clusters, one for each digit, 0 through 9. 
    + Identifies that each input image has 784 features (each handwritten number is a 28 x 28-pixel image, making 784 features).
+ Calls the estimator `fit` method

  ```
  // train
  val model = estimator.fit(trainingData)
  ```

  You pass the input `DataFrame` as a parameter. The model does all the work of training the model and deploying it to SageMaker AI. For more information see, [Integrate your Apache Spark application with SageMaker AI](apache-spark.md#spark-sdk-common-process). In response, you get a `SageMakerModel` object, which you can use to get inferences from your model deployed in SageMaker AI. 

  You provide only the input `DataFrame`. You don't need to specify the registry path to the k-means algorithm used for model training because the `KMeansSageMakerEstimator` knows it.
+ Calls the `SageMakerModel.transform` method to get inferences from the model deployed in SageMaker AI.

  The `transform` method takes a `DataFrame` as input, transforms it, and returns another `DataFrame` containing inferences obtained from the model. 

  ```
  val transformedData = model.transform(testData)
  transformedData.show
  ```

  For simplicity, we use the same `DataFrame` as input to the `transform` method that we used for model training in this example. The `transform` method does the following:
  + Serializes the `features` column in the input `DataFrame` to protobuf and sends it to the SageMaker AI endpoint for inference.
  + Deserializes the protobuf response into the two additional columns (`distance_to_cluster` and `closest_cluster`) in the transformed `DataFrame`.

  The `show` method gets inferences to the first 20 rows in the input `DataFrame`: 

  ```
  +-----+--------------------+-------------------+---------------+
  |label|            features|distance_to_cluster|closest_cluster|
  +-----+--------------------+-------------------+---------------+
  |  5.0|(784,[152,153,154...|  1767.897705078125|            4.0|
  |  0.0|(784,[127,128,129...|  1392.157470703125|            5.0|
  |  4.0|(784,[160,161,162...| 1671.5711669921875|            9.0|
  |  1.0|(784,[158,159,160...| 1182.6082763671875|            6.0|
  |  9.0|(784,[208,209,210...| 1390.4002685546875|            0.0|
  |  2.0|(784,[155,156,157...|  1713.988037109375|            1.0|
  |  1.0|(784,[124,125,126...| 1246.3016357421875|            2.0|
  |  3.0|(784,[151,152,153...|  1753.229248046875|            4.0|
  |  1.0|(784,[152,153,154...|  978.8394165039062|            2.0|
  |  4.0|(784,[134,135,161...|  1623.176513671875|            3.0|
  |  3.0|(784,[123,124,125...|  1533.863525390625|            4.0|
  |  5.0|(784,[216,217,218...|  1469.357177734375|            6.0|
  |  3.0|(784,[143,144,145...|  1736.765869140625|            4.0|
  |  6.0|(784,[72,73,74,99...|   1473.69384765625|            8.0|
  |  1.0|(784,[151,152,153...|    944.88720703125|            2.0|
  |  7.0|(784,[211,212,213...| 1285.9071044921875|            3.0|
  |  2.0|(784,[151,152,153...| 1635.0125732421875|            1.0|
  |  8.0|(784,[159,160,161...| 1436.3162841796875|            6.0|
  |  6.0|(784,[100,101,102...| 1499.7366943359375|            7.0|
  |  9.0|(784,[209,210,211...| 1364.6319580078125|            6.0|
  +-----+--------------------+-------------------+---------------+
  ```

  You can interpret the data, as follows:
  + A handwritten number with the `label` 5 belongs to cluster 4 (`closest_cluster`).
  + A handwritten number with the `label` 0 belongs to cluster 5.
  + A handwritten number with the `label` 4 belongs to cluster 9.
  + A handwritten number with the `label` 1 belongs to cluster 6.

**Topics**
+ [

# Use Custom Algorithms for Model Training and Hosting on Amazon SageMaker AI with Apache Spark
](apache-spark-example1-cust-algo.md)
+ [

# Use the SageMakerEstimator in a Spark Pipeline
](apache-spark-example1-extend-pipeline.md)

# Use Custom Algorithms for Model Training and Hosting on Amazon SageMaker AI with Apache Spark
<a name="apache-spark-example1-cust-algo"></a>

In [SageMaker AI Spark for Scala examples](apache-spark-example1.md), you use the `kMeansSageMakerEstimator` because the example uses the k-means algorithm provided by Amazon SageMaker AI for model training. You might choose to use your own custom algorithm for model training instead. Assuming that you have already created a Docker image, you can create your own `SageMakerEstimator` and specify the Amazon Elastic Container Registry path for your custom image. 

The following example shows how to create a `KMeansSageMakerEstimator` from the `SageMakerEstimator`. In the new estimator, you explicitly specify the Docker registry path to your training and inference code images.

```
import com.amazonaws.services.sagemaker.sparksdk.IAMRole
import com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator
import com.amazonaws.services.sagemaker.sparksdk.transformation.serializers.ProtobufRequestRowSerializer
import com.amazonaws.services.sagemaker.sparksdk.transformation.deserializers.KMeansProtobufResponseRowDeserializer

val estimator = new SageMakerEstimator(
  trainingImage =
    "811284229777.dkr.ecr.us-east-1.amazonaws.com/kmeans:1",
  modelImage =
    "811284229777.dkr.ecr.us-east-1.amazonaws.com/kmeans:1",
  requestRowSerializer = new ProtobufRequestRowSerializer(),
  responseRowDeserializer = new KMeansProtobufResponseRowDeserializer(),
  hyperParameters = Map("k" -> "10", "feature_dim" -> "784"),
  sagemakerRole = IAMRole(roleArn),
  trainingInstanceType = "ml.p2.xlarge",
  trainingInstanceCount = 1,
  endpointInstanceType = "ml.c4.xlarge",
  endpointInitialInstanceCount = 1,
  trainingSparkDataFormat = "sagemaker")
```

In the code, the parameters in the `SageMakerEstimator` constructor include:
+ `trainingImage` —Identifies the Docker registry path to the training image containing your custom code.
+ `modelImage` —Identifies the Docker registry path to the image containing inference code.
+ `requestRowSerializer` —Implements `com.amazonaws.services.sagemaker.sparksdk.transformation.RequestRowSerializer`.

  This parameter serializes rows in the input `DataFrame` to send them to the model hosted in SageMaker AI for inference.
+ `responseRowDeserializer` —Implements 

  `com.amazonaws.services.sagemaker.sparksdk.transformation.ResponseRowDeserializer`.

  This parameter deserializes responses from the model, hosted in SageMaker AI, back into a `DataFrame`.
+ `trainingSparkDataFormat` —Specifies the data format that Spark uses when uploading training data from a `DataFrame` to S3. For example, `"sagemaker"` for protobuf format, `"csv"` for comma-separated values, and `"libsvm"` for LibSVM format. 

You can implement your own `RequestRowSerializer` and `ResponseRowDeserializer` to serialize and deserialize rows from a data format that your inference code supports, such as .libsvm or ..csv.

# Use the SageMakerEstimator in a Spark Pipeline
<a name="apache-spark-example1-extend-pipeline"></a>

You can use `org.apache.spark.ml.Estimator` estimators and `org.apache.spark.ml.Model` models, and `SageMakerEstimator` estimators and `SageMakerModel` models in `org.apache.spark.ml.Pipeline` pipelines, as shown in the following example:

```
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.PCA
import org.apache.spark.sql.SparkSession
import com.amazonaws.services.sagemaker.sparksdk.IAMRole
import com.amazonaws.services.sagemaker.sparksdk.algorithms
import com.amazonaws.services.sagemaker.sparksdk.algorithms.KMeansSageMakerEstimator

val spark = SparkSession.builder.getOrCreate

// load mnist data as a dataframe from libsvm
val region = "us-east-1"
val trainingData = spark.read.format("libsvm")
  .option("numFeatures", "784")
  .load(s"s3://sagemaker-sample-data-$region/spark/mnist/train/")
val testData = spark.read.format("libsvm")
  .option("numFeatures", "784")
  .load(s"s3://sagemaker-sample-data-$region/spark/mnist/test/")

// substitute your SageMaker IAM role here
val roleArn = "arn:aws:iam::account-id:role/rolename"

val pcaEstimator = new PCA()
  .setInputCol("features")
  .setOutputCol("projectedFeatures")
  .setK(50)

val kMeansSageMakerEstimator = new KMeansSageMakerEstimator(
  sagemakerRole = IAMRole(integTestingRole),
  requestRowSerializer =
    new ProtobufRequestRowSerializer(featuresColumnName = "projectedFeatures"),
  trainingSparkDataFormatOptions = Map("featuresColumnName" -> "projectedFeatures"),
  trainingInstanceType = "ml.p2.xlarge",
  trainingInstanceCount = 1,
  endpointInstanceType = "ml.c4.xlarge",
  endpointInitialInstanceCount = 1)
  .setK(10).setFeatureDim(50)

val pipeline = new Pipeline().setStages(Array(pcaEstimator, kMeansSageMakerEstimator))

// train
val pipelineModel = pipeline.fit(trainingData)

val transformedData = pipelineModel.transform(testData)
transformedData.show()
```

The parameter `trainingSparkDataFormatOptions` configures Spark to serialize to protobuf the "projectedFeatures" column for model training. Additionally, Spark serializes to protobuf the "label" column by default.

Because we want to make inferences using the "projectedFeatures" column, we pass the column name into the `ProtobufRequestRowSerializer`.

The following example shows a transformed `DataFrame`:

```
+-----+--------------------+--------------------+-------------------+---------------+
|label|            features|   projectedFeatures|distance_to_cluster|closest_cluster|
+-----+--------------------+--------------------+-------------------+---------------+
|  5.0|(784,[152,153,154...|[880.731433034386...|     1500.470703125|            0.0|
|  0.0|(784,[127,128,129...|[1768.51722024166...|      1142.18359375|            4.0|
|  4.0|(784,[160,161,162...|[704.949236329314...|  1386.246826171875|            9.0|
|  1.0|(784,[158,159,160...|[-42.328192193771...| 1277.0736083984375|            5.0|
|  9.0|(784,[208,209,210...|[374.043902028333...|   1211.00927734375|            3.0|
|  2.0|(784,[155,156,157...|[941.267714528850...|  1496.157958984375|            8.0|
|  1.0|(784,[124,125,126...|[30.2848596410594...| 1327.6766357421875|            5.0|
|  3.0|(784,[151,152,153...|[1270.14374062052...| 1570.7674560546875|            0.0|
|  1.0|(784,[152,153,154...|[-112.10792566485...|     1037.568359375|            5.0|
|  4.0|(784,[134,135,161...|[452.068280676606...| 1165.1236572265625|            3.0|
|  3.0|(784,[123,124,125...|[610.596447285397...|  1325.953369140625|            7.0|
|  5.0|(784,[216,217,218...|[142.959601818422...| 1353.4930419921875|            5.0|
|  3.0|(784,[143,144,145...|[1036.71862533658...| 1460.4315185546875|            7.0|
|  6.0|(784,[72,73,74,99...|[996.740157435754...| 1159.8631591796875|            2.0|
|  1.0|(784,[151,152,153...|[-107.26076167417...|   960.963623046875|            5.0|
|  7.0|(784,[211,212,213...|[619.771820430940...|   1245.13623046875|            6.0|
|  2.0|(784,[151,152,153...|[850.152101817161...|  1304.437744140625|            8.0|
|  8.0|(784,[159,160,161...|[370.041887230547...| 1192.4781494140625|            0.0|
|  6.0|(784,[100,101,102...|[546.674328209335...|    1277.0908203125|            2.0|
|  9.0|(784,[209,210,211...|[-29.259112927426...| 1245.8182373046875|            6.0|
+-----+--------------------+--------------------+-------------------+---------------+
```

# Resources for using SageMaker AI Spark for Python (PySpark) examples
<a name="apache-spark-additional-examples"></a>

Amazon SageMaker AI provides an Apache Spark Python library ([SageMaker AI PySpark](https://github.com/aws/sagemaker-spark/tree/master/sagemaker-pyspark-sdk)) that you can use to integrate your Apache Spark applications with SageMaker AI. This topic contains examples to help you get started with PySpark. For information about the SageMaker AI Apache Spark library, see [Apache Spark with Amazon SageMaker AI](apache-spark.md).

**Download PySpark**

You can download the source code for both Python Spark (PySpark) and Scala libraries from the [SageMaker AI Spark](https://github.com/aws/sagemaker-spark) GitHub repository.

For instructions on installing the SageMaker AI Spark library, use any the following options or visit [SageMaker AI PySpark](https://github.com/aws/sagemaker-spark/tree/master/sagemaker-pyspark-sdk).
+ Install using pip:

  ```
  pip install sagemaker_pyspark
  ```
+ Install from the source:

  ```
  git clone git@github.com:aws/sagemaker-spark.git
  cd sagemaker-pyspark-sdk
  python setup.py install
  ```
+ You can also create a new notebook in a notebook instance that uses either the `Sparkmagic (PySpark)` or the `Sparkmagic (PySpark3)` kernel and connect to a remote Amazon EMR cluster.
**Note**  
The Amazon EMR cluster must be configured with an IAM role that has the `AmazonSageMakerFullAccess` policy attached. For information about configuring roles for an EMR cluster, see [Configure IAM Roles for Amazon EMR Permissions to AWS Services](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html) in the *Amazon EMR Management Guide*.

**PySpark examples**

For examples on using SageMaker AI PySpark, see:
+ [Using Amazon SageMaker AI with Apache Spark](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-spark/index.html) in Read the Docs.
+ [SageMaker AI Spark](https://github.com/aws/sagemaker-spark) GitHub repository.

# Resources for using Chainer with Amazon SageMaker AI
<a name="chainer"></a>

You can use SageMaker AI to train and deploy a model using custom Chainer code. The SageMaker AI Python SDK Chainer estimators and models and the SageMaker AI open-source Chainer container make writing a Chainer script and running it in SageMaker AI easier. The following section provides reference material you can use to learn how to use Chainer with SageMaker AI.

## What do you want to do?
<a name="chainer-intent"></a>

I want to train a custom Chainer model in SageMaker AI.  
For a sample Jupyter notebook, see the [Chainer example notebooks](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/mxnet_mnist) in the Amazon SageMaker AI Examples GitHub repository.  
For documentation, see [Train a Model with Chainer](https://sagemaker.readthedocs.io/en/stable/using_chainer.html#train-a-model-with-chainer).

I have a Chainer model that I trained in SageMaker AI, and I want to deploy it to a hosted endpoint.  
For more information, see [Deploy Chainer models](https://sagemaker.readthedocs.io/en/stable/using_chainer.html#deploy-chainer-models).

I have a Chainer model that I trained outside of SageMaker AI, and I want to deploy it to a SageMaker AI endpoint  
For more information, see [Deploy Endpoints from Model Data](https://sagemaker.readthedocs.io/en/stable/using_chainer.html#deploy-endpoints-from-model-data).

I want to see the API documentation for [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) Chainer classes.  
For more information, see [Chainer Classes](https://sagemaker.readthedocs.io/en/stable/sagemaker.chainer.html).

I want to find information about SageMaker AI Chainer containers.  
For more information, see the [SageMaker AI Chainer Container GitHub repository](https://github.com/aws/sagemaker-chainer-container).

 For information about supported Chainer versions, and for general information about writing Chainer training scripts and using Chainer estimators and models with SageMaker AI, see [Using Chainer with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/using_chainer.html). 

# Resources for using Hugging Face with Amazon SageMaker AI
<a name="hugging-face"></a>

Amazon SageMaker AI lets customers train, fine-tune, and run inference using Hugging Face models for Natural Language Processing (NLP) on SageMaker AI. You can use Hugging Face for both training and inference. The following section provides information on Hugging Face models and includes reference material you can use to learn how to use Hugging Face with SageMaker AI.

This functionality is available through the development of Hugging Face [AWS Deep Learning Containers](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/what-is-dlc.html). These containers include Hugging Face Transformers, Tokenizers and the Datasets library, which allows you to use these resources for your training and inference jobs. For a list of the available Deep Learning Containers images, see [Available Deep Learning Containers Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md). These Deep Learning Containers images are maintained and regularly updated with security patches.

To use the Hugging Face Deep Learning Containers with the SageMaker Python SDK for training, see the [Hugging Face SageMaker AI Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html). With the Hugging Face Estimator, you can use the Hugging Face models as you would any other SageMaker AI Estimator. However, using the SageMaker Python SDK is optional. You can also orchestrate your use of the Hugging Face Deep Learning Containers with the AWS CLI and AWS SDK for Python (Boto3).

For more information on Hugging Face and the models available in it, see the [Hugging Face documentation](https://huggingface.co/). 

## Training
<a name="hugging-face-training"></a>

To run training, use any of the thousands of models available in Hugging Face and fine-tune them for your use case with additional training. With SageMaker AI, you can use standard training or take advantage of [SageMaker AI Distributed Data and Model Parallel training](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html). 

Like other SageMaker training jobs using custom code, you can capture your own metrics by passing a metrics definition to the SageMaker Python SDK. For an example, see [Defining Training Metrics (SageMaker Python SDK) ](https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html#define-train-metrics-sdk). You can access the captured metrics using [CloudWatch](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html) and as a Pandas `DataFrame` using the [TrainingJobAnalytics](https://sagemaker.readthedocs.io/en/stable/api/training/analytics.html#sagemaker.analytics.TrainingJobAnalytics) method. After your model is trained and fine-tuned, you can use it like any other model to run inference jobs.

### How to run training with the Hugging Face estimator
<a name="hugging-face-training-using"></a>

You can implement the Hugging Face Estimator for training jobs using the SageMaker AI Python SDK. The SageMaker Python SDK is an open source library for training and deploying machine learning models on SageMaker AI. For more information on the Hugging Face Estimator, see the [SageMaker AI Python SDK documentation.](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html)

With the SageMaker Python SDK, you can run training jobs using the Hugging Face Estimator in the following environments: 
+ [Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html): Studio Classic is the first fully integrated development environment (IDE) for machine learning (ML). Studio Classic provides a single, web-based visual interface where you can perform all ML development steps required to:
  + prepare
  + build
  + train and tune
  + deploy and manage models

  For information on using Jupyter Notebooks in Studio Classic, see [Use Amazon SageMaker Studio Classic Notebooks](notebooks.md).
+ [SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html): An Amazon SageMaker notebook instance is a machine learning (ML) compute instance running the Jupyter Notebook App. This app lets you run Jupyter Notebooks in your notebook instance to:
  + prepare and process data
  + write code to train models
  + deploy models to SageMaker AI hosting
  + test or validate your models without SageMaker Studio features like Debugger, Model Monitoring, and a web-based IDE
+ Locally: If you have connectivity to AWS and have appropriate SageMaker AI permissions, you can use the SageMaker Python SDK locally. With local use, you can launch remote training and inference jobs for Hugging Face in SageMaker AI on AWS. This works on your local machine, as well as other AWS services with a connected SageMaker Python SDK and appropriate permissions.

## Inference
<a name="hugging-face-inference"></a>

For inference, you can use your trained Hugging Face model or one of the pretrained Hugging Face models to deploy an inference job with SageMaker AI. With this collaboration, you only need one line of code to deploy both your trained models and pre-trained models with SageMaker AI. You can also run inference jobs without having to write any custom inference code. With custom inference code, you can customize the inference logic by providing your own Python script.

### How to deploy an inference job using the Hugging Face Deep Learning Containers
<a name="hugging-face-inference-using"></a>

You have two options for running inference with SageMaker AI. You can run inference using a model that you trained, or deploy a pre-trained Hugging Face model. 
+ **Run inference with your trained model:** You have two options for running inference with your own trained model:
  + Run inference with a model that you trained using an existing Hugging Face model with the SageMaker AI Hugging Face Deep Learning Containers.
  + Bring your own existing Hugging Face model and deploy it using SageMaker AI.

  When you run inference with a model that you trained with the SageMaker AI Hugging Face Estimator, you can deploy the model immediately after training completes. You can also upload the trained model to an Amazon S3 bucket and ingest it when running inference later. 

  If you bring your own existing Hugging Face model, you must upload the trained model to an Amazon S3 bucket. You then ingest that bucket when running inference as shown in [Deploy your Hugging Face Transformers for inference example](https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb).
+ **Run inference with a pre-trained HuggingFace model: **You can use one of the thousands of pre-trained Hugging Face models to run your inference jobs with no additional training needed. To run inference, select the pre-trained model from the list of [Hugging Face models](https://huggingface.co/models), as outlined in [Deploy pre-trained Hugging Face Transformers for inference example](https://github.com/huggingface/notebooks/blob/main/sagemaker/11_deploy_model_from_hf_hub/deploy_transformer_model_from_hf_hub.ipynb).

## What do you want to do?
<a name="hugging-face-do"></a>

The following notebooks in the Hugging Face notebooks repository show how to use the Hugging Face Deep Learning Containers with SageMaker AI in various use cases.

I want to train and deploy a text classification model using Hugging Face in SageMaker AI with PyTorch.  
For a sample Jupyter Notebook, see the [PyTorch Getting Started Demo](https://github.com/huggingface/notebooks/blob/main/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb).

I want to train and deploy a text classification model using Hugging Face in SageMaker AI with TensorFlow.  
For a sample Jupyter Notebook, see the [TensorFlow Getting Started example](https://github.com/huggingface/notebooks/blob/main/sagemaker/02_getting_started_tensorflow/sagemaker-notebook.ipynb).

I want to run distributed training with data parallelism using Hugging Face and SageMaker AI Distributed.  
For a sample Jupyter Notebook, see the [Distributed Training example](https://github.com/huggingface/notebooks/blob/main/sagemaker/03_distributed_training_data_parallelism/sagemaker-notebook.ipynb).

I want to run distributed training with model parallelism using Hugging Face and SageMaker AI Distributed.  
For a sample Jupyter Notebook, see the [Model Parallelism example](https://github.com/huggingface/notebooks/blob/main/sagemaker/04_distributed_training_model_parallelism/sagemaker-notebook.ipynb).

I want to use a spot instance to train and deploy a model using Hugging Face in SageMaker AI.  
For a sample Jupyter Notebook, see the [Spot Instances example](https://github.com/huggingface/notebooks/blob/main/sagemaker/05_spot_instances/sagemaker-notebook.ipynb).

I want to capture custom metrics and use SageMaker AI Checkpointing when training a text classification model using Hugging Face in SageMaker AI.  
For a sample Jupyter Notebook, see the [Training with Custom Metrics example](https://github.com/huggingface/notebooks/blob/main/sagemaker/06_sagemaker_metrics/sagemaker-notebook.ipynb).

I want to train a distributed question-answering TensorFlow model using Hugging Face in SageMaker AI.  
For a sample Jupyter Notebook, see the [Distributed TensorFlow Training example](https://github.com/huggingface/notebooks/blob/main/sagemaker/07_tensorflow_distributed_training_data_parallelism/sagemaker-notebook.ipynb).

I want to train a distributed summarization model using Hugging Face in SageMaker AI.  
For a sample Jupyter Notebook, see the [Distributed Summarization Training example](https://github.com/huggingface/notebooks/blob/main/sagemaker/08_distributed_summarization_bart_t5/sagemaker-notebook.ipynb).

I want to train an image classification model using Hugging Face in SageMaker AI.  
For a sample Jupyter Notebook, see the [Vision Transformer Training example](https://github.com/huggingface/notebooks/blob/main/sagemaker/09_image_classification_vision_transformer/sagemaker-notebook.ipynb).

I want to deploy my trained Hugging Face model in SageMaker AI.  
For a sample Jupyter Notebook, see the [Deploy your Hugging Face Transformers for inference example](https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb).

I want to deploy a pre-trained Hugging Face model in SageMaker AI.  
For a sample Jupyter Notebook, see the [Deploy pre-trained Hugging Face Transformers for inference example](https://github.com/huggingface/notebooks/blob/main/sagemaker/11_deploy_model_from_hf_hub/deploy_transformer_model_from_hf_hub.ipynb).

# Resources for using PyTorch with Amazon SageMaker AI
<a name="pytorch"></a>

You can use Amazon SageMaker AI to train and deploy a model using custom PyTorch code. The SageMaker AI Python SDK PyTorch estimators and models and the SageMaker AI open-source PyTorch container make writing a PyTorch script and running it in SageMaker AI easier. The following section provides reference material you can use to learn how to use PyTorch with SageMaker AI.

## What do you want to do?
<a name="pytorch-intent"></a>

I want to train a custom PyTorch model in SageMaker AI.  
For a sample Jupyter notebook, see the [PyTorch example notebook](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/pytorch_mnist) in the Amazon SageMaker AI Examples GitHub repository.  
For documentation, see [Train a Model with PyTorch](https://sagemaker.readthedocs.io/en/stable/using_pytorch.html#train-a-model-with-pytorch).

I have a PyTorch model that I trained in SageMaker AI, and I want to deploy it to a hosted endpoint.  
For more information, see [Deploy PyTorch models](https://sagemaker.readthedocs.io/en/stable/using_pytorch.html#deploy-pytorch-models).

I have a PyTorch model that I trained outside of SageMaker AI, and I want to deploy it to a SageMaker AI endpoint  
For more information, see [Deploy your own PyTorch model](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#bring-your-own-model).

I want to see the API documentation for [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) PyTorch classes.  
For more information, see [PyTorch Classes](https://sagemaker.readthedocs.io/en/stable/sagemaker.pytorch.html).

I want to find the SageMaker AI PyTorch container repository.  
For more information, see [SageMaker AI PyTorch Container GitHub repository](https://github.com/aws/deep-learning-containers/tree/master/pytorch).

I want to find information about PyTorch versions supported by AWS Deep Learning Containers.  
For more information, see [Available Deep Learning Container Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).

 For general information about writing PyTorch training scripts and using PyTorch estimators and models with SageMaker AI, see [Using PyTorch with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/using_pytorch.html).

# Resources for using R with Amazon SageMaker AI
<a name="r-guide"></a>

This document lists resources that can help you learn how to use Amazon SageMaker AI features with the R software environment. The following sections introduce SageMaker AI's built-in R kernel, explain how to get started with R on SageMaker AI, and provide several example notebooks.

The examples are organized in three levels: beginner, intermediate, and advanced. They start with [Getting Started with R on SageMaker AI](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_sagemaker_hello_world/r_sagemaker_hello_world.html), continue with end-to-end machine learning with R on SageMaker AI, and then finish with more advanced topics such as SageMaker Processing with R script, and bring-your-own R algorithm to SageMaker AI.

For information on how to bring your own custom R image to Studio, see [Custom Images in Amazon SageMaker Studio Classic](studio-byoi.md). For a similar blog article, see [Bringing your own R environment to Amazon SageMaker Studio](https://aws.amazon.com/blogs/machine-learning/bringing-your-own-r-environment-to-amazon-sagemaker-studio/).

**Topics**
+ [

## RStudio support in SageMaker AI
](#rstudio-for-r)
+ [

## R kernel in SageMaker AI
](#r-sagemaker-kernel-ni)
+ [

## Example notebooks
](#r-sagemaker-example-notebooks)
+ [

# Get started with R in SageMaker AI
](r-sagemaker-get-started.md)

## RStudio support in SageMaker AI
<a name="rstudio-for-r"></a>

Amazon SageMaker AI supports RStudio as a fully-managed integrated development environment (IDE) integrated with Amazon SageMaker AI domain. With RStudio integration, you can launch an RStudio environment in the domain to run your RStudio workflows on SageMaker AI resources. For more information, see [RStudio on Amazon SageMaker AI](rstudio.md).

## R kernel in SageMaker AI
<a name="r-sagemaker-kernel-ni"></a>

SageMaker notebook instances support R using a pre-installed R kernel. Also, the R kernel has the reticulate library, an R to Python interface, so you can use the features of SageMaker AI Python SDK from within an R script. 
+ [reticulatelibrary](https://rstudio.github.io/reticulate/): provides an R interface to the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). The reticulate package translates between R and Python objects.

## Example notebooks
<a name="r-sagemaker-example-notebooks"></a>

**Prerequisites**
+ [Getting Started with R on SageMaker AI](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_sagemaker_hello_world/r_sagemaker_hello_world.html) – This sample notebook describes how you can develop R scripts using Amazon SageMaker AI‘s R kernel. In this notebook you set up your SageMaker AI environment and permissions, download the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/datasets), do some basic processing and visualization on the data, then save the data as .csv format to S3.

**Beginner Level**
+ [SageMaker AI Batch Transform using R Kernel](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_batch_transform/r_xgboost_batch_transform.html) – This sample Notebook describes how to conduct a batch transform job using SageMaker AI’s Transformer API and the [XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). The notebook also uses the Abalone dataset.

**Intermediate Level**
+ [Hyperparameter Optimization for XGBoost in R](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_xgboost_hpo_batch_transform/r_xgboost_hpo_batch_transform.html) – This sample notebook extends the previous beginner notebooks that use the abalone dataset and XGBoost. It describes how to do model tuning with [hyperparameter optimization](https://sagemaker.readthedocs.io/en/stable/tuner.html). You will also learn how to use batch transform for batching predictions, as well as how to create a model endpoint to make real-time predictions.
+ [Amazon SageMaker Processing with R](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_in_sagemaker_processing/r_in_sagemaker_processing.html) – [SageMaker Processing](https://aws.amazon.com/blogs/aws/amazon-sagemaker-processing-fully-managed-data-processing-and-model-evaluation/) lets you preprocess, post-process and run model evaluation workloads. This example shows you how to create an R script to orchestrate a Processing job.

**Advanced Level**
+ [Train and Deploy Your Own R Algorithm in SageMaker AI](https://sagemaker-examples.readthedocs.io/en/latest/r_examples/r_byo_r_algo_hpo/tune_r_bring_your_own.html) – Do you already have an R algorithm, and you want to bring it into SageMaker AI to tune, train, or deploy it? This example walks you through how to customize SageMaker AI containers with custom R packages, all the way to using a hosted endpoint for inference on your R-origin model.

# Get started with R in SageMaker AI
<a name="r-sagemaker-get-started"></a>

This topic explains how to get started using the R software environment in SageMaker AI. For more information about using R with SageMaker AI, see [Resources for using R with Amazon SageMaker AI](r-guide.md).

**To get started with R in the SageMaker AI console**

1. [Create a notebook instance](https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html) using the t2.medium instance type and default storage size. You can pick a faster instance and more storage if you plan to continue using the instance for more advanced examples, or you can create a bigger instance later.

1. Wait until the status of the notebook is **In Service**, and then choose **Open Jupyter**.  
![\[Location of the InService status and the Open Jupyter link in the console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/An-R-User-Guide-to-SageMaker/An-R-User-Guide-to-SageMaker-1.png)

1. Create a new notebook with R kernel from the list of available environments.  
![\[Location of the R kernel in the list of available environments.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/An-R-User-Guide-to-SageMaker/An-R-User-Guide-to-SageMaker-2.png)

1. When the new notebook is created, you should see an R logo in the upper right corner of the notebook environment, and also R as the kernel under that logo. This indicates that SageMaker AI has successfully launched the R kernel for this notebook.  
![\[Location of the R logo and R kernel of the notebook environment.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/An-R-User-Guide-to-SageMaker/An-R-User-Guide-to-SageMaker-3.png)

Alternatively, when you are in a Jupyter notebook, you can use the **Kernel** menu, and then select **R** from the **Change kernel** submenu.

![\[Location of where to change your notebook kernel to R.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/An-R-User-Guide-to-SageMaker/An-R-User-Guide-to-SageMaker-4.png)


# Resources for using Scikit-learn with Amazon SageMaker AI
<a name="sklearn"></a>

You can use Amazon SageMaker AI to train and deploy a model using custom Scikit-learn code. The SageMaker AI Python SDK Scikit-learn estimators and models and the SageMaker AI open-source Scikit-learn containers make writing a Scikit-learn script and running it in SageMaker AI easier. The following section provides reference material you can use to learn how to use Scikit-learn with SageMaker AI.

**Requirements**

Scikit-learn 1.4 has the following dependencies.


| Dependency | Minimum version | 
| --- | --- | 
| Python | 3.10 | 
| NumPy | 2.1.0 | 
| SciPy | 1.15.3 | 
| joblib | 1.5.2 | 
| threadpoolctl | 3.6.0 | 

The SageMaker AI Scikit-learn container supports the following Scikit-learn versions.


| Supported Scikit-learn version | Minimum Python version | 
| --- | --- | 
| 1.4-2 | 3.10 | 
| 1.2-1 | 3.8 | 
| 1.0-1 | 3.7 | 
| 0.23-1 | 3.6 | 
| 0.20.0 | 2.7 or 3.4 | 

For general information about writing Scikit-learn training scripts and using Scikit-learn estimators and models with SageMaker AI, see [Using Scikit-learn with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html).

## What do you want to do?
<a name="sklearn-intent"></a>

**Note**  
Matplotlib v2.2.3 or newer is required to run the SageMaker AI Scikit-learn example notebooks.

I want to use Scikit-learn for data processing, feature engineering, or model evaluation in SageMaker AI.  
For a sample Jupyter notebook, see [https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker\$1processing/scikit\$1learn\$1data\$1processing\$1and\$1model\$1evaluation](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation).  
For a blog post on training and deploying a Scikit-learn model, see [Amazon SageMaker AI adds Scikit-Learn support](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-adds-scikit-learn-support/).  
For documentation, see [ReadTheDocs](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#data-pre-processing-and-model-evaluation-with-scikit-learn).

I want to train a custom Scikit-learn model in SageMaker AI.  
For a sample Jupyter notebook, see [https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit\$1learn\$1iris](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_iris).  
For documentation, see [Train a Model with Scikit-learn](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#train-a-model-with-sklearn).

I have a Scikit-learn model that I trained in SageMaker AI, and I want to deploy it to a hosted endpoint.  
For more information, see [Deploy Scikit-learn models](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#deploy-sklearn-models).

I have a Scikit-learn model that I trained outside of SageMaker AI, and I want to deploy it to a SageMaker AI endpoint  
For more information, see [Deploy Endpoints from Model Data](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#deploy-endpoints-from-model-data).

I want to see the API documentation for [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) Scikit-learn classes.  
For more information, see [Scikit-learn Classes](https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html).

I want to see information about SageMaker AI Scikit-learn containers.  
For more information, see [SageMaker Scikit-learn Container GitHub repository](https://github.com/aws/sagemaker-scikit-learn-container).

# Resources for using SparkML Serving with Amazon SageMaker AI
<a name="sparkml-serving"></a>

The [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) SparkML Serving model and predictor and the Amazon SageMaker AI open-source SparkML Serving container support deploying Apache Spark ML pipelines serialized with MLeap in SageMaker AI to get inferences. Use the following resources to learn how to use SparkML Serving with SageMaker AI.

For information about using the SparkML Serving container to deploy models to SageMaker AI, see [SageMaker Spark ML Container GitHub repository](https://github.com/aws/sagemaker-sparkml-serving-container). For information about the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) SparkML Serving model and predictors, see the [SparkML Serving Model and Predictor API documentation](https://sagemaker.readthedocs.io/en/stable/sagemaker.sparkml.html).

# Resources for using TensorFlow with Amazon SageMaker AI
<a name="tf"></a>

You can use Amazon SageMaker AI to train and deploy a model using custom TensorFlow code. The SageMaker AI Python SDK TensorFlow estimators and models and the SageMaker AI open-source TensorFlow containers can help. Use the following list of resources to find more information, based on which version of TensorFlow you're using and what you want to do.

## TensorFlow Version 1.11 and Later
<a name="tf-script-mode"></a>

For TensorFlow versions 1.11 and later, the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) supports script mode training scripts.

### What do you want to do?
<a name="tf-intent"></a>

I want to train a custom TensorFlow model in SageMaker AI.  
For a sample Jupyter notebook, see [TensorFlow script mode training and serving](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving/tensorflow_script_mode_training_and_serving.html).  
For documentation, see [Train a Model with TensorFlow](https://sagemaker.readthedocs.io/en/stable/using_tf.html#train-a-model-with-tensorflow).

I have a TensorFlow model that I trained in SageMaker AI, and I want to deploy it to a hosted endpoint.  
For more information, see [Deploy TensorFlow Serving models](https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploy-tensorflow-serving-models).

I have a TensorFlow model that I trained outside of SageMaker AI, and I want to deploy it to a SageMaker AI endpoint.  
For more information, see [Deploying directly from model artifacts](https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploying-directly-from-model-artifacts).

I want to see the API documentation for [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) TensorFlow classes.  
For more information, see [TensorFlow Estimator](https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html).

I want to find the SageMaker AI TensorFlow container repository.  
For more information, see [SageMaker TensorFlow Container GitHub repository](https://github.com/aws/sagemaker-tensorflow-container).

I want to find information about TensorFlow versions supported by AWS Deep Learning Containers.  
For more information, see [Available Deep Learning Container Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).

 For general information about writing TensorFlow script mode training scripts and using TensorFlow script mode estimators and models with SageMaker AI, see [Using TensorFlow with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/using_tf.html).

## TensorFlow Legacy Mode for Versions 1.11 and Earlier
<a name="tf-legacy-mode"></a>

The [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) provides a legacy mode that supports TensorFlow versions 1.11 and earlier. Use legacy mode TensorFlow training scripts to run TensorFlow jobs in SageMaker AI if:
+ You have existing legacy mode scripts that you do not want to convert to script mode.
+ You want to use a TensorFlow version earlier than 1.11.

For information about writing legacy mode TensorFlow scripts to use with the SageMaker AI Python SDK, see [TensorFlow SageMaker Estimators and Models](https://github.com/aws/sagemaker-python-sdk/tree/v1.12.0/src/sagemaker/tensorflow#tensorflow-sagemaker-estimators-and-models).

# Resources for using Triton Inference Server with Amazon SageMaker AI
<a name="triton"></a>

SageMaker AI enables customers to deploy a model using custom code with NVIDIA Triton Inference Server. Use the following resources to learn how to use Triton Inference Server with SageMaker AI.

 This functionality is available through the development of [Triton Inference Server Containers](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/what-is-dlc.html). These containers include NVIDIA Triton Inference Server, support for common ML frameworks, and useful environment variables that let you optimize performance on SageMaker AI. For a list of all available Deep Learning Containers images, see [Available Deep Learning Containers Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md). Deep Learning Containers images are maintained and regularly updated with security patches. 

You can use the Triton Inference Server Container with SageMaker Python SDK as you would any other container in your SageMaker AI models. However, using the SageMaker Python SDK is optional. You can use Triton Inference Server Containers with the AWS CLI and AWS SDK for Python (Boto3). 

For more information on NVIDIA Triton Inference Server see the [Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/#).

## Inference
<a name="triton-inference"></a>

**Note**  
The Triton Python backend uses shared memory (SHMEM) to connect your code to Triton. SageMaker AI Inference provides up to half of the instance memory as SHMEM so you can use an instance with more memory for larger SHMEM size.

For inference, you can use your trained ML models with Triton Inference Server to deploy an inference job with SageMaker AI.

Some of the key features of Triton Inference Server Container are:
+ **Support for multiple frameworks**: Triton can be used to deploy models from all major ML frameworks. Triton supports TensorFlow GraphDef and SavedModel, ONNX, PyTorch TorchScript, TensorRT, and custom Python/C\$1\$1 model formats.
+ **Model pipelines**: Triton model ensemble represents a pipeline of one model with pre/post processing logic and the connection of input and output tensors between them. A single inference request to an ensemble triggers the execution of the entire pipeline.
+ **Concurrent model execution**: Multiple instances of the same model can run simultaneously on the same GPU or on multiple GPUs.
+ **Dynamic batching**: For models that support batching, Triton has multiple built-in scheduling and batching algorithms that combine individual inference requests together to improve inference throughput. These scheduling and batching decisions are transparent to the client requesting inference.
+ **Diverse CPU and GPU support**: The models can be executed on CPUs or GPUs for maximum flexibility and to support heterogeneous computing requirements.

## What do you want to do?
<a name="triton-do"></a>

I want to deploy my trained PyTorch model in SageMaker AI.  
For a sample Jupyter Notebook, see the [Deploy your PyTorch Resnet50 model with Triton Inference Server example](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-triton/resnet50/triton_resnet50.ipynb).

I want to deploy my trained Hugging Face model in SageMaker AI.  
For a sample Jupyter Notebook, see the [Deploy your PyTorch BERT model with Triton Inference Server example](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-triton/nlp_bert/triton_nlp_bert.ipynb).