

# Using custom images with EMR Serverless
<a name="using-custom-images"></a>

**Topics**
+ [Use a custom Python version](#image-python)
+ [Use a custom Java version](#image-java)
+ [Build a data science image](#image-data-science)
+ [Processing geospatial data with Apache Sedona](#image-sedona)
+ [Licensing information for using custom images](concepts-licensing-images.md)

## Use a custom Python version
<a name="image-python"></a>

You can build a custom image to use a different version of Python. To use Python version 3.10 for Spark jobs, for example, run the following command:

```
FROM public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest

USER root

# install python 3
RUN yum install -y gcc openssl-devel bzip2-devel libffi-devel tar gzip wget make
RUN wget https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tgz && \
tar xzf Python-3.10.0.tgz && cd Python-3.10.0 && \
./configure --enable-optimizations && \
make altinstall

# EMRS runs the image as hadoop
USER hadoop:hadoop
```

Before you submit the Spark job, set your properties to use the Python virtual environment, as follows.

```
--conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.10
--conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=/usr/local/bin/python3.10
--conf spark.executorEnv.PYSPARK_PYTHON=/usr/local/bin/python3.10
```

## Use a custom Java version
<a name="image-java"></a>

The following example demonstrates how to build a custom image to use Java 11 for your Spark jobs.

```
FROM public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest

USER root

# install JDK 11
RUN amazon-linux-extras install java-openjdk11

# EMRS runs the image as hadoop
USER hadoop:hadoop
```

Before you submit the Spark job, set Spark properties to use Java 11, as follows.

```
--conf spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 
--conf spark.emr-serverless.driverEnv.JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-
```

## Build a data science image
<a name="image-data-science"></a>

The following example shows how to include common, data science Python packages, such as Pandas and NumPy.

```
FROM public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest

USER root

# python packages
RUN pip3 install boto3 pandas numpy
RUN pip3 install -U scikit-learn==0.23.2 scipy 
RUN pip3 install sk-dist
RUN pip3 install xgboost

# EMR Serverless runs the image as hadoop
USER hadoop:hadoop
```

## Processing geospatial data with Apache Sedona
<a name="image-sedona"></a>

The following example shows how to build an image to include Apache Sedona for geospatial processing.

```
FROM public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest

USER root

RUN yum install -y wget
RUN wget https://repo1.maven.org/maven2/org/apache/sedona/sedona-core-3.0_2.12/1.3.0-incubating/sedona-core-3.0_2.12-1.3.0-incubating.jar -P /usr/lib/spark/jars/
RUN pip3 install apache-sedona

# EMRS runs the image as hadoop
USER hadoop:hadoop
```

# Licensing information for using custom images
<a name="concepts-licensing-images"></a>

You can build custom images with EMR Serverless to perform specific tasks or to use specific versions of a software package. Modification and distribution of custom images can be subject to rules and licensing terms. The licensing text appears in the subsection that follows.

## Licensing that applies to custom images
<a name="concepts-licensing-images-text"></a>

*Copyright Amazon.com and its affiliates; all rights reserved. This software is AWS Content under [AWS Customer Agreement](https://aws.amazon.com/agreement/) and may not be distributed without permission. In addition to the permissions in [AWS Intellectual Property License](https://aws.amazon.com/legal/aws-ip-license-terms/), the AWS Licensor grants you these additional permissions:*

*Create, Copy, and Use Derivatives of the AWS Content is permitted provided that the following conditions are met:*
+ *You do not modify the AWS Content itself, and any Derivatives are strictly the result of Your addition of new content.*
+ *Internal reproductions must retain the above copyright notice.*
+ *External distribution, in source or binary form, with or without modification, is not permitted under the terms of this license.*

For more information about using custom images, refer to [Using custom images with EMR Serverless](using-custom-images.html).