Understanding in-transit encryption - Amazon EMR

Understanding in-transit encryption

You can configure an EMR cluster to run open-source frameworks such as Apache Spark, Apache Hive, and Presto. each of these open-source frameworks has a set of processes running on the EC2 instances of a cluster. Each of these processes can host network endpoints for network communication.

If in-transit encryption is enabled on an EMR cluster, different network endpoints use different encryption mechanisms. See the following sections to learn more about the specific open-source framework network endpoints supported with in-transit encryption, the related encryption mechanisms, and which Amazon EMR release added the support. Each open-source application might also have different best practices and open-source framework configurations that you can change.

For the most in-transit encryption coverage, we recommend that you enable both in-transit encryption and Kerberos. If you only enable in-transit encryption, then in-transit encryption will be available only for the network endpoints that support TLS. Kerberos is necessary because some open-source framework network endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption.

Note that any open-source frameworks not supported in Amazon EMR 7.x.x releases are not included.

Spark

When you enable in-transit encryption in security configurations, spark.authenticate is automatically set to true and uses AES-based encryption for RPC connections.

Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use Spark applications that depend on the Hive metastore. Hive 3 fixes this issue in HIVE-16340. HIVE-44114 fully resolves this issue when open-source Spark can upgrade to Hive 3. In the meantime, you can set hive.metastore.use.SSL to false to work around this issue. For more information, see Configure applications.

For more information, see Spark security in the Apache Spark documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Spark History Server

spark.ssl.history.port

18480

TLS

emr-5.3.0+, emr-6.0.0+, emr-7.0.0+

Spark UI

spark.ui.port

4440

TLS

emr-5.3.0+, emr-6.0.0+, emr-7.0.0+

Spark Driver

spark.driver.port

Dynamic

Spark AES-based encryption

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Spark Executor

Executor Port (no named config)

Dynamic

Spark AES-based encryption

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

YARN NodeManager

spark.shuffle.service.port1

7337

Spark AES-based encryption

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

1spark.shuffle.service.port is hosted on YARN NodeManager but is only used by Apache Spark.

Hadoop YARN

Secure Hadoop RPC is set to to privacy and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration. If you don't want in-transit encryption for Hadoop RPC, configure hadoop.rpc.protection = authentication. We recommend that you use the default configuration for maximum security.

If your TLS certificates can't meet TLS hostname verification requirements, you can configure hadoop.ssl.hostname.verifier = ALLOW_ALL. We recommend that you use the default configuration of hadoop.ssl.hostname.verifier = DEFAULT, which enforces TLS hostname verification.

To disable HTTPS for the YARN web application endpoints, configure yarn.http.policy = HTTP_ONLY. This makes it so that traffic to these endpoints stays unencrypted. We recommend that you use the default configuration for maximum security.

For more information, see Hadoop in secure mode in the Apache Hadoop documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

ResourceManager

yarn.resourcemanager.webapp.address

8088

TLS

emr-7.3.0+

ResourceManager

yarn.resourcemanager.resource-tracker.address

8025

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

ResourceManager

yarn.resourcemanager.scheduler.address

8030

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

ResourceManager

yarn.resourcemanager.address

8032

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

ResourceManager

yarn.resourcemanager.admin.address

8033

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

TimelineServer

yarn.timeline-service.address

10200

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

TimelineServer

yarn.timeline-service.webapp.address

8188

TLS

emr-7.3.0+

WebApplicationProxy

yarn.web-proxy.address

20888

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

NodeManager

yarn.nodemanager.address

8041

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

NodeManager

yarn.nodemanager.localizer.address

8040

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

NodeManager

yarn.nodemanager.webapp.address

8044

TLS

emr-7.3.0+

NodeManager

mapreduce.shuffle.port1

13562

TLS

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

NodeManager

spark.shuffle.service.port2

7337

Spark AES-based encryption

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

1 mapreduce.shuffle.port is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

2 spark.shuffle.service.port is hosted on YARN NodeManager but is only used by Apache Spark.

Hadoop HDFS

The Hadoop name node, data node, and journal node all support TLS by default if in-transit encryption is enabled in EMR clusters.

Secure Hadoop RPC is set to to privacy and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration.

We recommend that you don't change the default ports used for HTTPS endpoints.

Data encryption on HDFS block transfer uses AES 256 and requires that at-rest encryption is enabled in the security configuration.

For more information, see Hadoop in secure mode in the Apache Hadoop documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Namenode

dfs.namenode.https-address

9871

TLS

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Namenode

dfs.namenode.rpc-address

8020

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Datanode

dfs.datanode.https.address

9865

TLS

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Datanode

dfs.datanode.address

9866

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Journal Node

dfs.journalnode.https-address

8481

TLS

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

Journal Node

dfs.journalnode.rpc-address

8485

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

DFSZKFailoverController

dfs.ha.zkfc.port

8019

None

TLS for ZKFC is only supported in Hadoop 3.4.0. See HADOOP-18919 for more information. Amazon EMR release 7.1.0 is currently on Hadoop 3.3.6. Higher Amazon EMR releases are on Hadoop 3.4.0 in the future

Hadoop MapReduce

Hadoop MapReduce, job history server, and MapReduce shuffle all support TLS by default when in-transit encryption is enabled in EMR clusters.

Hadoop MapReduce encrypted shuffle uses TLS.

We recommend that you don't change the default ports for HTTPS endpoints.

For more information, see Hadoop in secure mode in the Apache Hadoop documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

JobHistoryServer

mapreduce.jobhistory.webapp.https.address

19890

TLS

emr-7.3.0+

YARN NodeManager

mapreduce.shuffle.port1

13562

TLS

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

1 mapreduce.shuffle.port is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

Presto

In Amazon EMR releases 5.6.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable secure internal communication in Presto.

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Presto Coordinator

http-server.https.port

8446

TLS

emr-5.6.0+, emr-6.0.0+, emr-7.0.0+

Presto Worker

http-server.https.port

8446

TLS

emr-5.6.0+, emr-6.0.0+, emr-7.0.0+

Trino

In Amazon EMR releases 6.1.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable secure internal communication in Trino.

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Trino Coordinator

http-server.https.port

8446

TLS

emr-6.1.0+, emr-7.0.0+

Trino Worker

http-server.https.port

8446

TLS

emr-6.1.0+, emr-7.0.0+

Hive and Tez

By default, Hive server 2, Hive metastore server, Hive LLAP Daemon web UI, and Hive LLAP shuffle all support TLS when in-transit encryption is enabled in the EMR clusters. For more information about the Hive configurations, see Configuration properties.

Tez UI that's hosted on the Tomcat server is also HTTPS-enabled when in-transit encryption is enable in the EMR cluster. However, HTTPS is disabled for the Tez AM web UI service so AM users don't have access to the keystore file for the opening SSL listener. You can also enable this behavior with the Boolean configurations tez.am.tez-ui.webservice.enable.ssl and tez.am.tez-ui.webservice.enable.client.auth.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

HiveServer2

hive.server2.thrift.port

10000

TLS

emr-6.9.0+, emr-7.0.0+

HiveServer2

hive.server2.thrift.http.port

10001

TLS

emr-6.9.0+, emr-7.0.0+

HiveServer2

hive.server2.webui.port

10002

TLS

emr-7.3.0+

HiveMetastoreServer

hive.metastore.port

9083

TLS

emr-7.3.0+

LLAP Daemon

hive.llap.daemon.yarn.shuffle.port

15551

TLS

emr-7.3.0+

LLAP Daemon

hive.llap.daemon.web.port

15002

TLS

emr-7.3.0+

LLAP Daemon

hive.llap.daemon.output.service.port

15003

None

Hive doesn't support in-transit encryption for this endpoint

LLAP Daemon

hive.llap.management.rpc.port

15004

None

Hive doesn't support in-transit encryption for this endpoint

LLAP Daemon

hive.llap.plugin.rpc.port

Dynamic

None

Hive doesn't support in-transit encryption for this endpoint

LLAP Daemon

hive.llap.daemon.rpc.port

Dynamic

None

Hive doesn't support in-transit encryption for this endpoint

WebHCat

templeton.port

50111

TLS

emr-7.3.0+

Tez Application Master

tez.am.client.am.port-range

tez.am.task.am.port-range

Dynamic

None

Tez doesn't support in-transit encryption for this endpoint

Tez Application Master

tez.am.tez-ui.webservice.port-range

Dynamic

None

Disabled by default. Can be enabled using Tez configurations in emr-7.3.0+

Tez Task

N/A - not configurable

Dynamic

None

Tez doesn't support in-transit encryption for this endpoint

Tez UI

Configurable via Tomcat server on which Tez UI is hosted

8080

TLS

emr-7.3.0+

Apache Flink REST endpoints and internal communication between flink processes support TLS by default when you enable in-transit encryption in EMR clusters.

security.ssl.internal.enabled is set to true and uses in-transit encryption for internal communication between the Flink processes. If you don't want in-transit encryption for internal communication, disable that configuration. We recommend you use the default configuration for maximum security.

Amazon EMR sets security.ssl.rest.enabled to true and uses in-transit encryption for the REST endpoints. Additionally, Amazon EMR also sets historyserver.web.ssl.enabled to true to use TLS communication with the Flink history server. If you don't want in-transit encryption for the REST points, disable these configurations. We recommend you use the default configuration for maximum security.

Amazon EMR uses security.ssl.algorithms. to specify the list of ciphers that use AES-based encryption. Override this configuration to use the ciphers you want.

For more information, see SSL Setup in the Flink documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Flink History Server

historyserver.web.port

8082

TLS

emr-7.3.0+

Job Manager Rest Server

rest.bind-port

rest.port

Dynamic

TLS

emr-7.3.0+

HBase

Amazon EMR sets Secure Hadoop RPC to privacy. HMaster and RegionServer use SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration.

Amazon EMR sets hbase.ssl.enabled to true and uses TLS for UI endpoints. If you don't want to use TLS for UI endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Amazon EMR sets hbase.rest.ssl.enabled and hbase.thrift.ssl.enabled and uses TLS for the REST and Thirft server endpoints, respectively. If you don't want to use TLS for these endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

HMaster

HMaster

16000

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

HMaster

HMaster UI

16010

TLS

emr-7.3.0+

RegionServer

RegionServer

16020

SASL + Kerberos

emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+

RegionServer

RegionServer Info

16030

TLS

emr-7.3.0+

HBase Rest Server

Rest Server

8070

TLS

emr-7.3.0+

HBase Rest Server

Rest UI

8085

TLS

emr-7.3.0+

Hbase Thrift Server

Thrift Server

9090

TLS

emr-7.3.0+

Hbase Thrift Server

Thrift Server UI

9095

TLS

emr-7.3.0+

Phoenix

If you enabled in-transit encryption in your EMR cluster, Phoenix Query Serversupports the TLS property phoenix.queryserver.tls.enabled, which is set to true by default.

To learn more, see Configurations relating to HTTPS in the Phoenix Query Server documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

Query Server

phoenix.queryserver.http.port

8765

TLS

emr-7.3.0+

Oozie

OOZIE-3673 is available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. If you need to configure custom SSL or TLS protocols when you run an email action, you can set the property oozie.email.smtp.ssl.protocols in the oozie-site.xml file. By default, if you enabled in-transit encryption, Amazon EMR uses the TLS v1.3 protocol.

OOZIE-3677 and OOZIE-3674 are also available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. This lets you specify the properties keyStoreType and trustStoreType in oozie-site.xml. OOZIE-3674 adds the parameter --insecure to the Oozie client so it can ignore certificate errors.

Oozie enforces TLS hostname verification, which means that any certificate you use for in-transit encryption must meet hostname verification requirements. If the certificate doesn't meet the criteria, the cluster might get stuck at the oozie share lib update stage when Amazon EMR provisions the cluster. We recommend that you update your certificates to make sure they're compliant with hostname verification. However, if you can't update the certificates, you can disable SSL for Oozie by setting the oozie.https.enabled property to false in cluster configuration.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

EmbeddedOozieServer

oozie.https.port

11443

TLS

emr-7.3.0+

EmbeddedOozieServer

oozie.email.smtp.port

25

TLS

emr-7.3.0+

Zeppelin

By default, Zeppelin supports TLS when you enable in-transit encryption in your EMR cluster. For more information about the Zeppelin configurations, see SSL Configuration in the Zeppelin documentation.

Component Endpoint Port In-Transit Encryption Mechanism Supported from Release

zeppelin

zeppelin.server.ssl.port

8890

TLS

Supported in Amazon EMR releases 7.3.0 and higher.