Links to Amazon EMR on EKS best practices guides on GitHub - Amazon EMR

Links to Amazon EMR on EKS best practices guides on GitHub

We've built the Amazon EMR on EKS Best Practices Guide using open source community collaboration so that we can iterate quickly and provide recommendations for aspects of creating and running a virtual cluster. We recommend that you use the Amazon EMR on EKS best practices guide for the sections. Choose the links in each section to go to the GitHub site.

Security

Note

For more information on security with Amazon EMR on EKS, see Amazon EMR on EKS security best practices.

Encryption best practices: how to use encryption for data at rest and in transit.

Managing network security describes how to configure security groups for pods for Amazon EMR on EKS while you connect to data sources that are hosted in AWS services like Amazon RDS and Amazon Redshift.

Using AWS secrets manager to store secrets.

Pyspark job submission

Pyspark job submission: specifies different types of packaging for pySpark applications using packaging formats like zip, egg, wheel, and pex.

Storage

Using EBS volumes:: how to use static and dynamic provisioning for jobs that need EBS volumes.

Using Amazon FSx for Lustre volumes: how to use static and dynamic provisioning for jobs that need Amazon FSx for Luster volumes.

Using Instance store volumes: how to use instance store volumes for job processing.

Metastore integration

Using Hive metastore: offers different ways to use Hive metastore.

Using AWS Glue: offers different ways to configure AWS Glue catalog.

Debugging

Using Spark debugging: how to change the log level.

Connecting to Spark UI on the driver pod.

How to use self-hosted Spark history server with Amazon EMR on EKS.

Troubleshooting Amazon EMR on EKS issues

Troubleshooting.

Node placement

Using Kubernetes node selectors for single-az and other use cases.

Using Fargate node placement.

Performance

Using Dynamic Resource Allocation (DRA).

EKS best practices for the Amazon VPC Container Network Interface plugin (CNI), Cluster Autoscaler, and Core DNS.

Cost optimization

Using spot instances: Amazon EC2 spot instance best practices and how to use the Spark node decommission feature.

Using AWS Outposts

Running Amazon EMR on EKS using AWS Outposts