Get started with ML

Focus mode

Get started with ML - Amazon EKS

To jump into Machine Learning on EKS, start by choosing from these prescriptive patterns to quickly get an EKS cluster and ML software and hardware ready to begin running ML workloads. Most of these patterns are based on Terraform blueprints that are available from the Data on Amazon EKS site. Before you begin, here are few things to keep in mind:

GPUs or Neuron instances are required to run these procedures. Lack of availability of these resources can cause these procedures to fail during cluster creation or node autoscaling.
Neuron SDK (Tranium and Inferentia-based instances) can save money and are more available than NVIDIA GPUs. So, when your workloads permit it, we recommend that you consider using Neuron for your Machine Learning workloads (see Welcome to AWS Neuron).
Some of the getting started experiences here require that you get data via your own Hugging Face account.

To get started, choose from the following selection of patterns that are designed to get you started setting up infrastructure to run your Machine Learning workloads:

JupyterHub on EKS : Explore the JupyterHub blueprint, which showcases Time Slicing and MIG features, as well as multi-tenant configurations with profiles. This is ideal for deploying large-scale JupyterHub platforms on EKS.
Large Language Models on AWS Neuron and RayServe : Use AWS Neuron to run large language models (LLMs) on Amazon EKS and AWS Trainium and AWS Inferentia accelerators. See Serving LLMs with RayServe and vLLM on AWS Neuron for instructions on setting up a platform for making inference requests, with components that include:
- AWS Neuron SDK toolkit for deep learning
- AWS Inferentia and Trainium accelerators
- vLLM - variable-length language model (see the vLLM documentation site)
- RayServe scalable model serving library (see the Ray Serve: Scalable and Programmable Serving site)
- Llama-3 language model, using your own Hugging Face account.
- Observability with AWS CloudWatch and Neuron Monitor
- Open WebUI
Large Language Models on NVIDIA and Triton : Deploy multiple large language models (LLMs) on Amazon EKS and NVIDIA GPUs. See Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM for instructions for setting up a platform for making inference requests, with components that include:
- NVIDIA Triton Inference Server (see the Triton Inference Server GitHub site)
- vLLM - variable-length language model (see the vLLM documentation site)
- Two language models: mistralai/Mistral-7B-Instruct-v0.2 and meta-llama/Llama-2-7b-chat-hf, using your own Hugging Face account.

Continuing with ML on EKS

Along with choosing from the blueprints described on this page, there are other ways you can proceed through the ML on EKS documentation if you prefer. For example, you can:

Try tutorials for ML on EKS – Run other end-to-end tutorials for building and running your own Machine Learning models on EKS. See Try tutorials for deploying Machine Learning workloads on EKS.

To improve your work with ML on EKS, refer to the following:

Prepare for ML – Learn how to prepare for ML on EKS with features like custom AMIs and GPU reservations. See Prepare for ML clusters.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Machine Learning on EKS

Prepare for ML

Next topic:

Prepare for ML

Previous topic:

Machine Learning on EKS

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Get started with ML

Continuing with ML on EKS

Next topic:

Previous topic:

Need help?

On this page

Did this page help you?