Get started deploying Machine Learning tools on EKS
To jump into Machine Learning on EKS, start by choosing from these prescriptive patterns to quickly get an EKS cluster and ML software and hardware ready to begin running ML workloads. Most of these patterns are based on Terraform blueprints that are available from the Data on Amazon EKS
-
GPUs or Neuron instances are required to run these procedures. Lack of availability of these resources can cause these procedures to fail during cluster creation or node autoscaling.
-
Neuron SDK (Tranium and Inferentia-based instances) can save money and are more available than NVIDIA GPUs. So, when your workloads permit it, we recommend that you consider using Neutron for your Machine Learning workloads (see Welcome to AWS Neuron
). -
Some of the getting started experiences here require that you get data via your own Hugging Face
account.
To get started, choose from the following selection of patterns that are designed to get you started setting up infrastructure to run your Machine Learning workloads:
-
JupyterHub on EKS
: Explore the JupyterHub blueprint , which showcases Time Slicing and MIG features, as well as multi-tenant configurations with profiles. This is ideal for deploying large-scale JupyterHub platforms on EKS. -
Large Language Models on AWS Neuron and RayServe
: Use AWS Neuron to run large language models (LLMs) on Amazon EKS and AWS Trainium and AWS Inferentia accelerators. See Serving LLMs with RayServe and vLLM on AWS Neuron for instructions on setting up a platform for making inference requests, with components that include: -
AWS Neuron SDK toolkit for deep learning
-
AWS Inferentia and Trainium accelerators
-
vLLM - variable-length language model (see the vLLM
documentation site) -
RayServe scalable model serving library (see the Ray Serve: Scalable and Programmable Serving
site) -
Llama-3 language model, using your own Hugging Face
account. -
Observability with AWS CloudWatch and Neuron Monitor
-
Open WebUI
-
-
Large Language Models on NVIDIA and Triton
: Deploy multiple large language models (LLMs) on Amazon EKS and NVIDIA GPUs. See Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM for instructions for setting up a platform for making inference requests, with components that include: -
NVIDIA Triton Inference Server (see the Triton Inference Server
GitHub site) -
vLLM - variable-length language model (see the vLLM
documentation site) -
Two language models: mistralai/Mistral-7B-Instruct-v0.2 and meta-llama/Llama-2-7b-chat-hf, using your own Hugging Face
account.
-
Continuing with ML on EKS
Along with choosing from the blueprints described on this page, there are other ways you can proceed through the ML on EKS documentation if you prefer. For example, you can:
-
Try tutorials for ML on EKS – Run other end-to-end tutorials for building and running your own Machine Learning models on EKS. See Try tutorials for deploying Machine Learning workloads and platforms on EKS.
To improve your work with ML on EKS, refer to the following:
-
Prepare for ML – Learn how to prepare for ML on EKS with features like custom AMIs and GPU reservations. See Prepare to create an EKS cluster for Machine Learning.