The large model inference (LMI) container documentation
The Large Model
Inference (LMI) container documentation
The documentation is written for developers, data scientists, and machine learning engineers who need to deploy and optimize large language models (LLMs) on Amazon SageMaker. It helps you use LMI containers, which are specialized Docker containers for LLM inference, provided by AWS. It provides an overview, deployment guides, user guides for supported inference libraries, and advanced tutorials.
By using the LMI container documentation, you can:
-
Understand the components and architecture of LMI containers
-
Learn how to select the appropriate instance type and backend for your use case
-
Configure and deploy LLMs on SageMaker using LMI containers
-
Optimize performance by using features like quantization, tensor parallelism, and continuous batching
-
Benchmark and tune your SageMaker endpoints for optimal throughput and latency