대규모 모델 추론을 위한 SageMaker AI 엔드포인트 파라미터

SageMaker AI를 사용하여 지연 시간이 짧은 대규모 모델 추론(LMI)을 용이하게 하도록 다음 파라미터를 사용자 지정할 수 있습니다.

인스턴스의 최대 Amazon EBS 볼륨 크기(VolumeSizeInGB) - 모델 크기가 30GB보다 크고 로컬 디스크가 없는 인스턴스를 사용하는 경우 이 매개변수를 모델 크기보다 약간 크게 늘려야 합니다.
상태 점검 제한 시간 할당량(ContainerStartupHealthCheckTimeoutInSeconds) - 컨테이너가 올바르게 설정되고 CloudWatch 로그에 상태 점검 시간이 초과된 것으로 표시되는 경우 컨테이너가 상태 확인에 응답할 충분한 시간을 확보할 수 있도록 이 할당량을 늘려야 합니다.
모델 다운로드 제한 시간 할당량(ModelDataDownloadTimeoutInSeconds) - 모델 크기가 40GB보다 큰 경우 Amazon S3에서 인스턴스로 모델을 다운로드할 수 있는 충분한 시간을 확보하려면 이 할당량을 늘려야 합니다.

아래의 코드 스니펫은 앞서 언급한 매개변수를 프로그래밍 방식으로 구성하는 방법을 보여줍니다. 예제의 기울임꼴 자리 표시자 텍스트를 본인의 정보로 대체하세요.


import boto3

aws_region = "aws-region"
sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# The name of the endpoint. The name must be unique within an AWS Region in your AWS account.
endpoint_name = "endpoint-name"

# Create an endpoint config name.
endpoint_config_name = "endpoint-config-name"

# The name of the model that you want to host.
model_name = "the-name-of-your-model"

instance_type = "instance-type"

sagemaker_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name,
            "InstanceType": instance_type, # Specify the compute instance type.
            "InitialInstanceCount": 1, # Number of instances to launch initially.
            "VolumeSizeInGB": 256, # Specify the size of the Amazon EBS volume.
            "ModelDataDownloadTimeoutInSeconds": 1800, # Specify the model download timeout in seconds.
            "ContainerStartupHealthCheckTimeoutInSeconds": 1800, # Specify the health checkup timeout in seconds
        },
    ],
)

sagemaker_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)

ProductionVariants에 대한 자세한 내용은 ProductionVariant 섹션을 참조하세요.

대규모 모델을 사용하여 지연 시간이 짧은 추론을 달성하는 방법을 보여주는 예제는 aws-samples GitHub 리포지토리의 Amazon SageMaker AI에서 생성형 AI 추론 예제를 참조하세요.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

LMI 컨테이너 설명서

압축되지 않은 모델 배포하기