Language	Package
.NET	`Amazon.CDK.AWS.Sagemaker.Alpha`
Go	`github.com/aws/aws-cdk-go/awscdksagemakeralpha/v2`
Java	`software.amazon.awscdk.services.sagemaker.alpha`
Python	`aws_cdk.aws_sagemaker_alpha`
TypeScript	`@aws-cdk/aws-sagemaker-alpha`

Amazon SageMaker Construct Library

cdk-constructs: Experimental

The APIs of higher level constructs in this module are experimental and under active development. They are subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model and breaking changes will be announced in the release notes. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the model, tune and optimize it for deployment, make predictions, and take action. Your models get to production faster with much less effort and lower cost.

Model

To create a machine learning model with Amazon Sagemaker, use the Model construct. This construct includes properties that can be configured to define model components, including the model inference code as a Docker image and an optional set of separate model data artifacts. See the AWS documentation to learn more about SageMaker models.

Single Container Model

In the event that a single container is sufficient for your inference use-case, you can define a single-container model:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
import * as path from 'path';

const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));
const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));

const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
  containers: [
    {
      image: image,
      modelData: modelData,
    }
  ]
});

Example not in your language?

Inference Pipeline Model

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple containers that process requests for inferences on data. See the AWS documentation to learn more about SageMaker inference pipelines. To define an inference pipeline, you can provide additional containers for your model:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const image1: sagemaker.ContainerImage;
declare const modelData1: sagemaker.ModelData;
declare const image2: sagemaker.ContainerImage;
declare const modelData2: sagemaker.ModelData;
declare const image3: sagemaker.ContainerImage;
declare const modelData3: sagemaker.ModelData;

const model = new sagemaker.Model(this, 'InferencePipelineModel', {
  containers: [
    { image: image1, modelData: modelData1 },
    { image: image2, modelData: modelData2 },
    { image: image3, modelData: modelData3 }
  ],
});

Example not in your language?

Model Properties

Network Isolation

If you enable network isolation, the containers can't make any outbound network calls, even to other AWS services such as Amazon S3. Additionally, no AWS credentials are made available to the container runtime environment.

To enable network isolation, set the networkIsolation property to true:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const image: sagemaker.ContainerImage;
declare const modelData: sagemaker.ModelData;

const model = new sagemaker.Model(this, 'ContainerModel', {
  containers: [
    {
      image,
      modelData,
    }
  ],
  networkIsolation: true,
});

Example not in your language?

Container Images

Inference code can be stored in the Amazon EC2 Container Registry (Amazon ECR), which is specified via ContainerDefinition's image property which accepts a class that extends the ContainerImage abstract base class.

Asset Image

Reference a local directory containing a Dockerfile:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
import * as path from 'path';

const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));

Example not in your language?

ECR Image

Reference an image available within ECR:

import * as ecr from 'aws-cdk-lib/aws-ecr';
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

const repository = ecr.Repository.fromRepositoryName(this, 'Repository', 'repo');
const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');

Example not in your language?

DLC Image

Reference a deep learning container image:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

const repositoryName = 'huggingface-pytorch-training';
const tag = '1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04';

const image = sagemaker.ContainerImage.fromDlc(repositoryName, tag);

Example not in your language?

Model Artifacts

If you choose to decouple your model artifacts from your inference code (as is natural given different rates of change between inference code and model artifacts), the artifacts can be specified via the modelData property which accepts a class that extends the ModelData abstract base class. The default is to have no model artifacts associated with a model.

Asset Model Data

Reference local model data:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
import * as path from 'path';

const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));

Example not in your language?

S3 Model Data

Reference an S3 bucket and object key as the artifacts for a model:

import * as s3 from 'aws-cdk-lib/aws-s3';
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

const bucket = new s3.Bucket(this, 'MyBucket');
const modelData = sagemaker.ModelData.fromBucket(bucket, 'path/to/artifact/file.tar.gz');

Example not in your language?

Model Hosting

Amazon SageMaker provides model hosting services for model deployment. Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences.

Endpoint Configuration

By using the EndpointConfig construct, you can define a set of endpoint configuration which can be used to provision one or more endpoints. In this configuration, you identify one or more models to deploy and the resources that you want Amazon SageMaker to provision. You define one or more production variants, each of which identifies a model. Each production variant also describes the resources that you want Amazon SageMaker to provision. If you are hosting multiple models, you also assign a variant weight to specify how much traffic you want to allocate to each model. For example, suppose that you want to host two models, A and B, and you assign traffic weight 2 for model A and 1 for model B. Amazon SageMaker distributes two-thirds of the traffic to Model A, and one-third to model B:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const modelA: sagemaker.Model;
declare const modelB: sagemaker.Model;

const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [
    {
      model: modelA,
      variantName: 'modelA',
      initialVariantWeight: 2.0,
    },
    {
      model: modelB,
      variantName: 'variantB',
      initialVariantWeight: 1.0,
    },
  ]
});

Example not in your language?

Container Startup Health Check Timeout

You can specify a timeout value for your inference container to pass health check by configuring the containerStartupHealthCheckTimeout property. This is useful when your model takes longer to initialize and you want to avoid premature health check failures:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.Model;

const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [
    {
      model: model,
      variantName: 'my-variant',
      containerStartupHealthCheckTimeout: cdk.Duration.minutes(5), // 5 minutes timeout
    },
  ]
});

Example not in your language?

The timeout value must be between 60 seconds and 1 hour (3600 seconds). If not specified, Amazon SageMaker uses the default timeout behavior.

Serverless Inference

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. For more information, see SageMaker Serverless Inference.

To create a serverless endpoint configuration, use the serverlessProductionVariant property:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.Model;

const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
  serverlessProductionVariant: {
    model: model,
    variantName: 'serverlessVariant',
    maxConcurrency: 10,
    memorySizeInMB: 2048,
    provisionedConcurrency: 5, // optional
  },
});

Example not in your language?

Serverless inference is ideal for workloads with intermittent or unpredictable traffic patterns. You can configure:

maxConcurrency: Maximum concurrent invocations (1-200)
memorySizeInMB: Memory allocation in 1GB increments (1024, 2048, 3072, 4096, 5120, or 6144 MB)
provisionedConcurrency: Optional pre-warmed capacity to reduce cold starts

Note: Provisioned concurrency incurs charges even when the endpoint is not processing requests. Use it only when you need to minimize cold start latency.

You cannot mix serverless and instance-based variants in the same endpoint configuration.

Endpoint

When you create an endpoint from an EndpointConfig, Amazon SageMaker launches the ML compute instances and deploys the model or models as specified in the configuration. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint. For more information about the API, see the InvokeEndpoint API. Defining an endpoint requires at minimum the associated endpoint configuration:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const endpointConfig: sagemaker.EndpointConfig;

const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });

Example not in your language?

AutoScaling

To enable autoscaling on the production variant, use the autoScaleInstanceCount method:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.Model;

const variantName = 'my-variant';
const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [
    {
      model: model,
      variantName: variantName,
    },
  ]
});

const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
const productionVariant = endpoint.findInstanceProductionVariant(variantName);
const instanceCount = productionVariant.autoScaleInstanceCount({
  maxCapacity: 3
});
instanceCount.scaleOnInvocations('LimitRPS', {
  maxRequestsPerSecond: 30,
});

Example not in your language?

For load testing guidance on determining the maximum requests per second per instance, please see this documentation.

Metrics

To monitor CloudWatch metrics for a production variant, use one or more of the metric convenience methods:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const endpointConfig: sagemaker.EndpointConfig;

const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
const productionVariant = endpoint.findInstanceProductionVariant('my-variant');
productionVariant.metricModelLatency().createAlarm(this, 'ModelLatencyAlarm', {
  threshold: 100000,
  evaluationPeriods: 3,
});

Example not in your language?

AWS CDK

2.261.0

@aws-cdk/aws-sagemaker-alpha module

Amazon SageMaker Construct Library

Model

Single Container Model

Inference Pipeline Model

Model Properties

Network Isolation

Container Images

Asset Image

ECR Image

DLC Image

Model Artifacts

Asset Model Data

S3 Model Data

Model Hosting

Endpoint Configuration

Container Startup Health Check Timeout

Serverless Inference

Endpoint

AutoScaling

Metrics