追跡エンティティを手動で作成する

任意のプロパティの追跡エンティティを手動で作成して、モデルガバナンスを確立し、ワークフローを再現して、作業履歴の記録を保持することができます。Amazon SageMaker AI が自動的に作成する追跡エンティティの詳細については、「」を参照してくださいAmazon SageMaker AI で作成された追跡エンティティ。次のチュートリアルでは、SageMaker トレーニングジョブとエンドポイント間でアーティファクトを手動で作成して関連付けて、ワークフローを追跡するために必要なステップを示します。

関連付けを除くすべてのエンティティにタグを追加できます。タグはカスタム情報を提供する任意のキーと値のペアです。タグを基準にしたリストのフィルターや並べ替え、検索クエリを実行できます。詳細については、『』の「 AWS リソースのタグ付け」を参照してくださいAWS 全般のリファレンス。

系統エンティティの作成方法を示すサンプルノートブックについては、Amazon SageMaker サンプル GitHub リポジトリの「Amazon SageMaker AI 系統ノートブック」を参照してください。 Amazon SageMaker GitHub

エンティティを手動で作成する

次の手順では、SageMaker AI トレーニングジョブとエンドポイントの間にアーティファクトを作成して関連付ける方法を示します。以下のステップを実行します。

追跡エンティティと関連付けをインポートする

系統追跡エンティティをインポートします。


import sys
!{sys.executable} -m pip install -q sagemaker

from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.lineage import context, artifact, association, action

import boto3
boto_session = boto3.Session(region_name=region)
sagemaker_client = boto_session.client("sagemaker")

入力および出力のアーティファクトを作成します。


code_location_arn = artifact.Artifact.create(
    artifact_name='source-code-location',
    source_uri='s3://...',
    artifact_type='code-location'
).artifact_arn

# Similar constructs for train_data_location_arn and test_data_location_arn

model_location_arn = artifact.Artifact.create(
    artifact_name='model-location',
    source_uri='s3://...',
    artifact_type='model-location'
).artifact_arn

モデルをトレーニングし、トレーニングジョブを表す trial_component_arn を取得します。

入力アーティファクトおよび出力アーティファクトをトレーニングジョブ (トライアルコンポーネント) に関連付けます。


input_artifacts = [code_location_arn, train_data_location_arn, test_data_location_arn]
for artifact_arn in input_artifacts:
    try:
        association.Association.create(
            source_arn=artifact_arn,
            destination_arn=trial_component_arn,
            association_type='ContributedTo'
        )
    except:
        logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)

output_artifacts = [model_location_arn]
for artifact_arn in output_artifacts:
    try:
         association.Association.create(
            source_arn=trial_component_arn,
            destination_arn=artifact_arn,
            association_type='Produced'
        )
    except:
        logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)

推論エンドポイントを作成します。


predictor = mnist_estimator.deploy(initial_instance_count=1,
                                     instance_type='ml.m4.xlarge')

エンドポイントのコンテキストを作成します。


from sagemaker.lineage import context

endpoint = sagemaker_client.describe_endpoint(EndpointName=predictor.endpoint_name)
endpoint_arn = endpoint['EndpointArn']

endpoint_context_arn = context.Context.create(
    context_name=predictor.endpoint_name,
    context_type='Endpoint',
    source_uri=endpoint_arn
).context_arn

トレーニングジョブ (トライアルコンポーネント) とエンドポイントのコンテキストを関連付けます。
```
association.Association.create(
    source_arn=trial_component_arn,
    destination_arn=endpoint_context_arn
)
```

ワークフローを手動で追跡する

前のセクションで作成したワークフローは、手動で追跡できます。

前のサンプルのエンドポイントの Amazon リソースネーム (ARN) を前提として、以下の手順では、エンドポイントにデプロイされたモデルのトレーニングに使用されたデータセットまでワークフローを追跡する方法を説明します。以下のステップを実行します。

エンドポイントからトレーニングデータソースまでのワークフローを追跡するには

追跡エンティティをインポートします。


import sys
!{sys.executable} -m pip install -q sagemaker

from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.lineage import context, artifact, association, action

import boto3
boto_session = boto3.Session(region_name=region)
sagemaker_client = boto_session.client("sagemaker")

エンドポイント ARN から、エンドポイントのコンテキストを取得します。


endpoint_context_arn = sagemaker_client.list_contexts(
    SourceUri=endpoint_arn)['ContextSummaries'][0]['ContextArn']

トライアルコンポーネントとエンドポイントのコンテキストの間の関連付けからトライアルコンポーネントを取得します。
```
trial_component_arn = sagemaker_client.list_associations(
    DestinationArn=endpoint_context_arn)['AssociationSummaries'][0]['SourceArn']
```
トライアルコンポーネントとエンドポイントのコンテキストの間の関連付けからトレーニングデータの場所のアーティファクトを取得します。
```
train_data_location_artifact_arn = sagemaker_client.list_associations(
    DestinationArn=trial_component_arn, SourceType='Model')['AssociationSummaries'][0]['SourceArn']
```

トレーニングデータの場所のアーティファクトからトレーニングデータの場所を取得します。


train_data_location = sagemaker_client.describe_artifact(
    ArtifactArn=train_data_location_artifact_arn)['Source']['SourceUri']
    print(train_data_location)

レスポンス:


s3://sagemaker-sample-data-us-east-2/mxnet/mnist/train

制限

関連付けは、エンティティ、実験、系統の間に作成できます。ただし、以下を除きます。

2 つの実験エンティティ間には関連付けを作成できません。実験エンティティは、実験、トライアル、トライアルコンポーネントで構成されます。
別の関連付けとの関連付けを作成できます。

既に存在するエンティティを作成しようとすると、エラーが発生します。

手動で作成される系統エンティティの最大数

アクション: 3,000
アーティファクト: 6,000
関連付け: 6,000
コンテキスト: 500

Amazon SageMaker AI によって自動的に作成される系統エンティティの数に制限はありません。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

SageMaker AI 作成エンティティ

系統エンティティをクエリする