本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
查詢歷程實體
Amazon SageMaker AI 會在您使用時自動產生譜系實體的圖形。您可以查詢此資料以回答各種問題。以下說明如何在 中查詢 Python SDK 的此資料。
如需如何在 Amazon SageMaker Studio 中檢視已註冊模型系列的詳細資訊,請參閱 在 Studio 中檢視模型系列詳細資訊。
您可以查詢歷程實體,以執行下列作業:
-
擷取建立模型時使用的所有資料集。
-
擷取建立端點時使用的所有工作。
-
擷取使用資料集的所有模型。
-
擷取使用模型的所有端點。
-
擷取從特定資料集衍生的端點。
-
擷取建立訓練工作的管道執行。
-
擷取實體之間的關係,以進行調查、治理和再現。
-
擷取使用成品的所有下游試用。
-
擷取所有使用成品的上游試用。
-
擷取使用所提供之 S3 URI 的成品清單。
-
擷取使用資料集成品的上游成品。
-
擷取使用資料集成品的下游成品。
-
擷取使用映像成品的資料集。
-
擷取使用內容的動作。
-
擷取使用端點的處理工作。
-
擷取使用端點的轉換工作。
-
擷取使用端點的試用元件。
-
針對與模型套件群組相關聯的管道執行,擷取 ARN 。
-
擷取使用動作的所有成品。
-
擷取使用模型套件核准動作的所有上游資料集。
-
透過模型套件核准動作擷取模型套件。
-
擷取使用端點的下游端點內容。
-
針對與試驗元件相關聯的ARN管道執行,擷取 。
-
擷取使用試用元件的資料集。
-
擷取使用試用元件的模型。
-
探索歷程以進行視覺化。
限制
-
下列區域無法使用歷程查詢:
-
非洲 (開普敦) – af-south
-
亞太區域 (雅加達) – ap-southeast-3
-
亞太區域 (大阪) - (ap-northeast-3)
-
歐洲 (米蘭) – eu-south-1
-
歐洲 (西班牙) – eu-south-2
-
以色列 (特拉維夫) – il-central-1
-
-
目前,關係探索的最大深度限制為 10。
-
篩選僅限於下列屬性:上次修改日期、建立日期、類型和歷程實體類型。
主題
開始查詢歷程實體
開始查詢歷程實體的最簡單方式是:
-
定義了許多常見使用案例的 Amazon SageMaker AI SDK for Python
。 -
如需說明如何使用 SageMaker AI Lineage APIs 查詢全譜系圖表關係的筆記本,請參閱 sagemaker-lineage-multihop-queries.ipynb
。
下列範例示範如何使用 LineageQuery
和 LineageFilter
APIs 建構查詢,以回答有關折線圖的問題,並擷取幾個使用案例的實體關係。
範例 使用 LineageQuery
API 尋找實體關聯
from sagemaker.lineage.context import Context, EndpointContext from sagemaker.lineage.action import Action from sagemaker.lineage.association import Association from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact from sagemaker.lineage.query import ( LineageQuery, LineageFilter, LineageSourceEnum, LineageEntityEnum, LineageQueryDirectionEnum, ) # Find the endpoint context and model artifact that should be used for the lineage queries. contexts = Context.list(source_uri=endpoint_arn) context_name = list(contexts)[0].context_name endpoint_context = EndpointContext.load(context_name=context_name)
範例 尋找與某個端點相關聯的所有資料集
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the datasets dataset_artifacts = [] for vertex in query_result.vertices: dataset_artifacts.append(vertex.to_lineage_object().source.source_uri) pp.pprint(dataset_artifacts)
範例 尋找與某個端點相關聯的模型
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the model model_artifacts = [] for vertex in query_result.vertices: model_artifacts.append(vertex.to_lineage_object().source.source_uri) # The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with # the S3 URI to the model.tar.gz file associated with the model pp.pprint(model_artifacts)
範例 尋找與端點相關聯的試用元件
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`. query_filter = LineageFilter( entities=[LineageEntityEnum.TRIAL_COMPONENT], sources=[LineageSourceEnum.TRAINING_JOB], ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the ARNs of the training jobs associated with this Endpoint trial_components = [] for vertex in query_result.vertices: trial_components.append(vertex.arn) pp.pprint(trial_components)
範例 變更歷程的焦點
LineageQuery
可以修改為具有不同的 start_arns
來變更歷程的焦點。此外,LineageFilter
可以採用多個來源和實體來擴充查詢的範圍。
我們在下面使用該模型作為歷程焦點,並找到與之相關聯的端點和資料集。
# Get the ModelArtifact model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0] model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn) query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that descend from the model, i.e. the endpoint direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that ascend from the model, i.e. the datasets direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
範例 是用 LineageQueryDirectionEnum.BOTH
尋找遞增與遞減關係
當方向設定為 BOTH
時,查詢會遍歷圖形,以尋找遞增和遞減關係。這種遍歷不僅在起始節點發生,還會在造訪的每個節點進行。例如,如果某個訓練工作執行兩次,而且訓練工作產生的兩個模型均部署到端點,則查詢結果的方向會設定為 BOTH
,以顯示兩個端點。這是因為模型訓練和部署是用了相同的映像。由於模型映像是相同的,因此 start_arn
和兩個端點都會出現在查詢結果中。
query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # This specifies that the query should look for associations both ascending and descending for the start direction=LineageQueryDirectionEnum.BOTH, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
範例 LineageQuery
中的方向 - ASCENDANTS
和 DESCENDANTS
要了解在歷程圖中的方向,可採取以下實體關係圖:資料集-> 訓練工作 -> 模型-> 端點
從模型到端點是遞減,從模型到資料集也是遞減。與此類似,從端點到模型是遞增。direction
參數可用來指定查詢應傳回 start_arns
中實體的遞減還是遞增實體。如果 start_arns
包含模型且方向為 DESCENDANTS
,則查詢會傳回端點。如果方向為 ASCENDANTS
,則查詢會傳回資料集。
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[ LineageSourceEnum.ENDPOINT, LineageSourceEnum.MODEL, LineageSourceEnum.DATASET, LineageSourceEnum.TRAINING_JOB, ], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) ascendant_artifacts = [] # The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a # lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN. for vertex in query_result.vertices: try: ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: ascendant_artifacts.append(vertex.arn) print("Ascendant artifacts : ") pp.pprint(ascendant_artifacts) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) descendant_artifacts = [] for vertex in query_result.vertices: try: descendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: # Handling TrialComponents. descendant_artifacts.append(vertex.arn) print("Descendant artifacts : ") pp.pprint(descendant_artifacts)
範例 SDK 協助程式函數,讓譜系查詢更容易
類別 EndpointContext
、 ModelArtifact
和 DatasetArtifact
具有輔助函式,這些函式是 的包裝函式LineageQuery
API,讓某些譜查詢更容易利用。以下範例展示如何使用這些輔助函式。
# Find all the datasets associated with this endpoint datasets = [] dataset_artifacts = endpoint_context.dataset_artifacts() for dataset in dataset_artifacts: datasets.append(dataset.source.source_uri) print("Datasets : ", datasets) # Find the training jobs associated with the endpoint training_job_artifacts = endpoint_context.training_job_arns() training_jobs = [] for training_job in training_job_artifacts: training_jobs.append(training_job) print("Training Jobs : ", training_jobs) # Get the ARN for the pipeline execution associated with this endpoint (if any) pipeline_executions = endpoint_context.pipeline_execution_arn() if pipeline_executions: for pipeline in pipelines_executions: print(pipeline) # Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model dataset_artifacts = model_artifact.dataset_artifacts() endpoint_contexts = model_artifact.endpoint_contexts() datasets = [dataset.source.source_uri for dataset in dataset_artifacts] endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Datasets associated with this model : ") pp.pprint(datasets) print("Endpoints associated with this model : ") pp.pprint(endpoints) # Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset # Find the artifact associated with the dataset dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn) # Find the endpoints that used this training dataset endpoint_contexts = dataset_artifact.endpoint_contexts() endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Endpoints associated with the training dataset {}".format(training_data)) pp.pprint(endpoints)
範例 取得歷程圖視覺化圖形
範例筆記本 visualizer.pyVisualizer
,能夠幫助歷程圖出圖。彩現查詢回應時,系統會顯示含有來自 StartArns
之歷程關係的圖形。從StartArns
視覺化顯示與query_lineage
API動作中傳回的其他譜系實體之間的關係。
# Graph APIs # Here we use the boto3 `query_lineage` API to generate the query response to plot. from visualizer import Visualizer query_response = sm_client.query_lineage( StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True ) viz = Visualizer() viz.render(query_response, "Endpoint") query_response = sm_client.query_lineage( StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True ) viz.render(query_response, "Model")