サポート終了通知: 2025 AWS 年 10 月 31 日、は Amazon Lookout for Vision のサポートを終了します。2025 年 10 月 31 日以降、Lookout for Vision コンソールまたは Lookout for Vision リソースにアクセスできなくなります。詳細については、このブログ記事を参照してください。

翻訳は機械翻訳により提供されています。提供された翻訳内容と英語版の間で齟齬、不一致または矛盾がある場合、英語版が優先します。

プロジェクトからのデータセットのエクスポート (SDK)

AWS SDK を使用して、Amazon Lookout for Vision プロジェクトから Amazon S3 バケットの場所にデータセットをエクスポートできます。

データセットをエクスポートすることで、ソースプロジェクトのデータセットのコピーを使用してLookout for Visionプロジェクトを作成するなどのタスクを実行できます。特定のバージョンのモデルに使用されているデータセットのスナップショットを作成することもできます。

この手続きの Python コードは、プロジェクトのトレーニングデータセット (マニフェストとデータセット画像) を、指定した Amazon S3 の場所にエクスポートします。プロジェクト内に存在する場合、コードはテストデータセットのマニフェストとデータセット画像もエクスポートします。送信先は、ソースプロジェクトと同じ Amazon S3 バケットでも、別の Amazon S3 バケットでもかまいません。このコードは ListDataSetEntries オペレーションを使用してデータセットマニフェストファイルを取得します。Amazon S3 オペレーションは、データセット画像と更新されたマニフェストファイルを宛先の Amazon S3 ロケーションにコピーします。

この手順では、プロジェクトのデータセットをエクスポートする方法を示します。また、エクスポートされたデータセットで新規プロジェクトを作成する方法も示します。

プロジェクトからデータセットをエクスポートするには (SDK)

まだインストールしていない場合は、と AWS SDKs をインストール AWS CLI して設定します。詳細については、「ステップ 4: AWS CLI と AWS SDKsを設定する」を参照してください。
データセットのエクスポート先の Amazon S3 パスを決定します。エクスポート先が Amazon Lookout for Vision のサポートしている AWS リージョンにあることを確かめてください。新規 Amazon S3 バケットを作成する場合は、「バケットの作成」を参照してください。
ユーザーがデータセットのエクスポート先の Amazon S3 パスと、ソースプロジェクトデータセット内の画像ファイルの S3 ロケーションへのアクセス権限を持っていることを確かめてください。次のポリシーを使用できます。このポリシーでは、画像ファイルはどの場所にあってもかまいません。bucket/path は、データセットのエクスポート先バケットとパスに置き換えてください。
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PutExports",
            "Effect": "Allow",
            "Action": [
                "S3:PutObjectTagging",
                "S3:PutObject"
            ],
            "Resource": "arn:aws:s3:::bucket/path/*"
        },
        {
            "Sid": "GetSourceRefs",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectTagging",
                "s3:GetObjectVersion"
            ],
            "Resource": "*"
        }
    ]
}
```
アクセス権限を付与するにはユーザー、グループ、またはロールにアクセス許可を追加します。
- 以下のユーザーとグループ AWS IAM Identity Center：
  
  アクセス許可セットを作成します。「AWS IAM Identity Center ユーザーガイド」の「権限設定を作成する」の手順に従ってください。
- IAM 内で、ID プロバイダーによって管理されているユーザー:
  
  ID フェデレーションのロールを作成します。詳細については「IAM ユーザーガイド」の「サードパーティー ID プロバイダー (フェデレーション) 用のロールを作成する」を参照してください。
- IAM ユーザー:
  - ユーザーが担当できるロールを作成します。手順については「IAM ユーザーガイド」の「IAM ユーザーのロールの作成」を参照してください。
  - (お奨めできない方法) ポリシーをユーザーに直接アタッチするか、ユーザーをユーザーグループに追加します。「IAM ユーザーガイド」の「ユーザー (コンソール) への権限の追加」の指示に従います。

次のコードを dataset_export.py という名前のファイルに保存します。



"""
Purpose

Shows how to export the datasets (manifest files and images)
from an Amazon Lookout for Vision project to a new Amazon 
S3 location.
"""

import argparse
import json
import logging

import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)


def copy_file(s3_resource, source_file, destination_file):
    """
    Copies a file from a source Amazon S3 folder to a destination
    Amazon S3 folder.
    The destination can be in a different S3 bucket.
    :param s3: An Amazon S3 Boto3 resource.
    :param source_file: The Amazon S3 path to the source file.
    :param destination_file: The destination Amazon S3 path for
    the copy operation.
    """

    source_bucket, source_key = source_file.replace("s3://", "").split("/", 1)
    destination_bucket, destination_key = destination_file.replace("s3://", "").split(
        "/", 1
    )

    try:
        bucket = s3_resource.Bucket(destination_bucket)
        dest_object = bucket.Object(destination_key)
        dest_object.copy_from(CopySource={"Bucket": source_bucket, "Key": source_key})
        dest_object.wait_until_exists()
        logger.info("Copied %s to %s", source_file, destination_file)
    except ClientError as error:
        if error.response["Error"]["Code"] == "404":
            error_message = (
                f"Failed to copy {source_file} to "
                f"{destination_file}. : {error.response['Error']['Message']}"
            )
            logger.warning(error_message)
            error.response["Error"]["Message"] = error_message
        raise


def upload_manifest_file(s3_resource, manifest_file, destination):
    """
    Uploads a manifest file to a destination Amazon S3 folder.
    :param s3: An Amazon S3 Boto3 resource.
    :param manifest_file: The manifest file that you want to upload.
    :destination: The Amazon S3 folder location to upload the manifest
    file to.
    """

    destination_bucket, destination_key = destination.replace("s3://", "").split("/", 1)

    bucket = s3_resource.Bucket(destination_bucket)

    put_data = open(manifest_file, "rb")
    obj = bucket.Object(destination_key + manifest_file)

    try:
        obj.put(Body=put_data)
        obj.wait_until_exists()
        logger.info("Put manifest file '%s' to bucket '%s'.", obj.key, obj.bucket_name)
    except ClientError:
        logger.exception(
            "Couldn't put manifest file '%s' to bucket '%s'.", obj.key, obj.bucket_name
        )
        raise
    finally:
        if getattr(put_data, "close", None):
            put_data.close()


def get_dataset_types(lookoutvision_client, project):
    """
    Determines the types of the datasets (train or test) in an
    Amazon Lookout for Vision project.
    :param lookoutvision_client: A Lookout for Vision Boto3 client.
    :param project: The Lookout for Vision project that you want to check.
    :return: The dataset types in the project.
    """

    try:
        response = lookoutvision_client.describe_project(ProjectName=project)

        datasets = []

        for dataset in response["ProjectDescription"]["Datasets"]:
            if dataset["Status"] in ("CREATE_COMPLETE", "UPDATE_COMPLETE"):
                datasets.append(dataset["DatasetType"])
        return datasets

    except lookoutvision_client.exceptions.ResourceNotFoundException:
        logger.exception("Project %s not found.", project)
        raise


def process_json_line(s3_resource, entry, dataset_type, destination):
    """
    Creates a JSON line for a new manifest file, copies image and mask to
    destination.
    :param s3_resource: An Amazon S3 Boto3 resource.
    :param entry: A JSON line from the manifest file.
    :param dataset_type: The type (train or test) of the dataset that
    you want to create the manifest file for.
    :param destination: The destination Amazon S3 folder for the manifest
    file and dataset images.
    :return: A JSON line with details for the destination location.
    """
    entry_json = json.loads(entry)

    print(f"source: {entry_json['source-ref']}")

    # Use existing folder paths to ensure console added image names don't clash.
    bucket, key = entry_json["source-ref"].replace("s3://", "").split("/", 1)
    logger.info("Source location: %s/%s", bucket, key)

    destination_image_location = destination + dataset_type + "/images/" + key

    copy_file(s3_resource, entry_json["source-ref"], destination_image_location)

    # Update JSON for writing.
    entry_json["source-ref"] = destination_image_location

    if "anomaly-mask-ref" in entry_json:
        source_anomaly_ref = entry_json["anomaly-mask-ref"]
        mask_bucket, mask_key = source_anomaly_ref.replace("s3://", "").split("/", 1)

        destination_mask_location = destination + dataset_type + "/masks/" + mask_key
        entry_json["anomaly-mask-ref"] = destination_mask_location

        copy_file(s3_resource, source_anomaly_ref, entry_json["anomaly-mask-ref"])

    return entry_json


def write_manifest_file(
    lookoutvision_client, s3_resource, project, dataset_type, destination
):
    """
    Creates a manifest file for a dataset. Copies the manifest file and
    dataset images (and masks, if present) to the specified Amazon S3 destination.
    :param lookoutvision_client: A Lookout for Vision Boto3 client.
    :param project: The Lookout for Vision project that you want to use.
    :param dataset_type: The type (train or test) of the dataset that
    you want to create the manifest file for.
    :param destination: The destination Amazon S3 folder for the manifest file
    and dataset images.
    """

    try:
        # Create a reusable Paginator
        paginator = lookoutvision_client.get_paginator("list_dataset_entries")

        # Create a PageIterator from the Paginator
        page_iterator = paginator.paginate(
            ProjectName=project,
            DatasetType=dataset_type,
            PaginationConfig={"PageSize": 100},
        )

        output_manifest_file = dataset_type + ".manifest"

        # Create manifest file then upload to Amazon S3 with images.
        with open(output_manifest_file, "w", encoding="utf-8") as manifest_file:
            for page in page_iterator:
                for entry in page["DatasetEntries"]:
                    try:
                        entry_json = process_json_line(
                            s3_resource, entry, dataset_type, destination
                        )

                        manifest_file.write(json.dumps(entry_json) + "\n")

                    except ClientError as error:
                        if error.response["Error"]["Code"] == "404":
                            print(error.response["Error"]["Message"])
                            print(f"Excluded JSON line: {entry}")
                        else:
                            raise
        upload_manifest_file(
            s3_resource, output_manifest_file, destination + "datasets/"
        )

    except ClientError:
        logger.exception("Problem getting dataset_entries")
        raise


def export_datasets(lookoutvision_client, s3_resource, project, destination):
    """
    Exports the datasets from an Amazon Lookout for Vision project to a specified
    Amazon S3 destination.
    :param project: The Lookout for Vision project that you want to use.
    :param destination: The destination Amazon S3 folder for the exported datasets.
    """
    # Add trailing backslash, if missing.
    destination = destination if destination[-1] == "/" else destination + "/"

    print(f"Exporting project {project} datasets to {destination}.")

    # Get each dataset and export to destination.

    dataset_types = get_dataset_types(lookoutvision_client, project)
    for dataset in dataset_types:
        logger.info("Copying %s dataset to %s.", dataset, destination)

        write_manifest_file(
            lookoutvision_client, s3_resource, project, dataset, destination
        )

    print("Exported dataset locations")
    for dataset in dataset_types:
        print(f"   {dataset}: {destination}datasets/{dataset}.manifest")

    print("Done.")


def add_arguments(parser):
    """
    Adds command line arguments to the parser.
    :param parser: The command line parser.
    """

    parser.add_argument("project", help="The project that contains the dataset.")
    parser.add_argument("destination", help="The destination Amazon S3 folder.")


def main():
    """
    Exports the datasets from an Amazon Lookout for Vision project to a
    destination Amazon S3 location.
    """
    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
    add_arguments(parser)

    args = parser.parse_args()

    try:
        session = boto3.Session(profile_name="lookoutvision-access")
        lookoutvision_client = session.client("lookoutvision")
        s3_resource = session.resource("s3")

        export_datasets(
            lookoutvision_client, s3_resource, args.project, args.destination
        )
    except ClientError as err:
        logger.exception(err)
        print(f"Failed: {format(err)}")


if __name__ == "__main__":
    main()

コードを実行します。以下のコマンドライン引数を指定します:
- project — エクスポートしたいデータセットを含むソースプロジェクトの名前。
- destination — データセットのエクスポート先の Amazon S3 パス。
例えば、python dataset_export.py myproject s3://bucket/path/
コードに表示されるマニフェストファイルの場所をメモします。それらはステップ 8 で必要になります。
プロジェクトを作成しますの手引きに従って、エクスポートされたデータセットで新しい Lookout for Vision プロジェクトを作成します。
次のいずれかを実行します:
- Lookout for Vision コンソールを使用し、マニフェストファイルを使用したデータセットの作成 (コンソール) の手引きに従って新しいプロジェクト用のデータセットを作成します。ステップ 1 ～ 6 を行う必要はありません。
  
  ステップ 12 では、以下を行います:
  1. ソースプロジェクトにテストデータセットがある場合は [トレーニングデータセットとテストデータセットを分離] を選択し、それ以外の場合は [単一データセット] を選択します。
  2. [.manifest ファイルの場所] には、ステップ 6 でメモした適切なマニフェストファイル (train または test) の場所を入力します。
- CreateDataSet オペレーションを使用して、マニフェストファイル (SDK) を使用したデータセットの作成のコードを使用して新規プロジェクト用のデータセットを作成します。manifest_file パラメータには、ステップ 6 でメモしたマニフェストファイルの場所を使用します。ソースプロジェクトにテストデータセットがある場合は、コードを再度使用してテストデータセットを作成します。
準備ができたら、モデルのトレーニングの手引きに従ってモデルをトレーニングします。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

データセットの削除

モデルの表示