SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (コンソール） SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (SDK）データセットリクエストの作成

マニフェストファイルを使用したイメージのインポート

Amazon SageMaker Ground Truth 形式のマニフェストファイルを使用してデータセットを作成できます。Amazon SageMaker Ground Truth ジョブからマニフェストファイルを使用できます。イメージとラベルが SageMaker Ground Truth マニフェストファイルの形式でない場合は、 SageMaker 形式マニフェストファイルを作成し、それを使用してラベル付きイメージをインポートできます。

CreateDataset オペレーションが更新され、新しいデータセットの作成時にオプションでタグを指定できるようになりました。タグは、リソースの分類と管理に使用できるキーと値のペアです。

トピック

SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (コンソール）

次の手順は、Ground Truth 形式のマニフェストファイルを使用して SageMakerデータセットを作成する方法を示しています。

次のいずれかの方法で、トレーニングデータセットのマニフェストファイルを作成します。
- 「」の手順に従って、 SageMaker GroundTruth ジョブを含むマニフェストファイルを作成します Amazon SageMaker Ground Truth ジョブによるイメージのラベル付け。
- 「マニフェストファイルの作成」の手順に従って、独自のマニフェストファイルを作成します。
テストデータセットを作成する場合は、ステップ 1 を繰り返してテストデータセットを作成します。
で Amazon Rekognition コンソールを開きますhttps://console.aws.amazon.com/rekognition/。
[カスタムラベルを使用] を選択します。
[開始方法] を選択します。
左側のナビゲーションペインで、[プロジェクト] を選択します。
「プロジェクト」ページで、データセットを追加したいプロジェクトを選択します。プロジェクトの詳細ページが表示されます。
[データセットを作成] を選択します。「データセットを作成」ページが表示されます。
[設定の開始] で、[1 つのデータセットで開始] または [トレーニングデータセットで開始] を選択します。より高品質のモデルを作成するには、トレーニングデータセットとテストデータセットを別々に始めることを推奨します。
Single dataset
「トレーニングデータセットの詳細」セクションで、「Ground Truth」で SageMakerラベル付けされた画像のインポート」を選択します。

[マニフェストファイルの場所] には、ステップ 1 で作成したマニフェストファイルの場所を入力します。

[データセットを作成] を選択します。プロジェクトのデータセットページが開きます。
Separate training and test datasets
「トレーニングデータセットの詳細」セクションで、「Ground Truth で SageMakerラベル付けされたイメージのインポート」を選択します。

[マニフェストファイルの場所]には、ステップ 1 で作成したトレーニングデータセットのマニフェストファイルの場所を入力します。

「データセットの詳細のテスト」セクションで、 SageMaker 「Ground Truth でラベル付けされたイメージのインポート」を選択します。

注記
トレーニングデータセットとテストデータセットは、異なる画像ソースを持つことができます。

[マニフェストファイルの場所]には、ステップ 1 で作成したテストデータセットのマニフェストファイルの場所を入力します。

[データセットを作成] を選択します。プロジェクトのデータセットページが開きます。
ラベルの追加または変更の必要がある場合は、イメージにラベルを付けるを実行してください。
「モデルのトレーニング (コンソール)」の手順に従って、モデルをトレーニングします。

SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (SDK）

次の手順では、を使用してマニフェストファイルからトレーニングデータセットまたはテストデータセットを作成する方法を示しますCreateDatasetAPI。

SageMaker Ground Truth ジョブからの出力などの既存のマニフェストファイルを使用するか、独自のマニフェストファイルを作成できます。

まだインストールしていない場合は、とをインストール AWS CLI して設定します AWS SDKs。詳細については、「ステップ 4: をセットアップする AWS CLI また、 AWS SDKs」を参照してください。
次のいずれかの方法で、トレーニングデータセットのマニフェストファイルを作成します。
- 「」の手順に従って、 SageMaker GroundTruth ジョブを含むマニフェストファイルを作成します Amazon SageMaker Ground Truth ジョブによるイメージのラベル付け。
- 「マニフェストファイルの作成」の手順に従って、独自のマニフェストファイルを作成します。
テストデータセットを作成する場合は、ステップ 2 を繰り返してテストデータセットを作成します。

次のサンプルコードを使用して、トレーニングデータセットとテストデータセットを作成します。

AWS CLI

次のコードを使用してデータセットを作成します。以下に置き換えます:

project_arn — テストデータセットを追加するARNプロジェクトの。
type — 作成するデータセットのタイプ (TRAIN または TEST）
bucket - データセットのマニフェストファイルを含むバケット。
manifest_file - マニフェストファイルのパスとファイル名。


aws rekognition create-dataset --project-arn project_arn \
  --dataset-type type \
  --dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket", "Name": "manifest_file" } } }' \
  --profile custom-labels-access
  --tags '{"key1": "value1", "key2": "value2"}'

Python

次の値を使用してデータセットを作成します。次のコマンドラインパラメータを指定します。

project_arn — テストデータセットを追加するARNプロジェクトの。
dataset_type - 作成するデータセットのタイプ (train または test)。
bucket - データセットのマニフェストファイルを含むバケット。
manifest_file - マニフェストファイルのパスとファイル名。


#Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-custom-labels-developer-guide/blob/master/LICENSE-SAMPLECODE.)


import argparse
import logging
import time
import json
import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)

def create_dataset(rek_client, project_arn, dataset_type, bucket, manifest_file):
    """
    Creates an Amazon Rekognition Custom Labels dataset.
    :param rek_client: The Amazon Rekognition Custom Labels Boto3 client.
    :param project_arn: The ARN of the project in which you want to create a dataset.
    :param dataset_type: The type of the dataset that you want to create (train or test).
    :param bucket: The S3 bucket that contains the manifest file.
    :param manifest_file: The path and filename of the manifest file.
    """

    try:
        #Create the project
        logger.info("Creating %s dataset for project %s",dataset_type, project_arn)

        dataset_type = dataset_type.upper()

        dataset_source = json.loads(
            '{ "GroundTruthManifest": { "S3Object": { "Bucket": "'
            + bucket
            + '", "Name": "'
            + manifest_file
            + '" } } }'
        )

        response = rek_client.create_dataset(
            ProjectArn=project_arn, DatasetType=dataset_type, DatasetSource=dataset_source
        )

        dataset_arn=response['DatasetArn']

        logger.info("dataset ARN: %s",dataset_arn)

        finished=False
        while finished is False:

            dataset=rek_client.describe_dataset(DatasetArn=dataset_arn)

            status=dataset['DatasetDescription']['Status']
            
            if status == "CREATE_IN_PROGRESS":
                logger.info("Creating dataset: %s ",dataset_arn)
                time.sleep(5)
                continue

            if status == "CREATE_COMPLETE":
                logger.info("Dataset created: %s", dataset_arn)
                finished=True
                continue

            if status == "CREATE_FAILED":
                error_message = f"Dataset creation failed: {status} : {dataset_arn}"
                logger.exception(error_message)
                raise Exception (error_message)
                
            error_message = f"Failed. Unexpected state for dataset creation: {status} : {dataset_arn}"
            logger.exception(error_message)
            raise Exception(error_message)
            
        return dataset_arn
   
    
    except ClientError as err:
        logger.exception("Couldn't create dataset: %s",err.response['Error']['Message'])
        raise

def add_arguments(parser):
    """
    Adds command line arguments to the parser.
    :param parser: The command line parser.
    """

    parser.add_argument(
        "project_arn", help="The ARN of the project in which you want to create the dataset."
    )

    parser.add_argument(
        "dataset_type", help="The type of the dataset that you want to create (train or test)."
    )

    parser.add_argument(
        "bucket", help="The S3 bucket that contains the manifest file."
    )
    
    parser.add_argument(
        "manifest_file", help="The path and filename of the manifest file."
    )


def main():

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

    try:

        #Get command line arguments.
        parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
        add_arguments(parser)
        args = parser.parse_args()

        print(f"Creating {args.dataset_type} dataset for project {args.project_arn}")

        #Create the dataset.
        session = boto3.Session(profile_name='custom-labels-access')
        rekognition_client = session.client("rekognition")

        dataset_arn=create_dataset(rekognition_client, 
            args.project_arn,
            args.dataset_type,
            args.bucket,
            args.manifest_file)

        print(f"Finished creating dataset: {dataset_arn}")


    except ClientError as err:
        logger.exception("Problem creating dataset: %s", err)
        print(f"Problem creating dataset: {err}")



if __name__ == "__main__":
    main()

Java V2

次の値を使用してデータセットを作成します。次のコマンドラインパラメータを指定します。

project_arn — テストデータセットを追加するARNプロジェクトの。
dataset_type - 作成するデータセットのタイプ (train または test)。
bucket - データセットのマニフェストファイルを含むバケット。
manifest_file - マニフェストファイルのパスとファイル名。


/*
   Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   SPDX-License-Identifier: Apache-2.0
*/

package com.example.rekognition;

import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.DatasetDescription;
import software.amazon.awssdk.services.rekognition.model.DatasetSource;
import software.amazon.awssdk.services.rekognition.model.DatasetStatus;
import software.amazon.awssdk.services.rekognition.model.DatasetType;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.GroundTruthManifest;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.S3Object;

import java.util.logging.Level;
import java.util.logging.Logger;

public class CreateDatasetManifestFiles {

    public static final Logger logger = Logger.getLogger(CreateDatasetManifestFiles.class.getName());

    public static String createMyDataset(RekognitionClient rekClient, String projectArn, String datasetType,
            String bucket, String name) throws Exception, RekognitionException {

        try {

            logger.log(Level.INFO, "Creating {0} dataset for project : {1} from s3://{2}/{3} ",
                    new Object[] { datasetType, projectArn, bucket, name });

            DatasetType requestDatasetType = null;

            switch (datasetType) {
            case "train":
                requestDatasetType = DatasetType.TRAIN;
                break;
            case "test":
                requestDatasetType = DatasetType.TEST;
                break;
            default:
                logger.log(Level.SEVERE, "Could not create dataset. Unrecognized dataset type: {0}", datasetType);
                throw new Exception("Could not create dataset. Unrecognized dataset type: " + datasetType);

            }

            GroundTruthManifest groundTruthManifest = GroundTruthManifest.builder()
                    .s3Object(S3Object.builder().bucket(bucket).name(name).build()).build();

            DatasetSource datasetSource = DatasetSource.builder().groundTruthManifest(groundTruthManifest).build();

            CreateDatasetRequest createDatasetRequest = CreateDatasetRequest.builder().projectArn(projectArn)
                    .datasetType(requestDatasetType).datasetSource(datasetSource).build();

            CreateDatasetResponse response = rekClient.createDataset(createDatasetRequest);

            boolean created = false;

            do {

                DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder()
                        .datasetArn(response.datasetArn()).build();
                DescribeDatasetResponse describeDatasetResponse = rekClient.describeDataset(describeDatasetRequest);

                DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription();

                DatasetStatus status = datasetDescription.status();

                logger.log(Level.INFO, "Creating dataset ARN: {0} ", response.datasetArn());

                switch (status) {

                case CREATE_COMPLETE:
                    logger.log(Level.INFO, "Dataset created");
                    created = true;
                    break;

                case CREATE_IN_PROGRESS:
                    Thread.sleep(5000);
                    break;

                case CREATE_FAILED:
                    String error = "Dataset creation failed: " + datasetDescription.statusAsString() + " "
                            + datasetDescription.statusMessage() + " " + response.datasetArn();
                    logger.log(Level.SEVERE, error);
                    throw new Exception(error);

                default:
                    String unexpectedError = "Unexpected creation state: " + datasetDescription.statusAsString() + " "
                            + datasetDescription.statusMessage() + " " + response.datasetArn();
                    logger.log(Level.SEVERE, unexpectedError);
                    throw new Exception(unexpectedError);
                }

            } while (created == false);

            return response.datasetArn();

        } catch (RekognitionException e) {
            logger.log(Level.SEVERE, "Could not create dataset: {0}", e.getMessage());
            throw e;
        }

    }

    public static void main(String[] args) {

        String datasetType = null;
        String bucket = null;
        String name = null;
        String projectArn = null;
        String datasetArn = null;

        final String USAGE = "\n" + "Usage: " + "<project_arn> <dataset_type> <dataset_arn>\n\n" + "Where:\n"
                + "   project_arn - the ARN of the project that you want to add copy the datast to.\n\n"
                + "   dataset_type - the type of the dataset that you want to create (train or test).\n\n"
                + "   bucket - the S3 bucket that contains the manifest file.\n\n"
                + "   name - the location and name of the manifest file within the bucket.\n\n";

        if (args.length != 4) {
            System.out.println(USAGE);
            System.exit(1);
        }

        projectArn = args[0];
        datasetType = args[1];
        bucket = args[2];
        name = args[3];

        try {

            // Get the Rekognition client
            RekognitionClient rekClient = RekognitionClient.builder()
                .credentialsProvider(ProfileCredentialsProvider.create("custom-labels-access"))
                .region(Region.US_WEST_2)
                .build();


             // Create the dataset
            datasetArn = createMyDataset(rekClient, projectArn, datasetType, bucket, name);

            System.out.println(String.format("Created dataset: %s", datasetArn));

            rekClient.close();

        } catch (RekognitionException rekError) {
            logger.log(Level.SEVERE, "Rekognition client error: {0}", rekError.getMessage());
            System.exit(1);
        } catch (Exception rekError) {
            logger.log(Level.SEVERE, "Error: {0}", rekError.getMessage());
            System.exit(1);
        }

    }

}

ラベルの追加または変更の必要がある場合は、「ラベルの管理 (SDK）」を参照してください。
「モデルのトレーニング (SDK)」の手順に従って、モデルをトレーニングします。

データセットリクエストの作成

以下は、 CreateDataset オペレーションリクエストのフォーラムです。



{
"DatasetSource": {
"DatasetArn": "string",
"GroundTruthManifest": {
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
},
"DatasetType": "string",
"ProjectArn": "string",
"Tags": {
"string": "string"
}
}

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

ローカルコンピュータからのイメージのインポート

Amazon SageMaker Ground Truth ジョブによるイメージのラベル付け

マニフェストファイルを使用したイメージのインポート

トピック

SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (コンソール）

注記

SageMaker Ground Truth マニフェストファイルを使用したデータセットの作成 (SDK）

データセットリクエストの作成