파일에서 매니페스트 CSV 파일 생성

포커스 모드

페이지 내용이 도움이 되었습니까?

파일에서 매니페스트 CSV 파일 생성 - Rekognition

이 예제 Python 스크립트는 쉼표로 구분된 값(CSV) 파일을 사용하여 이미지에 레이블을 지정하여 매니페스트 파일 생성을 간소화합니다. CSV 파일을 생성합니다. 매니페스트 파일은 다중 레이블 이미지 분류 또는 다중 레이블 이미지 분류 용도에 적합합니다. 자세한 내용은 객체, 장면 및 개념 찾기 단원을 참조하십시오.

참고

이 스크립트는 객체 위치 또는 브랜드 위치를 찾는 데 적합한 매니페스트 파일을 생성하지 않습니다.

매니페스트 파일은 모델 학습에 사용되는 이미지를 설명합니다. 이미지 위치와 이미지에 지정된 레이블을 예로 들 수 있습니다. 매니페스트 파일은 하나 이상의 JSON 줄로 구성됩니다. 각 JSON 줄은 단일 이미지를 설명합니다. 자세한 내용은 매니페스트 파일의 이미지 수준 레이블 가져오기 단원을 참조하십시오.

CSV 파일은 텍스트 파일의 여러 행에 대한 테이블 형식 데이터를 나타냅니다. 행의 필드는 쉼표로 구분합니다. 자세한 내용은 comma separated values를 참조하세요. 이 스크립트의 경우 CSV 파일의 각 행은 단일 이미지를 나타내며 매니페스트 파일의 JSON 행에 매핑됩니다. 다중 레이블 이미지 분류를 지원하는 매니페스트 파일에 대한 CSV 파일을 생성하려면 각 행에 하나 이상의 이미지 수준 레이블을 추가합니다. 이미지 분류에 적합한 매니페스트 파일을 만들려면 각 행에 단일 이미지 수준 레이블을 추가하세요.

예를 들어 다음 CSV 파일은 다중 레이블 이미지 분류 (Flowers) 시작하기 프로젝트의 이미지를 설명합니다.


camellia1.jpg,camellia,with_leaves
camellia2.jpg,camellia,with_leaves
camellia3.jpg,camellia,without_leaves
helleborus1.jpg,helleborus,without_leaves,not_fully_grown
helleborus2.jpg,helleborus,with_leaves,fully_grown
helleborus3.jpg,helleborus,with_leaves,fully_grown
jonquil1.jpg,jonquil,with_leaves
jonquil2.jpg,jonquil,with_leaves
jonquil3.jpg,jonquil,with_leaves
jonquil4.jpg,jonquil,without_leaves
mauve_honey_myrtle1.jpg,mauve_honey_myrtle,without_leaves
mauve_honey_myrtle2.jpg,mauve_honey_myrtle,with_leaves
mauve_honey_myrtle3.jpg,mauve_honey_myrtle,with_leaves
mediterranean_spurge1.jpg,mediterranean_spurge,with_leaves
mediterranean_spurge2.jpg,mediterranean_spurge,without_leaves

스크립트는 각 행에 대해 JSON 행을 생성합니다. 예를 들어, 다음은 첫 번째 행(camellia1.jpg,camellia,with_leaves)의 JSON 행입니다.


{"source-ref": "s3://bucket/flowers/train/camellia1.jpg","camellia": 1,"camellia-metadata":{"confidence": 1,"job-name": "labeling-job/camellia","class-name": "camellia","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"},"with_leaves": 1,"with_leaves-metadata":{"confidence": 1,"job-name": "labeling-job/with_leaves","class-name": "with_leaves","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"}}

예제 에서는 이미지CSV에 대한 Amazon S3 경로가 없습니다. CSV 파일에 이미지에 대한 Amazon S3 경로가 포함되어 있지 않은 경우 --s3_path 명령줄 인수를 사용하여 이미지에 대한 Amazon S3 경로를 지정합니다.

스크립트는 중복 제거된 이미지 CSV 파일에 각 이미지의 첫 번째 항목을 기록합니다. 중복 제거된 이미지 CSV 파일에는 입력 CSV 파일에 있는 각 이미지의 단일 인스턴스가 포함됩니다. 입력 CSV 파일에 이미지가 추가로 발생하면 중복 이미지 CSV 파일에 기록됩니다. 스크립트에서 중복 이미지를 찾으면 중복 이미지 CSV 파일을 검토하고 필요에 따라 중복 제거된 이미지 CSV 파일을 업데이트합니다. 중복 제거된 파일을 사용하여 스크립트를 다시 실행합니다. 입력 CSV 파일에서 중복을 찾을 수 없는 경우 스크립트는 중복 제거된 이미지 CSV 파일과 중복 이미지를 비어 CSVfile있으므로 삭제합니다.

이 절차에서는 CSV 파일을 생성하고 Python 스크립트를 실행하여 매니페스트 파일을 생성합니다.

파일에서 매니페스트 CSV 파일을 생성하려면

각 행에 다음 필드가 있는 CSV 파일을 생성합니다(이미지당 행 1개). CSV 파일에 헤더 행을 추가하지 마십시오.

필드 1	필드 2	필드 n
이미지 이름 또는 Amazon S3 이미지 경로 예: `s3://my-bucket/flowers/train/camellia1.jpg`. Amazon S3 경로가 있는 이미지와 그렇지 않은 이미지를 혼합할 수는 없습니다.	이미지의 첫 번째 이미지 수준 레이블	쉼표로 구분된 하나 이상의 추가적인 이미지 수준 레이블 다중 레이블 이미지 분류를 지원하는 매니페스트 파일을 생성하려는 경우에만 추가하세요.

예: camellia1.jpg,camellia,with_leaves 또는 s3://my-bucket/flowers/train/camellia1.jpg,camellia,with_leaves

CSV 파일을 저장합니다.

다음 Python 스크립트를 실행합니다. 다음 인수를 제공하세요.

csv_file - 1단계에서 생성한 CSV 파일입니다.
manifest_file: 생성할 매니페스트 파일의 이름
(선택 사항)--s3_path s3://path_to_folder/: 이미지 파일 이름에 추가할 Amazon S3 경로(필드 1) 필드 1의 이미지에 아직 S3 경로가 포함되어 있지 않은 경우 --s3_path를 사용합니다.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier:  Apache-2.0

from datetime import datetime, timezone
import argparse
import logging
import csv
import os
import json

"""
Purpose
Amazon Rekognition Custom Labels model example used in the service documentation.
Shows how to create an image-level (classification) manifest file from a CSV file.
You can specify multiple image level labels per image.
CSV file format is
image,label,label,..
If necessary, use the bucket argument to specify the S3 bucket folder for the images.
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-gt-cl-transform.html
"""

logger = logging.getLogger(__name__)


def check_duplicates(csv_file, deduplicated_file, duplicates_file):
    """
    Checks for duplicate images in a CSV file. If duplicate images
    are found, deduplicated_file is the deduplicated CSV file - only the first
    occurence of a duplicate is recorded. Other duplicates are recorded in duplicates_file.
    :param csv_file: The source CSV file.
    :param deduplicated_file: The deduplicated CSV file to create. If no duplicates are found
    this file is removed.
    :param duplicates_file: The duplicate images CSV file to create. If no duplicates are found
    this file is removed.
    :return: True if duplicates are found, otherwise false.
    """

    logger.info("Deduplicating %s", csv_file)

    duplicates_found = False

    # Find duplicates.
    with open(csv_file, 'r', newline='', encoding="UTF-8") as f,\
            open(deduplicated_file, 'w', encoding="UTF-8") as dedup,\
            open(duplicates_file, 'w', encoding="UTF-8") as duplicates:

        reader = csv.reader(f, delimiter=',')
        dedup_writer = csv.writer(dedup)
        duplicates_writer = csv.writer(duplicates)

        entries = set()
        for row in reader:
            # Skip empty lines.
            if not ''.join(row).strip():
                continue

            key = row[0]
            if key not in entries:
                dedup_writer.writerow(row)
                entries.add(key)
            else:
                duplicates_writer.writerow(row)
                duplicates_found = True

    if duplicates_found:
        logger.info("Duplicates found check %s", duplicates_file)

    else:
        os.remove(duplicates_file)
        os.remove(deduplicated_file)

    return duplicates_found


def create_manifest_file(csv_file, manifest_file, s3_path):
    """
    Reads a CSV file and creates a Custom Labels classification manifest file.
    :param csv_file: The source CSV file.
    :param manifest_file: The name of the manifest file to create.
    :param s3_path: The S3 path to the folder that contains the images.
    """
    logger.info("Processing CSV file %s", csv_file)

    image_count = 0
    label_count = 0

    with open(csv_file, newline='', encoding="UTF-8") as csvfile,\
            open(manifest_file, "w", encoding="UTF-8") as output_file:

        image_classifications = csv.reader(
            csvfile, delimiter=',', quotechar='|')

        # Process each row (image) in CSV file.
        for row in image_classifications:
            source_ref = str(s3_path)+row[0]

            image_count += 1

            # Create JSON for image source ref.
            json_line = {}
            json_line['source-ref'] = source_ref

            # Process each image level label.
            for index in range(1, len(row)):
                image_level_label = row[index]

                # Skip empty columns.
                if image_level_label == '':
                    continue
                label_count += 1

               # Create the JSON line metadata.
                json_line[image_level_label] = 1
                metadata = {}
                metadata['confidence'] = 1
                metadata['job-name'] = 'labeling-job/' + image_level_label
                metadata['class-name'] = image_level_label
                metadata['human-annotated'] = "yes"
                metadata['creation-date'] = \
                    datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%S.%f')
                metadata['type'] = "groundtruth/image-classification"

                json_line[f'{image_level_label}-metadata'] = metadata

                # Write the image JSON Line.
            output_file.write(json.dumps(json_line))
            output_file.write('\n')

    output_file.close()
    logger.info("Finished creating manifest file %s\nImages: %s\nLabels: %s",
                manifest_file, image_count, label_count)

    return image_count, label_count


def add_arguments(parser):
    """
    Adds command line arguments to the parser.
    :param parser: The command line parser.
    """

    parser.add_argument(
        "csv_file", help="The CSV file that you want to process."
    )

    parser.add_argument(
        "--s3_path", help="The S3 bucket and folder path for the images."
        " If not supplied, column 1 is assumed to include the S3 path.", required=False
    )


def main():

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    try:

        # Get command line arguments
        parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
        add_arguments(parser)
        args = parser.parse_args()

        s3_path = args.s3_path
        if s3_path is None:
            s3_path = ''

        # Create file names.
        csv_file = args.csv_file
        file_name = os.path.splitext(csv_file)[0]
        manifest_file = f'{file_name}.manifest'
        duplicates_file = f'{file_name}-duplicates.csv'
        deduplicated_file = f'{file_name}-deduplicated.csv'

        # Create manifest file, if there are no duplicate images.
        if check_duplicates(csv_file, deduplicated_file, duplicates_file):
            print(f"Duplicates found. Use {duplicates_file} to view duplicates "
                  f"and then update {deduplicated_file}. ")
            print(f"{deduplicated_file} contains the first occurence of a duplicate. "
                  "Update as necessary with the correct label information.")
            print(f"Re-run the script with {deduplicated_file}")
        else:
            print("No duplicates found. Creating manifest file.")

            image_count, label_count = create_manifest_file(csv_file,
                                                            manifest_file,
                                                            s3_path)

            print(f"Finished creating manifest file: {manifest_file} \n"
                  f"Images: {image_count}\nLabels: {label_count}")

    except FileNotFoundError as err:
        logger.exception("File not found: %s", err)
        print(f"File not found: {err}. Check your input CSV file.")


if __name__ == "__main__":
    main()

테스트 데이터 세트를 사용하려는 경우 1~3단계를 반복하여 테스트 데이터 세트의 매니페스트 파일을 생성하세요.
필요한 경우 이미지를 CSV 파일의 열 1에서 지정한(또는 --s3_path 명령줄에 지정된) Amazon S3 버킷 경로에 복사합니다. 다음 AWS S3 명령을 사용할 수 있습니다.
```
aws s3 cp --recursive your-local-folder s3://your-target-S3-location
```
매니페스트 파일을 저장하는 데 사용할 Amazon S3 버킷에 매니페스트 파일을 업로드합니다.

참고
Amazon Rekognition Custom Labels가 매니페스트 파일 JSON 라인의 source-ref 필드에 참조된 Amazon S3 버킷에 액세스할 수 있는지 확인합니다. 자세한 내용은 외부 Amazon S3 버킷에 액세스 단원을 참조하십시오. Ground Truth 작업이 Amazon Rekognition Custom Labels 콘솔 버킷에 이미지를 저장하는 경우 권한을 추가할 필요가 없습니다.
SageMaker AI Ground Truth 매니페스트 파일을 사용하여 데이터 세트 생성(콘솔)의 지침에 따라 업로드된 매니페스트 파일로 데이터 세트를 생성하세요. 8단계의 경우 .manifest 파일 위치에 매니페스트 파일의 위치에 URL 대한 Amazon S3를 입력합니다. 를 AWS 사용하는 경우를 SDK수행합니다 SageMaker AI Ground Truth 매니페스트 파일을 사용하여 데이터 세트 생성(SDK).

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

다중 레이블 Ground Truth 매니페스트 파일 변환

기존 데이터세트에서 콘텐츠 복사

쿠키 기본 설정 선택

쿠키 기본 설정 사용자 지정

필수

성능

기능

광고

쿠키 기본 설정을 저장할 수 없음

파일에서 매니페스트 CSV 파일 생성

페이지 내용이 도움이 되었습니까?

참고

파일에서 매니페스트 CSV 파일을 생성하려면

참고

다음 주제:

이전 주제:

도움이 필요하십니까?