スターターの境界ボックスカスタムテンプレート独自の境界ボックスカスタムテンプレートマニフェストファイル前注釈の Lambda 関数後注釈 Lambda 関数ラベル付けジョブの出力

デモテンプレート: `crowd-bounding-box` を使用したイメージの注釈

Amazon SageMaker Ground Truth コンソールでタスクタイプとしてカスタムテンプレートを使用することを選択した場合は、[Custom labeling task panel] (カスタムラベル付けタスクパネル) が表示されます。そこで、複数の基本テンプレートから選択することができます。テンプレートは最も一般的なタスクを表しており、カスタマイズしたラベル付けタスクのテンプレートを作成するときに役立つサンプルを提供します。コンソールを使用していない場合、または追加のリコースとして使用する場合、さまざまなラベル付けジョブタスクタイプのデモテンプレートのリポジトリについては、Amazon SageMaker AI Ground Truth サンプルタスク UIs」を参照してください。

このデモは、[BoundingBox] テンプレートを使用して行います。このデモンストレーションは、タスクの前後にデータを処理するために必要な AWS Lambda 関数でも機能します。上記の Github リポジトリで、 AWS Lambda 関数と連携するテンプレートを検索するには、テンプレート{{ task.input.<property name> }}でを探します。

スターターの境界ボックスカスタムテンプレート

以下が、スターターの境界ボックステンプレートです。


<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<crowd-form>
  <crowd-bounding-box
    name="boundingBox"
    src="{{ task.input.taskObject | grant_read_access }}"
    header="{{ task.input.header }}"
    labels="{{ task.input.labels | to_json | escape }}"
  >

    <!-- The <full-instructions> tag is where you will define the full instructions of your task. -->
    <full-instructions header="Bounding Box Instructions" >
      <p>Use the bounding box tool to draw boxes around the requested target of interest:</p>
      <ol>
        <li>Draw a rectangle using your mouse over each instance of the target.</li>
        <li>Make sure the box does not cut into the target, leave a 2 - 3 pixel margin</li>
        <li>
          When targets are overlapping, draw a box around each object,
          include all contiguous parts of the target in the box.
          Do not include parts that are completely overlapped by another object.
        </li>
        <li>
          Do not include parts of the target that cannot be seen,
          even though you think you can interpolate the whole shape of the target.
        </li>
        <li>Avoid shadows, they're not considered as a part of the target.</li>
        <li>If the target goes off the screen, label up to the edge of the image.</li>
      </ol>
    </full-instructions>

    <!-- The <short-instructions> tag allows you to specify instructions that are displayed in the left hand side of the task interface.
    It is a best practice to provide good and bad examples in this section for quick reference. -->
    <short-instructions>
      Use the bounding box tool to draw boxes around the requested target of interest.
    </short-instructions>
  </crowd-bounding-box>
</crowd-form>

カスタムテンプレートでは Liquid テンプレート言語を使用します。二重波括弧で囲まれたそれぞれの項目は 1 つの変数です。注釈前 AWS Lambda 関数は、という名前のオブジェクトを提供する必要がありtaskInput、そのオブジェクトのプロパティにはテンプレート{{ task.input.<property name> }}のとしてアクセスできます。

独自の境界ボックスカスタムテンプレート

例として、動物の写真が多数あり、以前のイメージ分類ジョブによって、イメージ内の動物の種類を知っているとします。境界ボックスを設定しているとします。

スターターサンプルには、3 つの変数 taskObject、header、および labels があります。

各変数は、境界ボックスのさまざまな部分で示されることになります。

taskObject は、注釈を付ける写真の HTTP(S) URL または S3 URI です。追加された | grant_read_access はフィルターで、S3 URI を、そのリソースに対するアクセス存続期間が短い HTTPS URL に変換します。HTTP(S) URL を使用している場合、これは必要ありません。
header は写真の上にあるテキストで、"Draw a box around the bird in the photo" のようにラベル付けされます。
labels は配列で ['item1', 'item2', ...] のように表されます。これらはワーカーが、描画するさまざまなボックスに対して割り当てることができるラベルです。1 つまたは複数持つことができます。

それぞれの変数名は、前注釈 Lambda からのレスポンスの JSON オブジェクトから取得されます。上記の名前は提案にすぎないため、わかりやすい変数名を使用してください。それによって、チームでのコードの読みやすさが向上します。

必要時のみ変数を使用する

フィールドが変わらない場合、テンプレートからその変数を削除してそのテキストに置き換えることができます。それ以外の場合は、マニフェストの各オブジェクトにおいてそのテキストを値として繰り返すか、前注釈の Lambda 関数にコーディングする必要があります。

例 : 最終的なカスタム境界ボックステンプレート

このテンプレートは、シンプルにするために、1 つの変数、1 つのラベル、および非常に基本的な手順を使用します。マニフェストの各データオブジェクトに "animal" プロパティがあるとして、その値はテンプレートの 2 つの部分で再利用できます。


<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
  <crowd-bounding-box
    name="boundingBox"
    labels="[ '{{ task.input.animal }}' ]"
    src="{{ task.input.source-ref | grant_read_access }}"
    header="Draw a box around the {{ task.input.animal }}."
  >
    <full-instructions header="Bounding Box Instructions" >
      <p>Draw a bounding box around the {{ task.input.animal }} in the image. If 
      there is more than one {{ task.input.animal }} per image, draw a bounding 
      box around the largest one.</p>
      <p>The box should be tight around the {{ task.input.animal }} with 
      no more than a couple of pixels of buffer around the 
      edges.</p>
      <p>If the image does not contain a {{ task.input.animal }}, check the <strong>
      Nothing to label</strong> box.
    </full-instructions>
    <short-instructions>
      <p>Draw a bounding box around the {{ task.input.animal }} in each image. If 
      there is more than one {{ task.input.animal }} per image, draw a bounding 
      box around the largest one.</p>
    </short-instructions>
  </crowd-bounding-box>
</crowd-form>

テンプレート全体で {{ task.input.animal }} を再利用します。マニフェストに大文字で始まる動物名がすべて含まれている場合は、{{ task.input.animal | downcase }} を使用して、Liquid の組み込みフィルターのいずれかを小文字で表示する必要がある文に組み込むことができます。

マニフェストファイル

マニフェストファイルは、テンプレートで使用している変数値に合わせる必要があります。前注釈 Lambda でマニフェストデータをある程度変換できますが、変換が不要であれば、エラーの発生するリスクはより低いままで Lambda の実行速度は速くなります。テンプレートのマニフェストファイルの例を次に示します。


{"source-ref": "<S3 image URI>", "animal": "horse"}
{"source-ref": "<S3 image URI>", "animal" : "bird"}
{"source-ref": "<S3 image URI>", "animal" : "dog"}
{"source-ref": "<S3 image URI>", "animal" : "cat"}

前注釈の Lambda 関数

ジョブのセットアップの一環として、マニフェストエントリを処理してテンプレートエンジンに渡すために呼び出すことができる AWS Lambda 関数の ARN を指定します。

Lambda 関数に名前を付ける

関数に名前を付ける場合のベストプラクティスとして、SageMaker、Sagemaker、sagemaker、または LabelingFunction の 4 つの文字列のいずれかを関数名の一部として使用します。これは、注釈前と注釈後の両方の関数に適用されます。

コンソールを使用している場合、アカウントが所有する AWS Lambda 関数がある場合、命名要件を満たす関数のドロップダウンリストが表示され、いずれかを選択できます。

この非常に基本的な例では、追加の処理を行わずにマニフェストからの情報をそのままパススルーしています。このサンプルの前注釈関数は Python 3.7 用に書かれています。


import json

def lambda_handler(event, context):
    return {
        "taskInput": event['dataObject']
    }

マニフェストからの JSON オブジェクトは、event オブジェクトの子として提供されます。taskInput オブジェクト内のプロパティはテンプレートの変数として使用できるため、taskInput の値を event['dataObject'] に設定するだけで、マニフェストオブジェクトからテンプレートにすべての値が個別にコピーされることなく渡されます。テンプレートにさらに値を送信する場合は、それらを taskInput オブジェクトに追加できます。

後注釈 Lambda 関数

ジョブのセットアップの一環として、ワーカーがタスクを完了したときにフォームデータを処理するために呼び出すことができる AWS Lambda 関数の ARN を指定します。これは、必要なだけシンプルにすることも複雑にすることもできます。取り込みながら統合とスコアに対応する場合は、選択したスコアアルゴリズムや統合アルゴリズムを適用できます。raw データを保存してオフライン処理する場合、これはオプションです。

後注釈 Lambda にアクセス許可を付与する

注釈データは、payload オブジェクトの s3Uri 文字列で指定されたファイルにあります。注釈を取り込みながら処理するには、シンプルなパススルー関数の場合でも、注釈ファイルを読み取ることができるように、Lambda に対して S3ReadOnly アクセス権を割り当てる必要があります。

Lambda を作成するためのコンソールページで [Execution role] (実行ロール) パネルまでスクロールします。[Create a new role from one or more templates] (1 つ以上のテンプレートから新しいロールを作成します)を選択します。ロールに名前を付けます。[ポリシーテンプレート] ドロップダウンから [Amazon S3 object read-only permissions] (Amazon S3 オブジェクトの読み取り専用アクセス権限) を選択します。Lambda を保存すると、ロールが保存されて選択されます。

Python 2.7 での例を次に示します。


import json
import boto3
from urlparse import urlparse

def lambda_handler(event, context):
    consolidated_labels = []

    parsed_url = urlparse(event['payload']['s3Uri']);
    s3 = boto3.client('s3')
    textFile = s3.get_object(Bucket = parsed_url.netloc, Key = parsed_url.path[1:])
    filecont = textFile['Body'].read()
    annotations = json.loads(filecont);
    
    for dataset in annotations:
        for annotation in dataset['annotations']:
            new_annotation = json.loads(annotation['annotationData']['content'])
            label = {
                'datasetObjectId': dataset['datasetObjectId'],
                'consolidatedAnnotation' : {
                'content': {
                    event['labelAttributeName']: {
                        'workerId': annotation['workerId'],
                        'boxesInfo': new_annotation,
                        'imageSource': dataset['dataObject']
                        }
                    }
                }
            }
            consolidated_labels.append(label)
    
    return consolidated_labels

後注釈 Lambda は、イベントオブジェクトでタスク結果のバッチを受信することがよくあります。このバッチは、Lambda が繰り返す必要のある payload オブジェクトになります。返すものは API 規約を満たしているオブジェクトです。

ラベル付けジョブの出力

ジョブの出力は、指定したターゲット S3 バケットの、ラベル付けジョブの名前から命名されたフォルダにあります。manifests というサブフォルダにあります。

境界ボックスタスクの場合、出力マニフェストにある出力は、以下のデモのようになります。この例は、出力のためにクリーンアップされました。実際には、レコードごとに 1 行が出力されます。

例 : 出力マニフェストの JSON


{
  "source-ref":"<URL>",
  "<label attribute name>":
    {
       "workerId":"<URL>",
       "imageSource":"<image URL>",
       "boxesInfo":"{\"boundingBox\":{\"boundingBoxes\":[{\"height\":878, \"label\":\"bird\", \"left\":208, \"top\":6, \"width\":809}], \"inputImageProperties\":{\"height\":924, \"width\":1280}}}"},
  "<label attribute name>-metadata":
    {
      "type":"groundTruth/custom",
      "job_name":"<Labeling job name>",
      "human-annotated":"yes"
    },
  "animal" : "bird"
}

元のマニフェストからの追加の animal 属性が、source-ref およびラベル付けデータと同じレベルで出力マニフェストに渡される方法に注意してください。プロパティは、入力マニフェストからテンプレートに使用されたかどうかにかかわらず、マニフェスト出力に渡されます。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

前注釈と後注釈の Lambda 関数をテストする

デモ: crowd-classifier を使用したテキストインテント

デモテンプレート: crowd-bounding-box を使用したイメージの注釈