# Use input and output data
<a name="sms-data"></a>

The input data that you provide to Amazon SageMaker Ground Truth is sent to your workers for labeling. You choose the data to send to your workers by creating a single manifest file that defines all of the data that requires labeling or by sending input data objects to an ongoing, streaming labeling job to be labeled in real time. 

The output data is the result of your labeling job. The output data file, or *augmented manifest file*, contains label data for each object you send to the labeling job and metadata about the label assigned to data objects.

When you use image classification (single and multi-label), text classification (single and multi-label), object detection, and semantic segmentation built in task types to create a labeling job, you can use the resulting augmented manifest file to launch a SageMaker training job. For a demonstration of how to use an augmented manifest to train an object detection machine learning model with Amazon SageMaker AI, see [object\$1detection\$1augmented\$1manifest\$1training.ipynb](https://sagemaker-examples.readthedocs.io/en/latest/ground_truth_labeling_jobs/object_detection_augmented_manifest_training/object_detection_augmented_manifest_training.html). For more information, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

**Topics**
+ [Input data](sms-data-input.md)
+ [3D Point Cloud Input Data](sms-point-cloud-input-data.md)
+ [Video Frame Input Data](sms-video-frame-input-data-overview.md)
+ [Labeling job output data](sms-data-output.md)

# Input data
<a name="sms-data-input"></a>

The input data are the data objects that you send to your workforce to be labeled. There are two ways to send data objects to Ground Truth for labeling: 
+ Send a list of data objects that require labeling using an input manifest file.
+ Send individual data objects in real time to a perpetually running, streaming labeling job. 

If you have a dataset that needs to be labeled one time, and you do not require an ongoing labeling job, create a standard labeling job using an input manifest file. 

If you want to regularly send new data objects to your labeling job after it has started, create a streaming labeling job. When you create a streaming labeling job, you can optionally use an input manifest file to specify a group of data that you want labeled immediately when the job starts. You can continuously send new data objects to a streaming labeling job as long as it is active. 

**Note**  
Streaming labeling jobs are only supported through the SageMaker API. You cannot create a streaming labeling job using the SageMaker AI console.

The following task types have special input data requirements and options:
+ For [3D point cloud](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud.html) labeling job input data requirements, see [3D Point Cloud Input Data](sms-point-cloud-input-data.md). 
+ For [video frame](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-task-types.html) labeling job input data requirements, see [Video Frame Input Data](sms-video-frame-input-data-overview.md).

**Topics**
+ [Input manifest files](sms-input-data-input-manifest.md)
+ [Automate data setup for labeling jobs](sms-console-create-manifest-file.md)
+ [Supported data formats](sms-supported-data-formats.md)
+ [Ground Truth streaming labeling jobs](sms-streaming-labeling-job.md)
+ [Input Data Quotas](input-data-limits.md)
+ [Select Data for Labeling](sms-data-filtering.md)

# Input manifest files
<a name="sms-input-data-input-manifest"></a>

Each line in an input manifest file is an entry containing an object, or a reference to an object, to label. An entry can also contain labels from previous jobs and for some task types, additional information. 

Input data and the manifest file must be stored in Amazon Simple Storage Service (Amazon S3). Each has specific storage and access requirements, as follows:
+ The Amazon S3 bucket that contains the input data must be in the same AWS Region in which you are running Amazon SageMaker Ground Truth. You must give Amazon SageMaker AI access to the data stored in the Amazon S3 bucket so that it can read it. For more information about Amazon S3 buckets, see [ Working with Amazon S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html). 
+ The manifest file must be in the same AWS Region as the data files, but it doesn't need to be in the same location as the data files. It can be stored in any Amazon S3 bucket that is accessible to the AWS Identity and Access Management (IAM) role that you assigned to Ground Truth when you created the labeling job.

**Note**  
3D point cloud and video frame [ task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html) have different input manifest requirements and attributes.   
For [3D point cloud task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud.html), refer to [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md).  
For [video frame task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-task-types.html), refer to [Create a Video Frame Input Manifest File](sms-video-manual-data-setup.md#sms-video-create-manifest).

The manifest is a UTF-8 encoded file in which each line is a complete and valid JSON object. Each line is delimited by a standard line break, \$1n or \$1r\$1n. Because each line must be a valid JSON object, you can't have unescaped line break characters. For more information about data format, see [JSON Lines](http://jsonlines.org/).

Each JSON object in the manifest file can be no larger than 100,000 characters. No single attribute within an object can be larger than 20,000 characters. Attribute names can't begin with `$` (dollar sign).

Each JSON object in the manifest file must contain one of the following keys: `source-ref` or `source`. The value of the keys are interpreted as follows:
+ `source-ref` – The source of the object is the Amazon S3 object specified in the value. Use this value when the object is a binary object, such as an image.
+ `source` – The source of the object is the value. Use this value when the object is a text value.


The following is an example of a manifest file for files stored in an Amazon S3 bucket:

```
{"source-ref": "S3 bucket location 1"}
{"source-ref": "S3 bucket location 2"}
   ...
{"source-ref": "S3 bucket location n"}
```

Use the `source-ref` key for image files for bounding box, image classification (single and multi-label), semantic segmentation, and video clips for video classification labeling jobs. 3D point cloud and video frame labeling jobs also use the `source-ref` key but these labeling jobs require additional information in the input manifest file. For more information see [3D Point Cloud Input Data](sms-point-cloud-input-data.md) and [Video Frame Input Data](sms-video-frame-input-data-overview.md).

The following is an example of a manifest file with the input data stored in the manifest:

```
{"source": "Lorem ipsum dolor sit amet"}
{"source": "consectetur adipiscing elit"}
   ...
{"source": "mollit anim id est laborum"}
```

Use the `source` key for single and multi-label text classification and named entity recognition labeling jobs. 

You can include other key-value pairs in the manifest file. These pairs are passed to the output file unchanged. This is useful when you want to pass information between your applications. For more information, see [Labeling job output data](sms-data-output.md).

# Automate data setup for labeling jobs
<a name="sms-console-create-manifest-file"></a>

You can use the automated data setup to create manifest files for your labeling jobs in the Ground Truth console using images, videos, video frames, text (.txt) files, and comma-separated value (.csv) files stored in Amazon S3. When you use automated data setup, you specify an Amazon S3 location where your input data is stored and the input data type, and Ground Truth looks for the files that match that type in the location you specify.

**Note**  
Ground Truth does not use an AWS KMS key to access your input data or write the input manifest file in the Amazon S3 location that you specify. The user or role that creates the labeling job must have permissions to access your input data objects in Amazon S3.

Before using the following procedure, ensure that your input images or files are correctly formatted:
+ Image files – Image files must comply with the size and resolution limits listed in the tables found in [Input File Size Quota](input-data-limits.md#input-file-size-limit). 
+ Text files – Text data can be stored in one or more .txt files. Each item that you want labeled must be separated by a standard line break. 
+ CSV files – Text data can be stored in one or more .csv files. Each item that you want labeled must be in a separate row.
+ Videos – Video files can be any of the following formats: .mp4, .ogg, and .webm. If you want to extract video frames from your video files for object detection or object tracking, see [Provide Video Files](sms-point-cloud-video-input-data.md#sms-point-cloud-video-frame-extraction).
+ Video frames – Video frames are images extracted from a videos. All images extracted from a single video are referred to as a *sequence of video frames*. Each sequence of video frames must have unique prefix keys in Amazon S3. See [Provide Video Frames](sms-point-cloud-video-input-data.md#sms-video-provide-frames). For this data type, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md)

**Important**  
For video frame object detection and video frame object tracking labeling jobs, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md) to learn how to use the automated data setup. 

Use these instructions to automatically set up your input dataset connection with Ground Truth.

**Automatically connect your data in Amazon S3 with Ground Truth**

1. Navigate to the **Create labeling job** page in the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/). 

   This link puts you in the North Virginia (us-east-1) AWS Region. If your input data is in an Amazon S3 bucket in another Region, switch to that Region. To change your AWS Region, on the [navigation bar](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html#select-region), choose the name of the currently displayed Region.

1. Select **Create labeling job**.

1. Enter a **Job name**. 

1. In the section **Input data setup**, select **Automated data setup**.

1. Enter an Amazon S3 URI for **S3 location for input datasets**. 

1. Specify your **S3 location for output datasets**. This is where your output data is stored. 

1. Choose your **Data type** using the dropdown list.

1. Use the drop down menu under **IAM Role** to select an execution role. If you select **Create a new role**, specify the Amazon S3 buckets that you want grant this role permission to access. This role must have permission to access the S3 buckets you specified in Steps 5 and 6.

1. Select **Complete data setup**.

This creates an input manifest in the Amazon S3 location for input datasets that you specified in step 5. If you are creating a labeling job using the SageMaker API or, AWS CLI, or an AWS SDK, use the Amazon S3 URI for this input manifest file as input to the parameter `ManifestS3Uri`. 

The following GIF demonstrates how to use the automated data setup for image data. This example will create a file, `dataset-YYMMDDTHHMMSS.manifest` in the Amazon S3 bucket `example-groundtruth-images` where `YYMMDDTHHmmSS` indicates the year (`YY`), month (`MM`), day (`DD`) and time in hours (`HH`), minutes (`mm`) and seconds (`ss`), that the input manifest file was created. 

![\[GIF showing how to use the automated data setup for image data.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sms/gifs/automated-data-setup.gif)


# Supported data formats
<a name="sms-supported-data-formats"></a>

When you create an input manifest file for a [built-in task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html) manually, your input data must be in one of the following support file formats for the respective input data type. To learn about automated data setup, see [Automate data setup for labeling jobs](sms-console-create-manifest-file.md).

**Tip**  
When you use the automated data setup, additional data formats can be used to generate an input manifest file for video frame and text based task types.


****  

| Task Types | Input Data Type | Support Formats | Example Input Manifest Line | 
| --- | --- | --- | --- | 
|  Bounding Box, Semantic Segmentation, Image Classification (Single Label and Multi-label), Verify and Adjust Labels  |  Image  |  .jpg, .jpeg, .png  |  <pre>{"source-ref": "s3://amzn-s3-demo-bucket1/example-image.png"}</pre>  | 
|  Named Entity Recognition, Text Classification (Single and Multi-Label)  | Text | Raw text |  <pre>{"source": "Lorem ipsum dolor sit amet"}</pre>  | 
|  Video Classification  | Video clips | .mp4, .ogg, and .webm |  <pre>{"source-ref": "s3:///example-video.mp4"}</pre>  | 
| Video Frame Object Detection, Video Frame Object Tracking (bounding boxes, polylines, polygons or keypoint) | Video frames and video frame sequence files (for Object Tracking) |  **Video frames**: .jpg, .jpeg, .png **Sequence files**: .json  | Refer to [Create a Video Frame Input Manifest File](sms-video-manual-data-setup.md#sms-video-create-manifest). | 
|  3D Point Cloud Semantic Segmentation, 3D Point Cloud Object Detection, 3D Point Cloud Object Tracking  | Point clouds and point cloud sequence files (for Object Tracking) |  **Point clouds**: Binary pack format and ASCII. For more information see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md). **Sequence files**: .json  | Refer to [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md). | 

# Ground Truth streaming labeling jobs
<a name="sms-streaming-labeling-job"></a>

If you want to perpetually send new data objects to Amazon SageMaker Ground Truth to be labeled, use a streaming labeling job. Streaming labeling jobs allow you to:
+ Send new dataset objects to workers in real time using a perpetually running labeling job. Workers continuously receive new data objects to label as long as the labeling job is active and new objects are being sent to it.
+ Gain visibility into the number of objects that have been queued and are waiting to be labeled. Use this information to control the flow of data objects sent to your labeling job.
+ Receive label data for individual data objects in real time as workers finish labeling them. 

Ground Truth streaming labeling jobs remain active until they are manually stopped or have been idle for more than 10 days. You can intermittently send new data objects to workers while the labeling job is active.

If you are a new user of Ground Truth streaming labeling jobs, it is recommended that you review [How it works](#sms-streaming-how-it-works). 

Use [Create a streaming labeling job](sms-streaming-create-job.md) to learn how to create a streaming labeling job.

**Note**  
Ground Truth streaming labeling jobs are only supported through the SageMaker API.

## How it works
<a name="sms-streaming-how-it-works"></a>

When you create a Ground Truth streaming labeling job, the job remains active until it is manually stopped, remains idle for more than 10 days, or is unable to access input data sources. You can intermittently send new data objects to workers while it is active. A worker can continue to receive new data objects in real time as long as the total number of tasks currently available to the worker is less than the value in [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount). Otherwise, the data object is sent to a queue that Ground Truth creates on your behalf in [Amazon Simple Queue Service](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html) (Amazon SQS) for later processing. These tasks are sent to workers as soon as the total number of tasks currently available to a worker falls below `MaxConcurrentTaskCount`. If a data object is not sent to a worker after 14 days, it expires. You can view the number of tasks pending in the queue and adjust the number of objects you send to the labeling job. For example, you may decrease the speed at which you send objects to the labeling job if the backlog of pending objects moves above a threshold. 

**Topics**
+ [How it works](#sms-streaming-how-it-works)
+ [Send data to a streaming labeling job](sms-streaming-how-it-works-send-data.md)
+ [Manage labeling requests with an Amazon SQS queue](sms-streaming-how-it-works-sqs.md)
+ [Receive output data from a streaming labeling job](sms-streaming-how-it-works-output-data.md)
+ [Duplicate message handling](sms-streaming-impotency.md)

# Send data to a streaming labeling job
<a name="sms-streaming-how-it-works-send-data"></a>

You can optionally submit input data to a streaming labeling job one time when you create the labeling job using an input manifest file. Once the labeling job has started and the state is `InProgress`, you can submit new data objects to your labeling job in real time using your Amazon SNS input topic and Amazon S3 event notifications. 

***Submit Data Objects When you Start the Labeling Job (One Time):***
+ **Use an Input Manifest File** – You can optionally specify an input manifest file Amazon S3 URI in `ManifestS3Uri` when you create the streaming labeling job. Ground Truth sends each data object in the manifest file to workers for labeling as soon as the labeling job starts. To learn more, see [Create a Manifest File (Optional)](sms-streaming-manifest.md).

  After you submit a request to create the streaming labeling job, its status will be `Initializing`. Once the labeling job is active, the state changes to `InProgress` and you can start using the real-time options to submit additional data objects for labeling. 

***Submit Data Objects in Real Time:***
+ **Send data objects using Amazon SNS messages** – You can send Ground Truth new data objects to label by sending an Amazon SNS message. You will send this message to an Amazon SNS input topic that you create and specify when you create your streaming labeling job. For more information, see [Send data objects using Amazon SNS](#sms-streaming-how-it-works-sns).
+ **Send data objects by placing them in an Amazon S3 bucket** – Each time you add a new data object to an Amazon S3 bucket, you can prompt Ground Truth to process that object for labeling. To do this, you add an event notification to the bucket so that it notifies your Amazon SNS input topic each time a new object is added to (or *created in*) that bucket. For more information, see [Send data objects using Amazon S3](#sms-streaming-how-it-works-s3). This option is not available for text-based labeling jobs such as text classification and named entity recognition. 
**Important**  
If you use the Amazon S3 configuration, do not use the same Amazon S3 location for your input data configuration and your output data. You specify the S3 prefix for your output data when you create a labeling job.

## Send data objects using Amazon SNS
<a name="sms-streaming-how-it-works-sns"></a>

You can send data objects to your streaming labeling job using Amazon Simple Notification Service (Amazon SNS). Amazon SNS is a web service that coordinates and manages the delivery of messages to and from *endpoints* (for example, an email address or AWS Lambda function). An Amazon SNS *topic* acts as a communication channel between two or more endpoints. You use Amazon SNS to send, or *publish*, new data objects to the topic specified in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) parameter `SnsTopicArn` in `InputConfig`. The format of these messages is the same as a single line from an [input manifest file](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-input.html). 

For example, you may send a piece of text to an active text classification labeling job by publishing it to your input topic. The message that you publish may look similar to the following:

```
{"source": "Lorem ipsum dolor sit amet"}
```

To send a new image object to an image classification labeling job, your message may look similar to the following:

```
{"source-ref": "s3://amzn-s3-demo-bucket/example-image.jpg"}
```

**Note**  
You can also include custom deduplication IDs and deduplication keys in your Amazon SNS messages. To learn more, see [Duplicate message handling](sms-streaming-impotency.md).

When Ground Truth creates your streaming labeling job, it subscribes to your Amazon SNS input topic. 

## Send data objects using Amazon S3
<a name="sms-streaming-how-it-works-s3"></a>

You can send one or more new data objects to a streaming labeling job by placing them in an Amazon S3 bucket that is configured with an Amazon SNS event notification. You can set up an event to notify your Amazon SNS input topic anytime a new object is created in your bucket. You must specify this same Amazon SNS input topic in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) parameter `SnsTopicArn` in `InputConfig`.

Anytime you configure an Amazon S3 bucket to send notifications to Amazon SNS, Ground Truth will publish a test event, `"s3:TestEvent"`, to ensure that the topic exists and that the owner of the Amazon S3 bucket specified has permission to publish to the specified topic. It is recommended that you set up your Amazon S3 connection with Amazon SNS before starting a streaming labeling job. If you do not, this test event may register as a data object and be sent to Ground Truth for labeling. 

**Important**  
If you use the Amazon S3 configuration, do not use the same Amazon S3 location for your input data configuration and your output data. You specify the S3 prefix for your output data when you create a labeling job.  
For image-based labeling jobs, Ground Truth requires all S3 buckets to have a CORS policy attached. To learn more, see [CORS Requirement for Input Image Data](sms-cors-update.md).

Once you have configured your Amazon S3 bucket and created your labeling job, you can add objects to your bucket and Ground Truth either sends that object to workers or places it on your Amazon SQS queue. 

To learn more, see [Creating Amazon S3 based bucket event notifications based of the Amazon SNS defined in your labeling job](sms-streaming-s3-setup.md).

**Important**  
This option is not available for text-based labeling jobs such as text classification and named entity recognition.

# Manage labeling requests with an Amazon SQS queue
<a name="sms-streaming-how-it-works-sqs"></a>

When Ground Truth creates your streaming labeling job, it creates an Amazon SQS queue in the AWS account used to create the labeling job. The queue name is `GroundTruth-labeling_job_name` where `labeling_job_name` is the name of your labeling job, in lowercase letters. When you send data objects to your labeling job, Ground Truth either sends the data objects directly to workers or places the task in your queue to be processed at a later time. If a data object is not sent to a worker after 14 days, it expires and is removed from the queue. You can setup an alarm in Amazon SQS to detect when objects expire and use this mechanism to control the volume of objects you send to your labeling job.

**Important**  
Modifying, deleting, or sending objects directly to the Amazon SQS queue associated with your streaming labeling job may lead to job failures. 

# Receive output data from a streaming labeling job
<a name="sms-streaming-how-it-works-output-data"></a>

Your Amazon S3 output bucket is periodically updated with new output data from your streaming labeling job. Optionally, you can specify an Amazon SNS output topic. Each time a worker submits a labeled object, a notification with the output data is sent to that topic. You can subscribe an endpoint to your SNS output topic to receive notifications or trigger events when you receive output data from a labeling task. Use an Amazon SNS output topic if you want to do real time chaining to another streaming job and receive an Amazon SNS notifications each time a data object is submitted by a worker.

To learn more, see [Subscribe an Endpoint to Your Amazon SNS Output Topic](sms-create-sns-input-topic.md#sms-streaming-subscribe-output-topic).

# Duplicate message handling
<a name="sms-streaming-impotency"></a>

For data objects sent in real time, Ground Truth guarantees idempotency by ensuring each unique object is only sent for labeling once, even if the input message referring to that object is received multiple times (duplicate messages). To do this, each data object sent to a streaming labeling job is assigned a *deduplication ID*, which is identified with a *deduplication key*. If you send your requests to label data objects directly through your Amazon SNS input topic using Amazon SNS messages, you can optionally choose a custom deduplication key and deduplication IDs for your objects. For more information, see [Specify a deduplication key and ID in an Amazon SNS message](sms-streaming-impotency-create.md).

If you do not provide your own deduplication key, or if you use the Amazon S3 configuration to send data objects to your labeling job, Ground Truth uses one of the following for the deduplication ID:
+ For messages sent directly to your Amazon SNS input topic, Ground Truth uses the SNS message ID. 
+ For messages that come from an Amazon S3 configuration, Ground Truth creates a deduplication ID by combining the Amazon S3 URI of the object with the [sequencer token](https://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html) in the message.

# Specify a deduplication key and ID in an Amazon SNS message
<a name="sms-streaming-impotency-create"></a>

When you send a data object to your streaming labeling job using an Amazon SNS message, you have the option to specify your deduplication key and deduplication ID in one of the following ways. In all of these scenarios, identify your deduplication key with `dataset-objectid-attribute-name`.

**Bring Your Own Deduplication Key and ID**

Create your own deduplication key and deduplication ID by configuring your Amazon SNS message as follows. Replace `byo-key` with your key and `UniqueId` with the deduplication ID for that data object.

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/prefix/object1", 
    "dataset-objectid-attribute-name":"byo-key",
    "byo-key":"UniqueId" 
}
```

Your deduplication key can be up to 140 characters. Supported patterns include: `"^[$a-zA-Z0-9](-*[a-zA-Z0-9])*"`.

Your deduplication ID can be up to 1,024 characters. Supported patterns include: `^(https|s3)://([^/]+)/?(.*)$`.

**Use an Existing Key for your Deduplication Key**

You can use an existing key in your message as the deduplication key. When you do this, the value associated with that key is used for the deduplication ID. 

For example, you can specify use the `source-ref` key as your deduplication key by formatting your message as follows: 

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/prefix/object1",
    "dataset-objectid-attribute-name":"source-ref" 
}
```

In this example, Ground Truth uses `"s3://amzn-s3-demo-bucket/prefix/object1"` for the deduplication id.

# Find deduplication key and ID in your output data
<a name="sms-streaming-impotency-output"></a>

You can see the deduplication key and ID in your output data. The deduplication key is identified by `dataset-objectid-attribute-name`. When you use your own custom deduplication key, your output contains something similar to the following:

```
"dataset-objectid-attribute-name": "byo-key",
"byo-key": "UniqueId",
```

When you do not specify a key, you can find the deduplication ID that Ground Truth assigned to your data object as follows. The `$label-attribute-name-object-id` parameter identifies your deduplication ID. 

```
{
    "source-ref":"s3://bucket/prefix/object1", 
    "dataset-objectid-attribute-name":"$label-attribute-name-object-id"
    "label-attribute-name" :0,
    "label-attribute-name-metadata": {...},
    "$label-attribute-name-object-id":"<service-generated-key>"
}
```

For `<service-generated-key>`, if the data object came through an Amazon S3 configuration, Ground Truth adds a unique value used by the service and emits a new field keyed by `$sequencer` which shows the Amazon S3 sequencer used. If object was fed to SNS directly, Ground Truth use the SNS message ID.

**Note**  
Do not use the `$` character in your label attribute name. 

# Input Data Quotas
<a name="input-data-limits"></a>

Input datasets used in semantic segmentation labeling jobs have a quota of 20,000 items. For all other labeling job types, the dataset size quota is 100,000 items. To request an increase to the quota for labeling jobs other than semantic segmentation jobs, review the procedures in [AWS Service Quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) to request a quota increase.

Input image data for active and non-active learning labeling jobs must not exceed size and resolution quotas. *Active learning* refers to labeling job that use [automated data labeling](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-automated-labeling.html). *Non-active learning* refers to labeling jobs that don't use automated data labeling.

Additional quotas apply for label categories for all task types, and for input data and labeling category attributes for 3D point cloud and video frame task types. 

## Input File Size Quota
<a name="input-file-size-limit"></a>

Input files can't exceed the following size- quotas for both active and non-active learning labeling jobs. There is no input file size quota for videos used in [video classification](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-classification.html) labeling jobs.


| Labeling Job Task Type | Input File Size Quota | 
| --- | --- | 
| Image classification | 40 MB | 
| Bounding box (Object detection) | 40 MB | 
| Semantic segmentation | 40 MB | 
| Bounding box (Object detection) label adjustment | 40 MB | 
| Semantic segmentation label adjustment | 40 MB | 
| Bounding box (Object detection) label verification | 40 MB | 
| Semantic segmentation label verification | 40 MB | 

## Input Image Resolution Quotas
<a name="non-active-learning-input-data-limits"></a>

Image file resolution refers to the number of pixels in an image, and determines the amount of detail an image holds. Image resolution quotas differ depending on the labeling job type and the SageMaker AI built-in algorithm used. The following table lists the resolution quotas for images used in active and non-active learning labeling jobs.


| Labeling Job Task Type | **Resolution Quota - Non Active Learning** | Resolution Quota - Active Learning | 
| --- | --- | --- | 
| Image classification | 100 million pixels | 3840 x 2160 pixels (4 K) | 
| Bounding box (Object detection) | 100 million pixels | 3840 x 2160 pixels (4 K) | 
| Semantic segmentation | 100 million pixels | 1920 x 1080 pixels (1080 p) | 
| Object detection label adjustment | 100 million pixels | 3840 x 2160 pixels (4 K) | 
| Semantic segmentation label adjustment | 100 million pixels | 1920 x 1080 pixels (1080 p) | 
| Object detection label verification | 100 million pixels | Not available | 
| Semantic segmentation label verification | 100 million pixels | Not available | 

## Label Category Quotas
<a name="sms-label-quotas"></a>

Each labeling job task type has a quota for the number of label categories you can specify. Workers select label categories to create annotations. For example, you may specify label categories *car*, *pedestrian*, and *biker* when creating a bounding box labeling job and workers will select the *car* category before drawing bounding boxes around cars.

**Important**  
Label category names cannot exceed 256 characters.   
All label categories must be unique. You cannot specify duplicate label categories. 

The following label category limits apply to labeling jobs. Quotas for label categories depend on whether you use the SageMaker API operation `CreateLabelingJob` or the console to create a labeling job.


****  

| Labeling Job Task Type | Label Category Quota - API | Label Category Quota - Console | 
| --- | --- | --- | 
| Image classification (Multi-label) | 50 | 50 | 
| Image classification (Single label) | Unlimited | 30 | 
| Bounding box (Object detection) | 50 | 50 | 
| Label verification | Unlimited | 30 | 
| Semantic segmentation (with active learning) | 20 | 10 | 
| Semantic segmentation (without active learning) | Unlimited | 10 | 
| Named entity recognition | Unlimited | 30 | 
| Text classification (Multi-label) | 50 | 50 | 
| Text classification (Single label) | Unlimited | 30 | 
| Video classification | 30 | 30 | 
| Video frame object detection | 30 | 30 | 
| Video frame object tracking | 30 | 30 | 
| 3D point cloud object detection | 30 | 30 | 
| 3D point cloud object tracking | 30 | 30 | 
| 3D point cloud semantic segmentation | 30 | 30 | 

## Generative AI Labeling Job Quotas
<a name="gen-ai-labeling-job-quotas"></a>

The following quotas apply for question-answer pairs that you provide in the labeling application.


| Quota Type | Data Quota | 
| --- | --- | 
| Question-answer pairs | Minimum is one pair. Maximum is 20 pairs. | 
| Word count of a question | Minimum is one word. Maximum is 200 words. | 
| Word count of an answer | Minimum is one word. Maximum is 200 words. | 

## 3D Point Cloud and Video Frame Labeling Job Quotas
<a name="sms-input-data-quotas-other"></a>

The following quotas apply for 3D point cloud and video frame labeling job input data.


****  

| Labeling Job Task Type | Input Data Quota | 
| --- | --- | 
| Video frame object detection  |  2,000 video frames (images) per sequence  | 
| Video frame object detection  |  10 video frame sequences per manifest file | 
| Video frame object tracking |  2,000 video frames (images) per sequence  | 
| Video frame object tracking |  10 video frame sequences per manifest file | 
| 3D point cloud object detection |  100,000 point cloud frames per labeling job | 
| 3D point cloud object tracking |  100,000 point cloud frame sequences per labeling job | 
| 3D point cloud object tracking |  500 point cloud frames in each sequence file | 

When you create a video frame or 3D point cloud labeling job, you can add one or more *label category attributes* to each label category that you specify to have workers provide more information about an annotation.

Each label category attribute has a single label category attribute `name`, and a list of one or more options (values) to choose from. To learn more, see [Worker user interface (UI)](sms-point-cloud-general-information.md#sms-point-cloud-worker-task-ui) for 3D point cloud labeling jobs and [Worker user interface (UI)](sms-video-overview.md#sms-video-worker-task-ui) for video frame labeling jobs. 

 The following quotas apply to the number of label category attributes names and values you can specify for labeling jobs.


****  

| Labeling Job Task Type | Label Category Attribute (name) Quota | Label Category Attribute Values Quota | 
| --- | --- | --- | 
| Video frame object detection  | 10 | 10 | 
| Video frame object tracking | 10 | 10 | 
| 3D point cloud object detection | 10 | 10 | 
| 3D point cloud object tracking | 10 | 10 | 
| 3D point cloud semantic segmentation | 10 | 10 | 

# Select Data for Labeling
<a name="sms-data-filtering"></a>

You can use the Amazon SageMaker AI console to select a portion of your dataset for labeling. The data must be stored in an Amazon S3 bucket. You have three options:
+ Use the full dataset.
+ Choose a randomly selected sample of the dataset.
+ Specify a subset of the dataset using a query.

The following options are available in the **Labeling jobs** section of the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/groundtruth) after selecting **Create labeling job**. To learn how to create a labeling job in the console, see [Getting started: Create a bounding box labeling job with Ground Truth](sms-getting-started.md). To configure the dataset that you use for labeling, in the **Job overview** section, choose **Additional configuration**.

## Use the Full Dataset
<a name="sms-full-dataset"></a>

When you choose to use the **Full dataset**, you must provide a manifest file for your data objects. You can provide the path of the Amazon S3 bucket that contains the manifest file or use the SageMaker AI console to create the file. To learn how to create a manifest file using the console, see [Automate data setup for labeling jobs](sms-console-create-manifest-file.md). 

## Choose a Random Sample
<a name="sms-random-dataset"></a>

When you want to label a random subset of your data, select **Random sample**. The dataset is stored in the Amazon S3 bucket specified in the ** Input dataset location ** field. 

After you have specified the percentage of data objects that you want to include in the sample, choose **Create subset**. SageMaker AI randomly picks the data objects for your labeling job. After the objects are selected, choose **Use this subset**. 

SageMaker AI creates a manifest file for the selected data objects. It also modifies the value in the **Input dataset location** field to point to the new manifest file.

## Specify a Subset
<a name="sms-select-dataset"></a>

**Amazon S3 Select**  
Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. To learn more see, [How to optimize querying your data in Amazon S3](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/)

You can specify a subset of your data objects using an Amazon S3 `SELECT` query on the object file names. 

The `SELECT` statement of the SQL query is defined for you. You provide the `WHERE` clause to specify which data objects should be returned.

For more information about the Amazon S3 `SELECT` statement, see [ Selecting Content from Objects](https://docs.aws.amazon.com/AmazonS3/latest/dev/selecting-content-from-objects.html).

Choose **Create subset** to start the selection, and then choose **Use this subset** to use the selected data. 

SageMaker AI creates a manifest file for the selected data objects. It also updates the value in the **Input dataset location** field to point to the new manifest file.

# 3D Point Cloud Input Data
<a name="sms-point-cloud-input-data"></a>

To create a 3D point cloud labeling job, you must create an input manifest file. Use this topic to learn the formatting requirements of the input manifest file for each task type. To learn about the raw input data formats Ground Truth accepts for 3D point cloud labeling jobs, see the section [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).

Use your [labeling job task type](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html) to choose a topics on [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md) to learn about the formatting requirements for each line of your input manifest file.

**Topics**
+ [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md)
+ [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md)
+ [Understand Coordinate Systems and Sensor Fusion](sms-point-cloud-sensor-fusion-details.md)

# Accepted Raw 3D Data Formats
<a name="sms-point-cloud-raw-data-types"></a>

Ground Truth uses your 3D point cloud data to render a 3D scenes that workers annotate. This section describes the raw data formats that are accepted for point cloud data and sensor fusion data for a point cloud frame. To learn how to create an input manifest file to connect your raw input data files with Ground Truth, see [Input Manifest Files for 3D Point Cloud Labeling Jobs](sms-point-cloud-input-manifest.md).

For each frame, Ground Truth supports Compact Binary Pack Format (.bin) and ASCII (.txt) files. These files contain information about the location (`x`, `y`, and `z` coordinates) of all points that make up that frame, and, optionally, information about the pixel color of each point for colored point clouds. When you create a 3D point cloud labeling job input manifest file, you can specify the format of your raw data in the `format` parameter. 

The following table lists elements that Ground Truth supports in point cloud frame files to describe individual points. 


****  

| Symbol | Value | 
| --- | --- | 
|  `x`  |  The x coordinate of the point.  | 
|  `y`  |  The y coordinate of the point.  | 
|  `z`  |  The z coordinate of the point.  | 
|  `i`  |  The intensity of the point.  | 
|  `r`  |  The red color channel component. An 8-bit value (0-255).  | 
|  `g`  |  The green color channel component. An 8-bit value (0-255)  | 
|  `b`  |  The blue color channel component. An 8-bit value (0-255)  | 

Ground Truth assumes the following about your input data:
+ All of the positional coordinates (x, y, z) are in meters. 
+ All the pose headings (qx, qy, qz, qw) are measured in Spatial [Quaternions](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) .

## Compact Binary Pack Format
<a name="sms-point-cloud-raw-data-cbpf-format"></a>

The Compact Binary Pack Format represents a point cloud as an ordered set of a stream of points. Each point in the stream is an ordered binary pack of 4-byte float values in some variant of the form `xyzirgb`. The `x`, `y`, and `z` elements are required and additional information about that pixel can be included in a variety of ways using `i`, `r`, `g`, and `b`. 

To use a binary file to input point cloud frame data to a Ground Truth 3D point cloud labeling job, enter `binary/` in the `format` parameter for your input manifest file and replace `` with the order of elements in each binary pack. For example, you may enter one of the following for the `format` parameter. 
+ `binary/xyzi` – When you use this format, your point element stream would be in the following order: `x1y1z1i1x2y2z2i2...`
+ `binary/xyzrgb` – When you use this format, your point element stream would be in the following order: `x1y1z1r1g1b1x2y2z2r2g2b2...`
+ `binary/xyzirgb` – When you use this format, your point element stream would be in the following order: `x1y1z1i1r1g1b1x2y2z2i2r2g2b2...`

When you use a binary file for your point cloud frame data, if you do not enter a value for `format`, the default pack format `binary/xyzi` is used. 

## ASCII Format
<a name="sms-point-cloud-raw-data-ascii-format"></a>

The ASCII format uses a text file to represent a point cloud, where each line in the ASCII point cloud file represents a single point. Each point is a line the text file and contains white space separated values, each of which is a 4-byte float ASCII values. The `x`, `y`, and `z` elements are required for each point and additional information about that point can be included in a variety of ways using `i`, `r`, `g`, and `b`.

To use a text file to input point cloud frame data to a Ground Truth 3D point cloud labeling job, enter `text/` in the `format` parameter for your input manifest file and replace `` with the order of point elements on each line. 

For example, if you enter `text/xyzi` for `format`, your text file for each point cloud frame should look similar to the following: 

```
x1 y1 z1 i1
x2 y2 z2 i2
...
...
```

If you enter `text/xyzrgb`, your text file should look similar to the following: 

```
x1 y1 z1 r1 g1 b1
x2 y2 z2 r2 g2 b1
...
...
```

When you use a text file for your point cloud frame data, if you do not enter a value for `format`, the default format `text/xyzi` will be used. 

## Point Cloud Resolution Limits
<a name="sms-point-cloud-resolution"></a>

Ground Truth does not have a resolution limit for 3D point cloud frames. However, we recommend that you limit each point cloud frame to 500K points for optimal performance. When Ground Truth renders the 3D point cloud visualization, it must be viewable on your workers' computers, which depends on workers' computer hardware. Point cloud frames that are larger than 1 million points may not render on standard machines, or may take too long to load. 

# Input Manifest Files for 3D Point Cloud Labeling Jobs
<a name="sms-point-cloud-input-manifest"></a>

When you create a labeling job, you provide an input manifest file where each line of the manifest describes a unit of task to be completed by annotators. The format of your input manifest file depends on your task type. 
+ If you are creating a 3D point cloud **object detection** or **semantic segmentation** labeling job, each line in your input manifest file contains information about a single 3D point cloud frame. This is called a *point cloud frame input manifest*. To learn more, see [Create a Point Cloud Frame Input Manifest File](sms-point-cloud-single-frame-input-data.md). 
+ If you are creating a 3D point cloud **object tracking** labeling job, each line of your input manifest file contains a sequence of 3D point cloud frames and associated data. This is called a *point cloud sequence input manifest*. To learn more, see [Create a Point Cloud Sequence Input Manifest](sms-point-cloud-multi-frame-input-data.md). 

# Create a Point Cloud Frame Input Manifest File
<a name="sms-point-cloud-single-frame-input-data"></a>

The manifest is a UTF-8 encoded file in which each line is a complete and valid JSON object. Each line is delimited by a standard line break, \$1n or \$1r\$1n. Because each line must be a valid JSON object, you can't have unescaped line break characters. In the single-frame input manifest file, each line in the manifest contains data for a single point cloud frame. The point cloud frame data can either be stored in binary or ASCII format (see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md)). This is the manifest file formatting required for 3D point cloud object detection and semantic segmentation. Optionally, you can also provide camera sensor fusion data for each point cloud frame. 

Ground Truth supports point cloud and video camera sensor fusion in the [world coordinate system](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-world-coordinate-system) for all modalities. If you can obtain your 3D sensor extrinsic (like a LiDAR extrinsic), we recommend that you transform 3D point cloud frames into the world coordinate system using the extrinsic. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion). 

However, if you cannot obtain a point cloud in world coordinate system, you can provide coordinates in the original coordinate system that the data was captured in. If you are providing camera data for sensor fusion, it is recommended that you provide LiDAR sensor and camera pose in the world coordinate system. 

To create a single-frame input manifest file, you will identify the location of each point cloud frame that you want workers to label using the `source-ref` key. Additionally, you must use the `source-ref-metadata` key to identify the format of your dataset, a timestamp for that frame, and, optionally, sensor fusion data and video camera images.

The following example demonstrates the syntax used for an input manifest file for a single-frame point cloud labeling job. The example includes two point cloud frames. For details about each parameter, see the table following this example. 

**Important**  
Each line in your input manifest file must be in [JSON Lines](http://jsonlines.org/) format. The following code block shows an input manifest file with two JSON objects. Each JSON object is used to point to and provide details about a single point cloud frame. The JSON objects have been expanded for readability, but you must minimize each JSON object to fit on a single line when creating an input manifest file. An example is provided under this code block.

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame1.bin",
    "source-ref-metadata":{
        "format": "binary/xyzi",
        "unix-timestamp": 1566861644.759115,
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        },
        "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/",
        "images": [
        {
            "image-path": "images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    }
}
{
    "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame2.bin",
    "source-ref-metadata":{
        "format": "binary/xyzi",
        "unix-timestamp": 1566861632.759133,
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        },
        "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/",
        "images": [
        {
            "image-path": "images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    }
}
```

When you create an input manifest file, you must collapse your JSON objects to fit on a single line. For example, the code block above would appear as follows in an input manifest file:

```
{"source-ref":"s3://amzn-s3-demo-bucket/examplefolder/frame1.bin","source-ref-metadata":{"format":"binary/xyzi","unix-timestamp":1566861644.759115,"ego-vehicle-pose":{"position":{"x":-2.7161461413869947,"y":116.25822288149078,"z":1.8348751887989483},"heading":{"qx":-0.02111296123795955,"qy":-0.006495469416730261,"qz":-0.008024565904865688,"qw":0.9997181192298087}},"prefix":"s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/","images":[{"image-path":"images/frame300.bin_camera0.jpg","unix-timestamp":1566861644.759115,"fx":847.7962624528487,"fy":850.0340893791985,"cx":576.2129134707038,"cy":317.2423573573745,"k1":0,"k2":0,"k3":0,"k4":0,"p1":0,"p2":0,"skew":0,"position":{"x":-2.2722515189268138,"y":116.86003310568965,"z":1.454614668542299},"heading":{"qx":0.7594754093069037,"qy":0.02181790885672969,"qz":-0.02461725233103356,"qw":-0.6496916273040025},"camera-model":"pinhole"}]}}
{"source-ref":"s3://amzn-s3-demo-bucket/examplefolder/frame2.bin","source-ref-metadata":{"format":"binary/xyzi","unix-timestamp":1566861632.759133,"ego-vehicle-pose":{"position":{"x":-2.7161461413869947,"y":116.25822288149078,"z":1.8348751887989483},"heading":{"qx":-0.02111296123795955,"qy":-0.006495469416730261,"qz":-0.008024565904865688,"qw":0.9997181192298087}},"prefix":"s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/someprefix/","images":[{"image-path":"images/frame300.bin_camera0.jpg","unix-timestamp":1566861644.759115,"fx":847.7962624528487,"fy":850.0340893791985,"cx":576.2129134707038,"cy":317.2423573573745,"k1":0,"k2":0,"k3":0,"k4":0,"p1":0,"p2":0,"skew":0,"position":{"x":-2.2722515189268138,"y":116.86003310568965,"z":1.454614668542299},"heading":{"qx":0.7594754093069037,"qy":0.02181790885672969,"qz":-0.02461725233103356,"qw":-0.6496916273040025},"camera-model":"pinhole"}]}}
```

The following table shows the parameters you can include in your input manifest file:


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `source-ref`  |  Yes  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/point-cloud-frame-file`  |  The Amazon S3 location of a single point cloud frame.  | 
|  `source-ref-metadata`  |  Yes  |  JSON object **Accepted parameters**:  `format`, `unix-timestamp`, `ego-vehicle-pose`, `position`, `prefix`, `images`  |  Use this parameter to include additional information about the point cloud in `source-ref`, and to provide camera data for sensor fusion.   | 
|  `format`  |  No  |  String **Accepted string values**: `"binary/xyz"`, `"binary/xyzi"`, `"binary/xyzrgb"`, `"binary/xyzirgb"`, `"text/xyz"`, `"text/xyzi"`, `"text/xyzrgb"`, `"text/xyzirgb"` **Default Values**:  When the file identified in `source-ref` has a .bin extension, `binary/xyzi` When the file identified in `source-ref` has a .txt extension, `text/xyzi`  |  Use this parameter to specify the format of your point cloud data. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).  | 
|  `unix-timestamp`  |  Yes  |  Number A unix timestamp.   |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a sensor.   | 
|  `ego-vehicle-pose`  |  No  |  JSON object  |  The pose of the device used to collect the point cloud data. For more information about this parameter, see [Include Vehicle Pose Information in Your Input Manifest](#sms-point-cloud-single-frame-ego-vehicle-input).  | 
|  `prefix`  |  No  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/`  |  The location in Amazon S3 where your metadata, such as camera images, is stored for this frame.  The prefix must end with a forward slash: `/`.  | 
|  `images`  |  No  |  List  |  A list of parameters describing color camera images used for sensor fusion. You can include up to 8 images in this list. For more information about the parameters required for each image, see [Include Camera Data in Your Input Manifest](#sms-point-cloud-single-frame-image-input).   | 

## Include Vehicle Pose Information in Your Input Manifest
<a name="sms-point-cloud-single-frame-ego-vehicle-input"></a>

Use the ego-vehicle location to provide information about the location of the vehicle used to capture point cloud data. Ground Truth use this information to compute LiDAR extrinsic matrix. 

Ground Truth uses extrinsic matrices to project labels to and from the 3D scene and 2D images. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion).

The following table provides more information about the `position` and orientation (`heading`) parameters that are required when you provide ego-vehicle information. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The translation vector of the ego vehicle in the world coordinate system.   | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the device or sensor mounted on the vehicle sensing the surrounding, measured in [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`) in the a coordinate system.  | 

## Include Camera Data in Your Input Manifest
<a name="sms-point-cloud-single-frame-image-input"></a>

If you want to include video camera data with a frame, use the following parameters to provide information about each image. The **Required** column below applies when the `images` parameter is included in the input manifest file under `source-ref-metadata`. You are not required to include images in your input manifest file. 

If you include camera images, you must include information about the camera `position` and `heading` used the capture the images in the world coordinate system.

If your images are distorted, Ground Truth can automatically undistort them using information you provide about the image in your input manifest file, including distortion coefficients (`k1`, `k2`, `k3`, `k4`, `p1`, `p1`), the camera model and the camera intrinsic matrix. The intrinsic matrix is made up of focal length (`fx`, `fy`), and the principal point (`cx`, `cy)`. See [Intrinsic Matrix](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-intrinsic) to learn how Ground Truth uses the camera intrinsic. If distortion coefficients are not included, Ground Truth will not undistort an image. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `image-path`  |  Yes  |  String **Example of format**:  `<folder-name>/<imagefile.png>`  |  The relative location, in Amazon S3 of your image file. This relative path will be appended to the path you specify in `prefix`.   | 
|  `unix-timestamp`  |  Yes  |  Number  |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a camera.   | 
|  `camera-model`  |  No  |  String: **Accepted Values**: `"pinhole"`, `"fisheye"` **Default**: `"pinhole"`  |  The model of the camera used to capture the image. This information is used to undistort camera images.   | 
|  `fx, fy`  |  Yes  |  Numbers  |  The focal length of the camera, in the x (`fx`) and y (`fy`) directions.  | 
|  `cx, cy`  |  Yes  | Numbers |  The x (`cx`) and y (`cy`) coordinates of the principal point.   | 
|  `k1, k2, k3, k4`  |  No  |  Number  |  Radial distortion coefficients. Supported for both **fisheye** and **pinhole** camera models.   | 
|  `p1, p2`  |  No  |  Number  |  Tangential distortion coefficients. Supported for **pinhole** camera models.  | 
|  `skew`  |  No  |  Number  |  A parameter to measure the skew of an image.   | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   | The location or origin of the frame of reference of the camera mounted on the vehicle capturing images. | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the camera mounted on the vehicle capturing images, measured using [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`), in the world coordinate system.   | 

## Point Cloud Frame Limits
<a name="sms-point-cloud-single-frame-limits"></a>

You can include up to 100,000 point cloud frames in your input manifest file. 3D point cloud labeling job have longer pre-processing times than other Ground Truth task types. For more information, see [Job pre-processing time](sms-point-cloud-general-information.md#sms-point-cloud-job-creation-time).

# Create a Point Cloud Sequence Input Manifest
<a name="sms-point-cloud-multi-frame-input-data"></a>

The manifest is a UTF-8 encoded file in which each line is a complete and valid JSON object. Each line is delimited by a standard line break, \$1n or \$1r\$1n. Because each line must be a valid JSON object, you can't have unescaped line break characters. In the point cloud sequence input manifest file, each line in the manifest contains a sequence of point cloud frames. The point cloud data for each frame in the sequence can either be stored in binary or ASCII format. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md). This is the manifest file formatting required for 3D point cloud object tracking. Optionally, you can also provide point attribute and camera sensor fusion data for each point cloud frame. When you create a sequence input manifest file, you must provide LiDAR and video camera sensor fusion data in a [world coordinate system](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-world-coordinate-system). 

The following example demonstrates the syntax used for an input manifest file when each line in the manifest is a sequence file. Each line in your input manifest file must be in [JSON Lines](http://jsonlines.org/) format.

```
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq1.json"}
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq2.json"}
```

The data for each sequence of point cloud frames needs to be stored in a JSON data object. The following is an example of the format you use for a sequence file. Information about each frame is included as a JSON object and is listed in the `frames` list. This is an example of a sequence file with two point cloud frame files, `frame300.bin` and `frame303.bin`. The *...* is used to indicated where you should include information for additional frames. Add a JSON object for each frame in the sequence.

The following code block includes a JSON object for a single sequence file. The JSON object has been expanded for readability.

```
{
  "seq-no": 1,
  "prefix": "s3://amzn-s3-demo-bucket/example_lidar_sequence_dataset/seq1/",
  "number-of-frames": 100,
  "frames":[
    {
        "frame-no": 300, 
        "unix-timestamp": 1566861644.759115, 
        "frame": "example_lidar_frames/frame300.bin", 
        "format": "binary/xyzi", 
        "ego-vehicle-pose":{
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            },
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            }
        }, 
        "images": [
        {
            "image-path": "example_images/frame300.bin_camera0.jpg",
            "unix-timestamp": 1566861644.759115,
            "fx": 847.7962624528487,
            "fy": 850.0340893791985,
            "cx": 576.2129134707038,
            "cy": 317.2423573573745,
            "k1": 0,
            "k2": 0,
            "k3": 0,
            "k4": 0,
            "p1": 0,
            "p2": 0,
            "skew": 0,
            "position": {
                "x": -2.2722515189268138,
                "y": 116.86003310568965,
                "z": 1.454614668542299
            },
            "heading": {
                "qx": 0.7594754093069037,
                "qy": 0.02181790885672969,
                "qz": -0.02461725233103356,
                "qw": -0.6496916273040025
            },
            "camera-model": "pinhole"
        }]
    },
    {
        "frame-no": 303, 
        "unix-timestamp": 1566861644.759115, 
        "frame": "example_lidar_frames/frame303.bin", 
        "format": "text/xyzi", 
        "ego-vehicle-pose":{...}, 
        "images":[{...}]
    },
     ...
  ]
}
```

The following table provides details about the top-level parameters of a sequence file. For detailed information about the parameters required for individual frames in the sequence file, see [Parameters for Individual Point Cloud Frames](#sms-point-cloud-multi-frame-input-single-frame).


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `seq-no`  |  Yes  |  Integer  |  The ordered number of the sequence.   | 
|  `prefix`  |  Yes  |  String **Accepted Values**: `s3://<bucket-name>/<prefix>/`  |  The Amazon S3 location where the sequence files are located.  The prefix must end with a forward slash: `/`.  | 
|  `number-of-frames`  |  Yes  |  Integer  |  The total number of frames included in the sequence file. This number must match the total number of frames listed in the `frames` parameter in the next row.  | 
|  `frames`  |  Yes  |  List of JSON objects  |  A list of frame data. The length of the list must equal `number-of-frames`. In the worker UI, frames in a sequence will be the same as the order of frames in this array.  For details about the format of each frame, see [Parameters for Individual Point Cloud Frames](#sms-point-cloud-multi-frame-input-single-frame).   | 

## Parameters for Individual Point Cloud Frames
<a name="sms-point-cloud-multi-frame-input-single-frame"></a>

The following table shows the parameters you can include in your input manifest file.


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `frame-no`  |  No  |  Integer  |  A frame number. This is an optional identifier specified by the customer to identify the frame within a sequence. It is not used by Ground Truth.  | 
|  `unix-timestamp`  |  Yes  |  Number  |  The unix timestamp is the number of seconds since January 1st, 1970 until the UTC time that the data was collected by a sensor.  The timestamp for each frame must be different and timestamps must be sequential because they are used for cuboid interpolation. Ideally, this should be the real timestamp when the data was collected. If this is not available, you must use an incremental sequence of timestamps, where the first frame in your sequence file corresponds to the first timestamp in the sequence.  | 
|  `frame`  |  Yes  |  String **Example of format** `<folder-name>/<sequence-file.json>`  |  The relative location, in Amazon S3 of your sequence file. This relative path will be appended to the path you specify in `prefix`.  | 
|  `format`  |  No  |  String **Accepted string values**: `"binary/xyz"`, `"binary/xyzi"`, `"binary/xyzrgb"`, `"binary/xyzirgb"`, `"text/xyz"`, `"text/xyzi"`, `"text/xyzrgb"`, `"text/xyzirgb"` **Default Values**:  When the file identified in `source-ref` has a .bin extension, `binary/xyzi` When the file identified in `source-ref` has a .txt extension, `text/xyzi`  |  Use this parameter to specify the format of your point cloud data. For more information, see [Accepted Raw 3D Data Formats](sms-point-cloud-raw-data-types.md).  | 
|  `ego-vehicle-pose`  |  No  |  JSON object  |  The pose of the device used to collect the point cloud data. For more information about this parameter, see [Include Vehicle Pose Information in Your Input Manifest](#sms-point-cloud-multi-frame-ego-vehicle-input).  | 
|  `prefix`  |  No  |  String **Accepted string value format**:  `s3://<bucket-name>/<folder-name>/`  |  The location in Amazon S3 where your metadata, such as camera images, is stored for this frame.  The prefix must end with a forward slash: `/`.  | 
|  `images`  |  No  |  List  |  A list parameters describing color camera images used for sensor fusion. You can include up to 8 images in this list. For more information about the parameters required for each image, see [Include Camera Data in Your Input Manifest](#sms-point-cloud-multi-frame-image-input).   | 

## Include Vehicle Pose Information in Your Input Manifest
<a name="sms-point-cloud-multi-frame-ego-vehicle-input"></a>

Use the ego-vehicle location to provide information about the pose of the vehicle used to capture point cloud data. Ground Truth use this information to compute LiDAR extrinsic matrices. 

Ground Truth uses extrinsic matrices to project labels to and from the 3D scene and 2D images. For more information, see [Sensor Fusion](sms-point-cloud-sensor-fusion-details.md#sms-point-cloud-sensor-fusion).

The following table provides more information about the `position` and orientation (`heading`) parameters that are required when you provide ego-vehicle information. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The translation vector of the ego vehicle in the world coordinate system.   | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the device or sensor mounted on the vehicle sensing the surrounding, measured in [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`) in the a coordinate system.  | 

## Include Camera Data in Your Input Manifest
<a name="sms-point-cloud-multi-frame-image-input"></a>

If you want to include color camera data with a frame, use the following parameters to provide information about each image. The **Required** column in the following table applies when the `images` parameter is included in the input manifest file. You are not required to include images in your input manifest file. 

If you include camera images, you must include information about the `position` and orientation (`heading`) of the camera used the capture the images. 

If your images are distorted, Ground Truth can automatically undistort them using information you provide about the image in your input manifest file, including distortion coefficients (`k1`, `k2`, `k3`, `k4`, `p1`, `p1`), camera model and focal length (`fx`, `fy`), and the principal point (`cx`, `cy)`. To learn more about these coefficients and undistorting images, see [Camera calibration With OpenCV](https://docs.opencv.org/2.4.13.7/doc/tutorials/calib3d/camera_calibration/camera_calibration.html). If distortion coefficients are not included, Ground Truth will not undistort an image. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `image-path`  |  Yes  |  String **Example of format**:  `<folder-name>/<imagefile.png>`  |  The relative location, in Amazon S3 of your image file. This relative path will be appended to the path you specify in `prefix`.   | 
|  `unix-timestamp`  |  Yes  |  Number  |  The timestamp of the image.   | 
|  `camera-model`  |  No  |  String: **Accepted Values**: `"pinhole"`, `"fisheye"` **Default**: `"pinhole"`  |  The model of the camera used to capture the image. This information is used to undistort camera images.   | 
|  `fx, fy`  |  Yes  |  Numbers  |  The focal length of the camera, in the x (`fx`) and y (`fy`) directions.  | 
|  `cx, cy`  |  Yes  | Numbers |  The x (`cx`) and y (`cy`) coordinates of the principal point.   | 
|  `k1, k2, k3, k4`  |  No  |  Number  |  Radial distortion coefficients. Supported for both **fisheye** and **pinhole** camera models.   | 
|  `p1, p2`  |  No  |  Number  |  Tangential distortion coefficients. Supported for **pinhole** camera models.  | 
|  `skew`  |  No  |  Number  |  A parameter to measure any known skew in the image.  | 
|  `position`  |  Yes  |  JSON object **Required Parameters**: `x`, `y`, and `z`. Enter numbers for these parameters.   |  The location or origin of the frame of reference of the camera mounted on the vehicle capturing images.  | 
|  `heading`  |  Yes  |  JSON Object **Required Parameters**: `qx`, `qy`, `qz`, and `qw`. Enter numbers for these parameters.   |  The orientation of the frame of reference of the camera mounted on the vehicle capturing images, measured using [quaternions](https://en.wikipedia.org/wiki/Quaternion), (`qx`, `qy`, `qz`, `qw`).   | 

## Sequence File and Point Cloud Frame Limits
<a name="sms-point-cloud-multi-frame-limits"></a>

You can include up to 100,000 point cloud frame sequences in your input manifest file. You can include up to 500 point cloud frames in each sequence file. 

Keep in mind that 3D point cloud labeling job have longer pre-processing times than other Ground Truth task types. For more information, see [Job pre-processing time](sms-point-cloud-general-information.md#sms-point-cloud-job-creation-time).

# Understand Coordinate Systems and Sensor Fusion
<a name="sms-point-cloud-sensor-fusion-details"></a>

Point cloud data is always located in a coordinate system. This coordinate system may be local to the vehicle or the device sensing the surroundings, or it may be a world coordinate system. When you use Ground Truth 3D point cloud labeling jobs, all the annotations are generated using the coordinate system of your input data. For some labeling job task types and features, you must provide data in a world coordinate system. 

In this topic, you'll learn the following:
+ When you *are required to* provide input data in a world coordinate system or global frame of reference.
+ What a world coordinate is and how you can convert point cloud data to a world coordinate system. 
+ How you can use your sensor and camera extrinsic matrices to provide pose data when using sensor fusion. 

## Coordinate System Requirements for Labeling Jobs
<a name="sms-point-cloud-sensor-fusion-coordinate-requirements"></a>

If your point cloud data was collected in a local coordinate system, you can use an extrinsic matrix of the sensor used to collect the data to convert it to a world coordinate system or a global frame of reference. If you cannot obtain an extrinsic for your point cloud data and, as a result, cannot obtain point clouds in a world coordinate system, you can provide point cloud data in a local coordinate system for 3D point cloud object detection and semantic segmentation task types. 

For object tracking, you must provide point cloud data in a world coordinate system. This is because when you are tracking objects across multiple frames, the ego vehicle itself is moving in the world and so all of the frames need a point of reference. 

If you include camera data for sensor fusion, it is recommended that you provide camera poses in the same world coordinate system as the 3D sensor (such as a LiDAR sensor). 

## Using Point Cloud Data in a World Coordinate System
<a name="sms-point-cloud-world-coordinate-system"></a>

This section explains what a world coordinate system (WCS), also referred to as a *global frame of reference*, is and explains how you can provide point cloud data in a world coordinate system.

### What is a World Coordinate System?
<a name="sms-point-cloud-what-is-wcs"></a>

A WCS or global frame of reference is a fixed universal coordinate system in which vehicle and sensor coordinate systems are placed. For example, if multiple point cloud frames are located in different coordinate systems because they were collected from two sensors, a WCS can be used to translate all of the coordinates in these point cloud frames into a single coordinate system, where all frames have the same origin, (0,0,0). This transformation is done by translating the origin of each frame to the origin of the WCS using a translation vector, and rotating the three axes (typically x, y, and z) to the right orientation using a rotation matrix. This rigid body transformation is called a *homogeneous transformation*.

A world coordinate system is important in global path planning, localization, mapping, and driving scenario simulations. Ground Truth uses the right-handed Cartesian world coordinate system such as the one defined in [ISO 8855](https://www.iso.org/standard/51180.html), where the x axis is forward toward the car’s movement, y axis is left, and the z axis points up from the ground. 

The global frame of reference depends on the data. Some datasets use the LiDAR position in the first frame as the origin. In this scenario, all the frames use the first frame as a reference and device heading and position will be near the origin in the first frame. For example, KITTI datasets have the first frame as a reference for world coordinates. Other datasets use a device position that is different from the origin.

Note that this is not the GPS/IMU coordinate system, which is typically rotated by 90 degrees along the z-axis. If your point cloud data is in a GPS/IMU coordinate system (such as OxTS in the open source AV KITTI dataset), then you need to transform the origin to a world coordinate system (typically the vehicle's reference coordinate system). You apply this transformation by multiplying your data with transformation metrics (the rotation matrix and translation vector). This will transform the data from its original coordinate system to a global reference coordinate system. Learn more about this transformation in the next section. 

### Convert 3D Point Cloud Data to a WCS
<a name="sms-point-cloud-coordinate-system-general"></a>

Ground Truth assumes that your point cloud data has already been transformed into a reference coordinate system of your choice. For example, you can choose the reference coordinate system of the sensor (such as LiDAR) as your global reference coordinate system. You can also take point clouds from various sensors and transform them from the sensor's view to the vehicle's reference coordinate system view. You use the a sensor's extrinsic matrix, made up of a rotation matrix and translation vector, to convert your point cloud data to a WCS or global frame of reference. 

Collectively, the translation vector and rotation matrix can be used to make up an *extrinsic matrix*, which can be used to convert data from a local coordinate system to a WCS. For example, your LiDAR extrinsic matrix may be composed as follows, where `R` is the rotation matrix and `T` is the translation vector:

```
LiDAR_extrinsic = [R T;0 0 0 1]
```

For example, the autonomous driving KITTI dataset includes a rotation matrix and translation vector for the LiDAR extrinsic transformation matrix for each frame. The [pykitti](https://github.com/utiasSTARS/pykitti) python module can be used for loading the KITTI data, and in the dataset `dataset.oxts[i].T_w_imu` gives the LiDAR extrinsic transform for the `i`th frame with can be multiplied with points in that frame to convert them to a world frame - `np.matmul(lidar_transform_matrix, points)`. Multiplying a point in LiDAR frame with a LiDAR extrinsic matrix transforms it into world coordinates. Multiplying a point in the world frame with the camera extrinsic matrix gives the point coordinates in the camera's frame of reference.

The following code example demonstrates how you can convert point cloud frames from the KITTI dataset into a WCS. 

```
import pykitti
import numpy as np

basedir = '/Users/nameofuser/kitti-data'
date = '2011_09_26'
drive = '0079'

# The 'frames' argument is optional - default: None, which loads the whole dataset.
# Calibration, timestamps, and IMU data are read automatically. 
# Camera and velodyne data are available via properties that create generators
# when accessed, or through getter methods that provide random access.
data = pykitti.raw(basedir, date, drive, frames=range(0, 50, 5))

# i is frame number
i = 0

# lidar extrinsic for the ith frame
lidar_extrinsic_matrix = data.oxts[i].T_w_imu

# velodyne raw point cloud in lidar scanners own coordinate system
points = data.get_velo(i)

# transform points from lidar to global frame using lidar_extrinsic_matrix
def generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix):
    tps = []
    for point in points:
        transformed_points = np.matmul(lidar_extrinsic_matrix, np.array([point[0], point[1], point[2], 1], dtype=np.float32).reshape(4,1)).tolist()
        if len(point) > 3 and point[3] is not None:
            tps.append([transformed_points[0][0], transformed_points[1][0], transformed_points[2][0], point[3]])
       
    return tps
    
# customer transforms points from lidar to global frame using lidar_extrinsic_matrix
transformed_pcl = generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix)
```

## Sensor Fusion
<a name="sms-point-cloud-sensor-fusion"></a>

Ground Truth supports sensor fusion of point cloud data with up to 8 video camera inputs. This feature allows human labellers to view the 3D point cloud frame side-by-side with the synchronized video frame. In addition to providing more visual context for labeling, sensor fusion allows workers to adjust annotations in the 3D scene and in 2D images and the adjustment are projected into the other view. The following video demonstrates a 3D point cloud labeling job with LiDAR and camera sensor fusion. 

![\[Gif showing a 3D point cloud labeling job with LiDAR and camera sensor fusion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/gifs/object_tracking/sensor-fusion.gif)


For best results, when using sensor fusion, your point cloud should be in a WCS. Ground Truth uses your sensor (such as LiDAR), camera, and ego vehicle pose information to compute extrinsic and intrinsic matrices for sensor fusion. 

### Extrinsic Matrix
<a name="sms-point-cloud-extrinsics"></a>

Ground Truth uses sensor (such as LiDAR) extrinsic and camera extrinsic and intrinsic matrices to project objects to and from the point cloud data's frame of reference to the camera's frame of reference. 

For example, in order to project a label from the 3D point cloud to camera image plane, Ground Truth transforms 3D points from LiDAR’s own coordinate system to the camera's coordinate system. This is typically done by first transforming 3D points from LiDAR’s own coordinate system to a world coordinate system (or a global reference frame) using the LiDAR extrinsic matrix. Ground Truth then uses the camera inverse extrinsic (which converts points from a global frame of reference to the camera's frame of reference) to transform the 3D points from world coordinate system obtained in previous step into the camera image plane. The LiDAR extrinsic matrix can also be used to transform 3D data into a world coordinate system. If your 3D data is already transformed into world coordinate system then the first transformation doesn’t have any impact on label translation, and label translation only depends on the camera inverse extrinsic. A view matrix is used to visualize projected labels. To learn more about these transformations and the view matrix, see [Ground Truth Sensor Fusion Transformations](#sms-point-cloud-extrinsic-intrinsic-explanation).

 Ground Truth computes these extrinsic matrices by using LiDAR and camera *pose data* that you provide: `heading` ( in quaternions: `qx`, `qy`, `qz`, and `qw`) and `position` (`x`, `y`, `z`). For the vehicle, typically the heading and position are described in vehicle's reference frame in a world coordinate system and are called a *ego vehicle pose*. For each camera extrinsic, you can add pose information for that camera. For more information, see [Pose](#sms-point-cloud-pose).

### Intrinsic Matrix
<a name="sms-point-cloud-intrinsic"></a>

Ground Truth use the camera extrinsic and intrinsic matrices to compute view metrics to transform labels to and from the 3D scene to camera images. Ground Truth computes the camera intrinsic matrix using camera focal length (`fx`, `fy`) and optical center coordinates (`cx`,`cy`) that you provide. For more information, see [Intrinsic and Distortion](#sms-point-cloud-camera-intrinsic-distortion).

### Image Distortion
<a name="sms-point-cloud-image-distortion"></a>

Image distortion can occur for a variety of reasons. For example, images may be distorted due to barrel or fish-eye effects. Ground Truth uses intrinsic parameters along with distortion co-efficient to undistort images you provide when creating 3D point cloud labeling jobs. If a camera image is already been undistorted, all distortion coefficients should be set to 0.

For more information about the transformations Ground Truth performs to undistort images, see [Camera Calibrations: Extrinsic, Intrinsic and Distortion](#sms-point-cloud-extrinsic-camera-explanation).

### Ego Vehicle
<a name="sms-point-cloud-ego-vehicle"></a>

To collect data for autonomous driving applications, the measurements used to generate point cloud data and are taken from sensors mounted on a vehicle, or the *ego vehicle*. To project label adjustments to and from the 3D scene and 2D images, Ground Truth needs your ego vehicle pose in a world coordinate system. The ego vehicle pose is comprised of position coordinates and orientation quaternion. 

 Ground Truth uses your ego vehicle pose to compute rotation and transformations matrices. Rotations in 3 dimensions can be represented by a sequence of 3 rotations around a sequence of axes. In theory, any three axes spanning the 3D Euclidean space are enough. In practice, the axes of rotation are chosen to be the basis vectors. The three rotations are expected to be in a global frame of reference (extrinsic). Ground Truth does not a support body centered frame of reference (intrinsic) which is attached to, and moves with, the object under rotation. To track objects, Ground Truth needs to measure from a global reference where all vehicles are moving. When using Ground Truth 3D point cloud labeling jobs, z specifies the axis of rotation (extrinsic rotation) and yaw Euler angles are in radians (rotation angle).

### Pose
<a name="sms-point-cloud-pose"></a>

Ground Truth uses pose information for 3D visualizations and sensor fusion. Pose information you input through your manifest file is used to compute extrinsic matrices. If you already have an extrinsic matrix, you can use it to extract sensor and camera pose data. 

For example in the autonomous driving KITTI dataset, the [pykitti](https://github.com/utiasSTARS/pykitti) python module can be used for loading the KITTI data. In the dataset `dataset.oxts[i].T_w_imu` gives the LiDAR extrinsic transform for the `i`th frame and it can be multiplied with the points to get them in a world frame - `matmul(lidar_transform_matrix, points)`. This transform can be converted into position (translation vector) and heading (in quaternion) of LiDAR for the input manifest file JSON format. Camera extrinsic transform for `cam0` in `i`th frame can be calculated by `inv(matmul(dataset.calib.T_cam0_velo, inv(dataset.oxts[i].T_w_imu)))` and this can be converted into heading and position for `cam0`.

```
import numpy

rotation = [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03],
 [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02],
 [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01]]
 
origin= [1.71104606e+00,
          5.80000039e-01,
          9.43144935e-01]

         
from scipy.spatial.transform import Rotation as R

# position is the origin
position = origin 
r = R.from_matrix(np.asarray(rotation))

# heading in WCS using scipy 
heading = r.as_quat()
print(f"pose:{position}\nheading: {heading}")
```

**Position**  
In the input manifest file, `position` refers to the position of the sensor with respect to a world frame. If you are unable to put the device position in a world coordinate system, you can use LiDAR data with local coordinates. Similarly, for mounted video cameras you can specify the position and heading in a world coordinate system. For camera, if you do not have position information, please use (0, 0, 0). 

The following are the fields in the position object:

1.  `x` (float) – x coordinate of ego vehicle, sensor, or camera position in meters. 

1.  `y` (float) – y coordinate of ego vehicle, sensor, or camera position in meters. 

1.  `z` (float) – z coordinate of ego vehicle, sensor, or camera position in meters. 

The following is an example of a `position` JSON object: 

```
{
    "position": {
        "y": -152.77584902657554,
        "x": 311.21505956090624,
        "z": -10.854137529636024
      }
}
```

**Heading**  
In the input manifest file, `heading` is an object that represents the orientation of a device with respect to world frame. Heading values should be in quaternion. A [quaternion](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) is a representation of the orientation consistent with geodesic spherical properties. If you are unable to put the sensor heading in world coordinates, please use the identity quaternion `(qx = 0, qy = 0, qz = 0, qw = 1)`. Similarly, for cameras, specify the heading in quaternions. If you are unable to obtain extrinsic camera calibration parameters, please also use the identity quaternion. 

Fields in `heading` object are as follows:

1.  `qx` (float) - x component of ego vehicle, sensor, or camera orientation. 

1.  `qy` (float) - y component of ego vehicle, sensor, or camera orientation. 

1.  `qz` (float) - z component of ego vehicle, sensor, or camera orientation. 

1. `qw` (float) - w component of ego vehicle, sensor, or camera orientation. 

The following is an example of a `heading` JSON object: 

```
{
    "heading": {
        "qy": -0.7046155108831117,
        "qx": 0.034278837280808494,
        "qz": 0.7070617895701465,
        "qw": -0.04904659893885366
      }
}
```

To learn more, see [Compute Orientation Quaternions and Position](#sms-point-cloud-ego-vehicle-orientation).

## Compute Orientation Quaternions and Position
<a name="sms-point-cloud-ego-vehicle-orientation"></a>

Ground Truth requires that all orientation, or heading, data be given in quaternions. A [quaternions](https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation) is a representation of the orientation consistent with geodesic spherical properties that can be used to approximate of rotation. Compared to [Euler angles](https://en.wikipedia.org/wiki/Euler_angles) they are simpler to compose and avoid the problem of [gimbal lock](https://en.wikipedia.org/wiki/Gimbal_lock). Compared to rotation matrices they are more compact, more numerically stable, and more efficient. 

You can compute quaternions from a rotation matrix or a transformation matrix.

If you have a rotation matrix (made up of the axis rotations) and translation vector (or origin) in world coordinate system instead of a single 4x4 rigid transformation matrix, then you can directly use the rotation matrix and translation vector to compute quaternions. Libraries like [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html) and [pyqaternion ](http://kieranwynn.github.io/pyquaternion/#explicitly-by-rotation-or-transformation-matrix) can help. The following code-block shows an example using these libraries to compute quaternion from a rotation matrix. 

```
import numpy

rotation = [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03],
 [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02],
 [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01]]
 
origin = [1.71104606e+00,
          5.80000039e-01,
          9.43144935e-01]

         
from scipy.spatial.transform import Rotation as R
# position is the origin
position = origin 
r = R.from_matrix(np.asarray(rotation))
# heading in WCS using scipy 
heading = r.as_quat()
print(f"position:{position}\nheading: {heading}")
```

A UI tool like [3D Rotation Converter](https://www.andre-gaschler.com/rotationconverter/) can also be useful.

If you have a 4x4 extrinsic transformation matrix, note that the transformation matrix is in the form `[R T; 0 0 0 1]` where `R` is the rotation matrix and `T` is the origin translation vector. That means you can extract rotation matrix and translation vector from the transformation matrix as follows.

```
import numpy as np

transformation 
= [[ 9.96714314e-01, -8.09890350e-02,  1.16333982e-03, 1.71104606e+00],
   [ 8.09967396e-02,  9.96661051e-01, -1.03090934e-02, 5.80000039e-01],
   [-3.24531964e-04,  1.03694477e-02,  9.99946183e-01, 9.43144935e-01],
   [              0,               0,               0,              1]]

transformation  = np.array(transformation )
rotation = transformation[0:3,0:3]
translation= transformation[0:3,3]

from scipy.spatial.transform import Rotation as R
# position is the origin translation
position = translation
r = R.from_matrix(np.asarray(rotation))
# heading in WCS using scipy 
heading = r.as_quat()
print(f"position:{position}\nheading: {heading}")
```

With your own setup, you can compute an extrinsic transformation matrix using the GPS/IMU position and orientation (latitude, longitude, altitude and roll, pitch, yaw) with respect to the LiDAR sensor on the ego vehicle. For example, you can compute pose from KITTI raw data using `pose = convertOxtsToPose(oxts)` to transform the oxts data into a local euclidean poses, specified by 4x4 rigid transformation matrices. You can then transform this pose transformation matrix to a global reference frame using the reference frames transformation matrix in the world coordinate system.

```
struct Quaternion
{
    double w, x, y, z;
};

Quaternion ToQuaternion(double yaw, double pitch, double roll) // yaw (Z), pitch (Y), roll (X)
{
    // Abbreviations for the various angular functions
    double cy = cos(yaw * 0.5);
    double sy = sin(yaw * 0.5);
    double cp = cos(pitch * 0.5);
    double sp = sin(pitch * 0.5);
    double cr = cos(roll * 0.5);
    double sr = sin(roll * 0.5);

    Quaternion q;
    q.w = cr * cp * cy + sr * sp * sy;
    q.x = sr * cp * cy - cr * sp * sy;
    q.y = cr * sp * cy + sr * cp * sy;
    q.z = cr * cp * sy - sr * sp * cy;

    return q;
}
```

## Ground Truth Sensor Fusion Transformations
<a name="sms-point-cloud-extrinsic-intrinsic-explanation"></a>

The following sections go into greater detail about the Ground Truth sensor fusion transformations that are performed using the pose data you provide.

### LiDAR Extrinsic
<a name="sms-point-cloud-extrinsic-lidar-explanation"></a>

In order to project to and from a 3D LiDAR scene to a 2D camera image, Ground Truth computes the rigid transformation projection metrics using the ego vehicle pose and heading. Ground Truth computes rotation and translation of a world coordinates into the 3D plane by doing a simple sequence of rotations and translation. 

Ground Truth computes rotation metrics using the heading quaternions as follows:

![\[Equation: Ground Truth point cloud rotation metrics.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-rotation-matrix.png)


Here, `[x, y, z, w]` corresponds to parameters in the `heading` JSON object, `[qx, qy, qz, qw]`. Ground Truth computes the translation column vector as `T = [poseX, poseY, poseZ]`. Then the extrinsic metrics is simply as follows:

```
LiDAR_extrinsic = [R T;0 0 0 1]
```

### Camera Calibrations: Extrinsic, Intrinsic and Distortion
<a name="sms-point-cloud-extrinsic-camera-explanation"></a>

*Geometric camera calibration*, also referred to as *camera resectioning*, estimates the parameters of a lens and image sensor of an image or video camera. You can use these parameters to correct for lens distortion, measure the size of an object in world units, or determine the location of the camera in the scene. Camera parameters include intrinsics and distortion coefficients.

#### Camera Extrinsic
<a name="sms-point-cloud-camera-extrinsic"></a>

If the camera pose is given, then Ground Truth computes the camera extrinsic based on a rigid transformation from the 3D plane into the camera plane. The calculation is the same as the one used for the [LiDAR Extrinsic](#sms-point-cloud-extrinsic-lidar-explanation), except that Ground Truth uses camera pose (`position` and `heading`) and computes the inverse extrinsic.

```
 camera_inverse_extrinsic = inv([Rc Tc;0 0 0 1]) #where Rc and Tc are camera pose components
```

#### Intrinsic and Distortion
<a name="sms-point-cloud-camera-intrinsic-distortion"></a>

Some cameras, such as pinhole or fisheye cameras, may introduce significant distortion in photos. This distortion can be corrected using distortion coefficients and the camera focal length. To learn more, see [Camera calibration With OpenCV](https://docs.opencv.org/2.4.13.7/doc/tutorials/calib3d/camera_calibration/camera_calibration.html) in the OpenCV documentation.

There are two types of distortion Ground Truth can correct for: radial distortion and tangential distortion.

*Radial distortion* occurs when light rays bend more near the edges of a lens than they do at its optical center. The smaller the lens, the greater the distortion. The presence of the radial distortion manifests in form of the *barrel* or *fish-eye* effect and Ground Truth uses Formula 1 to undistort it. 

**Formula 1:**

![\[Formula 1: equations for x_{corrected} and y_{corrected}, to undistort radial distortion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-distortion-1.png)


*Tangential distortion* occurs because the lenses used to take the images are not perfectly parallel to the imaging plane. This can be corrected with Formula 2. 

**Formula 2:**

![\[Formula 2: equations for x_{corrected} and y_{corrected}, to correct for tangential distortion.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-distortion-2.png)


In the input manifest file, you can provide distortion coefficients and Ground Truth will undistort your images. All distortion coefficients are floats. 
+ `k1`, `k2`, `k3`, `k4` – Radial distortion coefficients. Supported for both fisheye and pinhole camera models.
+ `p1` ,`p2` – Tangential distortion coefficients. Supported for pinhole camera models.

If images are already undistorted, all distortion coefficients should be 0 in your input manifest. 

In order to correctly reconstruct the corrected image, Ground Truth does a unit conversion of the images based on focal lengths. If a common focal length is used with a given aspect ratio for both axes, such as 1, in the upper formula we will have a single focal length. The matrix containing these four parameters is referred to as the *in camera intrinsic calibration matrix*. 

![\[The in camera intrinsic calibration matrix.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pointcloud/sms-point-cloud-camera-intrinsic.png)


While the distortion coefficients are the same regardless of the camera resolutions used, these should be scaled with the current resolution from the calibrated resolution. 

The following are float values. 
+ `fx` - focal length in x direction.
+ `fy` - focal length in y direction.
+ `cx` - x coordinate of principal point.
+ `cy` - y coordinate of principal point.

Ground Truth use the camera extrinsic and camera intrinsic to compute view metrics as shown in the following code block to transform labels between the 3D scene and 2D images.

```
def generate_view_matrix(intrinsic_matrix, extrinsic_matrix):
    intrinsic_matrix = np.c_[intrinsic_matrix, np.zeros(3)]
    view_matrix = np.matmul(intrinsic_matrix, extrinsic_matrix)
    view_matrix = np.insert(view_matrix, 2, np.array((0, 0, 0, 1)), 0)
    return view_matrix
```

# Video Frame Input Data
<a name="sms-video-frame-input-data-overview"></a>

When you create a video frame object detection or object tracking labeling job, you can choose video files (MP4 files) or video frames for input data. All worker tasks are created using video frames, so if you choose video files, use the Ground Truth frame extraction tool to extract video frames (images) from your video files. 

For both of these options, you can use the **Automated data setup** option in the Ground Truth section of the Amazon SageMaker AI console to set up a connection between Ground Truth and your input data in Amazon S3 so that Ground Truth knows where to look for your input data when creating your labeling tasks. This creates and stores an input manifest file in your Amazon S3 input dataset location. To learn more, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md).

Alternatively, you can manually create sequence files for each sequence of video frames that you want labeled and provide the Amazon S3 location of an input manifest file that references each of these sequences files using the `source-ref` key. To learn more, see [Create a Video Frame Input Manifest File](sms-video-manual-data-setup.md#sms-video-create-manifest). 

**Topics**
+ [Choose Video Files or Video Frames for Input Data](sms-point-cloud-video-input-data.md)
+ [Input Data Setup](sms-video-data-setup.md)

# Choose Video Files or Video Frames for Input Data
<a name="sms-point-cloud-video-input-data"></a>

When you create a video frame object detection or object tracking labeling job, you can provide a sequence of video frames (images) or you can use the Amazon SageMaker AI console to have Ground Truth automatically extract video frames from your video files. Use the following sections to learn more about these options. 

## Provide Video Frames
<a name="sms-video-provide-frames"></a>

Video frames are sequences of images extracted from a video file. You can create a Ground Truth labeling job to have workers label multiple sequences of video frames. Each sequence is made up of images extracted from a single video. 

To create a labeling job using video frame sequences, you must store each sequence using a unique [key name prefix](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-keys) in Amazon S3. In the Amazon S3 console, key name prefixes are folders. So in the Amazon S3 console, each sequence of video frames must be located in its own folder in Amazon S3.

For example, if you have two sequences of video frames, you might use the key name prefixes `sequence1/` and `sequence2/` to identify your sequences. In this example, your sequences may be located in `s3://amzn-s3-demo-bucket/video-frames/sequence1/` and `s3://amzn-s3-demo-bucket/video-frames/sequence2/`.

If you are using the Ground Truth console to create an input manifest file, all of the sequence key name prefixes should be in the same location in Amazon S3. For example, in the Amazon S3 console, each sequence could be in a folder in `s3://amzn-s3-demo-bucket/video-frames/`. In this example, your first sequence of video frames (images) may be located in `s3://amzn-s3-demo-bucket/video-frames/sequence1/` and your second sequence may be located in `s3://amzn-s3-demo-bucket/video-frames/sequence2/`. 

**Important**  
Even if you only have a single sequence of video frames that you want workers to label, that sequence must have a key name prefix in Amazon S3. If you are using the Amazon S3 console, this means that your sequence is located in a folder. It cannot be located in the root of your S3 bucket. 

When creating worker tasks using sequences of video frames, Ground Truth uses one sequence per task. In each task, Ground Truth orders your video frames using [UTF-8](https://en.wikipedia.org/wiki/UTF-8) binary order. 

For example, video frames might be in the following order in Amazon S3: 

```
[0001.jpg, 0002.jpg, 0003.jpg, ..., 0011.jpg]
```

They are arranged in the same order in the worker’s task: `0001.jpg, 0002.jpg, 0003.jpg, ..., 0011.jpg`.

Frames might also be ordered using a naming convention like the following:

```
[frame1.jpg, frame2.jpg, ..., frame11.jpg]
```

In this case, `frame10.jpg` and `frame11.jpg` come before `frame2.jpg` in the worker task. Your worker sees your video frames in the following order: `frame1.jpg, frame10.jpg, frame11.jpg, frame2.jpg, ..., frame9.jpg`. 

## Provide Video Files
<a name="sms-point-cloud-video-frame-extraction"></a>

You can use the Ground Truth frame splitting feature when creating a new labeling job in the console to extract video frames from video files (MP4 files). A series of video frames extracted from a single video file is referred to as a *sequence of video frames*.

You can either have Ground Truth automatically extract all frames, up to 2,000, from the video, or you can specify a frequency for frame extraction. For example, you can have Ground Truth extract every 10th frame from your videos.

You can provide up to 50 videos when you use automated data setup to extract frames, however your input manifest file cannot reference more than 10 video frame sequence files when you create a video frame object tracking and video frame object detection labeling job. If you use the automated data setup console tool to extract video frames from more than 10 video files, you will need to modify the manifest file the tool generates or create a new one to include 10 video frame sequence files or less. To learn more about these quotas, see [3D Point Cloud and Video Frame Labeling Job Quotas](input-data-limits.md#sms-input-data-quotas-other).

To use the video frame extraction tool, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md). 

When all of your video frames have been successfully extracted from your videos, you will see the following in your S3 input dataset location:
+ A key name prefix (a folder in the Amazon S3 console) named after each video. Each of these prefixes leads to:
  + A sequence of video frames extracted from the video used to name that prefix.
  + A sequence file used to identify all of the images that make up that sequence. 
+ An input manifest file with a .manifest extension. This identifies all of the sequence files that will be used to create your labeling job. 

All of the frames extracted from a single video file are used for a labeling task. If you extract video frames from multiple video files, multiple tasks are created for your labeling job, one for each sequence of video frames. 

 Ground Truth stores each sequence of video frames that it extracts in your Amazon S3 location for input datasets using a unique [key name prefix](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-keys). In the Amazon S3 console, key name prefixes are folders.

# Input Data Setup
<a name="sms-video-data-setup"></a>

When you create a video frame labeling job, you need to let Ground Truth know where to look for your input data. You can do this in one of two ways:
+ You can store your input data in Amazon S3 and have Ground Truth automatically detect the input dataset used for your labeling job. See [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md) to learn more about this option. 
+ You can create an input manifest file and sequence files and upload them to Amazon S3. See [Set up Video Frame Input Data Manually](sms-video-manual-data-setup.md) to learn more about this option. 

**Topics**
+ [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md)
+ [Set up Video Frame Input Data Manually](sms-video-manual-data-setup.md)

# Set up Automated Video Frame Input Data
<a name="sms-video-automated-data-setup"></a>

You can use the Ground Truth automated data setup to automatically detect video files in your Amazon S3 bucket and extract video frames from those files. To learn how, see [Provide Video Files](sms-point-cloud-video-input-data.md#sms-point-cloud-video-frame-extraction).

If you already have video frames in Amazon S3, you can use the automated data setup to use these video frames in your labeling job. For this option, all video frames from a single video must be stored using a unique prefix. To learn about the requirements to use this option, see [Provide Video Frames](sms-point-cloud-video-input-data.md#sms-video-provide-frames).

Select one of the following sections to learn how to set up your automatic input dataset connection with Ground Truth.

## Provide Video Files and Extract Frames
<a name="sms-video-provide-files-auto-setup-console"></a>

Use the following procedure to connect your video files with Ground Truth and automatically extract video frames from those files for video frame object detection and object tracking labeling jobs.

**Note**  
If you use the automated data setup console tool to extract video frames from more than 10 video files, you will need to modify the manifest file the tool generates or create a new one to include 10 video frame sequence files or less. To learn more, see [Provide Video Files](sms-point-cloud-video-input-data.md#sms-point-cloud-video-frame-extraction).

Make sure your video files are stored in an Amazon S3 bucket in the same AWS Region that you perform the automated data setup in. 

**Automatically connect your video files in Amazon S3 with Ground Truth and extract video frames:**

1. Navigate to the **Create labeling job** page in the Amazon SageMaker AI console: [https://console.aws.amazon.com/sagemaker/groundtruth](https://console.aws.amazon.com//sagemaker/groundtruth). 

   Your input and output S3 buckets must be located in the same AWS Region that you create your labeling job in. This link puts you in the North Virginia (us-east-1) AWS Region. If your input data is in an Amazon S3 bucket in another Region, switch to that Region. To change your AWS Region, on the [navigation bar](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html#select-region), choose the name of the currently displayed Region.

1. Select **Create labeling job**.

1. Enter a **Job name**. 

1. In the section **Input data setup**, select **Automated data setup**.

1. Enter an Amazon S3 URI for **S3 location for input datasets**. An S3 URI looks like the following: `s3://amzn-s3-demo-bucket/path-to-files/`. This URI should point to the Amazon S3 location where your video files are stored.

1. Specify your **S3 location for output datasets**. This is where your output data is stored. You can choose to store your output data in the **Same location as input dataset** or **Specify a new location** and entering the S3 URI of the location that you want to store your output data.

1. Choose **Video Files** for your **Data type** using the dropdown list.

1. Choose **Yes, extract frames for object tracking and detection tasks**. 

1. Choose a method of **Frame extraction**.
   + When you choose **Use all frames extracted from the video to create a labeling task**, Ground Truth extracts all frames from each video in your **S3 location for input datasets**, up to 2,000 frames. If a video in your input dataset contains more than 2,000 frames, the first 2,000 are extracted and used for that labeling task. 
   + When you choose **Use every *x* frame from a video to create a labeling task**, Ground Truth extracts every *x*th frame from each video in your **S3 location for input datasets**. 

     For example, if your video is 2 seconds long, and has a [frame rate](https://en.wikipedia.org/wiki/Frame_rate) of 30 frames per second, there are 60 frames in your video. If you specify 10 here, Ground Truth extracts every 10th frame from your video. This means the 1st, 10th, 20th, 30th, 40th, 50th, and 60th frames are extracted. 

1. Choose or create an IAM execution role. Make sure that this role has permission to access your Amazon S3 locations for input and output data specified in steps 5 and 6. 

1. Select **Complete data setup**.

## Provide Video Frames
<a name="sms-video-provide-frames-auto-setup-console"></a>

Use the following procedure to connect your sequences of video frames with Ground Truth for video frame object detection and object tracking labeling jobs. 

Make sure your video frames are stored in an Amazon S3 bucket in the same AWS Region that you perform the automated data setup in. Each sequence of video frames should have a unique prefix. For example, if you have two sequences stored in `s3://amzn-s3-demo-bucket/video-frames/sequences/`, each should have a unique prefix like `sequence1` and `sequence2` and should both be located directly under the `/sequences/` prefix. In the example above, the locations of these two sequences is: `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence1/` and `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence2/`. 

**Automatically connect your video frame in Amazon S3 with Ground Truth:**

1. Navigate to the **Create labeling job** page in the Amazon SageMaker AI console: [https://console.aws.amazon.com/sagemaker/groundtruth](https://console.aws.amazon.com//sagemaker/groundtruth). 

   Your input and output S3 buckets must be located in the same AWS Region that you create your labeling job in. This link puts you in the North Virginia (us-east-1) AWS Region. If your input data is in an Amazon S3 bucket in another Region, switch to that Region. To change your AWS Region, on the [navigation bar](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html#select-region), choose the name of the currently displayed Region.

1. Select **Create labeling job**.

1. Enter a **Job name**. 

1. In the section **Input data setup**, select **Automated data setup**.

1. Enter an Amazon S3 URI for **S3 location for input datasets**. 

   This should be the Amazon S3 location where your sequences are stored. For example, if you have two sequences stored in `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence1/`, `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence2/`, enter `s3://amzn-s3-demo-bucket/video-frames/sequences/` here.

1. Specify your **S3 location for output datasets**. This is where your output data is stored. You can choose to store your output data in the **Same location as input dataset** or **Specify a new location** and entering the S3 URI of the location that you want to store your output data.

1. Choose **Video frames** for your **Data type** using the dropdown list. 

1. Choose or create an IAM execution role. Make sure that this role has permission to access your Amazon S3 locations for input and output data specified in steps 5 and 6. 

1. Select **Complete data setup**.

These procedures will create an input manifest in the Amazon S3 location for input datasets that you specified in step 5. If you are creating a labeling job using the SageMaker API or, AWS CLI, or an AWS SDK, use the Amazon S3 URI for this input manifest file as input to the parameter `ManifestS3Uri`.

# Set up Video Frame Input Data Manually
<a name="sms-video-manual-data-setup"></a>

Choose the manual data setup option if you have created sequence files for each of your video frame sequences, and a manifest file listing references to those sequences files.

## Create a Video Frame Input Manifest File
<a name="sms-video-create-manifest"></a>

 Ground Truth uses the input manifest file to identify the location of your input dataset when creating labeling tasks. For video frame object detection and object tracking labeling jobs, each line in the input manifest file identifies the location of a video frame sequence file. Each sequence file identifies the images included in a single sequence of video frames.

Use this page to learn how to create a video frame sequence file and an input manifest file for video frame object tracking and object detection labeling jobs.

If you want Ground Truth to automatically generate your sequence files and input manifest file, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md). 

### Create a Video Frame Sequence Input Manifest
<a name="sms-video-create-input-manifest-file"></a>

In the video frame sequence input manifest file, each line in the manifest is a JSON object, with a `"source-ref"` key that references a sequence file. Each sequence file identifies the location of a sequence of video frames. This is the manifest file formatting required for all video frame labeling jobs. 

The following example demonstrates the syntax used for an input manifest file:

```
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq1.json"}
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq2.json"}
```

### Create a Video Frame Sequence File
<a name="sms-video-create-sequence-file"></a>

The data for each sequence of video frames needs to be stored in a JSON data object. The following is an example of the format you use for a sequence file. Information about each frame is included as a JSON object and is listed in the `frames` list. The following JSON has been expanded for readability. 

```
{
 "seq-no": 1,
 "prefix": "s3://amzn-s3-demo-bucket/prefix/video1/",
 "number-of-frames": 3,
 "frames":[
   {"frame-no": 1, "unix-timestamp": 1566861644, "frame": "frame0001.jpg" },
   {"frame-no": 2, "unix-timestamp": 1566861644, "frame": "frame0002.jpg" }, 
   {"frame-no": 3, "unix-timestamp": 1566861644, "frame": "frame0003.jpg" }   
 ]
}
```

The following table provides details about the parameters shown in the this code example. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `seq-no`  |  Yes  |  Integer  |  The ordered number of the sequence.   | 
|  `prefix`  |  Yes  |  String **Accepted Values**: `s3://<bucket-name>/<prefix>/`  |  The Amazon S3 location where the sequence files are located.  The prefix must end with a forward slash: `/`.  | 
|  `number-of-frames`  |  Yes  |  Integer  |  The total number of frames included in the sequence file. This number must match the total number of frames listed in the `frames` parameter in the next row.  | 
|  `frames`  |  Yes  |  List of JSON objects **Required**: `frame-no`, `frame` **Optional**: `unix-timestamp`  |  A list of frame data. The length of the list must equal `number-of-frames`. In the worker UI, frames in a sequence are ordered in [UTF-8](https://en.wikipedia.org/wiki/UTF-8) binary order. To learn more about this ordering, see [Provide Video Frames](sms-point-cloud-video-input-data.md#sms-video-provide-frames).  | 
| frame-no |  Yes  |  Integer  |  The frame order number. This will determine the order of a frame in the sequence.   | 
|  `unix-timestamp`  |  No  |  Integer  |  The unix timestamp of a frame. The number of seconds since January 1st, 1970 until the UTC time when the frame was captured.   | 
| frame |  Yes  |  String  |  The name of a video frame image file.   | 

# Labeling job output data
<a name="sms-data-output"></a>

The output from a labeling job is placed in the Amazon S3 location that you specified in the console or in the call to the [CreateLabelingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) operation. Output data appears in this location when the workers have submitted one or more tasks, or when tasks expire. Note that it may take a few minutes for output data to appear in Amazon S3 after the worker submits the task or the task expires.

Each line in the output data file is identical to the manifest file with the addition of an attribute and value for the label assigned to the input object. The attribute name for the value is defined in the console or in the call to the `CreateLabelingJob` operation. You can't use `-metadata` in the label attribute name. If you are running an image semantic segmentation, 3D point cloud semantic segmentation, or 3D point cloud object tracking job, the label attribute must end with `-ref`. For any other type of job, the attribute name can't end with `-ref`.

The output of the labeling job is the value of the key-value pair with the label. The label and the value overwrites any existing JSON data in the input file with the new value. 

For example, the following is the output from an image classification labeling job where the input data files were stored in an Amazon S3 `amzn-s3-demo-bucket` and the label attribute name was defined as *`sport`*. In this example the JSON object is formatted for readability, in the actual output file the JSON object is on a single line. For more information about the data format, see [JSON Lines](http://jsonlines.org/). 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/image_example.png",
    "sport":0,
    "sport-metadata":
    {
        "class-name": "football",
        "confidence": 0.00,
        "type":"groundtruth/image-classification",
        "job-name": "identify-sport",
        "human-annotated": "yes",
        "creation-date": "2018-10-18T22:18:13.527256"
    }
}
```

The value of the label can be any valid JSON. In this case the label's value is the index of the class in the classification list. Other job types, such as bounding box, have more complex values.

Any key-value pair in the input manifest file other than the label attribute is unchanged in the output file. You can use this to pass data to your application.

The output from a labeling job can be used as the input to another labeling job. You can use this when you are chaining together labeling jobs. For example, you can send one labeling job to determine the sport that is being played. Then you send another using the same data to determine if the sport is being played indoors or outdoors. By using the output data from the first job as the manifest for the second job, you can consolidate the results of the two jobs into one output file for easier processing by your applications. 

The output data file is written to the output location periodically while the job is in progress. These intermediate files contain one line for each line in the manifest file. If an object is labeled, the label is included. If the object hasn't been labeled, it is written to the intermediate output file identically to the manifest file.

## Output directories
<a name="sms-output-directories"></a>

Ground Truth creates several directories in your Amazon S3 output path. These directories contain the results of your labeling job and other artifacts of the job. The top-level directory for a labeling job is given the same name as your labeling job; the output directories are placed beneath it. For example, if you named your labeling job **find-people**, your output would be in the following directories:

```
s3://amzn-s3-demo-bucket/find-people/activelearning
s3://amzn-s3-demo-bucket/find-people/annotations
s3://amzn-s3-demo-bucket/find-people/inference
s3://amzn-s3-demo-bucket/find-people/manifests
s3://amzn-s3-demo-bucket/find-people/training
```

Each directory contains the following output:

### Active learning directory
<a name="sms-output-activelearning"></a>

The `activelearning` directory is only present when you are using automated data labeling. It contains the input and output validation set for automated data labeling, and the input and output folder for automatically labeled data.

### Annotations directory
<a name="sms-directories-annotations"></a>

The `annotations` directory contains all of the annotations made by the workforce. These are the responses from individual workers that have not been consolidated into a single label for the data object. 

There are three subdirectories in the `annotations` directory. 
+ The first, `worker-response`, contains the responses from individual workers. This contains a subdirectory for each iteration, which in turn contains a subdirectory for each data object in that iteration. The worker response data for each data object is stored in a timestamped JSON file that contains the answers submitted by each worker for that data object, and if you use a private workforce, metadata about those workers. To learn more about this metadata, see [Worker metadata](#sms-worker-id-private).
+ The second, `consolidated-annotation`, contains information required to consolidate the annotations in the current batch into labels for your data objects.
+ The third, `intermediate`, contains the output manifest for the current batch with any completed labels. This file is updated as the label for each data object is completed.

**Note**  
We recommend that you do not use files that are not mentioned in the documentation.

### Inference directory
<a name="sms-directories-inference"></a>

The `inference` directory is only present when you are using automated data labeling. This directory contains the input and output files for the SageMaker AI batch transform used while labeling data objects.

### Manifest directory
<a name="sms-directories-manifest"></a>

The `manifest` directory contains the output manifest from your labeling job. There is one subdirectory in the manifest directory, `output`. The `output` directory contains the output manifest file for your labeling job. The file is named `output.manifest`.

### Training directory
<a name="sms-directories-training"></a>

The `training` directory is only present when you are using automated data labeling. This directory contains the input and output files used to train the automated data labeling model.

## Confidence score
<a name="sms-output-confidence"></a>

When you have more than one worker annotate a single task, your label results from annotation consolidation. Ground Truth calculates a confidence score for each label. A *confidence score* is a number between 0 and 1 that indicates how confident Ground Truth is in the label. You can use the confidence score to compare labeled data objects to each other, and to identify the least or most confident labels.

You should not interpret the value of a confidence score as an absolute value, or compare confidence scores across labeling jobs. For example, if all of the confidence scores are between 0.98 and 0.998, you should only compare the data objects with each other and not rely on the high confidence scores. 

You should not compare the confidence scores of human-labeled data objects and auto-labeled data objects. The confidence scores for humans are calculated using the annotation consolidation function for the task, while the confidence scores for automated labeling are calculated using a model that incorporates object features. The two models generally have different scales and average confidence.

For a bounding box labeling job, Ground Truth calculates a confidence score per box. You can compare confidence scores within one image or across images for the same labeling type (human or auto). You can't compare confidence scores across labeling jobs.

If a single worker annotates a task (`NumberOfHumanWorkersPerDataObject` is set to `1` or in the console, you enter 1 for **Number of workers per dataset object**), the confidence score is set to `0.00`. 

## Worker metadata
<a name="sms-worker-id-private"></a>

Ground Truth provides information that you can use to track individual workers in task output data. The following data is located in the directories under the `worker-response` located in the [Annotations directory](#sms-directories-annotations):
+ The `acceptanceTime` is the time that the worker accepted the task. The format of this date and time stamp is `YYYY-MM-DDTHH:MM:SS.mmmZ` for the year (`YYYY`), month (`MM`), day (`DD`), hour (`HH`), minute (`MM`), second (`SS`) and millisecond (`mmm`). The date and time are separated by a **T**. 
+ The `submissionTime` is the time that the worker submitted their annotations using the **Submit** button. The format of this date and time stamp is `YYYY-MM-DDTHH:MM:SS.mmmZ` for the year (`YYYY`), month (`MM`), day (`DD`), hour (`HH`), minute (`MM`), second (`SS`) and millisecond (`mmm`). The date and time are separated by a **T**. 
+ `timeSpentInSeconds` reports the total time, in seconds, that a worker actively worked on that task. This metric does not include time when a worker paused or took a break.
+ The `workerId` is unique to each worker. 
+ If you use a [private workforce](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-private.html), in `workerMetadata`, you see the following.
  + The `identityProviderType` is the service used to manage the private workforce. 
  + The `issuer` is the Cognito user pool or OIDC Identity Provider (IdP) issuer associated with the work team assigned to this human review task.
  + A unique `sub` identifier refers to the worker. If you create a workforce using Amazon Cognito, you can retrieve details about this worker (such as the name or user name) using this ID using Amazon Cognito. To learn how, see [Managing and Searching for User Accounts](https://docs.aws.amazon.com/cognito/latest/developerguide/how-to-manage-user-accounts.html#manage-user-accounts-searching-user-attributes) in [Amazon Cognito Developer Guide](https://docs.aws.amazon.com/cognito/latest/developerguide/).

The following is an example of the output you may see if you use Amazon Cognito to create a private workforce. This is identified in the `identityProviderType`.

```
"submissionTime": "2020-12-28T18:59:58.321Z",
"acceptanceTime": "2020-12-28T18:59:15.191Z", 
"timeSpentInSeconds": 40.543,
"workerId": "a12b3cdefg4h5i67",
"workerMetadata": {
    "identityData": {
        "identityProviderType": "Cognito",
        "issuer": "https://cognito-idp.aws-region.amazonaws.com/aws-region_123456789",
        "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
    }
}
```

 The following is an example of the `workerMetadata` you may see if you use your own OIDC IdP to create a private workforce:

```
"workerMetadata": {
        "identityData": {
            "identityProviderType": "Oidc",
            "issuer": "https://example-oidc-ipd.com/adfs",
            "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
        }
}
```

To learn more about using private workforces, see [Private workforce](sms-workforce-private.md).

## Output metadata
<a name="sms-output-metadata"></a>

The output from each job contains metadata about the label assigned to data objects. These elements are the same for all jobs with minor variations. The following example shows the metadata elements:

```
    "confidence": 0.00,
    "type": "groundtruth/image-classification",
    "job-name": "identify-animal-species",
    "human-annotated": "yes",
    "creation-date": "2020-10-18T22:18:13.527256"
```

The elements have the following meaning:
+ `confidence` – The confidence that Ground Truth has that the label is correct. For more information, see [Confidence score](#sms-output-confidence).
+ `type` – The type of classification job. For job types, see [Built-in Task Types](sms-task-types.md). 
+ `job-name` – The name assigned to the job when it was created.
+ `human-annotated` – Whether the data object was labeled by a human or by automated data labeling. For more information, see [Automate data labeling](sms-automated-labeling.md).
+ `creation-date` – The date and time that the label was created.

## Classification job output
<a name="sms-output-class"></a>

The following are sample outputs (output manifest files) from an image classification job and a text classification job. They include the label that Ground Truth assigned to the data object, the value for the label, and metadata that describes the label.

In addition to the standard metadata elements, the metadata for a classification job includes the text value of the label's class. For more information, see [Image Classification - MXNet](image-classification.md).

The red, italicized text in the examples below depends on labeling job specifications and output data. 

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/example_image.jpg",
    "species":"0",
    "species-metadata":
    {
        "class-name": "dog",
        "confidence": 0.00,
        "type": "groundtruth/image-classification",
        "job-name": "identify-animal-species",
        "human-annotated": "yes",
        "creation-date": "2018-10-18T22:18:13.527256"
    }
}
```

```
{
    "source":"The food was delicious",
    "mood":"1",
    "mood-metadata":
    {
        "class-name": "positive",
        "confidence": 0.8,
        "type": "groundtruth/text-classification",
        "job-name": "label-sentiment",
        "human-annotated": "yes",
        "creation-date": "2020-10-18T22:18:13.527256"
    }
}
```

## Multi-label classification job output
<a name="sms-output-multi-label-classification"></a>

The following are example output manifest files from a multi-label image classification job and a multi-label text classification job. They include the labels that Ground Truth assigned to the data object (for example, the image or piece of text) and metadata that describes the labels the worker saw when completing the labeling task. 

The label attribute name parameter (for example, `image-label-attribute-name`) contains an array of all of the labels selected by at least one of the workers who completed this task. This array contains integer keys (for example, `[1,0,8]`) that correspond to the labels found in `class-map`. In the multi-label image classification example, `bicycle`, `person`, and `clothing` were selected by at least one of the workers who completed the labeling task for the image, `exampleimage.jpg`.

The `confidence-map` shows the confidence score that Ground Truth assigned to each label selected by a worker. To learn more about Ground Truth confidence scores, see [Confidence score](#sms-output-confidence).

The red, italicized text in the examples below depends on labeling job specifications and output data. 

The following is an example of a multi-label image classification output manifest file. 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example_image.jpg",
    "image-label-attribute-name":[1,0,8],
    "image-label-attribute-name-metadata":
       {
        "job-name":"labeling-job/image-label-attribute-name",
        "class-map":
            {
                "1":"bicycle","0":"person","8":"clothing"
            },
        "human-annotated":"yes",
        "creation-date":"2020-02-27T21:36:25.000201",
        "confidence-map":
            {
                "1":0.95,"0":0.77,"8":0.2
            },
        "type":"groundtruth/image-classification-multilabel"
        }
}
```

The following is an example of a multi-label text classification output manifest file. In this example, `approving`, `sad` and `critical` were selected by at least one of the workers who completed the labeling task for the object `exampletext.txt` found in `amzn-s3-demo-bucket`.

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/exampletext.txt",
    "text-label-attribute-name":[1,0,4],
    "text-label-attribute-name-metadata":
       {
        "job-name":"labeling-job/text-label-attribute-name",
        "class-map":
            {
                "1":"approving","0":"sad","4":"critical"
            },
        "human-annotated":"yes",
        "creation-date":"2020-02-20T21:36:25.000201",
        "confidence-map":
            {
                "1":0.95,"0":0.77,"4":0.2
            },
        "type":"groundtruth/text-classification-multilabel"
        }
}
```

## Bounding box job output
<a name="sms-output-box"></a>

The following is sample output (output manifest file) from a bounding box job. For this task, three bounding boxes are returned. The label value contains information about the size of the image, and the location of the bounding boxes.

The `class_id` element is the index of the box's class in the list of available classes for the task. The `class-map` metadata element contains the text of the class.

The metadata has a separate confidence score for each bounding box. The metadata also includes the `class-map` element that maps the `class_id` to the text value of the class. For more information, see [Object Detection - MXNet](object-detection.md).

The red, italicized text in the examples below depends on labeling job specifications and output data. 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example_image.png",
    "bounding-box-attribute-name":
    {
        "image_size": [{ "width": 500, "height": 400, "depth":3}],
        "annotations":
        [
            {"class_id": 0, "left": 111, "top": 134,
                    "width": 61, "height": 128},
            {"class_id": 5, "left": 161, "top": 250,
                     "width": 30, "height": 30},
            {"class_id": 5, "left": 20, "top": 20,
                     "width": 30, "height": 30}
        ]
    },
    "bounding-box-attribute-name-metadata":
    {
        "objects":
        [
            {"confidence": 0.8},
            {"confidence": 0.9},
            {"confidence": 0.9}
        ],
        "class-map":
        {
            "0": "dog",
            "5": "bone"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2018-10-18T22:18:13.527256",
        "job-name": "identify-dogs-and-toys"
    }
 }
```

The output of a bounding box adjustment job looks like the following JSON. Note that the original JSON is kept intact and two new jobs are listed, each with “adjust-” prepended to the original attribute’s name. 

```
{
    "source-ref": "S3 bucket location",
    "bounding-box-attribute-name":
    {
        "image_size": [{ "width": 500, "height": 400, "depth":3}],
        "annotations":
        [
            {"class_id": 0, "left": 111, "top": 134,
                    "width": 61, "height": 128},
            {"class_id": 5, "left": 161, "top": 250,
                     "width": 30, "height": 30},
            {"class_id": 5, "left": 20, "top": 20,
                     "width": 30, "height": 30}
        ]
    },
    "bounding-box-attribute-name-metadata":
    {
        "objects":
        [
            {"confidence": 0.8},
            {"confidence": 0.9},
            {"confidence": 0.9}
        ],
        "class-map":
        {
            "0": "dog",
            "5": "bone"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2018-10-18T22:18:13.527256",
        "job-name": "identify-dogs-and-toys"
    },
    "adjusted-bounding-box":
    {
        "image_size": [{ "width": 500, "height": 400, "depth":3}],
        "annotations":
        [
            {"class_id": 0, "left": 110, "top": 135,
                    "width": 61, "height": 128},
            {"class_id": 5, "left": 161, "top": 250,
                     "width": 30, "height": 30},
            {"class_id": 5, "left": 10, "top": 10,
                     "width": 30, "height": 30}
        ]
    },
    "adjusted-bounding-box-metadata":
    {
        "objects":
        [
            {"confidence": 0.8},
            {"confidence": 0.9},
            {"confidence": 0.9}
        ],
        "class-map":
        {
            "0": "dog",
            "5": "bone"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2018-11-20T22:18:13.527256",
        "job-name": "adjust-bounding-boxes-on-dogs-and-toys",
        "adjustment-status": "adjusted"
    }
}
```

In this output, the job's `type` doesn't change, but an `adjustment-status` field is added. This field has the value of `adjusted` or `unadjusted`. If multiple workers have reviewed the object and at least one adjusted the label, the status is `adjusted`.

## Named entity recognition
<a name="sms-output-data-ner"></a>

The following is an example output manifest file from a named entity recognition (NER) labeling task. For this task, seven `entities` are returned.

In the output manifest, the JSON object, `annotations`, includes a list of the `labels` (label categories) that you provided.

Worker responses are in a list named `entities`. Each entity in this list is a JSON object that contains a `label` value that matches one in the `labels` list, an integer `startOffset` value for labeled span's starting Unicode offset, and an integer `endOffset` value for the ending Unicode offset.

The metadata has a separate confidence score for each entity. If a single worker labeled each data object, the confidence value for each entity will be zero.

The red, italicized text in the examples below depends on labeling job inputs and worker responses.

```
{
    "source": "Amazon SageMaker is a cloud machine-learning platform that was launched in November 2017. SageMaker enables developers to create, train, and deploy machine-learning (ML) models in the cloud. SageMaker also enables developers to deploy ML models on embedded systems and edge-devices",
    "ner-labeling-job-attribute-name": {
        "annotations": {
            "labels": [
                {
                    "label": "Date",
                    "shortDisplayName": "dt"
                },
                {
                    "label": "Verb",
                    "shortDisplayName": "vb"
                },
                {
                    "label": "Thing",
                    "shortDisplayName": "tng"
                },
                {
                    "label": "People",
                    "shortDisplayName": "ppl"
                }
            ],
            "entities": [
                {
                    "label": "Thing",
                    "startOffset": 22,
                    "endOffset": 53
                },
                {
                    "label": "Thing",
                    "startOffset": 269,
                    "endOffset": 281
                },
                {
                    "label": "Verb",
                    "startOffset": 63,
                    "endOffset": 71
                },
                {
                    "label": "Verb",
                    "startOffset": 228,
                    "endOffset": 234
                },
                {
                    "label": "Date",
                    "startOffset": 75,
                    "endOffset": 88
                },
                {
                    "label": "People",
                    "startOffset": 108,
                    "endOffset": 118
                },
                {
                    "label": "People",
                    "startOffset": 214,
                    "endOffset": 224
                }
            ]
        }
    },
    "ner-labeling-job-attribute-name-metadata": {
        "job-name": "labeling-job/example-ner-labeling-job",
        "type": "groundtruth/text-span",
        "creation-date": "2020-10-29T00:40:39.398470",
        "human-annotated": "yes",
        "entities": [
            {
                "confidence": 0
            },
            {
                "confidence": 0
            },
            {
                "confidence": 0
            },
            {
                "confidence": 0
            },
            {
                "confidence": 0
            },
            {
                "confidence": 0
            },
            {
                "confidence": 0
            }
        ]
    }
}
```

## Label verification job output
<a name="sms-output-bounding-box-verification"></a>

The output (output manifest file) of a bounding box verification job looks different than the output of a bounding box annotation job. That's because the workers have a different type of task. They're not labeling objects, but evaluating the accuracy of prior labeling, making a judgment, and then providing that judgment and perhaps some comments.

If human workers are verifying or adjusting prior bounding box labels, the output of a verification job would look like the following JSON. The red, italicized text in the examples below depends on labeling job specifications and output data. 

```
{
    "source-ref":"s3://amzn-s3-demo-bucket/image_example.png",
    "bounding-box-attribute-name":
    {
        "image_size": [{ "width": 500, "height": 400, "depth":3}],
        "annotations":
        [
            {"class_id": 0, "left": 111, "top": 134,
                    "width": 61, "height": 128},
            {"class_id": 5, "left": 161, "top": 250,
                     "width": 30, "height": 30},
            {"class_id": 5, "left": 20, "top": 20,
                     "width": 30, "height": 30}
        ]
    },
    "bounding-box-attribute-name-metadata":
    {
        "objects":
        [
            {"confidence": 0.8},
            {"confidence": 0.9},
            {"confidence": 0.9}
        ],
        "class-map":
        {
            "0": "dog",
            "5": "bone"
        },
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2018-10-18T22:18:13.527256",
        "job-name": "identify-dogs-and-toys"
    },
    "verify-bounding-box-attribute-name":"1",
    "verify-bounding-box-attribute-name-metadata":
    {
        "class-name": "bad",
        "confidence": 0.93,
        "type": "groundtruth/label-verification",
        "job-name": "verify-bounding-boxes",
        "human-annotated": "yes",
        "creation-date": "2018-11-20T22:18:13.527256",
        "worker-feedback": [
            {"comment": "The bounding box on the bird is too wide on the right side."},
            {"comment": "The bird on the upper right is not labeled."}
        ]
    }
}
```

Although the `type` on the original bounding box output was `groundtruth/object-detection`, the new `type` is `groundtruth/label-verification`. Also note that the `worker-feedback` array provides worker comments. If the worker doesn't provide comments, the empty fields are excluded during consolidation.

## Semantic Segmentation Job Output
<a name="sms-output-segmentation"></a>

The following is the output manifest file from a semantic segmentation labeling job. The value of the label for this job is a reference to a PNG file in an Amazon S3 bucket.

In addition to the standard elements, the metadata for the label includes a color map that defines which color is used to label the image, the class name associated with the color, and the confidence score for each color. For more information, see [Semantic Segmentation Algorithm](semantic-segmentation.md).

The red, italicized text in the examples below depends on labeling job specifications and output data. 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example_city_image.png",
    "city-streets-ref": "S3 bucket location",
    "city-streets-ref-metadata": {
      "internal-color-map": {
        "0": {
           "class-name": "BACKGROUND",
           "confidence": 0.9,
           "hex-color": "#ffffff"
        },
        "1": {
           "class-name": "buildings",
           "confidence": 0.9,
           "hex-color": "#2acf59"
        },
        "2":  {
           "class-name": "road",
           "confidence": 0.9,
           "hex-color": "#f28333"
       }
     },
     "type": "groundtruth/semantic-segmentation",
     "human-annotated": "yes",
     "creation-date": "2018-10-18T22:18:13.527256",
     "job-name": "label-city-streets",
     },
     "verify-city-streets-ref":"1",
     "verify-city-streets-ref-metadata":
     {
        "class-name": "bad",
        "confidence": 0.93,
        "type": "groundtruth/label-verification",
        "job-name": "verify-city-streets",
        "human-annotated": "yes",
        "creation-date": "2018-11-20T22:18:13.527256",
        "worker-feedback": [
            {"comment": "The mask on the leftmost building is assigned the wrong side of the road."},
            {"comment": "The curb of the road is not labeled but the instructions say otherwise."}
        ]
     }
}
```

Confidence is scored on a per-image basis. Confidence scores are the same across all classes within an image. 

The output of a semantic segmentation adjustment job looks similar to the following JSON.

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example_city_image.png",
    "city-streets-ref": "S3 bucket location",
    "city-streets-ref-metadata": {
      "internal-color-map": {
        "0": {
           "class-name": "BACKGROUND",
           "confidence": 0.9,
           "hex-color": "#ffffff"
        },
        "1": {
           "class-name": "buildings",
           "confidence": 0.9,
           "hex-color": "#2acf59"
        },
        "2":  {
           "class-name": "road",
           "confidence": 0.9,
           "hex-color": "#f28333"
       }
     },
     "type": "groundtruth/semantic-segmentation",
     "human-annotated": "yes",
     "creation-date": "2018-10-18T22:18:13.527256",
     "job-name": "label-city-streets",
     },
     "adjusted-city-streets-ref": "s3://amzn-s3-demo-bucket/example_city_image.png",
     "adjusted-city-streets-ref-metadata": {
      "internal-color-map": {
        "0": {
           "class-name": "BACKGROUND",
           "confidence": 0.9,
           "hex-color": "#ffffff"
        },
        "1": {
           "class-name": "buildings",
           "confidence": 0.9,
           "hex-color": "#2acf59"
        },
        "2":  {
           "class-name": "road",
           "confidence": 0.9,
           "hex-color": "#f28333"
       }
     },
     "type": "groundtruth/semantic-segmentation",
     "human-annotated": "yes",
     "creation-date": "2018-11-20T22:18:13.527256",
     "job-name": "adjust-label-city-streets",
     }
}
```

## Video frame object detection output
<a name="sms-output-video-object-detection"></a>

The following is the output manifest file from a video frame object detection labeling job. The *red, italicized text* in the examples below depends on labeling job specifications and output data.

In addition to the standard elements, the metadata includes a class map that lists each class that has at least one label in the sequence. The metadata also includes `job-name` which is the name you assigned to the labeling job. For adjustment tasks, If one or more bounding boxes were modified, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted`. 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example-path/input-manifest.json",
    "CarObjectDetection-ref": "s3://amzn-s3-demo-bucket/output/labeling-job-name/annotations/consolidated-annotation/output/0/SeqLabel.json",
    "CarObjectDetection-ref-metadata": {
        "class-map": {
            "0": "car",
            "1": "bus"
        },
        "job-name": "labeling-job/labeling-job-name",
        "human-annotated": "yes",
        "creation-date": "2021-09-29T05:50:35.566000",
        "type": "groundtruth/video-object-detection"
        }
}
```

Ground Truth creates one output sequence file for each sequence of video frames that was labeled. Each output sequence file contains the following: 
+ All annotations for all frames in a sequence in the `detection-annotations` list of JSON objects. 
+ For each frame that was annotated by a worker, the frame file name (`frame`), number (`frame-no`), a list of JSON objects containing annotations (`annotations`), and if applicable, `frame-attributes`. The name of this list is defined by the task type you use: `polylines`, `polygons`, `keypoints`, and for bounding boxes, `annotations`.

   
  Each JSON object contains information about a single annotation and associated label. The following table outlines the parameters you'll see for each video frame task type.   
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html)

  In addition to task type specific values, you will see the following in each JSON object:
  + Values of any `label-category-attributes` that were specified for that label. 
  + The `class-id` of the box. Use the `class-map` in the output manifest file to see which label category this ID maps to. 

The following is an example of a `SeqLabel.json` file from a bounding box video frame object detection labeling job. This file will be located under `s3://amzn-s3-demo-bucket/output-prefix/annotations/consolidated-annotation/output/annotation-number/`

```
{
    "detection-annotations": [
        {
            "annotations": [
                {
                    "height": 41,
                    "width": 53,
                    "top": 152,
                    "left": 339,
                    "class-id": "1",
                    "label-category-attributes": {
                        "occluded": "no",
                        "size": "medium"
                    }
                },
                {
                    "height": 24,
                    "width": 37,
                    "top": 148,
                    "left": 183,
                    "class-id": "0",
                    "label-category-attributes": {
                        "occluded": "no",
                    }
                }
            ],
            "frame-no": 0,
            "frame": "frame_0000.jpeg", 
            "frame-attributes": {name: value, name: value}
        },
        {
            "annotations": [
                {
                    "height": 41,
                    "width": 53,
                    "top": 152,
                    "left": 341,
                    "class-id": "0",
                    "label-category-attributes": {}
                },
                {
                    "height": 24,
                    "width": 37,
                    "top": 141,
                    "left": 177,
                    "class-id": "0",
                    "label-category-attributes": {
                        "occluded": "no",
                    }
                }
            ],
            "frame-no": 1,
            "frame": "frame_0001.jpeg",
            "frame-attributes": {name: value, name: value}
        }
    ]
}
```

## Video frame object tracking output
<a name="sms-output-video-object-tracking"></a>

The following is the output manifest file from a video frame object tracking labeling job. The *red, italicized text* in the examples below depends on labeling job specifications and output data.

In addition to the standard elements, the metadata includes a class map that lists each class that has at least one label in the sequence of frames. The metadata also includes `job-name` which is the name you assigned to the labeling job. For adjustment tasks, If one or more bounding boxes were modified, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted`. 

```
{
    "source-ref": "s3://amzn-s3-demo-bucket/example-path/input-manifest.json",
    "CarObjectTracking-ref": "s3://amzn-s3-demo-bucket/output/labeling-job-name/annotations/consolidated-annotation/output/0/SeqLabel.json",
    "CarObjectTracking-ref-metadata": {
        "class-map": {
            "0": "car",
            "1": "bus"
        },
        "job-name": "labeling-job/labeling-job-name",
        "human-annotated": "yes",
        "creation-date": "2021-09-29T05:50:35.566000",
        "type": "groundtruth/video-object-tracking"
        }
 }
```

Ground Truth creates one output sequence file for each sequence of video frames that was labeled. Each output sequence file contains the following: 
+ All annotations for all frames in a sequence in the `tracking-annotations` list of JSON objects. 
+ For each frame that was annotated by a worker, the frame (`frame`), number (`frame-no`), a list of JSON objects containing annotations (`annotations`), and if applicable, frame attributes (`frame-attributes`). The name of this list is defined by the task type you use: `polylines`, `polygons`, `keypoints`, and for bounding boxes, `annotations`.

  Each JSON object contains information about a single annotation and associated label. The following table outlines the parameters you'll see for each video frame task type.   
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html)

  In addition to task type specific values, you will see the following in each JSON object: 
  + Values of any `label-category-attributes` that were specified for that label. 
  + The `class-id` of the box. Use the `class-map` in the output manifest file to see which label category this ID maps to. 
  + An `object-id` which identifies an instance of a label. This ID will be the same across frames if a worker identified the same instance of an object in multiple frames. For example, if a car appeared in multiple frames, all bounding boxes uses to identify that car would have the same `object-id`.
  + The `object-name` which is the instance ID of that annotation. 

The following is an example of a `SeqLabel.json` file from a bounding box video frame object tracking labeling job. This file will be located under `s3://amzn-s3-demo-bucket/output-prefix/annotations/consolidated-annotation/output/annotation-number/`

```
{
    "tracking-annotations": [
        {
            "annotations": [
                {
                    "height": 36,
                    "width": 46,
                    "top": 178,
                    "left": 315,
                    "class-id": "0",
                    "label-category-attributes": {
                        "occluded": "no"
                    },
                    "object-id": "480dc450-c0ca-11ea-961f-a9b1c5c97972",
                    "object-name": "car:1"
                }
            ],
            "frame-no": 0,
            "frame": "frame_0001.jpeg",
            "frame-attributes": {}
        },
        {
            "annotations": [
                {
                    "height": 30,
                    "width": 47,
                    "top": 163,
                    "left": 344,
                    "class-id": "1",
                    "label-category-attributes": {
                        "occluded": "no",
                        "size": "medium"
                    },
                    "object-id": "98f2b0b0-c0ca-11ea-961f-a9b1c5c97972",
                    "object-name": "bus:1"
                },
                {
                    "height": 28,
                    "width": 33,
                    "top": 150,
                    "left": 192,
                    "class-id": "0",
                    "label-category-attributes": {
                        "occluded": "partially"
                    },
                    "object-id": "480dc450-c0ca-11ea-961f-a9b1c5c97972",
                    "object-name": "car:1"
                }
            ],
            "frame-no": 1,
            "frame": "frame_0002.jpeg",
            "frame-attributes": {name: value, name: value}
        }
    ]
}
```

## 3D point cloud semantic segmentation output
<a name="sms-output-point-cloud-segmentation"></a>

The following is the output manifest file from a 3D point cloud semantic segmentation labeling job. 

In addition to the standard elements, the metadata for the label includes a color map that defines which color is used to label the image, the class name associated with the color, and the confidence score for each color. Additionally, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted` if the color mask is modified. If you added one or more `frameAttributes` to your label category configuration file, worker responses for frame attributes are in the JSON object, `dataset-object-attributes`.

The `your-label-attribute-ref` parameter contains the location of a compressed file with a .zlib extension. When you uncompress this file, it contains an array. Each index in the array corresponds to the index of an annotated point in the input point cloud. The value of the array at a given index gives the class of the point at the same index in the point cloud, based on the semantic color map found in the `color-map` parameter of the `metadata`.

You can use Python code similar to the following to decompress a .zlib file:

```
import zlib
from array import array

# read the label file
compressed_binary_file = open(zlib_file_path/file.zlib, 'rb').read()

# uncompress the label file
binary_content = zlib.decompress(compressed_binary_file)

# load labels to an array
my_int_array_data = array('B', binary_content);

print(my_int_array_data)
```

The code block above will produce an output similar to the following. Each element of the printed array contains the class of a point at the that index in the point cloud. For example, `my_int_array_data[0] = 1` means `point[0]` in the input point cloud has a class `1`. In the following output manifest file example, class `0` corresponds with `"Background"`, `1` with `Car`, and `2` with `Pedestrian`.

```
>> array('B', [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
```

The following is an example of a semantic segmentation 3D point cloud labeling job output manifest file. The red, italicized text in the examples below depends on labeling job specifications and output data. 

```
{
   "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame1.bin",
   "source-ref-metadata":{
      "format": "binary/xyzi",
      "unix-timestamp": 1566861644.759115,
      "ego-vehicle-pose":{...}, 
      "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/prefix",
      "images": [{...}] 
    },
    "lidar-ss-label-attribute-ref": "s3://amzn-s3-demo-bucket/labeling-job-name/annotations/consolidated-annotation/output/dataset-object-id/filename.zlib",
    "lidar-ss-label-attribute-ref-metadata": { 
        'color-map': {
            "0": {
                "class-name": "Background",
                "hex-color": "#ffffff",
                "confidence": 0.00
            },
            "1": {
                "class-name": "Car",
                "hex-color": "#2ca02c",
                "confidence": 0.00
            },
            "2": {
                "class-name": "Pedestrian",
                "hex-color": "#1f77b4",
                "confidence": 0.00
            },
            "3": {
                "class-name": "Tree",
                "hex-color": "#ff7f0e",
                "confidence": 0.00
            }
        },
        'type': 'groundtruth/point_cloud_single_frame_semantic_segmentation', 
        'human-annotated': 'yes',
        'creation-date': '2019-11-12T01:18:14.271944',
        'job-name': 'labeling-job-name',
        //only present for adjustment audit workflow
        "adjustment-status": "adjusted", // "adjusted" means the label was adjusted
        "dataset-object-attributes": {name: value, name: value}
    }
}
```

## 3D point cloud object detection output
<a name="sms-output-point-cloud-object-detection"></a>

The following is sample output from a 3D point cloud objected detection job. For this task type, the data about 3D cuboids is returned in the `3d-bounding-box` parameter, in a list named `annotations`. In this list, each 3D cuboid is described using the following information. 
+ Each class, or label category, that you specify in your input manifest is associated with a `class-id`. Use the `class-map` to identify the class associated with each class ID.
+ These classes are used to give each 3D cuboid an `object-name` in the format `<class>:<integer>` where `integer` is a unique number to identify that cuboid in the frame. 
+ `center-x`, `center-y`, and `center-z` are the coordinates of the center of the cuboid, in the same coordinate system as the 3D point cloud input data used in your labeling job.
+ `length`, `width`, and `height` describe the dimensions of the cuboid. 
+ `yaw` is used to describe the orientation (heading) of the cuboid in radians.
**Note**  
`yaw` is now in the right-handed Cartesian system. Since this feature was added on September 02, 2022 19:02:17 UTC, you can convert the `yaw` measurement in the output data prior to that using the following (all units are in radians):  

  ```
  old_yaw_in_output = pi - yaw
  ```
+ In our definition, \$1x is to the right, \$1y is to the forward, and \$1z is up from the ground plane. The rotation order is x - y - z. The `roll`, `pitch` and `yaw` are represented in the right-handed Cartesian system. In 3D space, `roll` is along the x-axis, `pitch` is along the y-axis and `yaw` is along the z-axis. All three are counterclockwise.
+ If you included label attributes in your input manifest file for a given class, a `label-category-attributes` parameter is included for all cuboids for which workers selected label attributes. 

If one or more cuboids were modified, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted`. If you added one or more `frameAttributes` to your label category configuration file, worker responses for frame attributes are in the JSON object, `dataset-object-attributes`.

The *red, italicized text* in the examples below depends on labeling job specifications and output data. The ellipses (*...*) denote a continuation of that list, where additional objects with the same format as the proceeding object can appear.

```
{
   "source-ref": "s3://amzn-s3-demo-bucket/examplefolder/frame1.txt",
   "source-ref-metadata":{
      "format": "text/xyzi",
      "unix-timestamp": 1566861644.759115, 
      "prefix": "s3://amzn-s3-demo-bucket/lidar_singleframe_dataset/prefix",
      "ego-vehicle-pose": {
            "heading": {
                "qx": -0.02111296123795955,
                "qy": -0.006495469416730261,
                "qz": -0.008024565904865688,
                "qw": 0.9997181192298087
            },
            "position": {
                "x": -2.7161461413869947,
                "y": 116.25822288149078,
                "z": 1.8348751887989483
            }
       },
       "images": [
            {
                "fx": 847.7962624528487,
                "fy": 850.0340893791985,
                "cx": 576.2129134707038,
                "cy": 317.2423573573745,
                "k1": 0,
                "k2": 0,
                "k3": 0,
                "k4": 0,
                "p1": 0,
                "p2": 0,
                "skew": 0,
                "unix-timestamp": 1566861644.759115,
                "image-path": "images/frame_0_camera_0.jpg", 
                "position": {
                    "x": -2.2722515189268138,
                    "y": 116.86003310568965,
                    "z": 1.454614668542299
                },
                "heading": {
                    "qx": 0.7594754093069037,
                    "qy": 0.02181790885672969,
                    "qz": -0.02461725233103356,
                    "qw": -0.6496916273040025
                },
                "camera_model": "pinhole"
            }
        ]
    },
   "3d-bounding-box": 
    {
       "annotations": [
            {
                "label-category-attributes": {
                    "Occlusion": "Partial",
                    "Type": "Sedan"
                },
                "object-name": "Car:1",
                "class-id": 0,
                "center-x": -2.616382013657516,
                "center-y": 125.04149850484193,
                "center-z": 0.311272296465834,
                "length": 2.993000265181146,
                "width": 1.8355260519692056,
                "height": 1.3233490884304047,
                "roll": 0,
                "pitch": 0,
                "yaw": 1.6479308313703527
            },
            {
                "label-category-attributes": {
                    "Occlusion": "Partial",
                    "Type": "Sedan"
                },
                "object-name": "Car:2",
                "class-id": 0,
                "center-x": -5.188984560617168,
                "center-y": 99.7954483288783,
                "center-z": 0.2226435567445657,
                "length": 4,
                "width": 2,
                "height": 2,
                "roll": 0,
                "pitch": 0,
                "yaw": 1.6243170732068055
            }
        ]
    },
    "3d-bounding-box-metadata":
    {
        "objects": [], 
        "class_map": 
        {
            "0": "Car",
        },
        "type": "groundtruth/point_cloud_object_detection",
        "human-annotated": "yes", 
        "creation-date": "2018-10-18T22:18:13.527256",
        "job-name": "identify-3d-objects",
        "adjustment-status": "adjusted",
        "dataset-object-attributes": {name: value, name: value}
    }
}
```

## 3D point cloud object tracking output
<a name="sms-output-point-cloud-object-tracking"></a>

The following is an example of an output manifest file from a 3D point cloud object tracking labeling job. The *red, italicized text* in the examples below depends on labeling job specifications and output data. The ellipses (*...*) denote a continuation of that list, where additional objects with the same format as the proceeding object can appear.

In addition to the standard elements, the metadata includes a class map that lists each class that has at least one label in the sequence. If one or more cuboids were modified, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted`. 

```
{
   "source-ref": "s3://amzn-s3-demo-bucket/myfolder/seq1.json",
    "lidar-label-attribute-ref": "s3://amzn-s3-demo-bucket/<labelingJobName>/annotations/consolidated-annotation/output/<datasetObjectId>/SeqLabel.json",
    "lidar-label-attribute-ref-metadata": { 
        "objects": 
        [
            {   
                "frame-no": 300,
                "confidence": []
            },
            {
                "frame-no": 301,
                "confidence": []
            },
            ...
        ],    
        'class-map': {'0': 'Car', '1': 'Person'}, 
        'type': 'groundtruth/point_cloud_object_tracking', 
        'human-annotated': 'yes',
        'creation-date': '2019-11-12T01:18:14.271944',
        'job-name': 'identify-3d-objects',
        "adjustment-status": "adjusted" 
    }
}
```

In the above example, the cuboid data for each frame in `seq1.json` is in `SeqLabel.json` in the Amazon S3 location, `s3://amzn-s3-demo-bucket/<labelingJobName>/annotations/consolidated-annotation/output/<datasetObjectId>/SeqLabel.json`. The following is an example of this label sequence file.

For each frame in the sequence, you see the `frame-number`, `frame-name`, if applicable, `frame-attributes`, and a list of `annotations`. This list contains 3D cubiods that were drawn for that frame. Each annotation includes the following information: 
+ An `object-name` in the format `<class>:<integer>` where `class` identifies the label category and `integer` is a unique ID across the dataset.
+ When workers draw a cuboid, it is associated with a unique `object-id` which is associated with all cuboids that identify the same object across multiple frames.
+ Each class, or label category, that you specified in your input manifest is associated with a `class-id`. Use the `class-map` to identify the class associated with each class ID.
+ `center-x`, `center-y`, and `center-z` are the coordinates of the center of the cuboid, in the same coordinate system as the 3D point cloud input data used in your labeling job.
+ `length`, `width`, and `height` describe the dimensions of the cuboid. 
+ `yaw` is used to describe the orientation (heading) of the cuboid in radians.
**Note**  
`yaw` is now in the right-handed Cartesian system. Since this feature was added on September 02, 2022 19:02:17 UTC, you can convert the `yaw` measurement in the output data prior to that using the following (all units are in radians):  

  ```
  old_yaw_in_output = pi - yaw
  ```
+ In our definition, \$1x is to the right, \$1y is to the forward, and \$1z is up from the ground plane. The rotation order is x - y - z. The `roll`, `pitch` and `yaw` are represented in the right-handed Cartesian system. In 3D space, `roll` is along the x-axis, `pitch` is along the y-axis and `yaw` is along the z-axis. All three are counterclockwise.
+ If you included label attributes in your input manifest file for a given class, a `label-category-attributes` parameter is included for all cuboids for which workers selected label attributes. 

```
{
    "tracking-annotations": [
        {
            "frame-number": 0,
            "frame-name": "0.txt.pcd",
            "frame-attributes": {name: value, name: value},
            "annotations": [
                {
                    "label-category-attributes": {},
                    "object-name": "Car:4",
                    "class-id": 0,
                    "center-x": -2.2906369208300674,
                    "center-y": 103.73924823843463,
                    "center-z": 0.37634114027023313,
                    "length": 4,
                    "width": 2,
                    "height": 2,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": 1.5827222214406014,
                    "object-id": "ae5dc770-a782-11ea-b57d-67c51a0561a1"
                },
                {
                    "label-category-attributes": {
                        "Occlusion": "Partial",
                        "Type": "Sedan"
                    },
                    "object-name": "Car:1",
                    "class-id": 0,
                    "center-x": -2.6451293634707413,
                    "center-y": 124.9534455706848,
                    "center-z": 0.5020834081743839,
                    "length": 4,
                    "width": 2,
                    "height": 2.080488827301309,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": -1.5963335581398077,
                    "object-id": "06efb020-a782-11ea-b57d-67c51a0561a1"
                },
                {
                    "label-category-attributes": {
                        "Occlusion": "Partial",
                        "Type": "Sedan"
                    },
                    "object-name": "Car:2",
                    "class-id": 0,
                    "center-x": -5.205611313118477,
                    "center-y": 99.91731932137061,
                    "center-z": 0.22917217081212138,
                    "length": 3.8747142207671956,
                    "width": 1.9999999999999918,
                    "height": 2,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": 1.5672228760316775,
                    "object-id": "26fad020-a782-11ea-b57d-67c51a0561a1"
                }
            ]
        },
        {
            "frame-number": 1,
            "frame-name": "1.txt.pcd",
            "frame-attributes": {},
            "annotations": [
                {
                    "label-category-attributes": {},
                    "object-name": "Car:4",
                    "class-id": 0,
                    "center-x": -2.2906369208300674,
                    "center-y": 103.73924823843463,
                    "center-z": 0.37634114027023313,
                    "length": 4,
                    "width": 2,
                    "height": 2,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": 1.5827222214406014,
                    "object-id": "ae5dc770-a782-11ea-b57d-67c51a0561a1"
                },
                {
                    "label-category-attributes": {
                        "Occlusion": "Partial",
                        "Type": "Sedan"
                    },
                    "object-name": "Car:1",
                    "class-id": 0,
                    "center-x": -2.6451293634707413,
                    "center-y": 124.9534455706848,
                    "center-z": 0.5020834081743839,
                    "length": 4,
                    "width": 2,
                    "height": 2.080488827301309,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": -1.5963335581398077,
                    "object-id": "06efb020-a782-11ea-b57d-67c51a0561a1"
                },
                {
                    "label-category-attributes": {
                        "Occlusion": "Partial",
                        "Type": "Sedan"
                    },
                    "object-name": "Car:2",
                    "class-id": 0,
                    "center-x": -5.221311072916759,
                    "center-y": 100.4639841045424,
                    "center-z": 0.22917217081212138,
                    "length": 3.8747142207671956,
                    "width": 1.9999999999999918,
                    "height": 2,
                    "roll": 0,
                    "pitch": 0,
                    "yaw": 1.5672228760316775,
                    "object-id": "26fad020-a782-11ea-b57d-67c51a0561a1"
                }
            ]
        }       
    ]
}
```

## 3D-2D object tracking point cloud object tracking output
<a name="sms-output-3d-2d-point-cloud-object-tracking"></a>

The following is an example of an output manifest file from a 3D point cloud object tracking labeling job. The *red, italicized text* in the examples below depends on labeling job specifications and output data. The ellipses (*...*) denote a continuation of that list, where additional objects with the same format as the proceeding object can appear.

In addition to the standard elements, the metadata includes a class map that lists each class that has at least one label in the sequence. If one or more cuboids were modified, there is an `adjustment-status` parameter in the metadata for audit workflows that is set to `adjusted`. 

```
{
  "source-ref": "s3://amzn-s3-demo-bucket/artifacts/gt-point-cloud-demos/sequences/seq2.json",
  "source-ref-metadata": {
    "json-paths": [
      "number-of-frames",
      "prefix",
      "frames{frame-no, frame}"
    ]
  },
  "3D2D-linking-ref": "s3://amzn-s3-demo-bucket/xyz/3D2D-linking/annotations/consolidated-annotation/output/0/SeqLabel.json",
  "3D2D-linking-ref-metadata": {
    "objects": [
      {
        "frame-no": 0,
        "confidence": []
      },
      {
        "frame-no": 1,
        "confidence": []
      },
      {
        "frame-no": 2,
        "confidence": []
      },
      {
        "frame-no": 3,
        "confidence": []
      },
      {
        "frame-no": 4,
        "confidence": []
      },
      {
        "frame-no": 5,
        "confidence": []
      },
      {
        "frame-no": 6,
        "confidence": []
      },
      {
        "frame-no": 7,
        "confidence": []
      },
      {
        "frame-no": 8,
        "confidence": []
      },
      {
        "frame-no": 9,
        "confidence": []
      }
    ],
    "class-map": {
      "0": "Car"
    },
    "type": "groundtruth/point_cloud_object_tracking",
    "human-annotated": "yes",
    "creation-date": "2023-01-19T02:55:10.206508",
    "job-name": "mcm-linking"
  },
  "3D2D-linking-chain-ref": "s3://amzn-s3-demo-bucket/xyz/3D2D-linking-chain/annotations/consolidated-annotation/output/0/SeqLabel.json",
  "3D2D-linking-chain-ref-metadata": {
    "objects": [
      {
        "frame-no": 0,
        "confidence": []
      },
      {
        "frame-no": 1,
        "confidence": []
      },
      {
        "frame-no": 2,
        "confidence": []
      },
      {
        "frame-no": 3,
        "confidence": []
      },
      {
        "frame-no": 4,
        "confidence": []
      },
      {
        "frame-no": 5,
        "confidence": []
      },
      {
        "frame-no": 6,
        "confidence": []
      },
      {
        "frame-no": 7,
        "confidence": []
      },
      {
        "frame-no": 8,
        "confidence": []
      },
      {
        "frame-no": 9,
        "confidence": []
      }
    ],
    "class-map": {
      "0": "Car"
    },
    "type": "groundtruth/point_cloud_object_tracking",
    "human-annotated": "yes",
    "creation-date": "2023-01-19T03:29:49.149935",
    "job-name": "3d2d-linking-chain"
  }
}
```

In the above example, the cuboid data for each frame in `seq2.json` is in `SeqLabel.json` in the Amazon S3 location, `s3://amzn-s3-demo-bucket/<labelingJobName>/annotations/consolidated-annotation/output/<datasetObjectId>/SeqLabel.json`. The following is an example of this label sequence file.

For each frame in the sequence, you see the `frame-number`, `frame-name`, if applicable, `frame-attributes`, and a list of `annotations`. This list contains 3D cubiods that were drawn for that frame. Each annotation includes the following information: 
+ An `object-name` in the format `<class>:<integer>` where `class` identifies the label category and `integer` is a unique ID across the dataset.
+ When workers draw a cuboid, it is associated with a unique `object-id` which is associated with all cuboids that identify the same object across multiple frames.
+ Each class, or label category, that you specified in your input manifest is associated with a `class-id`. Use the `class-map` to identify the class associated with each class ID.
+ `center-x`, `center-y`, and `center-z` are the coordinates of the center of the cuboid, in the same coordinate system as the 3D point cloud input data used in your labeling job.
+ `length`, `width`, and `height` describe the dimensions of the cuboid. 
+ `yaw` is used to describe the orientation (heading) of the cuboid in radians.
**Note**  
`yaw` is now in the right-handed Cartesian system. Since this feature was added on September 02, 2022 19:02:17 UTC, you can convert the `yaw` measurement in the output data prior to that using the following (all units are in radians):  

  ```
  old_yaw_in_output = pi - yaw
  ```
+ In our definition, \$1x is to the right, \$1y is to the forward, and \$1z is up from the ground plane. The rotation order is x - y - z. The `roll`, `pitch` and `yaw` are represented in the right-handed Cartesian system. In 3D space, `roll` is along the x-axis, `pitch` is along the y-axis and `yaw` is along the z-axis. All three are counterclockwise.
+ If you included label attributes in your input manifest file for a given class, a `label-category-attributes` parameter is included for all cuboids for which workers selected label attributes. 

```
{
  "lidar": {
    "tracking-annotations": [
      {
        "frame-number": 0,
        "frame-name": "0.txt.pcd",
        "annotations": [
          {
            "label-category-attributes": {
              "Type": "Sedan"
            },
            "object-name": "Car:1",
            "class-id": 0,
            "center-x": 12.172361721602815,
            "center-y": 120.23067521992364,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "505b39e0-97a4-11ed-8903-dd5b8b903715"
          },
          {
            "label-category-attributes": {},
            "object-name": "Car:4",
            "class-id": 0,
            "center-x": 17.192725195301094,
            "center-y": 114.55705365827872,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "1afcb670-97a9-11ed-9a84-ff627d099e16"
          }
        ],
        "frame-attributes": {}
      },
      {
        "frame-number": 1,
        "frame-name": "1.txt.pcd",
        "annotations": [
          {
            "label-category-attributes": {
              "Type": "Sedan"
            },
            "object-name": "Car:1",
            "class-id": 0,
            "center-x": -1.6841480600695489,
            "center-y": 126.20198882749516,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "505b39e0-97a4-11ed-8903-dd5b8b903715"
          },
          {
            "label-category-attributes": {},
            "object-name": "Car:4",
            "class-id": 0,
            "center-x": 17.192725195301094,
            "center-y": 114.55705365827872,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "1afcb670-97a9-11ed-9a84-ff627d099e16"
          }
        ],
        "frame-attributes": {}
      },
      {
        "frame-number": 2,
        "frame-name": "2.txt.pcd",
        "annotations": [
          {
            "label-category-attributes": {
              "Type": "Sedan"
            },
            "object-name": "Car:1",
            "class-id": 0,
            "center-x": -1.6841480600695489,
            "center-y": 126.20198882749516,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "505b39e0-97a4-11ed-8903-dd5b8b903715"
          },
          {
            "label-category-attributes": {},
            "object-name": "Car:4",
            "class-id": 0,
            "center-x": 17.192725195301094,
            "center-y": 114.55705365827872,
            "center-z": 1.590525771183712,
            "length": 4,
            "width": 2,
            "height": 2,
            "roll": 0,
            "pitch": 0,
            "yaw": 0,
            "object-id": "1afcb670-97a9-11ed-9a84-ff627d099e16"
          }
        ],
        "frame-attributes": {}
      }
    ]
  },
  "camera-0": {
    "tracking-annotations": [
      {
        "frame-no": 0,
        "frame": "0.txt.pcd",
        "annotations": [
          {
            "label-category-attributes": {
              "Occlusion": "Partial"
            },
            "object-name": "Car:2",
            "class-id": 0,
            "width": 223,
            "height": 164,
            "top": 225,
            "left": 486,
            "object-id": "5229df60-97a4-11ed-8903-dd5b8b903715"
          }
        ],
        "frame-attributes": {}
      },
      {
        "frame-no": 1,
        "frame": "1.txt.pcd",
        "annotations": [
          {
            "label-category-attributes": {},
            "object-name": "Car:4",
            "class-id": 0,
            "width": 252,
            "height": 246,
            "top": 237,
            "left": 473,
            "object-id": "1afcb670-97a9-11ed-9a84-ff627d099e16"
          }
        ],
        "frame-attributes": {}
      }
    ]
  }
}
```

The cuboid and bounding box for an object are linked through a common object-id.