Using the Amazon Mechanical Turk Workforce
The Amazon Mechanical Turk (Mechanical Turk) workforce provides the most workers for your Amazon SageMaker Ground Truth labeling job and Amazon Augmented AI human review task. The Amazon Mechanical Turk workforce is a world-wide resource. Workers are available 24 hours a day, 7 days a week. You typically get the fastest turnaround for your human review tasks and labeling jobs when you use the Amazon Mechanical Turk workforce.
Any Amazon Mechanical Turk workforce billing is handled as part of your Ground Truth or Amazon Augmented AI billing. You do not need to create a separate Mechanical Turk account to use the Amazon Mechanical Turk workforce.
Important
You should not share confidential information, personal information, or protected health information with this workforce. You should not use the Amazon Mechanical Turk workforce when you use Amazon A2I in conjunction with AWS HIPAA-eligible services, such as Amazon Textract and Amazon Rekognition, for workloads containing protected health information.
You can choose Mechanical Turk as your workforce when you create a Ground Truth labeling job or Amazon A2I human review workflow (flow definition). You can create a labeling job and a human review workflow using the SageMaker console and API.
When you use an API operation to create a labeling job or human review workflow, you use
the following ARN for the Amazon Mechanical Turk workforce for your WorkteamArn
. Replace
with the AWS Region you are using to
create the labeling job or human loops. For example, if you create a labeling job in
US West (Oregon), replace region
with
region
us-west-2
.
-
arn:aws:sagemaker:
region
:394669845002:workteam/public-crowd/default
Ground Truth and Amazon A2I require that your input data is
free of personally identifiable information (PII) when you use Mechanical Turk. If you use the Mechanical Turk
workforce and do not specify that your input data is free of PII, your Ground Truth labeling jobs
and Augmented AI tasks will fail. You specify that your input data is free of PII when you create a
Ground Truth labeling job and when you create a Amazon A2I human loop using a built-in integration
or the StartHumanLoop
operation.
Use the following sections to learn how to use Mechanical Turk with these services.
Topics
Use Mechanical Turk with Ground Truth
You can use Mechanical Turk with Ground Truth when you create a labeling job using the console, or the
CreateLabelingJob
operation.
When you create a labeling job, we recommend you adjust the number of workers that annotate each data object based on the complexity of the job and the quality that you need. Amazon SageMaker Ground Truth uses annotation consolidation to improve the quality of the labels. More workers can make a difference in the quality of the labels for more complex labeling jobs, but might not make a difference for simpler jobs. For more information, see Annotation consolidation. Note that annotation consolidation is not supported for Amazon A2I human review workflows.
To use Mechanical Turk when you create a labeling job (console):
-
Use the following to create a labeling job using the Ground Truth area of the SageMaker console: Create a Labeling Job (Console).
-
When you are selecting Worker types in the Workers section, select Amazon Mechanical Turk.
-
Specify the total amount of time workers have to complete a task using Task timeout.
-
Specify the total amount of time a task remains available to workers in Task expiration. This is how long workers have to pick up a task before it fails.
-
Select the Price per task using the dropdown list. This is the amount of money a worker receives for completing a single task.
-
(Optional) If applicable, select The dataset does not contain adult content. SageMaker may restrict the Mechanical Turk workers that can view your task if it contains adult content.
-
You must read and confirm the following statement by selecting the check box to use the Mechanical Turk workforce. If your input data contains confidential information, personal information, or protected health information, you must select another workforce.
You understand and agree that the Mechanical Turk workforce consists of independent contractors located worldwide and that you should not share confidential information, personal information, or protected health information with this workforce.
-
(Optional) Select the check box next to Enable automated data labeling if you want to enable automated data labeling. To learn more about this feature, see Automate data labeling.
-
You can specify the Number of workers per dataset object under Additional configuration. For example, if you enter 3 in this field, each data object will be labeled by 3 workers.
When you create your labeling job by selecting Create, your labeling tasks are sent to Mechanical Turk workers.
To use Mechanical Turk when you create a labeling job (API):
-
Use the following to create a labeling job using the
CreateLabelingJob
operation: Create a Labeling Job (API). -
Use the following for the
WorkteamArn
. Replace
with the AWS Region you are using to create the labeling job.region
arn:aws:sagemaker:
region
:394669845002:workteam/public-crowd/default -
Use
TaskTimeLimitInSeconds
to specify the total amount of time workers have to complete a task. -
Use
TaskAvailabilityLifetimeInSeconds
to specify the total amount of time a task remains available to workers. This is how long workers have to pick up a task before it fails. -
Use
NumberOfHumanWorkersPerDataObject
to specify the number of workers per dataset object. -
Use
PublicWorkforceTaskPrice
to set the price per task. This is the amount of money a worker receives for completing a single task. -
Use
DataAttributes
to specify that your input data is free of confidential information, personal information, or protected health information.Ground Truth requires that your input data is free of personally identifiable information (PII) if you use the Mechanical Turk workforce. If you use Mechanical Turk and do not specify that your input data is free of PII using the
FreeOfPersonallyIdentifiableInformation
flag, your labeling job will fail.Use the
FreeOfAdultContent
flag to declare that your input data is free of adult content. SageMaker may restrict the Mechanical Turk workers that can view your task if it contains adult content.
You can see examples of how to use this API in the following notebooks, found on
GitHub: Ground Truth Jupyter Notebook Examples
Use Mechanical Turk with Amazon A2I
You can specify that you want to use Mechanical Turk with Amazon A2I when you create a human
review workflow, also referred to as a flow
definition, in the console, or with the CreateFlowDefinition
API operation. When you use this human review workflow to configure human loops, you
must specify that your input data is free of PII.
To use Mechanical Turk when you create a human review workflow (console):
-
Use the following to create a human review workflow in the Augmented AI section of the SageMaker console: Create a Human Review Workflow (Console).
-
When you are selecting Worker types in the Workers section, select Amazon Mechanical Turk.
-
Select the Price per task using the dropdown list. This is the amount of money a worker receives for completing a single task.
-
(Optional) You can specify the Number of workers per dataset object under Additional configuration. For example, if you enter 3 in this field, each data object will be labeled by 3 workers.
-
(Optional) Specify the total amount of time workers have to complete a task using Task timeout.
-
(Optional) Specify the total amount of time a task remains available to workers in Task expiration. This is how long workers have to pick up a task before it fails.
-
Once you have created your human review workflow, you can use it to configure a human loop by providing its Amazon Resource Name (ARN) in the parameter
FlowDefinitionArn
. You configure a human loop using one of the API operations of a built-in task type, or the Amazon A2I runtime API operation,StartHumanLoop
. To learn more, see Create and Start a Human Loop.When you configure your human loop, you must specify that your input data is free of personally identifiable information (PII) using the
FreeOfPersonallyIdentifiableInformation
content classifier inDataAttributes
. If you use Mechanical Turk and do not specify that your input data is free of PII, your human review tasks will fail.Use the
FreeOfAdultContent
flag to declare that your input data is free of adult content. SageMaker may restrict the Mechanical Turk workers that can view your task if it contains adult content.
To use Mechanical Turk when you create a human review workflow (API):
-
Use the following to create a human review workflow using the
CreateFlowDefinition
operation: Create a Human Review Workflow (API). -
Use the following for the
WorkteamArn
. Replace
with the AWS Region you are using to create the labeling job.region
arn:aws:sagemaker:
region
:394669845002:workteam/public-crowd/default -
Use
TaskTimeLimitInSeconds
to specify the total amount of time workers have to complete a task. -
Use
TaskAvailabilityLifetimeInSeconds
to specify the total amount of time a task remains available to workers. This is how long workers have to pick up a task before it fails. -
Use
TaskCount
to specify the number of workers per dataset object. For example, if you specify 3 for this parameter, each data object will be labeled by 3 workers. -
Use
PublicWorkforceTaskPrice
to set the price per task. This is the amount of money a worker receives for completing a single task. -
Once you have created your human review workflow, you can use it to configure a human loop by providing its Amazon Resource Name (ARN) in the parameter
FlowDefinitionArn
. You configure a human loop using one of the API operations of a built-in task type, or the Amazon A2I runtime API operation,StartHumanLoop
. To learn more, see Create and Start a Human Loop.When you configure your human loop, you must specify that your input data is free of personally identifiable information (PII) using the
FreeOfPersonallyIdentifiableInformation
content classifier inDataAttributes
. If you use Mechanical Turk and do not specify that your input data is free of PII, your human review tasks will fail.Use the
FreeOfAdultContent
flag to declare that your input data is free of adult content. SageMaker may restrict the Mechanical Turk workers that can view your task if it contains adult content.
You can see examples of how to use this API in the following notebooks, found on
GitHub: Amazon A2I Jupyter Notebook Examples
When is Mechanical Turk Not Supported?
This workforce is not supported under the following scenarios. In each scenario, you must use a private or vendor workforce.
-
This workforce is not supported for Ground Truth video frame labeling jobs and 3D point cloud labeling jobs.
-
You cannot use this workforce if your input data contains personally identifiable information (PII).
-
Mechanical Turk is not available in some of the AWS special regions. If applicable, refer to the documentation for your special region for more information.