Detecting labels in a video
Amazon Rekognition Video can detect labels (objects and concepts), and the time a label is detected, in a video. For an SDK code example, see Analyzing a video stored in an Amazon S3 bucket with Java or Python (SDK). For an AWS CLI example, see Analyzing a video with the AWS Command Line Interface.
Amazon Rekognition Video label detection is an asynchronous operation. To start the detection of labels in a video, call StartLabelDetection.
Amazon Rekognition Video publishes the completion status of the video analysis to an Amazon Simple Notification Service topic. If the video analysis is succesful, call GetLabelDetection to get the detected labels. For information about calling the video analysis API operations, see Calling Amazon Rekognition Video operations.
StartLabelDetection Request
The following example is a request for the StartLabelDetection
operation. You provide the StartLabelDetection
operation with a video
stored in an Amazon S3 bucket. In the example request JSON, the Amazon S3 bucket and the video
name are specified, along with MinConfidence
, Features
,
Settings
, and NotificationChannel
.
MinConfidence
is the minimum confidence that Amazon Rekognition Video
must have in the accuracy of the detected label, or an instance bounding box (if
detected), for it to be returned in the response.
With Features
, you can specify that you want GENERAL_LABELS returned
as part of the response.
With Settings
, you can filter the returned items for GENERAL_LABELS.
For labels you can use inclusive and exclusive filters. You can also filter by label
specific, individual labels or by label category:
-
LabelInclusionFilters
- Used to specify which labels you want included in the response -
LabelExclusionFilters
- Used to specify which labels you want excluded from the response. -
LabelCategoryInclusionFilters
- Used to specify which label categories you want included in the response. -
LabelCategoryExclusionFilters
- Used to specify which label categories you want excluded from the response.
You can also combine inclusive and exclusive filters according to your needs, excluding some labels or categories and including others.
NotificationChannel
is the ARN of the Amazon SNS topic you want Amazon Rekognition Video
to publish the completion status of the label detection operation to. If you’re
using the AmazonRekognitionServiceRole
permissions policy, then the
Amazon SNS topic must have a topic name that begins with Rekognition.
The following is a sample StartLabelDetection
request in JSON form,
including filters:
{ "ClientRequestToken": "5a6e690e-c750-460a-9d59-c992e0ec8638", "JobTag": "5a6e690e-c750-460a-9d59-c992e0ec8638", "Video": { "S3Object": { "Bucket": "bucket", "Name": "video.mp4" } }, "Features": ["GENERAL_LABELS"], "MinConfidence": 75, "Settings": { "GeneralLabels": { "LabelInclusionFilters": ["Cat", "Dog"], "LabelExclusionFilters": ["Tiger"], "LabelCategoryInclusionFilters": ["Animals and Pets"], "LabelCategoryExclusionFilters": ["Popular Landmark"] } }, "NotificationChannel": { "RoleArn": "arn:aws:iam::012345678910:role/SNSAccessRole", "SNSTopicArn": "arn:aws:sns:us-east-1:012345678910:notification-topic", } }
GetLabelDetection Operation Response
GetLabelDetection
returns an array (Labels
) that
contains information about the labels detected in the video. The array can be sorted
either by time or by the label detected when specifying the SortBy
parameter.You can also select how response items are aggregated by using the
AggregateBy
parameter.
The following example is the JSON response of the GetLabelDetection
.
In the response, note the following:
-
Sort order – The array of labels returned is sorted by time. To sort by label, specify
NAME
in theSortBy
input parameter forGetLabelDetection
. If the label appears multiple times in the video, there will be multiples instances of the (LabelDetection) element. The default sort oder isTIMESTAMP
, while the secondary sort order isNAME
. -
Label information – The
LabelDetection
array element contains a (Label) object, which in turn contains the label name and the confidence Amazon Rekognition has in the accuracy of the detected label. ALabel
object also includes a hierarchical taxonomy of labels and bounding box information for common labels.Timestamp
is the time the label was detected, defined as the number of milliseconds elapsed since the start of the video.Information about any Categories or Aliases associated with a label is also returned. For results aggregated by video
SEGMENTS
, theStartTimestampMillis
,EndTimestampMillis
, andDurationMillis
structures are returned, which define the start time, end time, and duration of a segment respectively. -
Aggregation – Specifies how results are aggregated when returned. The default is to aggregate by
TIMESTAMPS
. You can also choose to aggregate bySEGMENTS
, which aggregates results over a time window. If aggregating bySEGMENTS
, information about detected instances with bounding boxes are not returned. Only labels detected during the segments are returned. -
Paging information – The example shows one page of label detection information. You can specify how many
LabelDetection
objects to return in theMaxResults
input parameter forGetLabelDetection
. If more results thanMaxResults
exist,GetLabelDetection
returns a token (NextToken
) used to get the next page of results. For more information, see Getting Amazon Rekognition Video analysis results. -
Video information – The response includes information about the video format (
VideoMetadata
) in each page of information returned byGetLabelDetection
.
The following is a sample GetLabelDetection response in JSON form with aggregation by TIMESTAMPS:
{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "Timestamp": 1000, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 1000, "Label": { "Name": "Cup", "Categories": [ { "Name": "Kitchen and Dining" } ], "Aliases": [ { "Name": "Mug" } ], "Parents": [], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 2000, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567, }, "Confidence": 99.9364013671875 } ] } }, { "Timestamp": 4000, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 } ] } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }
The following is a sample GetLabelDetection response in JSON form with aggregation by SEGMENTS:
{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "StartTimestampMillis": 225, "EndTimestampMillis": 3578, "DurationMillis": 3353, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 // Maximum confidence score for Segment mode } }, { "StartTimestampMillis": 7578, "EndTimestampMillis": 12371, "DurationMillis": 4793, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875 } }, { "StartTimestampMillis": 22225, "EndTimestampMillis": 22578, "DurationMillis": 2353, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }
Transforming the GetLabelDetection Response
When retrieving results with the GetLabelDetection API operation, you might need the response structure to mimic the older API response structure, where both primary labels and aliases were contained in the same list.
The example JSON response found in the preceding section displays the current form of the API response from GetLabelDetection.
The following example shows the previous response from the GetLabelDetection API:
{ "Labels": [ { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 60.51791763305664, "Parents": [], "Name": "Leaf" } }, { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 99.53411102294922, "Parents": [], "Name": "Human" } }, { "Timestamp": 0, "Label": { "Instances": [ { "BoundingBox": { "Width": 0.11109819263219833, "Top": 0.08098889887332916, "Left": 0.8881205320358276, "Height": 0.9073750972747803 }, "Confidence": 99.5831298828125 }, { "BoundingBox": { "Width": 0.1268676072359085, "Top": 0.14018426835536957, "Left": 0.0003282368124928324, "Height": 0.7993982434272766 }, "Confidence": 99.46029663085938 } ], "Confidence": 99.63411102294922, "Parents": [], "Name": "Person" } }, . . . { "Timestamp": 166, "Label": { "Instances": [], "Confidence": 73.6471176147461, "Parents": [ { "Name": "Clothing" } ], "Name": "Sleeve" } } ], "LabelModelVersion": "2.0", "JobStatus": "SUCCEEDED", "VideoMetadata": { "Format": "QuickTime / MOV", "FrameRate": 23.976024627685547, "Codec": "h264", "DurationMillis": 5005, "FrameHeight": 674, "FrameWidth": 1280 } }
If needed, you can transform the current response to follow the format of the older response. You can use the following sample code to transform the latest API response to the previous API response structure:
from copy import deepcopy VIDEO_LABEL_KEY = "Labels" LABEL_KEY = "Label" ALIASES_KEY = "Aliases" INSTANCE_KEY = "Instances" NAME_KEY = "Name" #Latest API response sample for AggregatedBy SEGMENTS EXAMPLE_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, ] } #Output example after the transformation for AggregatedBy SEGMENTS EXPECTED_EXPANDED_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, { "Timestamp": 0, "Label":{ "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, ] } #Latest API response sample for AggregatedBy TIMESTAMPS EXAMPLE_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, ] } #Output example after the transformation for AggregatedBy TIMESTAMPS EXPECTED_EXPANDED_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, { "Timestamp": 0, "Label": { "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, }, ] } def expand_aliases(inferenceOutputsWithAliases): if VIDEO_LABEL_KEY in inferenceOutputsWithAliases: expandInferenceOutputs = [] for segmentLabelDict in inferenceOutputsWithAliases[VIDEO_LABEL_KEY]: primaryLabelDict = segmentLabelDict[LABEL_KEY] if ALIASES_KEY in primaryLabelDict: for alias in primaryLabelDict[ALIASES_KEY]: aliasLabelDict = deepcopy(segmentLabelDict) aliasLabelDict[LABEL_KEY][NAME_KEY] = alias[NAME_KEY] del aliasLabelDict[LABEL_KEY][ALIASES_KEY] if INSTANCE_KEY in aliasLabelDict[LABEL_KEY]: del aliasLabelDict[LABEL_KEY][INSTANCE_KEY] expandInferenceOutputs.append(aliasLabelDict) inferenceOutputsWithAliases[VIDEO_LABEL_KEY].extend(expandInferenceOutputs) return inferenceOutputsWithAliases if __name__ == "__main__": segmentOutputWithExpandAliases = expand_aliases(EXAMPLE_SEGMENT_OUTPUT) assert segmentOutputWithExpandAliases == EXPECTED_EXPANDED_SEGMENT_OUTPUT timestampOutputWithExpandAliases = expand_aliases(EXAMPLE_TIMESTAMP_OUTPUT) assert timestampOutputWithExpandAliases == EXPECTED_EXPANDED_TIMESTAMP_OUTPUT