Videos
BDA offers a set of standard outputs to process and generate insights for videos. Here's a detailed look at each operation type:
Full Video Summary
Full video summary generates an overall summary of the entire video. It distills the key themes, events, and information presented throughout the video into a concise summary. Full video summary is optimized for content with descriptive dialogue such as product overviews, trainings, news casts, talk shows, and documentaries. BDA will attempt to provide a name for each unique speaker based on audio signals (e.g., the speaker introduces themself) or visual signals (e.g., a presentation slide shows a speaker’s name) in the full video summaries and the scene summaries. When a unique speaker’s name is not resolved they will be represented by a unique number (e.g., speaker_0). Full video summary is optimized for content with descriptive dialogue such as product overviews, trainings, news casts, talk shows, and documentaries.
Scene Summaries
Video scene summarization provides descriptive summaries for individual scenes within a video. A video scene is a sequence of shots that form a coherent unit of action or narrative within the video. This feature breaks down the video into meaningful segments based on visual and audible cues , provides timestamps for those segments, and summarizes each. BDA will attempt to provide a name for each unique speaker based on audio signals (e.g., the speaker introduces themself) or visual signals (e.g., a presentation slide shows a speaker’s name) in the full video summaries and the scene summaries.
IAB Taxonomy
The Interactive Advertising Bureau (IAB) classification applies a standard advertising taxonomy to classify video scenes based on visual and audio elements. For Preview, BDA will support 24 top-level (L1) categories and 85 second-level (L2) categories. To download the list of IAB categories supported by BDA, click here.
Full Audio Transcript
The full audio transcript feature provides a complete text representation of all speech in the audio file. It uses advanced speech recognition technology to accurately transcribe dialogue, narration, and other audio elements. The transcription includes speaker identification, making it easy to navigate and search through the audio content based on the speaker.
Text in Video
This feature detects and extracts text that appears visually in the video. It can identify both static text (like titles or captions) and dynamic text (such as moving text in graphics). Similar to image text detection, it provides bounding box information for each detected text element, allowing for precise localization within video frames.
Content Moderation
Content moderation detects inappropriate, unwanted, or offensive content in an image. For Preview, BDA will support 7 moderation categories: Explicit, Non-Explicit Nudity of Intimate parts and Kissing, Swimwear or Underwear, Violence, Drugs & Tobacco, Alcohol, Hate symbols. Explicit text in videos is not flagged.
Bounding boxes and the associated confidence scores can be enabled or disabled for relevant features like text detection, to provide location coordinates and timestamps in the video file. By default, full video summarization, scene summarization, and video text detection are enabled.
Note
Only one audio track per video is supported. Subtitle file formats (e.g., SRT, VTT, etc.) are not supported.
Video Standard Output
The following is an example of a standard output for a video processed through BDA:
{ "metadata": { "id": "video_123", "semantic_modality": "VIDEO", "s3_bucket": "my-video-bucket", "s3_prefix": "videos/", "format": "MP4", "frame_rate": 24.0, "codec": "h264", "duration_millis": 120000, "frame_width": 1920, "frame_height": 1080 }, "video": { "summary": "A tech conference presentation discussing AI advancements and their impact on various industries.", "transcript": { "representation": { "text": "This is a sample video transcript. The video discusses various topics including technology, innovation, and the future of our society." } } }, "scenes": [ { "scene_index": 0, "start_timecode_SMPTE": "00:00:00:00", "end_timecode_SMPTE": "00:00:30:00", "start_timestamp_millis": 0, "end_timestamp_millis": 30000, "start_frame_index": 0, "end_frame_index": 720, "duration_smpte": "00:00:30:00", "duration_millis": 30000, "duration_frames": 720, "shot_indices": [0, 1], "summary": "This scene introduces the main topic of the video and provides an overview of the key themes.", "transcript": { "representation": { "text": "Welcome to this video on the future of technology. In this presentation, we will explore the latest advancements in various fields, including artificial intelligence, renewable energy, and smart city initiatives." } }, "iab_categories": [ { "id": "iab_12345", "type": "IAB", "category": "Technology & Computing", "confidence": 0.9, "parent_name": "Business & Industrial", "taxonomy_level": 2 }, { "id": "iab_67890", "type": "IAB", "category": "Renewable Energy", "confidence": 0.8, "parent_name": "Energy & Utilities", "taxonomy_level": 2 } ], "content_moderation": [ { "id": "mod_12345", "type": "CONTENT_MODERATION", "confidence": 0.1, "start_timestamp_millis": 0, "end_timestamp_millis": 30000, "moderation_categories": [ { "category": "profanity", "confidence": 0.2 } ] } ], "audio_segments": [ { "start_timestamp_millis": 0, "end_timestamp_millis": 30000, "id": "audio_segment_1", "type": "TRANSCRIPT", "text": "Welcome to this video on the future of technology. In this presentation, we will explore the latest advancements in various fields, including artificial intelligence, renewable energy, and smart city initiatives.", "speaker": { "speaker_id": "SPK_001" } } ], "frames": [ { "timecode_SMPTE": "00:00:05:00", "timestamp_millis": 5000, "index": 120, "features": { "content_moderation": [ { "id": "mod_67890", "type": "MODERATION", "category": "Adult", "confidence": 0.2, "parent_name": "Sensitive", "taxonomy_level": 2 } ], "text_words": [ { "id": "word_1", "text": "technology", "confidence": 0.9, "line_id": "line_1", "locations": [ { "bounding_box": { "left": 0.1, "top": 0.2, "width": 0.2, "height": 0.1 }, "polygon": [ {"x": 0.1, "y": 0.2}, {"x": 0.3, "y": 0.2}, {"x": 0.3, "y": 0.3}, {"x": 0.1, "y": 0.3} ] } ] } ], "text_lines": [ { "id": "line_1", "text": "The future of technology", "confidence": 0.85, "locations": [ { "bounding_box": { "left": 0.05, "top": 0.1, "width": 0.4, "height": 0.2 }, "polygon": [ {"x": 0.05, "y": 0.1}, {"x": 0.45, "y": 0.1}, {"x": 0.45, "y": 0.3}, {"x": 0.05, "y": 0.3} ] } ] } ] } } ] } ], "statistics": { "entity_count": 20, "shot_count": 4, "scene_count": 2, "speaker_count": 1 } }
These examples illustrate the comprehensive nature of the BDA output, providing rich, structured data that can be easily integrated into various applications for further analysis or processing.