Audio
The Amazon Bedrock Data Automation (BDA) feature offers a set of standard output to process and generate insights for audio files. Here's a detailed look at each operation type:
Full Audio Summary
Full audio summary generates an overall summary of the entire audio file. It distills the key themes, events, and information presented throughout the audio into a concise summary.
Full Audio Transcript
The full audio transcript feature provides a complete text representation of all spoken content in the audio. It uses advanced speech recognition technology to accurately transcribe dialogue, narration, and other audio elements. The transcription includes time-stamping, making it easy to navigate and search through audio content based on spoken words.
Topic Summary
Audio topic summary seperates the audio file into sections called topics, and summarizes them to provide key information. These topics are given timestamps to help place them in the audio file as a whole. This feature is not enabled by default.
Content Moderation
Content moderation uses audio and text-based cues to identify and classify voice-based toxic content into seven different categories:
-
Profanity: Speech that contains words, phrases, or acronyms that are impolite, vulgar, or offensive.
-
Hate speech: Speech that criticizes, insults, denounces, or dehumanizes a person or group on the basis of an identity (such as race, ethnicity, gender, religion, sexual orientation, ability, and national origin).
-
Sexual: Speech that indicates sexual interest, activity, or arousal using direct or indirect references to body parts, physical traits, or sex.
-
Insults: Speech that includes demeaning, humiliating, mocking, insulting, or belittling language. This type of language is also labeled as bullying
-
Violence or threat: Speech that includes threats seeking to inflict pain, injury, or hostility toward a person or group.
-
Graphic: Speech that uses visually descriptive and unpleasantly vivid imagery. This type of language is often intentionally verbose to amplify a recipient's discomfort.
-
Harassment or abusive: Speech intended to affect the psychological well-being of the recipient, including demeaning and objectifying terms. This type of language is also labeled as harassment.
Audio Standard Output
The following is an example of a standard output for an audio file processed through BDA:
{ "metadata": { "id": "audio_123", "semantic_modality": "AUDIO", "s3_bucket": "my-audio-bucket", "s3_prefix": "audios/", "format": "MP3", "sample_rate": 44100, "bit_rate": 128000, "duration_millis": 180000, "channels": 2 }, "audio_segments": [ { "start_timestamp_millis": 0, "end_timestamp_millis": 30000, "id": "audio_segment_1", "type": "TRANSCRIPT", "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance.", }, { "start_timestamp_millis": 30000, "end_timestamp_millis": 60000, "id": "audio_segment_2", "type": "TRANSCRIPT", "text": "Let's start by looking at the healthcare industry. AI is revolutionizing diagnostics, drug discovery, and personalized medicine.", } } ] "topics": [ { "topic_index": 0, "start_timestamp_millis": 0, "end_timestamp_millis": 30000, "summary": "As follows: The opening of a podcast, introducing the topic of discussion, which involves how AI is impacting various industries.", "transcript": { "representation": { "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance." } }, "audio": { "summary": "A podcast discussion about recent advancements in artificial intelligence and their potential impact on various industries.", "transcript": { "representation": { "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance. Let's start by looking at the healthcare industry. AI is revolutionizing diagnostics, drug discovery, and personalized medicine." } }, "content_moderation": [ { "id": "mod_12345", "type": "CONTENT_MODERATION", "confidence": 0.1, "start_timestamp_millis": 0, "end_timestamp_millis": 180000, "moderation_categories": [ { "category": "profanity", "confidence": 0.05 } ] } ], }, "statistics": { "word_count": 150, "segment_count": 6 } }
This output includes:
-
Audio metadata
-
Audio summarization
-
Topic summarization
-
Full transcript
-
Content moderation results
-
Statistics about the analyzed content
This example illustrates the comprehensive nature of the BDA output for audio, providing rich, structured data that can be easily integrated into various applications for further analysis or processing.
BDA Audio Processing Restrictions
BDA supports audio clips in the file formats AMR, FLAC, M4A, MP3, Ogg, and WAV. The maximum file size of audio files is 2048 MB. The minimum audio sample rate is 8000 Hz, and the maximum sample rate is 48000 Hz. The maximum audio length is 240 minutes and the minimum length is 500 milliseconds. If an audio file has mutiple audio streams, it will only process the first stream.