Audio

The Amazon Bedrock Data Automation (BDA) feature offers a set of standard output to process and generate insights for audio files. Here's a detailed look at each operation type:

Full Audio Summary

Full audio summary generates an overall summary of the entire audio file. It distills the key themes, events, and information presented throughout the audio into a concise summary.

Full Audio Transcript

The full audio transcript feature provides a complete text representation of all spoken content in the audio. It uses advanced speech recognition technology to accurately transcribe dialogue, narration, and other audio elements. The transcription includes time-stamping, making it easy to navigate and search through audio content based on spoken words.

Topic Summary

Audio topic summary seperates the audio file into sections called topics, and summarizes them to provide key information. These topics are given timestamps to help place them in the audio file as a whole. This feature is not enabled by default.

Content Moderation

Content moderation uses audio and text-based cues to identify and classify voice-based toxic content into seven different categories:

Profanity: Speech that contains words, phrases, or acronyms that are impolite, vulgar, or offensive.
Hate speech: Speech that criticizes, insults, denounces, or dehumanizes a person or group on the basis of an identity (such as race, ethnicity, gender, religion, sexual orientation, ability, and national origin).
Sexual: Speech that indicates sexual interest, activity, or arousal using direct or indirect references to body parts, physical traits, or sex.
Insults: Speech that includes demeaning, humiliating, mocking, insulting, or belittling language. This type of language is also labeled as bullying
Violence or threat: Speech that includes threats seeking to inflict pain, injury, or hostility toward a person or group.
Graphic: Speech that uses visually descriptive and unpleasantly vivid imagery. This type of language is often intentionally verbose to amplify a recipient's discomfort.
Harassment or abusive: Speech intended to affect the psychological well-being of the recipient, including demeaning and objectifying terms. This type of language is also labeled as harassment.

Audio Standard Output

The following is an example of a standard output for an audio file processed through BDA:


{
"metadata": {
    "id": "audio_123",
    "semantic_modality": "AUDIO",
    "s3_bucket": "my-audio-bucket",
    "s3_prefix": "audios/",
    "format": "MP3",
    "sample_rate": 44100,
    "bit_rate": 128000,
    "duration_millis": 180000,
    "channels": 2
},
"audio_segments": [
        {
            "start_timestamp_millis": 0,
            "end_timestamp_millis": 30000,
            "id": "audio_segment_1",
            "type": "TRANSCRIPT",
            "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance.",
        },
        {
            "start_timestamp_millis": 30000,
            "end_timestamp_millis": 60000,
            "id": "audio_segment_2",
            "type": "TRANSCRIPT",
            "text": "Let's start by looking at the healthcare industry. AI is revolutionizing diagnostics, drug discovery, and personalized medicine.",
            }
        }
    ]
 "topics": [
    {
      "topic_index": 0,
      "start_timestamp_millis": 0,
      "end_timestamp_millis": 30000,
      "summary": "As follows: The opening of a podcast, introducing the topic of discussion, which involves how AI is impacting various industries.",
      "transcript": {
        "representation": {
          "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance."
        }
      },
"audio": {
    "summary": "A podcast discussion about recent advancements in artificial intelligence and their potential impact on various industries.",
    "transcript": {
        "representation": {
            "text": "Welcome to our podcast on AI advancements. Today, we'll be discussing how recent developments in artificial intelligence are reshaping industries from healthcare to finance. Let's start by looking at the healthcare industry. AI is revolutionizing diagnostics, drug discovery, and personalized medicine."
        }
    },
    "content_moderation": [
        {
            "id": "mod_12345",
            "type": "CONTENT_MODERATION",
            "confidence": 0.1,
            "start_timestamp_millis": 0,
            "end_timestamp_millis": 180000,
            "moderation_categories": [
                {
                    "category": "profanity",
                    "confidence": 0.05
                }
            ]
        }
    ],
    
},
"statistics": {
    "word_count": 150,
    "segment_count": 6
}
}

This output includes:

Audio metadata
Audio summarization
Topic summarization
Full transcript
Content moderation results
Statistics about the analyzed content

This example illustrates the comprehensive nature of the BDA output for audio, providing rich, structured data that can be easily integrated into various applications for further analysis or processing.

BDA Audio Processing Restrictions

BDA supports audio clips in the file formats AMR, FLAC, M4A, MP3, Ogg, and WAV. The maximum file size of audio files is 2048 MB. The minimum audio sample rate is 8000 Hz, and the maximum sample rate is 48000 Hz. The maximum audio length is 240 minutes and the minimum length is 500 milliseconds. If an audio file has mutiple audio streams, it will only process the first stream.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Images

Custom output and blueprints