Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Example PII redaction and identification output - Amazon Transcribe

Example PII redaction and identification output

The following examples show redacted output from batch and streaming jobs, and PII identification from a streaming job.

Transcription jobs using content redaction generate two types of confidence values. The Automatic Speech Recognition (ASR) confidence indicates the items that have the type of pronunciation or punctuation is a specific utterance. In the following transcript output, the word Good has a confidence of 1.0. This confidence value indicates that Amazon Transcribe is 100 percent confident that the word uttered in this transcript is 'Good'. The confidence value for a [PII] tag is the confidence that the speech it flagged for redaction is truly PII. In the following transcript output, the confidence of 0.9999 indicates that Amazon Transcribe is 99.99 percent confident that the entity it redacted in the transcript is PII.

Example redacted output (batch)

{ "jobName": "my-first-transcription-job", "accountId": "111122223333", "isRedacted": true, "results": { "transcripts": [ { "transcript": "Good morning, everybody. My name is [PII], and today I feel like sharing a whole lot of personal information with you. Let's start with my Social Security number [PII]. My credit card number is [PII] and my C V V code is [PII]. I hope that Amazon Transcribe is doing a good job at redacting that personal information away. Let's check." } ], "items": [ { "id": 0, "start_time": "2.86", "end_time": "3.35", "alternatives": [ { "confidence": "1.0", "content": "Good" } ], "type": "pronunciation" }, Items removed for brevity { "id": 8, "start_time": "5.56", "end_time": "6.25", "alternatives": [ { "content": "[PII]", "redactions": [ { "confidence": "0.9999", "type": "NAME", "category": "PII" } ] } ], "type": "pronunciation" }, Items removed for brevity ], }, "status": "COMPLETED" }

Here's the unredacted transcript for comparison:

{ "jobName": "job id", "accountId": "111122223333", "isRedacted": false, "results": { "transcripts": [ { "transcript": "Good morning, everybody. My name is Mike, and today I feel like sharing a whole lot of personal information with you. Let's start with my Social Security number 000000000. My credit card number is 5555555555555555 and my C V V code is 000. I hope that Amazon Transcribe is doing a good job at redacting that personal information away. Let's check." } ], "items": [ { "id": 0, "start_time": "2.86", "end_time": "3.35", "alternatives": [ { "confidence": "1.0", "content": "Good" } ], "type": "pronunciation" }, Items removed for brevity { "id": 8, "start_time": "5.56", "end_time": "6.25", "alternatives": [ { "confidence": "0.9999", "content": "Mike", { ], "type": "pronunciation" }, Items removed for brevity ], }, "status": "COMPLETED" }

Example redacted streaming output

{ "TranscriptResultStream": { "TranscriptEvent": { "Transcript": { "Results": [ { "Alternatives": [ { "Transcript": "my name is [NAME]", "Items": [ { "Content": "my", "EndTime": 0.3799375, "StartTime": 0.0299375, "Type": "pronunciation" }, { "Content": "name", "EndTime": 0.5899375, "StartTime": 0.3899375, "Type": "pronunciation" }, { "Content": "is", "EndTime": 0.7899375, "StartTime": 0.5999375, "Type": "pronunciation" }, { "Content": "[NAME]", "EndTime": 1.0199375, "StartTime": 0.7999375, "Type": "pronunciation" } ], "Entities": [ { "Content": "[NAME]", "Category": "PII", "Type": "NAME", "StartTime" : 0.7999375, "EndTime" : 1.0199375, "Confidence": 0.9989 } ] } ], "EndTime": 1.02, "IsPartial": false, "ResultId": "12345a67-8bc9-0de1-2f34-a5b678c90d12", "StartTime": 0.0199375 } ] } } } }

Example PII identification output

PII identification is an additional feature that you can use with your streaming transcription job. The identified PII is listed in each segment's Entities section.

{ "TranscriptResultStream": { "TranscriptEvent": { "Transcript": { "Results": [ { "Alternatives": [ { "Transcript": "my name is mike", "Items": [ { "Content": "my", "EndTime": 0.3799375, "StartTime": 0.0299375, "Type": "pronunciation" }, { "Content": "name", "EndTime": 0.5899375, "StartTime": 0.3899375, "Type": "pronunciation" }, { "Content": "is", "EndTime": 0.7899375, "StartTime": 0.5999375, "Type": "pronunciation" }, { "Content": "mike", "EndTime": 0.9199375, "StartTime": 0.7999375, "Type": "pronunciation" } ], "Entities": [ { "Content": "mike", "Category": "PII", "Type": "NAME", "StartTime" : 0.7999375, "EndTime" : 1.0199375, "Confidence": 0.9989 } ] } ], "EndTime": 1.02, "IsPartial": false, "ResultId": "12345a67-8bc9-0de1-2f34-a5b678c90d12", "StartTime": 0.0199375 } ] } } } }
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.