使用自訂詞彙篩選 - Amazon Transcribe

使用自訂詞彙篩選

建立自訂詞彙篩選後，您可以包含在轉錄請求中；請參閱以下章節以取得範例。

您要包含在請求中的自訂詞彙篩選的語言，必須與您為媒體指定的語言代碼相符。如果您使用語言識別並指定多種語言選項，則可以針對指定的語言納入一個自訂詞彙篩選。如果自訂詞彙篩選的語言與您的音訊中識別的語言不符，您的篩選就不會套用至您的轉錄，且不會出現警告或錯誤。

在批次轉錄中使用自訂詞彙篩選

若要搭配批次轉錄使用自訂詞彙篩選，請參閱下列範例：

登入 AWS Management Console。
在導覽窗格中，選擇轉錄作業，然後選擇建立作業(右上角)。這會開啟指定作業詳細資訊‭頁面。

為工作命名並指定輸入媒體。選擇性包含任何其他欄位，然後選擇下一步。
在設定工作頁面的內容移除面板中，開啟詞彙篩選。
從下拉式功能表中選擇自訂詞彙篩選，然後指定篩選方法。
選擇建立作業以執行轉錄作業。

AWS Management Console

登入 AWS Management Console。
在導覽窗格中，選擇轉錄作業，然後選擇建立作業(右上角)。這會開啟指定作業詳細資訊‭頁面。

為工作命名並指定輸入媒體。選擇性包含任何其他欄位，然後選擇下一步。
在設定工作頁面的內容移除面板中，開啟詞彙篩選。
從下拉式功能表中選擇自訂詞彙篩選，然後指定篩選方法。
選擇建立作業以執行轉錄作業。

此範例使用 start-transcription-job 指令和 Settings 參數與 VocabularyFilterName 和 VocabularyFilterMethod 子參數。如需詳細資訊，請參閱 StartTranscriptionJob 和 Settings。


aws transcribe start-transcription-job \
--region us-west-2 \
--transcription-job-name my-first-transcription-job \
--media MediaFileUri=s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac \
--output-bucket-name amzn-s3-demo-bucket \
--output-key my-output-files/ \
--language-code en-US \
--settings VocabularyFilterName=my-first-vocabulary-filter,VocabularyFilterMethod=mask

這是使用 start-transcription-job 指令的另一個範利，以及包含工作的自訂詞彙篩選的請求內文。


aws transcribe start-transcription-job \
--region us-west-2 \
--cli-input-json file://my-first-vocabulary-filter-job.json

檔案 my-first-vocabulary-filter-job.json 包含以下請求內文。


{
  "TranscriptionJobName": "my-first-transcription-job",
  "Media": {
        "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
  },
  "OutputBucketName": "amzn-s3-demo-bucket",
  "OutputKey": "my-output-files/", 
  "LanguageCode": "en-US",
  "Settings": {
        "VocabularyFilterName": "my-first-vocabulary-filter",
        "VocabularyFilterMethod": "mask"
   }
}

AWS CLI


aws transcribe start-transcription-job \
--region us-west-2 \
--transcription-job-name my-first-transcription-job \
--media MediaFileUri=s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac \
--output-bucket-name amzn-s3-demo-bucket \
--output-key my-output-files/ \
--language-code en-US \
--settings VocabularyFilterName=my-first-vocabulary-filter,VocabularyFilterMethod=mask

這是使用 start-transcription-job 指令的另一個範利，以及包含工作的自訂詞彙篩選的請求內文。


aws transcribe start-transcription-job \
--region us-west-2 \
--cli-input-json file://my-first-vocabulary-filter-job.json

檔案 my-first-vocabulary-filter-job.json 包含以下請求內文。


{
  "TranscriptionJobName": "my-first-transcription-job",
  "Media": {
        "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
  },
  "OutputBucketName": "amzn-s3-demo-bucket",
  "OutputKey": "my-output-files/", 
  "LanguageCode": "en-US",
  "Settings": {
        "VocabularyFilterName": "my-first-vocabulary-filter",
        "VocabularyFilterMethod": "mask"
   }
}

此範例使用適用於 Python (Boto3) 的 AWS SDK ，使用 start_transcription_job 方法的引Settings數來包含自訂詞彙篩選條件。如需詳細資訊，請參閱 StartTranscriptionJob 和 Settings。

如需使用 AWS SDKs 的其他範例，包括功能特定、案例和跨服務範例，請參閱使用 AWS SDKs Amazon Transcribe 程式碼範例章節。


from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = {
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US', 
    Settings = {
        'VocabularyFilterName': 'my-first-vocabulary-filter',
        'VocabularyFilterMethod': 'mask' 
   }
)

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)

適用於 Python (Boto3) 的 AWS SDK

如需使用 AWS SDKs 的其他範例，包括功能特定、案例和跨服務範例，請參閱使用 AWS SDKs Amazon Transcribe 程式碼範例章節。


from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = {
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US', 
    Settings = {
        'VocabularyFilterName': 'my-first-vocabulary-filter',
        'VocabularyFilterMethod': 'mask' 
   }
)

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)

在串流轉錄中使用自訂詞彙篩選

若要將自訂詞彙篩選與串流轉錄搭配使用，請參閱下列範例：

登入 AWS Management Console。
在導覽窗格中，選擇即時轉錄。向下捲動至內容移除設定，如果此欄位已最小化，請展開此欄位。
開啟詞彙篩選。從下拉式功能表中選擇自訂詞彙篩選，並指定篩選方法。

加入您要套用至串流的其他設定。
您現在已準備好轉錄串流。選擇開始串流並開始說話。若要結束聽寫，選擇停止串流。

AWS Management Console

登入 AWS Management Console。
在導覽窗格中，選擇即時轉錄。向下捲動至內容移除設定，如果此欄位已最小化，請展開此欄位。
開啟詞彙篩選。從下拉式功能表中選擇自訂詞彙篩選，並指定篩選方法。

加入您要套用至串流的其他設定。
您現在已準備好轉錄串流。選擇開始串流並開始說話。若要結束聽寫，選擇停止串流。

此範例會建立包含您自訂詞彙篩選和篩選方法的 HTTP/2 請求。如需搭配 HTTP/2 串流使用的詳細資訊 Amazon Transcribe，請參閱設定 HTTP/2 串流。如需特定參數和標頭的詳細資訊 Amazon Transcribe，請參閱 StartStreamTranscription。


POST /stream-transcription HTTP/2
host: transcribestreaming.us-west-2.amazonaws.com
X-Amz-Target: com.amazonaws.transcribe.Transcribe.StartStreamTranscription
Content-Type: application/vnd.amazon.eventstream
X-Amz-Content-Sha256: string
X-Amz-Date: 20220208T235959Z
Authorization: AWS4-HMAC-SHA256 Credential=access-key/20220208/us-west-2/transcribe/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-target;x-amz-security-token, Signature=string
x-amzn-transcribe-language-code: en-US
x-amzn-transcribe-media-encoding: flac
x-amzn-transcribe-sample-rate: 16000      
x-amzn-transcribe-vocabulary-filter-name: my-first-vocabulary-filter
x-amzn-transcribe-vocabulary-filter-method: mask
transfer-encoding: chunked

您可以在 API 參考中找到參數定義；所有 AWS API 操作常見的參數都列在通用參數區段中。

HTTP/2 串流


POST /stream-transcription HTTP/2
host: transcribestreaming.us-west-2.amazonaws.com
X-Amz-Target: com.amazonaws.transcribe.Transcribe.StartStreamTranscription
Content-Type: application/vnd.amazon.eventstream
X-Amz-Content-Sha256: string
X-Amz-Date: 20220208T235959Z
Authorization: AWS4-HMAC-SHA256 Credential=access-key/20220208/us-west-2/transcribe/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-target;x-amz-security-token, Signature=string
x-amzn-transcribe-language-code: en-US
x-amzn-transcribe-media-encoding: flac
x-amzn-transcribe-sample-rate: 16000      
x-amzn-transcribe-vocabulary-filter-name: my-first-vocabulary-filter
x-amzn-transcribe-vocabulary-filter-method: mask
transfer-encoding: chunked

您可以在 API 參考中找到參數定義；所有 AWS API 操作常見的參數都列在通用參數區段中。

此範例會建立個預先簽署的 URL，將您的自訂詞彙篩選套用至 WebSocket 串流。已加入分行符號以提高可讀性。如需有關搭配 Amazon Transcribe使用 WebSocket 串流的詳細資訊，請參閱設定 WebSocket 串流。如需參數詳細資訊，請參閱 StartStreamTranscription。


GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/stream-transcription-websocket?
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=string
&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-date
&language-code=en-US
&media-encoding=flac
&sample-rate=16000    
&vocabulary-filter-name=my-first-vocabulary-filter
&vocabulary-filter-method=mask

您可以在 API 參考中找到參數定義；所有 AWS API 操作常用的參數都列在通用參數區段中。

WebSocket 串流


GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/stream-transcription-websocket?
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=string
&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-date
&language-code=en-US
&media-encoding=flac
&sample-rate=16000    
&vocabulary-filter-name=my-first-vocabulary-filter
&vocabulary-filter-method=mask

您可以在 API 參考中找到參數定義；所有 AWS API 操作常用的參數都列在通用參數區段中。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

建立詞彙篩選條件

偵測有毒語音

選取您的 Cookie 偏好設定

自訂 Cookie 偏好設定

必要

效能

功能

廣告

無法儲存 Cookie 偏好設定

使用自訂詞彙篩選

在批次轉錄中使用自訂詞彙篩選

AWS Management Console

AWS CLI

適用於 Python (Boto3) 的 AWS SDK

在串流轉錄中使用自訂詞彙篩選

AWS Management Console

HTTP/2 串流

WebSocket 串流

此頁面是否有幫助？

下一個主題：

上一個主題：

需要協助？

Amazon Transcribe 主控台螢幕擷取畫面：「指定任務設定」頁面。

Amazon Transcribe 主控台螢幕擷取畫面：「設定任務」頁面。

Amazon Transcribe 主控台螢幕擷取畫面：詞彙篩選條件選擇選項。

Amazon Transcribe 主控台螢幕擷取畫面：「即時轉錄」頁面。

Amazon Transcribe 主控台螢幕擷取畫面：展開的「內容移除設定」窗格。