Cookie の設定を選択する

当社は、当社のサイトおよびサービスを提供するために必要な必須 Cookie および類似のツールを使用しています。当社は、パフォーマンス Cookie を使用して匿名の統計情報を収集することで、お客様が当社のサイトをどのように利用しているかを把握し、改善に役立てています。必須 Cookie は無効化できませんが、[カスタマイズ] または [拒否] をクリックしてパフォーマンス Cookie を拒否することはできます。

お客様が同意した場合、AWS および承認された第三者は、Cookie を使用して便利なサイト機能を提供したり、お客様の選択を記憶したり、関連する広告を含む関連コンテンツを表示したりします。すべての必須ではない Cookie を受け入れるか拒否するには、[受け入れる] または [拒否] をクリックしてください。より詳細な選択を行うには、[カスタマイズ] をクリックしてください。

SDK for Python (Boto3) を使用する Amazon Transcribe の例 - AWS SDK コードの例

Doc AWS SDK Examples GitHub リポジトリには、他にも SDK の例があります。 AWS


Doc AWS SDK Examples GitHub リポジトリには、他にも SDK の例があります。 AWS


SDK for Python (Boto3) を使用する Amazon Transcribe の例

次のコード例は、Amazon Transcribe AWS SDK for Python (Boto3) で を使用してアクションを実行し、一般的なシナリオを実装する方法を示しています。


「シナリオ」は、1 つのサービス内から、または他の AWS のサービスと組み合わせて複数の関数を呼び出し、特定のタスクを実行する方法を示すコード例です。



次のコード例は、CreateVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def create_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Creates a custom vocabulary that can be used to improve the accuracy of transcription jobs. This function returns as soon as the vocabulary processing is started. Call get_vocabulary to get the current status of the vocabulary. The vocabulary is ready to use when its status is 'READY'. :param vocabulary_name: The name of the custom vocabulary. :param language_code: The language code of the vocabulary. For example, en-US or nl-NL. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. :return: Information about the newly created vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.create_vocabulary(**vocab_args)"Created custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't create custom vocabulary %s.", vocabulary_name) raise else: return response
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「CreateVocabulary」を参照してください。

次のコード例は、CreateVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def create_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Creates a custom vocabulary that can be used to improve the accuracy of transcription jobs. This function returns as soon as the vocabulary processing is started. Call get_vocabulary to get the current status of the vocabulary. The vocabulary is ready to use when its status is 'READY'. :param vocabulary_name: The name of the custom vocabulary. :param language_code: The language code of the vocabulary. For example, en-US or nl-NL. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. :return: Information about the newly created vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.create_vocabulary(**vocab_args)"Created custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't create custom vocabulary %s.", vocabulary_name) raise else: return response
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「CreateVocabulary」を参照してください。

次のコード例は、DeleteTranscriptionJob を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def delete_job(job_name, transcribe_client): """ Deletes a transcription job. This also deletes the transcript associated with the job. :param job_name: The name of the job to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_transcription_job(TranscriptionJobName=job_name)"Deleted job %s.", job_name) except ClientError: logger.exception("Couldn't delete job %s.", job_name) raise
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「DeleteTranscriptionJob」を参照してください。

次のコード例は、DeleteTranscriptionJob を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def delete_job(job_name, transcribe_client): """ Deletes a transcription job. This also deletes the transcript associated with the job. :param job_name: The name of the job to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_transcription_job(TranscriptionJobName=job_name)"Deleted job %s.", job_name) except ClientError: logger.exception("Couldn't delete job %s.", job_name) raise
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「DeleteTranscriptionJob」を参照してください。

次のコード例は、DeleteVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def delete_vocabulary(vocabulary_name, transcribe_client): """ Deletes a custom vocabulary. :param vocabulary_name: The name of the vocabulary to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_vocabulary(VocabularyName=vocabulary_name)"Deleted vocabulary %s.", vocabulary_name) except ClientError: logger.exception("Couldn't delete vocabulary %s.", vocabulary_name) raise
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「DeleteVocabulary」を参照してください。

次のコード例は、DeleteVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def delete_vocabulary(vocabulary_name, transcribe_client): """ Deletes a custom vocabulary. :param vocabulary_name: The name of the vocabulary to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_vocabulary(VocabularyName=vocabulary_name)"Deleted vocabulary %s.", vocabulary_name) except ClientError: logger.exception("Couldn't delete vocabulary %s.", vocabulary_name) raise
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「DeleteVocabulary」を参照してください。

次の例は、GetTranscriptionJob を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def get_job(job_name, transcribe_client): """ Gets details about a transcription job. :param job_name: The name of the job to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: The retrieved transcription job. """ try: response = transcribe_client.get_transcription_job( TranscriptionJobName=job_name ) job = response["TranscriptionJob"]"Got job %s.", job["TranscriptionJobName"]) except ClientError: logger.exception("Couldn't get job %s.", job_name) raise else: return job
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「GetTranscriptionJob」を参照してください。

次の例は、GetTranscriptionJob を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def get_job(job_name, transcribe_client): """ Gets details about a transcription job. :param job_name: The name of the job to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: The retrieved transcription job. """ try: response = transcribe_client.get_transcription_job( TranscriptionJobName=job_name ) job = response["TranscriptionJob"]"Got job %s.", job["TranscriptionJobName"]) except ClientError: logger.exception("Couldn't get job %s.", job_name) raise else: return job
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「GetTranscriptionJob」を参照してください。

次の例は、GetVocabulary を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def get_vocabulary(vocabulary_name, transcribe_client): """ Gets information about a custom vocabulary. :param vocabulary_name: The name of the vocabulary to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: Information about the vocabulary. """ try: response = transcribe_client.get_vocabulary(VocabularyName=vocabulary_name)"Got vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't get vocabulary %s.", vocabulary_name) raise else: return response
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「GetVocabulary」を参照してください。

次の例は、GetVocabulary を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def get_vocabulary(vocabulary_name, transcribe_client): """ Gets information about a custom vocabulary. :param vocabulary_name: The name of the vocabulary to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: Information about the vocabulary. """ try: response = transcribe_client.get_vocabulary(VocabularyName=vocabulary_name)"Got vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't get vocabulary %s.", vocabulary_name) raise else: return response
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「GetVocabulary」を参照してください。

次の例は、ListTranscriptionJobs を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def list_jobs(job_filter, transcribe_client): """ Lists summaries of the transcription jobs for the current AWS account. :param job_filter: The list of returned jobs must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved transcription job summaries. """ try: response = transcribe_client.list_transcription_jobs(JobNameContains=job_filter) jobs = response["TranscriptionJobSummaries"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_transcription_jobs( JobNameContains=job_filter, NextToken=next_token ) jobs += response["TranscriptionJobSummaries"] next_token = response.get("NextToken")"Got %s jobs with filter %s.", len(jobs), job_filter) except ClientError: logger.exception("Couldn't get jobs with filter %s.", job_filter) raise else: return jobs
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「ListTranscriptionJobs」を参照してください。

次の例は、ListTranscriptionJobs を使用する方法を説明しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def list_jobs(job_filter, transcribe_client): """ Lists summaries of the transcription jobs for the current AWS account. :param job_filter: The list of returned jobs must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved transcription job summaries. """ try: response = transcribe_client.list_transcription_jobs(JobNameContains=job_filter) jobs = response["TranscriptionJobSummaries"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_transcription_jobs( JobNameContains=job_filter, NextToken=next_token ) jobs += response["TranscriptionJobSummaries"] next_token = response.get("NextToken")"Got %s jobs with filter %s.", len(jobs), job_filter) except ClientError: logger.exception("Couldn't get jobs with filter %s.", job_filter) raise else: return jobs
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「ListTranscriptionJobs」を参照してください。

次のコード例は、ListVocabularies を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def list_vocabularies(vocabulary_filter, transcribe_client): """ Lists the custom vocabularies created for this AWS account. :param vocabulary_filter: The returned vocabularies must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved vocabularies. """ try: response = transcribe_client.list_vocabularies(NameContains=vocabulary_filter) vocabs = response["Vocabularies"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_vocabularies( NameContains=vocabulary_filter, NextToken=next_token ) vocabs += response["Vocabularies"] next_token = response.get("NextToken") "Got %s vocabularies with filter %s.", len(vocabs), vocabulary_filter ) except ClientError: logger.exception( "Couldn't list vocabularies with filter %s.", vocabulary_filter ) raise else: return vocabs
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「ListVocabularies」を参照してください。

次のコード例は、ListVocabularies を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def list_vocabularies(vocabulary_filter, transcribe_client): """ Lists the custom vocabularies created for this AWS account. :param vocabulary_filter: The returned vocabularies must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved vocabularies. """ try: response = transcribe_client.list_vocabularies(NameContains=vocabulary_filter) vocabs = response["Vocabularies"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_vocabularies( NameContains=vocabulary_filter, NextToken=next_token ) vocabs += response["Vocabularies"] next_token = response.get("NextToken") "Got %s vocabularies with filter %s.", len(vocabs), vocabulary_filter ) except ClientError: logger.exception( "Couldn't list vocabularies with filter %s.", vocabulary_filter ) raise else: return vocabs
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「ListVocabularies」を参照してください。

次のコード例は、StartTranscriptionJob を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def start_job( job_name, media_uri, media_format, language_code, transcribe_client, vocabulary_name=None, ): """ Starts a transcription job. This function returns as soon as the job is started. To get the current status of the job, call get_transcription_job. The job is successfully completed when the job status is 'COMPLETED'. :param job_name: The name of the transcription job. This must be unique for your AWS account. :param media_uri: The URI where the audio file is stored. This is typically in an Amazon S3 bucket. :param media_format: The format of the audio file. For example, mp3 or wav. :param language_code: The language code of the audio file. For example, en-US or ja-JP :param transcribe_client: The Boto3 Transcribe client. :param vocabulary_name: The name of a custom vocabulary to use when transcribing the audio file. :return: Data about the job. """ try: job_args = { "TranscriptionJobName": job_name, "Media": {"MediaFileUri": media_uri}, "MediaFormat": media_format, "LanguageCode": language_code, } if vocabulary_name is not None: job_args["Settings"] = {"VocabularyName": vocabulary_name} response = transcribe_client.start_transcription_job(**job_args) job = response["TranscriptionJob"]"Started transcription job %s.", job_name) except ClientError: logger.exception("Couldn't start transcription job %s.", job_name) raise else: return job
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「StartTranscriptionJob」を参照してください。

次のコード例は、StartTranscriptionJob を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def start_job( job_name, media_uri, media_format, language_code, transcribe_client, vocabulary_name=None, ): """ Starts a transcription job. This function returns as soon as the job is started. To get the current status of the job, call get_transcription_job. The job is successfully completed when the job status is 'COMPLETED'. :param job_name: The name of the transcription job. This must be unique for your AWS account. :param media_uri: The URI where the audio file is stored. This is typically in an Amazon S3 bucket. :param media_format: The format of the audio file. For example, mp3 or wav. :param language_code: The language code of the audio file. For example, en-US or ja-JP :param transcribe_client: The Boto3 Transcribe client. :param vocabulary_name: The name of a custom vocabulary to use when transcribing the audio file. :return: Data about the job. """ try: job_args = { "TranscriptionJobName": job_name, "Media": {"MediaFileUri": media_uri}, "MediaFormat": media_format, "LanguageCode": language_code, } if vocabulary_name is not None: job_args["Settings"] = {"VocabularyName": vocabulary_name} response = transcribe_client.start_transcription_job(**job_args) job = response["TranscriptionJob"]"Started transcription job %s.", job_name) except ClientError: logger.exception("Couldn't start transcription job %s.", job_name) raise else: return job
  • API の詳細については、「AWS SDK for Python (Boto3) API リファレンス」の「StartTranscriptionJob」を参照してください。

次のコード例は、UpdateVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def update_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Updates an existing custom vocabulary. The entire vocabulary is replaced with the contents of the update. :param vocabulary_name: The name of the vocabulary to update. :param language_code: The language code of the vocabulary. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.update_vocabulary(**vocab_args)"Updated custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't update custom vocabulary %s.", vocabulary_name) raise
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「UpdateVocabulary」を参照してください。

次のコード例は、UpdateVocabulary を使用する方法を示しています。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

def update_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Updates an existing custom vocabulary. The entire vocabulary is replaced with the contents of the update. :param vocabulary_name: The name of the vocabulary to update. :param language_code: The language code of the vocabulary. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.update_vocabulary(**vocab_args)"Updated custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't update custom vocabulary %s.", vocabulary_name) raise
  • API の詳細については、AWS SDK for Python (Boto3) API リファレンスの「UpdateVocabulary」を参照してください。



  • Amazon S3 に音声ファイルをアップロードします。

  • Amazon Transcribe ジョブを実行してファイルを文字起こしし、結果を取得します。

  • カスタム語彙を作成して改良し、文字起こしの精度を向上させます。

  • カスタム語彙を使ってジョブを実行し、結果を取得します。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

ルイス キャロルによる「ジャバウォッキー」の朗読を収録した音声ファイルを文字起こしします。まず、Amazon Transcribe アクションをラップする関数を作成します。

def start_job( job_name, media_uri, media_format, language_code, transcribe_client, vocabulary_name=None, ): """ Starts a transcription job. This function returns as soon as the job is started. To get the current status of the job, call get_transcription_job. The job is successfully completed when the job status is 'COMPLETED'. :param job_name: The name of the transcription job. This must be unique for your AWS account. :param media_uri: The URI where the audio file is stored. This is typically in an Amazon S3 bucket. :param media_format: The format of the audio file. For example, mp3 or wav. :param language_code: The language code of the audio file. For example, en-US or ja-JP :param transcribe_client: The Boto3 Transcribe client. :param vocabulary_name: The name of a custom vocabulary to use when transcribing the audio file. :return: Data about the job. """ try: job_args = { "TranscriptionJobName": job_name, "Media": {"MediaFileUri": media_uri}, "MediaFormat": media_format, "LanguageCode": language_code, } if vocabulary_name is not None: job_args["Settings"] = {"VocabularyName": vocabulary_name} response = transcribe_client.start_transcription_job(**job_args) job = response["TranscriptionJob"]"Started transcription job %s.", job_name) except ClientError: logger.exception("Couldn't start transcription job %s.", job_name) raise else: return job def get_job(job_name, transcribe_client): """ Gets details about a transcription job. :param job_name: The name of the job to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: The retrieved transcription job. """ try: response = transcribe_client.get_transcription_job( TranscriptionJobName=job_name ) job = response["TranscriptionJob"]"Got job %s.", job["TranscriptionJobName"]) except ClientError: logger.exception("Couldn't get job %s.", job_name) raise else: return job def delete_job(job_name, transcribe_client): """ Deletes a transcription job. This also deletes the transcript associated with the job. :param job_name: The name of the job to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_transcription_job(TranscriptionJobName=job_name)"Deleted job %s.", job_name) except ClientError: logger.exception("Couldn't delete job %s.", job_name) raise def create_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Creates a custom vocabulary that can be used to improve the accuracy of transcription jobs. This function returns as soon as the vocabulary processing is started. Call get_vocabulary to get the current status of the vocabulary. The vocabulary is ready to use when its status is 'READY'. :param vocabulary_name: The name of the custom vocabulary. :param language_code: The language code of the vocabulary. For example, en-US or nl-NL. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. :return: Information about the newly created vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.create_vocabulary(**vocab_args)"Created custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't create custom vocabulary %s.", vocabulary_name) raise else: return response def get_vocabulary(vocabulary_name, transcribe_client): """ Gets information about a custom vocabulary. :param vocabulary_name: The name of the vocabulary to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: Information about the vocabulary. """ try: response = transcribe_client.get_vocabulary(VocabularyName=vocabulary_name)"Got vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't get vocabulary %s.", vocabulary_name) raise else: return response def update_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Updates an existing custom vocabulary. The entire vocabulary is replaced with the contents of the update. :param vocabulary_name: The name of the vocabulary to update. :param language_code: The language code of the vocabulary. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.update_vocabulary(**vocab_args)"Updated custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't update custom vocabulary %s.", vocabulary_name) raise def list_vocabularies(vocabulary_filter, transcribe_client): """ Lists the custom vocabularies created for this AWS account. :param vocabulary_filter: The returned vocabularies must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved vocabularies. """ try: response = transcribe_client.list_vocabularies(NameContains=vocabulary_filter) vocabs = response["Vocabularies"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_vocabularies( NameContains=vocabulary_filter, NextToken=next_token ) vocabs += response["Vocabularies"] next_token = response.get("NextToken") "Got %s vocabularies with filter %s.", len(vocabs), vocabulary_filter ) except ClientError: logger.exception( "Couldn't list vocabularies with filter %s.", vocabulary_filter ) raise else: return vocabs def delete_vocabulary(vocabulary_name, transcribe_client): """ Deletes a custom vocabulary. :param vocabulary_name: The name of the vocabulary to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_vocabulary(VocabularyName=vocabulary_name)"Deleted vocabulary %s.", vocabulary_name) except ClientError: logger.exception("Couldn't delete vocabulary %s.", vocabulary_name) raise


def usage_demo(): """Shows how to use the Amazon Transcribe service.""" logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") s3_resource = boto3.resource("s3") transcribe_client = boto3.client("transcribe") print("-" * 88) print("Welcome to the Amazon Transcribe demo!") print("-" * 88) bucket_name = f"jabber-bucket-{time.time_ns()}" print(f"Creating bucket {bucket_name}.") bucket = s3_resource.create_bucket( Bucket=bucket_name, CreateBucketConfiguration={ "LocationConstraint": transcribe_client.meta.region_name }, ) media_file_name = ".media/Jabberwocky.mp3" media_object_key = "Jabberwocky.mp3" print(f"Uploading media file {media_file_name}.") bucket.upload_file(media_file_name, media_object_key) media_uri = f"s3://{}/{media_object_key}" job_name_simple = f"Jabber-{time.time_ns()}" print(f"Starting transcription job {job_name_simple}.") start_job( job_name_simple, f"s3://{bucket_name}/{media_object_key}", "mp3", "en-US", transcribe_client, ) transcribe_waiter = TranscribeCompleteWaiter(transcribe_client) transcribe_waiter.wait(job_name_simple) job_simple = get_job(job_name_simple, transcribe_client) transcript_simple = requests.get( job_simple["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_simple['jobName']}:") print(transcript_simple["results"]["transcripts"][0]["transcript"]) print("-" * 88) print( "Creating a custom vocabulary that lists the nonsense words to try to " "improve the transcription." ) vocabulary_name = f"Jabber-vocabulary-{time.time_ns()}" create_vocabulary( vocabulary_name, "en-US", transcribe_client, phrases=[ "brillig", "slithy", "borogoves", "mome", "raths", "Jub-Jub", "frumious", "manxome", "Tumtum", "uffish", "whiffling", "tulgey", "thou", "frabjous", "callooh", "callay", "chortled", ], ) vocabulary_ready_waiter = VocabularyReadyWaiter(transcribe_client) vocabulary_ready_waiter.wait(vocabulary_name) job_name_vocabulary_list = f"Jabber-vocabulary-list-{time.time_ns()}" print(f"Starting transcription job {job_name_vocabulary_list}.") start_job( job_name_vocabulary_list, media_uri, "mp3", "en-US", transcribe_client, vocabulary_name, ) transcribe_waiter.wait(job_name_vocabulary_list) job_vocabulary_list = get_job(job_name_vocabulary_list, transcribe_client) transcript_vocabulary_list = requests.get( job_vocabulary_list["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_vocabulary_list['jobName']}:") print(transcript_vocabulary_list["results"]["transcripts"][0]["transcript"]) print("-" * 88) print( "Updating the custom vocabulary with table data that provides additional " "pronunciation hints." ) table_vocab_file = "jabber-vocabulary-table.txt" bucket.upload_file(table_vocab_file, table_vocab_file) update_vocabulary( vocabulary_name, "en-US", transcribe_client, table_uri=f"s3://{}/{table_vocab_file}", ) vocabulary_ready_waiter.wait(vocabulary_name) job_name_vocab_table = f"Jabber-vocab-table-{time.time_ns()}" print(f"Starting transcription job {job_name_vocab_table}.") start_job( job_name_vocab_table, media_uri, "mp3", "en-US", transcribe_client, vocabulary_name=vocabulary_name, ) transcribe_waiter.wait(job_name_vocab_table) job_vocab_table = get_job(job_name_vocab_table, transcribe_client) transcript_vocab_table = requests.get( job_vocab_table["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_vocab_table['jobName']}:") print(transcript_vocab_table["results"]["transcripts"][0]["transcript"]) print("-" * 88) print("Getting data for jobs and vocabularies.") jabber_jobs = list_jobs("Jabber", transcribe_client) print(f"Found {len(jabber_jobs)} jobs:") for job_sum in jabber_jobs: job = get_job(job_sum["TranscriptionJobName"], transcribe_client) print( f"\t{job['TranscriptionJobName']}, {job['Media']['MediaFileUri']}, " f"{job['Settings'].get('VocabularyName')}" ) jabber_vocabs = list_vocabularies("Jabber", transcribe_client) print(f"Found {len(jabber_vocabs)} vocabularies:") for vocab_sum in jabber_vocabs: vocab = get_vocabulary(vocab_sum["VocabularyName"], transcribe_client) vocab_content = requests.get(vocab["DownloadUri"]).text print(f"\t{vocab['VocabularyName']} contents:") print(vocab_content) print("-" * 88) print("Deleting demo jobs.") for job_name in [job_name_simple, job_name_vocabulary_list, job_name_vocab_table]: delete_job(job_name, transcribe_client) print("Deleting demo vocabulary.") delete_vocabulary(vocabulary_name, transcribe_client) print("Deleting demo bucket.") bucket.objects.delete() bucket.delete() print("Thanks for watching!")


  • Amazon S3 に音声ファイルをアップロードします。

  • Amazon Transcribe ジョブを実行してファイルを文字起こしし、結果を取得します。

  • カスタム語彙を作成して改良し、文字起こしの精度を向上させます。

  • カスタム語彙を使ってジョブを実行し、結果を取得します。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

ルイス キャロルによる「ジャバウォッキー」の朗読を収録した音声ファイルを文字起こしします。まず、Amazon Transcribe アクションをラップする関数を作成します。

def start_job( job_name, media_uri, media_format, language_code, transcribe_client, vocabulary_name=None, ): """ Starts a transcription job. This function returns as soon as the job is started. To get the current status of the job, call get_transcription_job. The job is successfully completed when the job status is 'COMPLETED'. :param job_name: The name of the transcription job. This must be unique for your AWS account. :param media_uri: The URI where the audio file is stored. This is typically in an Amazon S3 bucket. :param media_format: The format of the audio file. For example, mp3 or wav. :param language_code: The language code of the audio file. For example, en-US or ja-JP :param transcribe_client: The Boto3 Transcribe client. :param vocabulary_name: The name of a custom vocabulary to use when transcribing the audio file. :return: Data about the job. """ try: job_args = { "TranscriptionJobName": job_name, "Media": {"MediaFileUri": media_uri}, "MediaFormat": media_format, "LanguageCode": language_code, } if vocabulary_name is not None: job_args["Settings"] = {"VocabularyName": vocabulary_name} response = transcribe_client.start_transcription_job(**job_args) job = response["TranscriptionJob"]"Started transcription job %s.", job_name) except ClientError: logger.exception("Couldn't start transcription job %s.", job_name) raise else: return job def get_job(job_name, transcribe_client): """ Gets details about a transcription job. :param job_name: The name of the job to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: The retrieved transcription job. """ try: response = transcribe_client.get_transcription_job( TranscriptionJobName=job_name ) job = response["TranscriptionJob"]"Got job %s.", job["TranscriptionJobName"]) except ClientError: logger.exception("Couldn't get job %s.", job_name) raise else: return job def delete_job(job_name, transcribe_client): """ Deletes a transcription job. This also deletes the transcript associated with the job. :param job_name: The name of the job to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_transcription_job(TranscriptionJobName=job_name)"Deleted job %s.", job_name) except ClientError: logger.exception("Couldn't delete job %s.", job_name) raise def create_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Creates a custom vocabulary that can be used to improve the accuracy of transcription jobs. This function returns as soon as the vocabulary processing is started. Call get_vocabulary to get the current status of the vocabulary. The vocabulary is ready to use when its status is 'READY'. :param vocabulary_name: The name of the custom vocabulary. :param language_code: The language code of the vocabulary. For example, en-US or nl-NL. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. :return: Information about the newly created vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.create_vocabulary(**vocab_args)"Created custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't create custom vocabulary %s.", vocabulary_name) raise else: return response def get_vocabulary(vocabulary_name, transcribe_client): """ Gets information about a custom vocabulary. :param vocabulary_name: The name of the vocabulary to retrieve. :param transcribe_client: The Boto3 Transcribe client. :return: Information about the vocabulary. """ try: response = transcribe_client.get_vocabulary(VocabularyName=vocabulary_name)"Got vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't get vocabulary %s.", vocabulary_name) raise else: return response def update_vocabulary( vocabulary_name, language_code, transcribe_client, phrases=None, table_uri=None ): """ Updates an existing custom vocabulary. The entire vocabulary is replaced with the contents of the update. :param vocabulary_name: The name of the vocabulary to update. :param language_code: The language code of the vocabulary. :param transcribe_client: The Boto3 Transcribe client. :param phrases: A list of comma-separated phrases to include in the vocabulary. :param table_uri: A table of phrases and pronunciation hints to include in the vocabulary. """ try: vocab_args = {"VocabularyName": vocabulary_name, "LanguageCode": language_code} if phrases is not None: vocab_args["Phrases"] = phrases elif table_uri is not None: vocab_args["VocabularyFileUri"] = table_uri response = transcribe_client.update_vocabulary(**vocab_args)"Updated custom vocabulary %s.", response["VocabularyName"]) except ClientError: logger.exception("Couldn't update custom vocabulary %s.", vocabulary_name) raise def list_vocabularies(vocabulary_filter, transcribe_client): """ Lists the custom vocabularies created for this AWS account. :param vocabulary_filter: The returned vocabularies must contain this string in their names. :param transcribe_client: The Boto3 Transcribe client. :return: The list of retrieved vocabularies. """ try: response = transcribe_client.list_vocabularies(NameContains=vocabulary_filter) vocabs = response["Vocabularies"] next_token = response.get("NextToken") while next_token is not None: response = transcribe_client.list_vocabularies( NameContains=vocabulary_filter, NextToken=next_token ) vocabs += response["Vocabularies"] next_token = response.get("NextToken") "Got %s vocabularies with filter %s.", len(vocabs), vocabulary_filter ) except ClientError: logger.exception( "Couldn't list vocabularies with filter %s.", vocabulary_filter ) raise else: return vocabs def delete_vocabulary(vocabulary_name, transcribe_client): """ Deletes a custom vocabulary. :param vocabulary_name: The name of the vocabulary to delete. :param transcribe_client: The Boto3 Transcribe client. """ try: transcribe_client.delete_vocabulary(VocabularyName=vocabulary_name)"Deleted vocabulary %s.", vocabulary_name) except ClientError: logger.exception("Couldn't delete vocabulary %s.", vocabulary_name) raise


def usage_demo(): """Shows how to use the Amazon Transcribe service.""" logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") s3_resource = boto3.resource("s3") transcribe_client = boto3.client("transcribe") print("-" * 88) print("Welcome to the Amazon Transcribe demo!") print("-" * 88) bucket_name = f"jabber-bucket-{time.time_ns()}" print(f"Creating bucket {bucket_name}.") bucket = s3_resource.create_bucket( Bucket=bucket_name, CreateBucketConfiguration={ "LocationConstraint": transcribe_client.meta.region_name }, ) media_file_name = ".media/Jabberwocky.mp3" media_object_key = "Jabberwocky.mp3" print(f"Uploading media file {media_file_name}.") bucket.upload_file(media_file_name, media_object_key) media_uri = f"s3://{}/{media_object_key}" job_name_simple = f"Jabber-{time.time_ns()}" print(f"Starting transcription job {job_name_simple}.") start_job( job_name_simple, f"s3://{bucket_name}/{media_object_key}", "mp3", "en-US", transcribe_client, ) transcribe_waiter = TranscribeCompleteWaiter(transcribe_client) transcribe_waiter.wait(job_name_simple) job_simple = get_job(job_name_simple, transcribe_client) transcript_simple = requests.get( job_simple["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_simple['jobName']}:") print(transcript_simple["results"]["transcripts"][0]["transcript"]) print("-" * 88) print( "Creating a custom vocabulary that lists the nonsense words to try to " "improve the transcription." ) vocabulary_name = f"Jabber-vocabulary-{time.time_ns()}" create_vocabulary( vocabulary_name, "en-US", transcribe_client, phrases=[ "brillig", "slithy", "borogoves", "mome", "raths", "Jub-Jub", "frumious", "manxome", "Tumtum", "uffish", "whiffling", "tulgey", "thou", "frabjous", "callooh", "callay", "chortled", ], ) vocabulary_ready_waiter = VocabularyReadyWaiter(transcribe_client) vocabulary_ready_waiter.wait(vocabulary_name) job_name_vocabulary_list = f"Jabber-vocabulary-list-{time.time_ns()}" print(f"Starting transcription job {job_name_vocabulary_list}.") start_job( job_name_vocabulary_list, media_uri, "mp3", "en-US", transcribe_client, vocabulary_name, ) transcribe_waiter.wait(job_name_vocabulary_list) job_vocabulary_list = get_job(job_name_vocabulary_list, transcribe_client) transcript_vocabulary_list = requests.get( job_vocabulary_list["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_vocabulary_list['jobName']}:") print(transcript_vocabulary_list["results"]["transcripts"][0]["transcript"]) print("-" * 88) print( "Updating the custom vocabulary with table data that provides additional " "pronunciation hints." ) table_vocab_file = "jabber-vocabulary-table.txt" bucket.upload_file(table_vocab_file, table_vocab_file) update_vocabulary( vocabulary_name, "en-US", transcribe_client, table_uri=f"s3://{}/{table_vocab_file}", ) vocabulary_ready_waiter.wait(vocabulary_name) job_name_vocab_table = f"Jabber-vocab-table-{time.time_ns()}" print(f"Starting transcription job {job_name_vocab_table}.") start_job( job_name_vocab_table, media_uri, "mp3", "en-US", transcribe_client, vocabulary_name=vocabulary_name, ) transcribe_waiter.wait(job_name_vocab_table) job_vocab_table = get_job(job_name_vocab_table, transcribe_client) transcript_vocab_table = requests.get( job_vocab_table["Transcript"]["TranscriptFileUri"] ).json() print(f"Transcript for job {transcript_vocab_table['jobName']}:") print(transcript_vocab_table["results"]["transcripts"][0]["transcript"]) print("-" * 88) print("Getting data for jobs and vocabularies.") jabber_jobs = list_jobs("Jabber", transcribe_client) print(f"Found {len(jabber_jobs)} jobs:") for job_sum in jabber_jobs: job = get_job(job_sum["TranscriptionJobName"], transcribe_client) print( f"\t{job['TranscriptionJobName']}, {job['Media']['MediaFileUri']}, " f"{job['Settings'].get('VocabularyName')}" ) jabber_vocabs = list_vocabularies("Jabber", transcribe_client) print(f"Found {len(jabber_vocabs)} vocabularies:") for vocab_sum in jabber_vocabs: vocab = get_vocabulary(vocab_sum["VocabularyName"], transcribe_client) vocab_content = requests.get(vocab["DownloadUri"]).text print(f"\t{vocab['VocabularyName']} contents:") print(vocab_content) print("-" * 88) print("Deleting demo jobs.") for job_name in [job_name_simple, job_name_vocabulary_list, job_name_vocab_table]: delete_job(job_name, transcribe_client) print("Deleting demo vocabulary.") delete_vocabulary(vocabulary_name, transcribe_client) print("Deleting demo bucket.") bucket.objects.delete() bucket.delete() print("Thanks for watching!")


  • Amazon Transcribe で文字起こしジョブを開始します。

  • ジョブが完了するまで待ちます。

  • 書き起こしが保存されている URI を取得します。

詳細については、「Amazon Transcribe の開始方法」を参照してください。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

import time import boto3 def transcribe_file(job_name, file_uri, transcribe_client): transcribe_client.start_transcription_job( TranscriptionJobName=job_name, Media={"MediaFileUri": file_uri}, MediaFormat="wav", LanguageCode="en-US", ) max_tries = 60 while max_tries > 0: max_tries -= 1 job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name) job_status = job["TranscriptionJob"]["TranscriptionJobStatus"] if job_status in ["COMPLETED", "FAILED"]: print(f"Job {job_name} is {job_status}.") if job_status == "COMPLETED": print( f"Download the transcript from\n" f"\t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}." ) break else: print(f"Waiting for {job_name}. Current status is {job_status}.") time.sleep(10) def main(): transcribe_client = boto3.client("transcribe") file_uri = "s3://test-transcribe/answer2.wav" transcribe_file("Example-job", file_uri, transcribe_client) if __name__ == "__main__": main()


  • Amazon Transcribe で文字起こしジョブを開始します。

  • ジョブが完了するまで待ちます。

  • 書き起こしが保存されている URI を取得します。

詳細については、「Amazon Transcribe の開始方法」を参照してください。

SDK for Python (Boto3)

GitHub には、その他のリソースもあります。用例一覧を検索し、AWS コード例リポジトリでの設定と実行の方法を確認してください。

import time import boto3 def transcribe_file(job_name, file_uri, transcribe_client): transcribe_client.start_transcription_job( TranscriptionJobName=job_name, Media={"MediaFileUri": file_uri}, MediaFormat="wav", LanguageCode="en-US", ) max_tries = 60 while max_tries > 0: max_tries -= 1 job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name) job_status = job["TranscriptionJob"]["TranscriptionJobStatus"] if job_status in ["COMPLETED", "FAILED"]: print(f"Job {job_name} is {job_status}.") if job_status == "COMPLETED": print( f"Download the transcript from\n" f"\t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}." ) break else: print(f"Waiting for {job_name}. Current status is {job_status}.") time.sleep(10) def main(): transcribe_client = boto3.client("transcribe") file_uri = "s3://test-transcribe/answer2.wav" transcribe_file("Example-job", file_uri, transcribe_client) if __name__ == "__main__": main()
プライバシーサイト規約Cookie の設定
© 2025, Amazon Web Services, Inc. or its affiliates.All rights reserved.