유해 언어 감지 사용

PDF

RSS

포커스 모드

유해 언어 감지 사용 - Amazon Transcribe

배치 트랜스크립션에 유해 언어 감지 기능 사용 출력 예시

배치 트랜스크립션에 유해 언어 감지 기능 사용

배치 트랜스크립션에서 유해 언어 감지를 사용하려면 다음 예를 참조하십시오.

AWS Management Console에 로그인합니다.
탐색 창에서 트랜스크립션 작업을 선택한 다음 작업 생성(오른쪽 상단)을 선택합니다. 그러면 작업 세부 정보 지정 페이지가 열립니다.
작업 세부 정보 지정 페이지에서 원하는 경우 PII 삭제를 활성화할 수도 있습니다. 나열된 다른 옵션은 유해성 감지에서 지원되지 않습니다. 다음을 선택합니다. 그러면 작업 구성 - 선택 사항 페이지로 이동합니다. 오디오 설정 패널에서 유해성 감지를 선택합니다.
작업 생성을 선택하여 트랜스크립션 작업을 실행합니다.
트랜스크립션 작업이 완료되면 트랜스크립션 작업 세부 정보 페이지의 다운로드 드롭다운 메뉴에서 트랜스크립션을 다운로드할 수 있습니다.

AWS Management Console

AWS Management Console에 로그인합니다.
탐색 창에서 트랜스크립션 작업을 선택한 다음 작업 생성(오른쪽 상단)을 선택합니다. 그러면 작업 세부 정보 지정 페이지가 열립니다.
작업 세부 정보 지정 페이지에서 원하는 경우 PII 삭제를 활성화할 수도 있습니다. 나열된 다른 옵션은 유해성 감지에서 지원되지 않습니다. 다음을 선택합니다. 그러면 작업 구성 - 선택 사항 페이지로 이동합니다. 오디오 설정 패널에서 유해성 감지를 선택합니다.
작업 생성을 선택하여 트랜스크립션 작업을 실행합니다.
트랜스크립션 작업이 완료되면 트랜스크립션 작업 세부 정보 페이지의 다운로드 드롭다운 메뉴에서 트랜스크립션을 다운로드할 수 있습니다.

이 예시에서는 start-transcription-job 명령 및 ToxicityDetection 파라미터를 사용합니다. 자세한 내용은 StartTranscriptionJob 및 ToxicityDetection 단원을 참조하세요.



aws transcribe start-transcription-job \
--region us-west-2 \
--transcription-job-name my-first-transcription-job \
--media MediaFileUri=s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac \
--output-bucket-name amzn-s3-demo-bucket \
--output-key my-output-files/ \
--language-code en-US \
--toxicity-detection ToxicityCategories=ALL

다음은 start-transcription-job 명령을 사용하는 또 다른 예 및 유해성 감지를 포함하는 요청 본문입니다.



aws transcribe start-transcription-job \
--region us-west-2 \
--cli-input-json file://filepath/my-first-toxicity-job.json

my-first-toxicity-job.json 파일에는 다음과 같은 요청 본문이 포함되어 있습니다.



{
  "TranscriptionJobName": "my-first-transcription-job",
  "Media": {
        "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
  },
  "OutputBucketName": "amzn-s3-demo-bucket",
  "OutputKey": "my-output-files/", 
  "LanguageCode": "en-US",
  "ToxicityDetection": [ 
      { 
         "ToxicityCategories": [ "ALL" ]
      }
   ]
}

AWS CLI



aws transcribe start-transcription-job \
--region us-west-2 \
--transcription-job-name my-first-transcription-job \
--media MediaFileUri=s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac \
--output-bucket-name amzn-s3-demo-bucket \
--output-key my-output-files/ \
--language-code en-US \
--toxicity-detection ToxicityCategories=ALL

다음은 start-transcription-job 명령을 사용하는 또 다른 예 및 유해성 감지를 포함하는 요청 본문입니다.



aws transcribe start-transcription-job \
--region us-west-2 \
--cli-input-json file://filepath/my-first-toxicity-job.json

my-first-toxicity-job.json 파일에는 다음과 같은 요청 본문이 포함되어 있습니다.



{
  "TranscriptionJobName": "my-first-transcription-job",
  "Media": {
        "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
  },
  "OutputBucketName": "amzn-s3-demo-bucket",
  "OutputKey": "my-output-files/", 
  "LanguageCode": "en-US",
  "ToxicityDetection": [ 
      { 
         "ToxicityCategories": [ "ALL" ]
      }
   ]
}

이 예제에서는 AWS SDK for Python (Boto3) 를 사용하여 start_transcription_job 메서드에 ToxicityDetection 대해를 활성화합니다. 자세한 내용은 StartTranscriptionJob 및 ToxicityDetection 단원을 참조하세요.

기능별, 시나리오 및 교차 서비스 예제 AWS SDKs를 사용하는 추가 예제는 AWS SDKs를 사용한 Amazon Transcribe의 코드 예제장을 참조하세요.



from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = {
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US', 
    ToxicityDetection = [ 
        { 
            'ToxicityCategories': ['ALL']
        }
    ]
)

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)

AWS SDK for Python (Boto3)

기능별, 시나리오 및 교차 서비스 예제 AWS SDKs를 사용하는 추가 예제는 AWS SDKs를 사용한 Amazon Transcribe의 코드 예제장을 참조하세요.



from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = {
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US', 
    ToxicityDetection = [ 
        { 
            'ToxicityCategories': ['ALL']
        }
    ]
)

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)

출력 예시

트랜스크립션 출력에서 유해 언어를 태깅하고 분류합니다. 유해 언어의 각 사례를 분류하고 신뢰도 점수(0에서 1 사이의 값)를 할당합니다. 신뢰도 값이 높을수록 콘텐츠가 지정된 범주 내에서 유해 언어일 가능성이 커집니다.

다음은 분류된 유해 언어와 관련 신뢰도 점수를 보여주는 JSON 형식의 출력 예시입니다.



{
    "jobName": "my-toxicity-job",
    "accountId": "111122223333",
    "results": {
        "transcripts": [...],
        "items":[...],
        "toxicity_detection": [
            {
                "text": "What the * are you doing man? That's why I didn't want to play with your * .  man it was a no, no I'm not calming down * man. I well I spent I spent too much * money on this game.",
                "toxicity": 0.7638,
                "categories": {
                    "profanity": 0.9913,
                    "hate_speech": 0.0382,
                    "sexual": 0.0016,
                    "insult": 0.6572,
                    "violence_or_threat": 0.0024,
                    "graphic": 0.0013,
                    "harassment_or_abuse": 0.0249
                },
                "start_time": 8.92,
                "end_time": 21.45
            },
            Items removed for brevity
            {
                "text": "What? Who? What the * did you just say to me? What's your address? What is your * address? I will pull up right now on your * * man. Take your * back to , tired of this **.",
                "toxicity": 0.9816,
                "categories": {
                    "profanity": 0.9865,
                    "hate_speech": 0.9123,
                    "sexual": 0.0037,
                    "insult": 0.5447,
                    "violence_or_threat": 0.5078,
                    "graphic": 0.0037,
                    "harassment_or_abuse": 0.0613
                },
                "start_time": 43.459,
                "end_time": 54.639
            },
        ]
    },
    ...
    "status": "COMPLETED"
}

출력 예(JSON)

다음은 분류된 유해 언어와 관련 신뢰도 점수를 보여주는 JSON 형식의 출력 예시입니다.



{
    "jobName": "my-toxicity-job",
    "accountId": "111122223333",
    "results": {
        "transcripts": [...],
        "items":[...],
        "toxicity_detection": [
            {
                "text": "What the * are you doing man? That's why I didn't want to play with your * .  man it was a no, no I'm not calming down * man. I well I spent I spent too much * money on this game.",
                "toxicity": 0.7638,
                "categories": {
                    "profanity": 0.9913,
                    "hate_speech": 0.0382,
                    "sexual": 0.0016,
                    "insult": 0.6572,
                    "violence_or_threat": 0.0024,
                    "graphic": 0.0013,
                    "harassment_or_abuse": 0.0249
                },
                "start_time": 8.92,
                "end_time": 21.45
            },
            Items removed for brevity
            {
                "text": "What? Who? What the * did you just say to me? What's your address? What is your * address? I will pull up right now on your * * man. Take your * back to , tired of this **.",
                "toxicity": 0.9816,
                "categories": {
                    "profanity": 0.9865,
                    "hate_speech": 0.9123,
                    "sexual": 0.0037,
                    "insult": 0.5447,
                    "violence_or_threat": 0.5078,
                    "graphic": 0.0037,
                    "harassment_or_abuse": 0.0613
                },
                "start_time": 43.459,
                "end_time": 54.639
            },
        ]
    },
    ...
    "status": "COMPLETED"
}