Mendeteksi teks dalam video yang tersimpan

Pendeteksi teks Amazon Rekognition Video dalam video yang disimpan adalah operasi tidak sinkron. Untuk mulai mendeteksi teks, hubungi StartTextDetection. Amazon Rekognition Video menerbitkan status penyelesaian analisis video ke topik Amazon SNS. Jika analisis video berhasil, hubungi GetTextDetectionuntuk mendapatkan hasil analisis. Untuk informasi selengkapnya tentang memulai analisis video dan mendapatkan hasilnya, lihat Memanggil operasi Amazon Rekognition Video.

Prosedur ini memperluas kode di Menganalisis video yang disimpan dalam bucket Amazon S3 dengan Java atau Python () SDK. Menggunakan antrean Amazon SQS untuk mendapatkan status penyelesaian permintaan analisis video.

Untuk mendeteksi teks dalam video yang disimpan di bucket Amazon S3 (SDK)

Lakukan langkah-langkah pada Menganalisis video yang disimpan dalam bucket Amazon S3 dengan Java atau Python () SDK.

Tambahkan kode berikut ke kelas VideoDetect pada langkah 1.

Java


//Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)


private static void StartTextDetection(String bucket, String video) throws Exception{
           
    NotificationChannel channel= new NotificationChannel()
            .withSNSTopicArn(snsTopicArn)
            .withRoleArn(roleArn);
    
    StartTextDetectionRequest req = new StartTextDetectionRequest()
            .withVideo(new Video()
                    .withS3Object(new S3Object()
                        .withBucket(bucket)
                        .withName(video)))
            .withNotificationChannel(channel);
    
    
    StartTextDetectionResult startTextDetectionResult = rek.startTextDetection(req);
    startJobId=startTextDetectionResult.getJobId();
    
} 

private static void GetTextDetectionResults() throws Exception{
    
    int maxResults=10;
    String paginationToken=null;
    GetTextDetectionResult textDetectionResult=null;
    
    do{
        if (textDetectionResult !=null){
            paginationToken = textDetectionResult.getNextToken();

        }
        
    
        textDetectionResult = rek.getTextDetection(new GetTextDetectionRequest()
             .withJobId(startJobId)
             .withNextToken(paginationToken)
             .withMaxResults(maxResults));
    
        VideoMetadata videoMetaData=textDetectionResult.getVideoMetadata();
            
        System.out.println("Format: " + videoMetaData.getFormat());
        System.out.println("Codec: " + videoMetaData.getCodec());
        System.out.println("Duration: " + videoMetaData.getDurationMillis());
        System.out.println("FrameRate: " + videoMetaData.getFrameRate());
            
            
        //Show text, confidence values
        List<TextDetectionResult> textDetections = textDetectionResult.getTextDetections();


        for (TextDetectionResult text: textDetections) {
            long seconds=text.getTimestamp()/1000;
            System.out.println("Sec: " + Long.toString(seconds) + " ");
            TextDetection detectedText=text.getTextDetection();
            
            System.out.println("Text Detected: " + detectedText.getDetectedText());
                System.out.println("Confidence: " + detectedText.getConfidence().toString());
                System.out.println("Id : " + detectedText.getId());
                System.out.println("Parent Id: " + detectedText.getParentId());
                System.out.println("Bounding Box" + detectedText.getGeometry().getBoundingBox().toString());
                System.out.println("Type: " + detectedText.getType());
                System.out.println();
        }
    } while (textDetectionResult !=null && textDetectionResult.getNextToken() != null);
      
        
}

Dalam fungsi main, ganti baris:


        StartLabelDetection(bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetLabelDetectionResults();

dengan:


        StartTextDetection(bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetTextDetectionResults();

Java V2

Kode ini diambil dari GitHub repositori contoh SDK AWS Dokumentasi. Lihat contoh lengkapnya di sini.


//snippet-start:[rekognition.java2.recognize_video_text.import]
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.S3Object;
import software.amazon.awssdk.services.rekognition.model.NotificationChannel;
import software.amazon.awssdk.services.rekognition.model.Video;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.VideoMetadata;
import software.amazon.awssdk.services.rekognition.model.TextDetectionResult;
import java.util.List;
//snippet-end:[rekognition.java2.recognize_video_text.import]

/**
* Before running this Java V2 code example, set up your development environment, including your credentials.
*
* For more information, see the following documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class DetectTextVideo {

 private static String startJobId ="";
 public static void main(String[] args) {

     final String usage = "\n" +
         "Usage: " +
         "   <bucket> <video> <topicArn> <roleArn>\n\n" +
         "Where:\n" +
         "   bucket - The name of the bucket in which the video is located (for example, (for example, myBucket). \n\n"+
         "   video - The name of video (for example, people.mp4). \n\n" +
         "   topicArn - The ARN of the Amazon Simple Notification Service (Amazon SNS) topic. \n\n" +
         "   roleArn - The ARN of the AWS Identity and Access Management (IAM) role to use. \n\n" ;

     if (args.length != 4) {
         System.out.println(usage);
         System.exit(1);
     }

     String bucket = args[0];
     String video = args[1];
     String topicArn = args[2];
     String roleArn = args[3];

     Region region = Region.US_EAST_1;
     RekognitionClient rekClient = RekognitionClient.builder()
         .region(region)
         .credentialsProvider(ProfileCredentialsProvider.create("profile-name"))
         .build();

     NotificationChannel channel = NotificationChannel.builder()
         .snsTopicArn(topicArn)
         .roleArn(roleArn)
         .build();

     startTextLabels(rekClient, channel, bucket, video);
     GetTextResults(rekClient);
     System.out.println("This example is done!");
     rekClient.close();
 }

 // snippet-start:[rekognition.java2.recognize_video_text.main]
 public static void startTextLabels(RekognitionClient rekClient,
                                NotificationChannel channel,
                                String bucket,
                                String video) {
     try {
         S3Object s3Obj = S3Object.builder()
             .bucket(bucket)
             .name(video)
             .build();

         Video vidOb = Video.builder()
             .s3Object(s3Obj)
             .build();

         StartTextDetectionRequest labelDetectionRequest = StartTextDetectionRequest.builder()
             .jobTag("DetectingLabels")
             .notificationChannel(channel)
             .video(vidOb)
             .build();

         StartTextDetectionResponse labelDetectionResponse = rekClient.startTextDetection(labelDetectionRequest);
         startJobId = labelDetectionResponse.jobId();

     } catch (RekognitionException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }

 public static void GetTextResults(RekognitionClient rekClient) {

     try {
         String paginationToken=null;
         GetTextDetectionResponse textDetectionResponse=null;
         boolean finished = false;
         String status;
         int yy=0 ;

         do{
             if (textDetectionResponse !=null)
                 paginationToken = textDetectionResponse.nextToken();

             GetTextDetectionRequest recognitionRequest = GetTextDetectionRequest.builder()
                 .jobId(startJobId)
                 .nextToken(paginationToken)
                 .maxResults(10)
                 .build();

             // Wait until the job succeeds.
             while (!finished) {
                 textDetectionResponse = rekClient.getTextDetection(recognitionRequest);
                 status = textDetectionResponse.jobStatusAsString();

                 if (status.compareTo("SUCCEEDED") == 0)
                     finished = true;
                 else {
                     System.out.println(yy + " status is: " + status);
                     Thread.sleep(1000);
                 }
                 yy++;
             }

             finished = false;

             // Proceed when the job is done - otherwise VideoMetadata is null.
             VideoMetadata videoMetaData=textDetectionResponse.videoMetadata();
             System.out.println("Format: " + videoMetaData.format());
             System.out.println("Codec: " + videoMetaData.codec());
             System.out.println("Duration: " + videoMetaData.durationMillis());
             System.out.println("FrameRate: " + videoMetaData.frameRate());
             System.out.println("Job");

             List<TextDetectionResult> labels= textDetectionResponse.textDetections();
             for (TextDetectionResult detectedText: labels) {
                 System.out.println("Confidence: " + detectedText.textDetection().confidence().toString());
                 System.out.println("Id : " + detectedText.textDetection().id());
                 System.out.println("Parent Id: " + detectedText.textDetection().parentId());
                 System.out.println("Type: " + detectedText.textDetection().type());
                 System.out.println("Text: " + detectedText.textDetection().detectedText());
                 System.out.println();
             }

         } while (textDetectionResponse !=null && textDetectionResponse.nextToken() != null);

     } catch(RekognitionException | InterruptedException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }
 // snippet-end:[rekognition.java2.recognize_video_text.main]
}

Python


#Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

    def StartTextDetection(self):
        response=self.rek.start_text_detection(Video={'S3Object': {'Bucket': self.bucket, 'Name': self.video}},
            NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})

        self.startJobId=response['JobId']
        print('Start Job Id: ' + self.startJobId)
  
    def GetTextDetectionResults(self):
        maxResults = 10
        paginationToken = ''
        finished = False

        while finished == False:
            response = self.rek.get_text_detection(JobId=self.startJobId,
                                            MaxResults=maxResults,
                                            NextToken=paginationToken)

            print('Codec: ' + response['VideoMetadata']['Codec'])
            
            print('Duration: ' + str(response['VideoMetadata']['DurationMillis']))
            print('Format: ' + response['VideoMetadata']['Format'])
            print('Frame rate: ' + str(response['VideoMetadata']['FrameRate']))
            print()

            for textDetection in response['TextDetections']:
                text=textDetection['TextDetection']

                print("Timestamp: " + str(textDetection['Timestamp']))
                print("   Text Detected: " + text['DetectedText'])
                print("   Confidence: " +  str(text['Confidence']))
                print ("      Bounding box")
                print ("        Top: " + str(text['Geometry']['BoundingBox']['Top']))
                print ("        Left: " + str(text['Geometry']['BoundingBox']['Left']))
                print ("        Width: " +  str(text['Geometry']['BoundingBox']['Width']))
                print ("        Height: " +  str(text['Geometry']['BoundingBox']['Height']))
                print ("   Type: " + str(text['Type']) )
                print()

            if 'NextToken' in response:
                paginationToken = response['NextToken']
            else:
                finished = True

Dalam fungsi main, ganti baris:


    analyzer.StartLabelDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetLabelDetectionResults()

dengan:


    analyzer.StartTextDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetTextDetectionResults()

CLI

Jalankan AWS CLI perintah berikut untuk mulai mendeteksi teks dalam video.


 aws rekognition start-text-detection --video "{"S3Object":{"Bucket":"bucket-name","Name":"video-name"}}"\
 --notification-channel "{"SNSTopicArn":"topic-arn","RoleArn":"role-arn"}" \
 --region region-name --profile profile-name

Perbarui nilai berikut:

Ubah bucket-name dan video-name ke nama bucket Amazon S3 dan nama file yang Anda tentukan pada langkah 2.
Ubah region-name ke wilayah AWS yang Anda gunakan.
Ganti nilai profile-name dengan nama profil pengembang Anda.
Ubah topic-ARN ke ARN dari topik Amazon SNS yang Anda buat pada langkah 3 Mengonfigurasi Amazon Rekognition Video.
Perubahan role-ARN ke ARN dari peran layanan IAM yang Anda buat di langkah 7 Mengonfigurasi Amazon Rekognition Video.

Jika Anda mengakses CLI pada perangkat Windows, gunakan tanda kutip ganda alih-alih tanda kutip tunggal dan hindari tanda kutip ganda bagian dalam dengan garis miring terbalik (yaitu\) untuk mengatasi kesalahan parser yang mungkin Anda temui. Sebagai contoh, lihat di bawah:


aws rekognition start-text-detection --video \
 "{\"S3Object\":{\"Bucket\":\"bucket-name\",\"Name\":\"video-name\"}}" \
 --notification-channel "{\"SNSTopicArn\":\"topic-arn\",\"RoleArn\":\"role-arn\"}" \
 --region region-name --profile profile-name

Setelah menjalankan contoh kode prosiding, salin yang dikembalikan jobID dan berikan ke GetTextDetection perintah berikut di bawah ini untuk mendapatkan hasil Anda, ganti job-id-number dengan yang jobID Anda terima sebelumnya:


aws rekognition get-text-detection --job-id job-id-number --profile profile-name

catatan

Jika Anda sudah menjalankan contoh video selain Menganalisis video yang disimpan dalam bucket Amazon S3 dengan Java atau Python () SDK, kode yang akan diganti mungkin berbeda.

Jalankan kode tersebut. Teks yang terdeteksi dalam video ditampilkan dalam daftar.

Filter

Filter adalah parameter permintaan opsional yang dapat digunakan ketika Anda memanggil StartTextDetection. Pemfilteran berdasarkan wilayah teks, ukuran, dan skor kepercayaan memberikan Anda fleksibilitas tambahan untuk mengendalikan output deteksi teks Anda. Dengan menggunakan wilayah yang diminati, Anda dengan mudah membatasi deteksi teks ke wilayah yang relevan, misalnya, wilayah ketiga terbawah untuk grafik atau sudut kiri atas untuk membaca papan skor dalam pertandingan sepak bola. Filter ukuran kotak pembatas kata dapat digunakan untuk menghindari teks latar belakang kecil yang mungkin ramai atau tidak relevan. Dan terakhir, filter kepercayaan kata memungkinkan Anda untuk menghapus hasil yang mungkin tidak dapat diandalkan karena buram atau tercoreng.

Untuk informasi mengenai nilai filter, lihatDetectTextFilters.

Anda dapat menggunakan filter berikut:

MinConfidence—Menetapkan tingkat kepercayaan deteksi kata. Kata-kata dengan kepercayaan deteksi di bawah tingkat ini dikeluarkan dari hasil. Nilai harus antara 0 hingga 100.
MinBoundingBoxWidth— Mengatur lebar minimum kotak pembatas kata. Kata-kata dengan kotak pembatas yang lebih kecil dari nilai ini dikeluarkan dari hasil. Nilainya relatif terhadap lebar bingkai video.
MinBoundingBoxHeight— Mengatur tinggi minimum kotak pembatas kata. Kata-kata dengan tinggi kotak pembatas kurang dari nilai ini dikeluarkan dari hasil. Nilai relatif terhadap tinggi bingkai video.
RegionsOfInterest— Membatasi deteksi ke wilayah tertentu dari bingkai. Nilainya relatif terhadap dimensi bingkai. Untuk objek yang hanya sebagian di dalam wilayah, responsnya tidak ditentukan.

GetTextDetection respon

GetTextDetection mengembalikan sebuah array (TextDetectionResults) yang berisi informasi tentang teks yang terdeteksi dalam video. Sebuah elemen array, TextDetection, ada untuk setiap kali kata atau baris terdeteksi dalam video. Elemen array diurutkan berdasarkan waktu (dalam milidetik) sejak awal video.

Berikut ini adalah respons JSON parsial dari GetTextDetection. Dalam respons, perhatikan hal berikut:

Informasi teks - Elemen TextDetectionResult array berisi informasi tentang teks yang terdeteksi (TextDetection) dan waktu teks terdeteksi dalam video (Timestamp).
Informasi halaman – Contoh tersebut menunjukkan satu halaman informasi deteksi teks. Anda dapat menentukan seberapa banyak elemen teks yang akan dikembalikan dalam parameter input MaxResults untuk GetTextDetection. Jika ada lebih banyak hasil daripada MaxResults, atau ada lebih banyak hasil daripada maksimum default, GetTextDetection mengembalikan token (NextToken) yang digunakan untuk mendapatkan halaman hasil berikutnya. Untuk informasi selengkapnya, lihat Mendapatkan hasil analisis Amazon Rekognition Video.
Informasi video – Respons mencakup informasi tentang format video (VideoMetadata) di setiap halaman informasi yang dikembalikan oleh GetTextDetection.



{
    "JobStatus": "SUCCEEDED",
    "VideoMetadata": {
        "Codec": "h264",
        "DurationMillis": 174441,
        "Format": "QuickTime / MOV",
        "FrameRate": 29.970029830932617,
        "FrameHeight": 480,
        "FrameWidth": 854
    },
    "TextDetections": [
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle Twinkle Little Star",
                "Type": "LINE",
                "Id": 0,
                "Confidence": 99.91780090332031,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.8337579369544983,
                        "Height": 0.08365312218666077,
                        "Left": 0.08313830941915512,
                        "Top": 0.4663468301296234
                    },
                    "Polygon": [
                        {
                            "X": 0.08313830941915512,
                            "Y": 0.4663468301296234
                        },
                        {
                            "X": 0.9168962240219116,
                            "Y": 0.4674469828605652
                        },
                        {
                            "X": 0.916861355304718,
                            "Y": 0.5511001348495483
                        },
                        {
                            "X": 0.08310343325138092,
                            "Y": 0.5499999523162842
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 1,
                "ParentId": 0,
                "Confidence": 99.98338317871094,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.0833333358168602,
                        "Left": 0.08313817530870438,
                        "Top": 0.46666666865348816
                    },
                    "Polygon": [
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 2,
                "ParentId": 0,
                "Confidence": 99.982666015625,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.08124999701976776,
                        "Left": 0.3454332649707794,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Little",
                "Type": "WORD",
                "Id": 3,
                "ParentId": 0,
                "Confidence": 99.8787612915039,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.16627635061740875,
                        "Height": 0.08124999701976776,
                        "Left": 0.6053864359855652,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Star",
                "Type": "WORD",
                "Id": 4,
                "ParentId": 0,
                "Confidence": 99.82640075683594,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.12997658550739288,
                        "Height": 0.08124999701976776,
                        "Left": 0.7868852615356445,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        }
    ],
    "NextToken": "NiHpGbZFnkM/S8kLcukMni15wb05iKtquu/Mwc+Qg1LVlMjjKNOD0Z0GusSPg7TONLe+OZ3P",
    "TextModelVersion": "3.0"
}

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

Mendeteksi teks dalam sebuah citra

Mendeteksi segmen video