文件 AWS 開發套件範例 GitHub 儲存庫中有更多可用的 [AWS SDK 範例](https://github.com/awsdocs/aws-doc-sdk-examples)。本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # AWS Glue 使用適用於 Kotlin 的 SDK 的範例下列程式碼範例示範如何使用適用於 Kotlin 的 AWS SDK 搭配來執行動作和實作常見案例 AWS Glue。 *基本概念*是程式碼範例，這些範例說明如何在服務內執行基本操作。 *Actions* 是大型程式的程式碼摘錄，必須在內容中執行。雖然動作會告訴您如何呼叫個別服務函數，但您可以在其相關情境中查看內容中的動作。每個範例均包含完整原始碼的連結，您可在連結中找到如何設定和執行內容中程式碼的相關指示。 **Topics** + [基本概念](#basics) + [動作](#actions) ## 基本概念 ### 了解基本概念以下程式碼範例顯示做法： + 建立網路爬取公有 Amazon S3 儲存貯體的爬蟲程式，以及產生 CSV 格式中繼資料的資料庫。 + 列出中資料庫和資料表的相關資訊 AWS Glue Data Catalog。 + 建立從 S3 儲存貯體中擷取 CSV 資料的任務、轉換資料，以及將 JSON 格式的輸出載入至另一個 S3 儲存貯體。 + 列出任務執行的相關資訊、檢視已轉換的資料以及清除資源。如需詳細資訊，請參閱[教學課程： AWS Glue Studio 入門](https://docs.aws.amazon.com/glue/latest/ug/tutorial-create-job.html)。 **適用於 Kotlin 的 SDK** GitHub 上提供更多範例。尋找完整範例，並了解如何在 [AWS 程式碼範例儲存庫](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/kotlin/services/glue#code-examples)中設定和執行。 ``` suspend fun main(args: Array) { val usage = """ Usage: Where: iam - The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that has AWS Glue and Amazon Simple Storage Service (Amazon S3) permissions. s3Path - The Amazon Simple Storage Service (Amazon S3) target that contains data (for example, CSV data). cron - A cron expression used to specify the schedule (for example, cron(15 12 * * ? *). dbName - The database name. crawlerName - The name of the crawler. jobName - The name you assign to this job definition. scriptLocation - Specifies the Amazon S3 path to a script that runs a job. locationUri - Specifies the location of the database """ if (args.size != 8) { println(usage) exitProcess(1) } val iam = args[0] val s3Path = args[1] val cron = args[2] val dbName = args[3] val crawlerName = args[4] val jobName = args[5] val scriptLocation = args[6] val locationUri = args[7] println("About to start the AWS Glue Scenario") createDatabase(dbName, locationUri) createCrawler(iam, s3Path, cron, dbName, crawlerName) getCrawler(crawlerName) startCrawler(crawlerName) getDatabase(dbName) getGlueTables(dbName) createJob(jobName, iam, scriptLocation) startJob(jobName) getJobs() getJobRuns(jobName) deleteJob(jobName) println("*** Wait for 5 MIN so the $crawlerName is ready to be deleted") TimeUnit.MINUTES.sleep(5) deleteMyDatabase(dbName) deleteCrawler(crawlerName) } suspend fun createDatabase( dbName: String?, locationUriVal: String?, ) { val input = DatabaseInput { description = "Built with the AWS SDK for Kotlin" name = dbName locationUri = locationUriVal } val request = CreateDatabaseRequest { databaseInput = input } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.createDatabase(request) println("The database was successfully created") } } suspend fun createCrawler( iam: String?, s3Path: String?, cron: String?, dbName: String?, crawlerName: String, ) { val s3Target = S3Target { path = s3Path } val targetList = ArrayList() targetList.add(s3Target) val targetOb = CrawlerTargets { s3Targets = targetList } val crawlerRequest = CreateCrawlerRequest { databaseName = dbName name = crawlerName description = "Created by the AWS Glue Java API" targets = targetOb role = iam schedule = cron } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.createCrawler(crawlerRequest) println("$crawlerName was successfully created") } } suspend fun getCrawler(crawlerName: String?) { val request = GetCrawlerRequest { name = crawlerName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getCrawler(request) val role = response.crawler?.role println("The role associated with this crawler is $role") } } suspend fun startCrawler(crawlerName: String) { val crawlerRequest = StartCrawlerRequest { name = crawlerName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.startCrawler(crawlerRequest) println("$crawlerName was successfully started.") } } suspend fun getDatabase(databaseName: String?) { val request = GetDatabaseRequest { name = databaseName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getDatabase(request) val dbDesc = response.database?.description println("The database description is $dbDesc") } } suspend fun getGlueTables(dbName: String?) { val tableRequest = GetTablesRequest { databaseName = dbName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getTables(tableRequest) response.tableList?.forEach { tableName -> println("Table name is ${tableName.name}") } } } suspend fun startJob(jobNameVal: String?) { val runRequest = StartJobRunRequest { workerType = WorkerType.G1X numberOfWorkers = 10 jobName = jobNameVal } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.startJobRun(runRequest) println("The job run Id is ${response.jobRunId}") } } suspend fun createJob( jobName: String, iam: String?, scriptLocationVal: String?, ) { val commandOb = JobCommand { pythonVersion = "3" name = "MyJob1" scriptLocation = scriptLocationVal } val jobRequest = CreateJobRequest { description = "A Job created by using the AWS SDK for Java V2" glueVersion = "2.0" workerType = WorkerType.G1X numberOfWorkers = 10 name = jobName role = iam command = commandOb } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.createJob(jobRequest) println("$jobName was successfully created.") } } suspend fun getJobs() { val request = GetJobsRequest { maxResults = 10 } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getJobs(request) response.jobs?.forEach { job -> println("Job name is ${job.name}") } } } suspend fun getJobRuns(jobNameVal: String?) { val request = GetJobRunsRequest { jobName = jobNameVal } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getJobRuns(request) response.jobRuns?.forEach { job -> println("Job name is ${job.jobName}") } } } suspend fun deleteJob(jobNameVal: String) { val jobRequest = DeleteJobRequest { jobName = jobNameVal } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.deleteJob(jobRequest) println("$jobNameVal was successfully deleted") } } suspend fun deleteMyDatabase(databaseName: String) { val request = DeleteDatabaseRequest { name = databaseName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.deleteDatabase(request) println("$databaseName was successfully deleted") } } suspend fun deleteCrawler(crawlerName: String) { val request = DeleteCrawlerRequest { name = crawlerName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> glueClient.deleteCrawler(request) println("$crawlerName was deleted") } } ``` + 如需 API 詳細資訊，請參閱《*AWS SDK for Kotlin API 參考*》中的下列主題。 + [CreateCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [CreateJob](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [DeleteCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [DeleteDatabase](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [DeleteJob](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [DeleteTable](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetDatabase](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetDatabases](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetJob](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetJobRun](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetJobRuns](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [GetTables](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [ListJobs](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [StartCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html) + [StartJobRun](https://sdk.amazonaws.com/kotlin/api/latest/index.html) ## 動作 ### `CreateCrawler` 以下程式碼範例顯示如何使用 `CreateCrawler`。 **適用於 Kotlin 的 SDK** GitHub 上提供更多範例。尋找完整範例，並了解如何在 [AWS 程式碼範例儲存庫](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/kotlin/services/glue#code-examples)中設定和執行。 ``` suspend fun createGlueCrawler( iam: String?, s3Path: String?, cron: String?, dbName: String?, crawlerName: String, ) { val s3Target = S3Target { path = s3Path } // Add the S3Target to a list. val targetList = mutableListOf() targetList.add(s3Target) val targetOb = CrawlerTargets { s3Targets = targetList } val request = CreateCrawlerRequest { databaseName = dbName name = crawlerName description = "Created by the AWS Glue Kotlin API" targets = targetOb role = iam schedule = cron } GlueClient.fromEnvironment { region = "us-west-2" }.use { glueClient -> glueClient.createCrawler(request) println("$crawlerName was successfully created") } } ``` + 如需 API 詳細資訊，請參閱《*AWS SDK for Kotlin API 參考*》中的 [CreateCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html)。 ### `GetCrawler` 以下程式碼範例顯示如何使用 `GetCrawler`。 **適用於 Kotlin 的 SDK** GitHub 上提供更多範例。尋找完整範例，並了解如何在 [AWS 程式碼範例儲存庫](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/kotlin/services/glue#code-examples)中設定和執行。 ``` suspend fun getSpecificCrawler(crawlerName: String?) { val request = GetCrawlerRequest { name = crawlerName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getCrawler(request) val role = response.crawler?.role println("The role associated with this crawler is $role") } } ``` + 如需 API 詳細資訊，請參閱《*AWS SDK for Kotlin API 參考*》中的 [GetCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html)。 ### `GetDatabase` 以下程式碼範例顯示如何使用 `GetDatabase`。 **適用於 Kotlin 的 SDK** GitHub 上提供更多範例。尋找完整範例，並了解如何在 [AWS 程式碼範例儲存庫](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/kotlin/services/glue#code-examples)中設定和執行。 ``` suspend fun getSpecificDatabase(databaseName: String?) { val request = GetDatabaseRequest { name = databaseName } GlueClient.fromEnvironment { region = "us-east-1" }.use { glueClient -> val response = glueClient.getDatabase(request) val dbDesc = response.database?.description println("The database description is $dbDesc") } } ``` + 如需 API 詳細資訊，請參閱《*AWS SDK for Kotlin API 參考*》中的 [GetDatabase](https://sdk.amazonaws.com/kotlin/api/latest/index.html)。 ### `StartCrawler` 以下程式碼範例顯示如何使用 `StartCrawler`。 **適用於 Kotlin 的 SDK** GitHub 上提供更多範例。尋找完整範例，並了解如何在 [AWS 程式碼範例儲存庫](https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/kotlin/services/glue#code-examples)中設定和執行。 ``` suspend fun startSpecificCrawler(crawlerName: String?) { val request = StartCrawlerRequest { name = crawlerName } GlueClient.fromEnvironment { region = "us-west-2" }.use { glueClient -> glueClient.startCrawler(request) println("$crawlerName was successfully started.") } } ``` + 如需 API 詳細資訊，請參閱《*AWS SDK for Kotlin API 參考*》中的 [StartCrawler](https://sdk.amazonaws.com/kotlin/api/latest/index.html)。