AWS Glue 使用 的範例 AWS CLI - AWS SDK 程式碼範例

文件 AWS SDK AWS 範例 SDK 儲存庫中有更多可用的 GitHub 範例。

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

AWS Glue 使用 的範例 AWS CLI

下列程式碼範例示範如何使用 AWS Command Line Interface 搭配 來執行動作和實作常見案例 AWS Glue。

Actions 是大型程式的程式碼摘錄,必須在內容中執行。雖然 動作會示範如何呼叫個別服務函數,但您可以在其相關案例中查看內容中的動作。

每個範例都包含完整原始程式碼的連結,您可以在其中找到如何在內容中設定和執行程式碼的指示。

主題

動作

下列程式碼範例示範如何使用 batch-stop-job-run

AWS CLI

停止任務執行

下列batch-stop-job-run範例會停止任務執行。

aws glue batch-stop-job-run \ --job-name "my-testing-job" \ --job-run-id jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f

輸出:

{ "SuccessfulSubmissions": [ { "JobName": "my-testing-job", "JobRunId": "jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f" } ], "Errors": [], "ResponseMetadata": { "RequestId": "66bd6b90-01db-44ab-95b9-6aeff0e73d88", "HTTPStatusCode": 200, "HTTPHeaders": { "date": "Fri, 16 Oct 2020 20:54:51 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "148", "connection": "keep-alive", "x-amzn-requestid": "66bd6b90-01db-44ab-95b9-6aeff0e73d88" }, "RetryAttempts": 0 } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 BatchStopJobRun

下列程式碼範例示範如何使用 create-connection

AWS CLI

建立 Glue AWS 資料存放區的連線

下列create-connection範例會在 AWS Glue Data Catalog 中建立連線,提供 Kafka 資料存放區的連線資訊。

aws glue create-connection \ --connection-input '{ \ "Name":"conn-kafka-custom", \ "Description":"kafka connection with ssl to custom kafka", \ "ConnectionType":"KAFKA", \ "ConnectionProperties":{ \ "KAFKA_BOOTSTRAP_SERVERS":"<Kafka-broker-server-url>:<SSL-Port>", \ "KAFKA_SSL_ENABLED":"true", \ "KAFKA_CUSTOM_CERT": "s3://bucket/prefix/cert-file.pem" \ }, \ "PhysicalConnectionRequirements":{ \ "SubnetId":"subnet-1234", \ "SecurityGroupIdList":["sg-1234"], \ "AvailabilityZone":"us-east-1a"} \ }' \ --region us-east-1 --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的 Glue Data Catalog 中的定義連線AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 CreateConnection

下列程式碼範例示範如何使用 create-database

AWS CLI

建立資料庫

下列create-database範例會在 Glue Data Catalog AWS 中建立資料庫。

aws glue create-database \ --database-input "{\"Name\":\"tempdb\"}" \ --profile my_profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 CreateDatabase

下列程式碼範例示範如何使用 create-job

AWS CLI

建立工作以轉換資料

下列 create-job 範例會建立串流工作,它可執行存放在 S3 中的指令碼。

aws glue create-job \ --name my-testing-job \ --role AWSGlueServiceRoleDefault \ --command '{ \ "Name": "gluestreaming", \ "ScriptLocation": "s3://DOC-EXAMPLE-BUCKET/folder/" \ }' \ --region us-east-1 \ --output json \ --default-arguments '{ \ "--job-language":"scala", \ "--class":"GlueApp" \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

test_script.scala 的內容:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

輸出:

{ "Name": "my-testing-job" }

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的在 Glue 中編寫任務AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 CreateJob

下列程式碼範例示範如何使用 create-table

AWS CLI

範例 1:為 Kinesis 資料串流建立資料表

下列create-table範例會在 Glue Data Catalog AWS 中建立描述 Kinesis 資料串流的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kinesis-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"my-testing-stream", \ "Parameters":{ \ "typeOfData":"kinesis","streamName":"my-testing-stream", \ "kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \ }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的 Glue Data Catalog 中的定義資料表AWS

範例 2:為 Kafka 資料存放區建立資料表

下列create-table範例會在 Glue Data Catalog AWS 中建立描述 Kafka 資料存放區的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kafka-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"glue-topic", \ "Parameters":{ \ "typeOfData":"kafka","topicName":"glue-topic", \ "connectionName":"my-kafka-connection" }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \ }, \ "Parameters":{ \ "separatorChar":","} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的 Glue Data Catalog 中的定義資料表AWS

範例 3:為 a AWS S3 資料存放區建立資料表

下列create-table範例會在 Glue Data Catalog AWS 中建立描述 AWS Simple Storage Service (AWS S3) 資料存放區的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"s3-output", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"s1", "Type":"string"}, \ {"Name":"s2", "Type":"int"}, \ {"Name":"s3", "Type":"string"} ], \ "Location":"s3://bucket-path/", \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的 Glue Data Catalog 中的定義資料表AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 CreateTable

下列程式碼範例示範如何使用 delete-job

AWS CLI

若要刪除工作

下列 delete-job 範例會刪除不再需要的工作。

aws glue delete-job \ --job-name my-testing-job

輸出:

{ "JobName": "my-testing-job" }

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的使用 Glue 主控台上的任務AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 DeleteJob

下列程式碼範例示範如何使用 get-databases

AWS CLI

在 Glue Data Catalog AWS 中列出部分或全部資料庫的定義

下列 get-databases 範例會傳回 Data Catalog 中的資料庫相關資訊。

aws glue get-databases

輸出:

{ "DatabaseList": [ { "Name": "default", "Description": "Default Hive database", "LocationUri": "file:/spark-warehouse", "CreateTime": 1602084052.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "flights-db", "CreateTime": 1587072847.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "legislators", "CreateTime": 1601415625.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "tempdb", "CreateTime": 1601498566.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetDatabases

下列程式碼範例示範如何使用 get-job-run

AWS CLI

取得工作執行的相關資訊

以下 get-job-run 範例會擷取工作執行的相關資訊。

aws glue get-job-run \ --job-name "Combine legistators data" \ --run-id jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e

輸出:

{ "JobRun": { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "Combine legistators data", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetJobRun

下列程式碼範例示範如何使用 get-job-runs

AWS CLI

取得工作的全部工作執行相關資訊

以下 get-job-runs 範例會擷取工作的工作執行相關資訊。

aws glue get-job-runs \ --job-name "my-testing-job"

輸出:

{ "JobRuns": [ { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "my-testing-job", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_2", "Attempt": 2, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "JobName": "my-testing-job", "StartedOn": 1602811168.496, "LastModifiedOn": 1602811282.39, "CompletedOn": 1602811282.39, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 021AAB703DB20A2D; S3 Extended Request ID: teZk24Y09TkXzBvMPG502L5VJBhe9DJuWA9/TXtuGOqfByajkfL/Tlqt5JBGdEGpigAqzdMDM/U=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 110, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "Attempt": 1, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f", "JobName": "my-testing-job", "StartedOn": 1602811020.518, "LastModifiedOn": 1602811138.364, "CompletedOn": 1602811138.364, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 2671D37856AE7ABB; S3 Extended Request ID: RLJCJw20brV+PpC6GpORahyF2fp9flB5SSb2bTGPnUSPVizLXRl1PN3QZldb+v1o9qRVktNYbW8=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 113, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetJobRuns

下列程式碼範例示範如何使用 get-job

AWS CLI

擷取工作相關資訊

以下 get-job 範例會擷取工作相關資訊。

aws glue get-job \ --job-name my-testing-job

輸出:

{ "Job": { "Name": "my-testing-job", "Role": "Glue_DefaultRole", "CreatedOn": 1602805698.167, "LastModifiedOn": 1602805698.167, "ExecutionProperty": { "MaxConcurrentRuns": 1 }, "Command": { "Name": "gluestreaming", "ScriptLocation": "s3://janetst-bucket-01/Scripts/test_script.scala", "PythonVersion": "2" }, "DefaultArguments": { "--class": "GlueApp", "--job-language": "scala" }, "MaxRetries": 0, "AllocatedCapacity": 10, "MaxCapacity": 10.0, "GlueVersion": "1.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetJob

下列程式碼範例示範如何使用 get-plan

AWS CLI

取得產生的程式碼,以將資料從來源資料表映射至目標資料表

以下內容會get-plan擷取產生的程式碼,用於將資料欄從資料來源對應至資料目標。

aws glue get-plan --mapping '[ \ { \ "SourcePath":"sensorid", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"sensorid", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"currenttemperature", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"currenttemperature", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"status", \ "SourceTable":"anything", \ "SourceType":"string", \ "TargetPath":"status", \ "TargetTable":"anything", \ "TargetType":"string" \ }]' \ --source '{ \ "DatabaseName":"tempdb", \ "TableName":"s3-source" \ }' \ --sinks '[ \ { \ "DatabaseName":"tempdb", \ "TableName":"my-s3-sink" \ }]' --language "scala" --endpoint https://glue.us-east-1.amazonaws.com --output "text"

輸出:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的在 Glue 中編輯指令碼AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetPlan

下列程式碼範例示範如何使用 get-tables

AWS CLI

列出指定資料庫中部分或全部資料表的定義

下列 get-tables 範例會傳回指定資料庫中資料表的相關資訊。

aws glue get-tables --database-name 'tempdb'

輸出:

{ "TableList": [ { "Name": "my-s3-sink", "DatabaseName": "tempdb", "CreateTime": 1602730539.0, "UpdateTime": 1602730539.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/test-s3-output/", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "s3-source", "DatabaseName": "tempdb", "CreateTime": 1602730658.0, "UpdateTime": 1602730658.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/", "Compressed": false, "NumberOfBuckets": 0, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "test-kinesis-input", "DatabaseName": "tempdb", "CreateTime": 1601507001.0, "UpdateTime": 1601507001.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "my-testing-stream", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "Parameters": { "kinesisUrl": "https://kinesis.us-east-1.amazonaws.com", "streamName": "my-testing-stream", "typeOfData": "kinesis" }, "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" } ] }

如需詳細資訊,請參閱 Glue AWS 開發人員指南中的 Glue Data Catalog 中的定義資料表AWS

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 GetTables

下列程式碼範例示範如何使用 start-crawler

AWS CLI

啟動爬蟲程式

以下 start-crawler 範例會啟動爬蟲程式。

aws glue start-crawler --name my-crawler

輸出:

None

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的定義爬蟲程式

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 StartCrawler

下列程式碼範例示範如何使用 start-job-run

AWS CLI

開始執行工作

以下 start-job-run 範例會啟動工作。

aws glue start-job-run \ --job-name my-job

輸出:

{ "JobRunId": "jr_22208b1f44eb5376a60569d4b21dd20fcb8621e1a366b4e7b2494af764b82ded" }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的授權工作

  • 如需 API 詳細資訊,請參閱 AWS CLI 命令參考中的 StartJobRun