AWS Glue 使用 的範例 AWS CLI - AWS Command Line Interface

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

AWS Glue 使用 的範例 AWS CLI

下列程式碼範例示範如何使用 AWS Command Line Interface 搭配 來執行動作和實作常見案例 AWS Glue。

Actions 是大型程式的程式碼摘錄,必須在內容中執行。雖然動作會示範如何呼叫個別服務函數,但您可以在其相關案例中查看內容中的動作。

每個範例都包含完整原始程式碼的連結,您可以在其中找到如何在內容中設定和執行程式碼的指示。

主題

動作

下列程式碼範例示範如何使用 batch-stop-job-run

AWS CLI

停止任務執行

下列batch-stop-job-run範例會停止任務執行。

aws glue batch-stop-job-run \ --job-name "my-testing-job" \ --job-run-id jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f

輸出:

{ "SuccessfulSubmissions": [ { "JobName": "my-testing-job", "JobRunId": "jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f" } ], "Errors": [], "ResponseMetadata": { "RequestId": "66bd6b90-01db-44ab-95b9-6aeff0e73d88", "HTTPStatusCode": 200, "HTTPHeaders": { "date": "Fri, 16 Oct 2020 20:54:51 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "148", "connection": "keep-alive", "x-amzn-requestid": "66bd6b90-01db-44ab-95b9-6aeff0e73d88" }, "RetryAttempts": 0 } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需API詳細資訊,請參閱 命令參考 BatchStopJobRun中的 。 AWS CLI

下列程式碼範例示範如何使用 create-connection

AWS CLI

建立 AWS Glue 資料存放區的連線

下列create-connection範例會在 AWS Glue Data Catalog 中建立連線,提供 Kafka 資料存放區的連線資訊。

aws glue create-connection \ --connection-input '{ \ "Name":"conn-kafka-custom", \ "Description":"kafka connection with ssl to custom kafka", \ "ConnectionType":"KAFKA", \ "ConnectionProperties":{ \ "KAFKA_BOOTSTRAP_SERVERS":"<Kafka-broker-server-url>:<SSL-Port>", \ "KAFKA_SSL_ENABLED":"true", \ "KAFKA_CUSTOM_CERT": "s3://bucket/prefix/cert-file.pem" \ }, \ "PhysicalConnectionRequirements":{ \ "SubnetId":"subnet-1234", \ "SecurityGroupIdList":["sg-1234"], \ "AvailabilityZone":"us-east-1a"} \ }' \ --region us-east-1 --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的定義 Glue Data Catalog 中的連線AWS

  • 如需API詳細資訊,請參閱 命令參考 CreateConnection中的 。 AWS CLI

下列程式碼範例示範如何使用 create-database

AWS CLI

建立資料庫

下列create-database範例會在 AWS Glue Data Catalog 中建立資料庫。

aws glue create-database \ --database-input "{\"Name\":\"tempdb\"}" \ --profile my_profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需API詳細資訊,請參閱 命令參考 CreateDatabase中的 。 AWS CLI

下列程式碼範例示範如何使用 create-job

AWS CLI

建立工作以轉換資料

下列 create-job 範例會建立串流工作,它可執行存放在 S3 中的指令碼。

aws glue create-job \ --name my-testing-job \ --role AWSGlueServiceRoleDefault \ --command '{ \ "Name": "gluestreaming", \ "ScriptLocation": "s3://DOC-EXAMPLE-BUCKET/folder/" \ }' \ --region us-east-1 \ --output json \ --default-arguments '{ \ "--job-language":"scala", \ "--class":"GlueApp" \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

test_script.scala 的內容:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

輸出:

{ "Name": "my-testing-job" }

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的在 Glue 中編寫任務AWS

  • 如需API詳細資訊,請參閱 命令參考 CreateJob中的 。 AWS CLI

下列程式碼範例示範如何使用 create-table

AWS CLI

範例 1:為 Kinesis 資料串流建立資料表

下列create-table範例會在 AWS Glue Data Catalog 中建立描述 Kinesis 資料串流的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kinesis-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"my-testing-stream", \ "Parameters":{ \ "typeOfData":"kinesis","streamName":"my-testing-stream", \ "kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \ }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的 Glue Data Catalog 中的定義資料表AWS

範例 2:為 Kafka 資料存放區建立資料表

下列create-table範例會在 AWS Glue Data Catalog 中建立描述 Kafka 資料存放區的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kafka-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"glue-topic", \ "Parameters":{ \ "typeOfData":"kafka","topicName":"glue-topic", \ "connectionName":"my-kafka-connection" }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \ }, \ "Parameters":{ \ "separatorChar":","} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的 Glue Data Catalog 中的定義資料表AWS

範例 3:為 AWS S3 資料存放區建立資料表

下列create-table範例會在 AWS Glue Data Catalog 中建立描述 AWS Simple Storage Service (AWS S3) 資料存放區的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"s3-output", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"s1", "Type":"string"}, \ {"Name":"s2", "Type":"int"}, \ {"Name":"s3", "Type":"string"} ], \ "Location":"s3://bucket-path/", \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的 Glue Data Catalog 中的定義資料表AWS

  • 如需API詳細資訊,請參閱 命令參考 CreateTable中的 。 AWS CLI

下列程式碼範例示範如何使用 delete-job

AWS CLI

若要刪除工作

下列 delete-job 範例會刪除不再需要的工作。

aws glue delete-job \ --job-name my-testing-job

輸出:

{ "JobName": "my-testing-job" }

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的使用 Glue 主控台上的任務AWS

  • 如需API詳細資訊,請參閱 命令參考 DeleteJob中的 。 AWS CLI

下列程式碼範例示範如何使用 get-databases

AWS CLI

在 AWS Glue Data Catalog 中列出部分或全部資料庫的定義

下列 get-databases 範例會傳回 Data Catalog 中的資料庫相關資訊。

aws glue get-databases

輸出:

{ "DatabaseList": [ { "Name": "default", "Description": "Default Hive database", "LocationUri": "file:/spark-warehouse", "CreateTime": 1602084052.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "flights-db", "CreateTime": 1587072847.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "legislators", "CreateTime": 1601415625.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "tempdb", "CreateTime": 1601498566.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需API詳細資訊,請參閱 命令參考 GetDatabases中的 。 AWS CLI

下列程式碼範例示範如何使用 get-job-run

AWS CLI

取得工作執行的相關資訊

以下 get-job-run 範例會擷取工作執行的相關資訊。

aws glue get-job-run \ --job-name "Combine legistators data" \ --run-id jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e

輸出:

{ "JobRun": { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "Combine legistators data", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需API詳細資訊,請參閱 命令參考 GetJobRun中的 。 AWS CLI

下列程式碼範例示範如何使用 get-job-runs

AWS CLI

取得工作的全部工作執行相關資訊

以下 get-job-runs 範例會擷取工作的工作執行相關資訊。

aws glue get-job-runs \ --job-name "my-testing-job"

輸出:

{ "JobRuns": [ { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "my-testing-job", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_2", "Attempt": 2, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "JobName": "my-testing-job", "StartedOn": 1602811168.496, "LastModifiedOn": 1602811282.39, "CompletedOn": 1602811282.39, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 021AAB703DB20A2D; S3 Extended Request ID: teZk24Y09TkXzBvMPG502L5VJBhe9DJuWA9/TXtuGOqfByajkfL/Tlqt5JBGdEGpigAqzdMDM/U=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 110, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "Attempt": 1, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f", "JobName": "my-testing-job", "StartedOn": 1602811020.518, "LastModifiedOn": 1602811138.364, "CompletedOn": 1602811138.364, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 2671D37856AE7ABB; S3 Extended Request ID: RLJCJw20brV+PpC6GpORahyF2fp9flB5SSb2bTGPnUSPVizLXRl1PN3QZldb+v1o9qRVktNYbW8=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 113, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需API詳細資訊,請參閱 命令參考 GetJobRuns中的 。 AWS CLI

下列程式碼範例示範如何使用 get-job

AWS CLI

擷取工作相關資訊

以下 get-job 範例會擷取工作相關資訊。

aws glue get-job \ --job-name my-testing-job

輸出:

{ "Job": { "Name": "my-testing-job", "Role": "Glue_DefaultRole", "CreatedOn": 1602805698.167, "LastModifiedOn": 1602805698.167, "ExecutionProperty": { "MaxConcurrentRuns": 1 }, "Command": { "Name": "gluestreaming", "ScriptLocation": "s3://janetst-bucket-01/Scripts/test_script.scala", "PythonVersion": "2" }, "DefaultArguments": { "--class": "GlueApp", "--job-language": "scala" }, "MaxRetries": 0, "AllocatedCapacity": 10, "MaxCapacity": 10.0, "GlueVersion": "1.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作

  • 如需API詳細資訊,請參閱 命令參考 GetJob中的 。 AWS CLI

下列程式碼範例示範如何使用 get-plan

AWS CLI

若要取得產生的程式碼,以將資料從來源資料表映射至目標資料表

以下內容會get-plan擷取產生的程式碼,用於將資料欄從資料來源映射至資料目標。

aws glue get-plan --mapping '[ \ { \ "SourcePath":"sensorid", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"sensorid", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"currenttemperature", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"currenttemperature", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"status", \ "SourceTable":"anything", \ "SourceType":"string", \ "TargetPath":"status", \ "TargetTable":"anything", \ "TargetType":"string" \ }]' \ --source '{ \ "DatabaseName":"tempdb", \ "TableName":"s3-source" \ }' \ --sinks '[ \ { \ "DatabaseName":"tempdb", \ "TableName":"my-s3-sink" \ }]' --language "scala" --endpoint https://glue.us-east-1.amazonaws.com --output "text"

輸出:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的在 Glue 中編輯指令碼AWS

  • 如需API詳細資訊,請參閱 命令參考 GetPlan中的 。 AWS CLI

下列程式碼範例示範如何使用 get-tables

AWS CLI

列出指定資料庫中部分或全部資料表的定義

下列 get-tables 範例會傳回指定資料庫中資料表的相關資訊。

aws glue get-tables --database-name 'tempdb'

輸出:

{ "TableList": [ { "Name": "my-s3-sink", "DatabaseName": "tempdb", "CreateTime": 1602730539.0, "UpdateTime": 1602730539.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/test-s3-output/", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "s3-source", "DatabaseName": "tempdb", "CreateTime": 1602730658.0, "UpdateTime": 1602730658.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/", "Compressed": false, "NumberOfBuckets": 0, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "test-kinesis-input", "DatabaseName": "tempdb", "CreateTime": 1601507001.0, "UpdateTime": 1601507001.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "my-testing-stream", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "Parameters": { "kinesisUrl": "https://kinesis.us-east-1.amazonaws.com", "streamName": "my-testing-stream", "typeOfData": "kinesis" }, "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" } ] }

如需詳細資訊,請參閱 AWS Glue 開發人員指南 中的 Glue Data Catalog 中的定義資料表AWS

  • 如需API詳細資訊,請參閱 命令參考 GetTables中的 。 AWS CLI

下列程式碼範例示範如何使用 start-crawler

AWS CLI

啟動爬蟲程式

以下 start-crawler 範例會啟動爬蟲程式。

aws glue start-crawler --name my-crawler

輸出:

None

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的定義爬蟲程式

  • 如需API詳細資訊,請參閱 命令參考 StartCrawler中的 。 AWS CLI

下列程式碼範例示範如何使用 start-job-run

AWS CLI

開始執行工作

以下 start-job-run 範例會啟動工作。

aws glue start-job-run \ --job-name my-job

輸出:

{ "JobRunId": "jr_22208b1f44eb5376a60569d4b21dd20fcb8621e1a366b4e7b2494af764b82ded" }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的授權工作

  • 如需API詳細資訊,請參閱 命令參考 StartJobRun中的 。 AWS CLI