本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
AWS Glue 使用示例 AWS CLI
以下代码示例向您展示了如何使用with来执行操作和实现常见场景 AWS Glue。 AWS Command Line Interface
操作是大型程序的代码摘录,必须在上下文中运行。您可以通过操作了解如何调用单个服务函数,还可以通过函数相关场景的上下文查看操作。
每个示例都包含一个指向完整源代码的链接,您可以在其中找到有关如何在上下文中设置和运行代码的说明。
主题
操作
以下代码示例显示了如何使用batch-stop-job-run
。
- AWS CLI
-
停止作业运行
以下
batch-stop-job-run
示例停止作业运行。aws glue batch-stop-job-run \ --job-name
"my-testing-job"
\ --job-run-idjr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f
输出:
{ "SuccessfulSubmissions": [ { "JobName": "my-testing-job", "JobRunId": "jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f" } ], "Errors": [], "ResponseMetadata": { "RequestId": "66bd6b90-01db-44ab-95b9-6aeff0e73d88", "HTTPStatusCode": 200, "HTTPHeaders": { "date": "Fri, 16 Oct 2020 20:54:51 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "148", "connection": "keep-alive", "x-amzn-requestid": "66bd6b90-01db-44ab-95b9-6aeff0e73d88" }, "RetryAttempts": 0 } }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的任务运行。
-
有关API详细信息,请参阅 “BatchStopJobRun AWS CLI
命令参考”。
-
以下代码示例显示了如何使用create-connection
。
- AWS CLI
-
为 Glue 数据 AWS 存储创建连接
以下
create-connection
示例在 AWS Glue 数据目录中创建一个连接,该连接为 Kafka 数据存储提供连接信息。aws glue create-connection \ --connection-input '
{ \ "Name":"conn-kafka-custom", \ "Description":"kafka connection with ssl to custom kafka", \ "ConnectionType":"KAFKA", \ "ConnectionProperties":{ \ "KAFKA_BOOTSTRAP_SERVERS":"<Kafka-broker-server-url>:<SSL-Port>", \ "KAFKA_SSL_ENABLED":"true", \ "KAFKA_CUSTOM_CERT": "s3://bucket/prefix/cert-file.pem" \ }, \ "PhysicalConnectionRequirements":{ \ "SubnetId":"subnet-1234", \ "SecurityGroupIdList":["sg-1234"], \ "AvailabilityZone":"us-east-1a"} \ }
' \ --regionus-east-1
--endpointhttps://glue.us-east-1.amazonaws.com
此命令不生成任何输出。
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 Gl AWS u e 数据目录中定义连接”。
-
有关API详细信息,请参阅 “CreateConnection AWS CLI
命令参考”。
-
以下代码示例显示了如何使用create-database
。
- AWS CLI
-
创建数据库
以下
create-database
示例在 Glue 数据目录 AWS 中创建了一个数据库。aws glue create-database \ --database-input "{\"Name\":\"tempdb\"}" \ --profile
my_profile
\ --endpointhttps://glue.us-east-1.amazonaws.com
此命令不生成任何输出。
有关更多信息,请参阅《AWS Glue 开发人员指南》中的在数据目录中定义数据库。
-
有关API详细信息,请参阅 “CreateDatabase AWS CLI
命令参考”。
-
以下代码示例显示了如何使用create-job
。
- AWS CLI
-
创建用于转换数据的任务
以下
create-job
示例创建了一个运行存储在 S3 中的脚本的流式处理任务。aws glue create-job \ --name
my-testing-job
\ --roleAWSGlueServiceRoleDefault
\ --command '{ \ "Name": "gluestreaming", \ "ScriptLocation": "s3://DOC-EXAMPLE-BUCKET/folder/" \ }
' \ --regionus-east-1
\ --outputjson
\ --default-arguments '{ \ "--job-language":"scala", \ "--class":"GlueApp" \ }
' \ --profilemy-profile
\ --endpointhttps://glue.us-east-1.amazonaws.com
test_script.scala
的内容:import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }
输出:
{ "Name": "my-testing-job" }
有关更多信息,请参阅《Glue 开发者指南》中的 “在 AWS Glue 中AWS 创作作作业”。
-
有关API详细信息,请参阅 “CreateJob AWS CLI
命令参考”。
-
以下代码示例显示了如何使用create-table
。
- AWS CLI
-
示例 1:为 Kinesis 数据流创建表
以下
create-table
示例在 Glue 数据目录中 AWS 创建了一个描述 Kinesis 数据流的表。aws glue create-table \ --database-name
tempdb
\ --table-input '{"Name":"test-kinesis-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"my-testing-stream", \ "Parameters":{ \ "typeOfData":"kinesis","streamName":"my-testing-stream", \ "kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \ }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }
' \ --profilemy-profile
\ --endpointhttps://glue.us-east-1.amazonaws.com
此命令不生成任何输出。
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 Gl AWS u e 数据目录中定义表”。
示例 2:为 Kafka 数据存储库创建表
以下
create-table
示例在 Glue 数据目录中 AWS 创建了一个描述 Kafka 数据存储的表。aws glue create-table \ --database-name
tempdb
\ --table-input '{"Name":"test-kafka-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"glue-topic", \ "Parameters":{ \ "typeOfData":"kafka","topicName":"glue-topic", \ "connectionName":"my-kafka-connection" }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \ }, \ "Parameters":{ \ "separatorChar":","} \ }
' \ --profilemy-profile
\ --endpointhttps://glue.us-east-1.amazonaws.com
此命令不生成任何输出。
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 Gl AWS u e 数据目录中定义表”。
示例 3:为 AWS S3 数据存储创建表
以下
create-table
示例在 Glue 数据目录 AWS 中创建了一个描述 AWS 简单存储服务 (AWS S3) 数据存储的表。aws glue create-table \ --database-name
tempdb
\ --table-input '{"Name":"s3-output", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"s1", "Type":"string"}, \ {"Name":"s2", "Type":"int"}, \ {"Name":"s3", "Type":"string"} ], \ "Location":"s3://bucket-path/", \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }
' \ --profilemy-profile
\ --endpointhttps://glue.us-east-1.amazonaws.com
此命令不生成任何输出。
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 Gl AWS u e 数据目录中定义表”。
-
有关API详细信息,请参阅 “CreateTable AWS CLI
命令参考”。
-
以下代码示例显示了如何使用delete-job
。
- AWS CLI
-
删除任务
以下
delete-job
示例删除了不再需要的任务。aws glue delete-job \ --job-name
my-testing-job
输出:
{ "JobName": "my-testing-job" }
有关更多信息,请参阅《Glue 开发者指南》中的 “在 AWS Glue 控制台AWS 上处理作业”。
-
有关API详细信息,请参阅 “DeleteJob AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-databases
。
- AWS CLI
-
在 Glue 数据目录中列出部分或全部 AWS 数据库的定义
以下
get-databases
示例返回有关数据目录中数据库的信息。aws glue get-databases
输出:
{ "DatabaseList": [ { "Name": "default", "Description": "Default Hive database", "LocationUri": "file:/spark-warehouse", "CreateTime": 1602084052.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "flights-db", "CreateTime": 1587072847.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "legislators", "CreateTime": 1601415625.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "tempdb", "CreateTime": 1601498566.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" } ] }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的在数据目录中定义数据库。
-
有关API详细信息,请参阅 “GetDatabases AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-job-run
。
- AWS CLI
-
获取有关任务运行的信息
以下
get-job-run
示例检索有关任务运行的信息。aws glue get-job-run \ --job-name
"Combine legistators data"
\ --run-idjr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e
输出:
{ "JobRun": { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "Combine legistators data", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的任务运行。
-
有关API详细信息,请参阅 “GetJobRun AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-job-runs
。
- AWS CLI
-
获取有关任务的所有任务运行的信息
以下
get-job-runs
示例检索有关任务的任务运行的信息。aws glue get-job-runs \ --job-name
"my-testing-job"
输出:
{ "JobRuns": [ { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "my-testing-job", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_2", "Attempt": 2, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "JobName": "my-testing-job", "StartedOn": 1602811168.496, "LastModifiedOn": 1602811282.39, "CompletedOn": 1602811282.39, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 021AAB703DB20A2D; S3 Extended Request ID: teZk24Y09TkXzBvMPG502L5VJBhe9DJuWA9/TXtuGOqfByajkfL/Tlqt5JBGdEGpigAqzdMDM/U=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 110, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "Attempt": 1, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f", "JobName": "my-testing-job", "StartedOn": 1602811020.518, "LastModifiedOn": 1602811138.364, "CompletedOn": 1602811138.364, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 2671D37856AE7ABB; S3 Extended Request ID: RLJCJw20brV+PpC6GpORahyF2fp9flB5SSb2bTGPnUSPVizLXRl1PN3QZldb+v1o9qRVktNYbW8=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 113, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } ] }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的任务运行。
-
有关API详细信息,请参阅 “GetJobRuns AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-job
。
- AWS CLI
-
检索有关任务的信息
以下
get-job
示例检索有关任务的信息。aws glue get-job \ --job-name
my-testing-job
输出:
{ "Job": { "Name": "my-testing-job", "Role": "Glue_DefaultRole", "CreatedOn": 1602805698.167, "LastModifiedOn": 1602805698.167, "ExecutionProperty": { "MaxConcurrentRuns": 1 }, "Command": { "Name": "gluestreaming", "ScriptLocation": "s3://janetst-bucket-01/Scripts/test_script.scala", "PythonVersion": "2" }, "DefaultArguments": { "--class": "GlueApp", "--job-language": "scala" }, "MaxRetries": 0, "AllocatedCapacity": 10, "MaxCapacity": 10.0, "GlueVersion": "1.0" } }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的任务。
-
有关API详细信息,请参阅 “GetJob AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-plan
。
- AWS CLI
-
获取生成的代码,用于将数据从源表映射到目标表
以下内容
get-plan
检索生成的代码,用于将列从数据源映射到数据目标。aws glue get-plan --mapping '
[ \ { \ "SourcePath":"sensorid", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"sensorid", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"currenttemperature", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"currenttemperature", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"status", \ "SourceTable":"anything", \ "SourceType":"string", \ "TargetPath":"status", \ "TargetTable":"anything", \ "TargetType":"string" \ }]
' \ --source '{ \ "DatabaseName":"tempdb", \ "TableName":"s3-source" \ }
' \ --sinks '[ \ { \ "DatabaseName":"tempdb", \ "TableName":"my-s3-sink" \ }]
' --language"scala"
--endpointhttps://glue.us-east-1.amazonaws.com
--output"text"
输出:
import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 G AWS l ue 中编辑脚本”。
-
有关API详细信息,请参阅 “GetPlan AWS CLI
命令参考”。
-
以下代码示例显示了如何使用get-tables
。
- AWS CLI
-
列出指定数据库中的部分或全部表的定义
以下
get-tables
示例返回有关指定数据库中表的信息。aws glue get-tables --database-name '
tempdb
'输出:
{ "TableList": [ { "Name": "my-s3-sink", "DatabaseName": "tempdb", "CreateTime": 1602730539.0, "UpdateTime": 1602730539.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/test-s3-output/", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "s3-source", "DatabaseName": "tempdb", "CreateTime": 1602730658.0, "UpdateTime": 1602730658.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/", "Compressed": false, "NumberOfBuckets": 0, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "test-kinesis-input", "DatabaseName": "tempdb", "CreateTime": 1601507001.0, "UpdateTime": 1601507001.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "my-testing-stream", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "Parameters": { "kinesisUrl": "https://kinesis.us-east-1.amazonaws.com", "streamName": "my-testing-stream", "typeOfData": "kinesis" }, "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" } ] }
有关更多信息,请参阅《Glue 开发者指南》中的 AWS “在 Gl AWS u e 数据目录中定义表”。
-
有关API详细信息,请参阅 “GetTables AWS CLI
命令参考”。
-
以下代码示例显示了如何使用start-crawler
。
- AWS CLI
-
启动爬网程序
以下
start-crawler
示例启动了一个爬网程序。aws glue start-crawler --name
my-crawler
输出:
None
有关更多信息,请参阅《AWS Glue 开发人员指南》中的定义爬网程序。
-
有关API详细信息,请参阅 “StartCrawler AWS CLI
命令参考”。
-
以下代码示例显示了如何使用start-job-run
。
- AWS CLI
-
开始运行任务
以下
start-job-run
示例启动了一个任务。aws glue start-job-run \ --job-name
my-job
输出:
{ "JobRunId": "jr_22208b1f44eb5376a60569d4b21dd20fcb8621e1a366b4e7b2494af764b82ded" }
有关更多信息,请参阅《AWS Glue 开发人员指南》中的编写任务。
-
有关API详细信息,请参阅 “StartJobRun AWS CLI
命令参考”。
-