创建不进行分区的 AWS WAF 日志表 - Amazon Athena

创建不进行分区的 AWS WAF 日志表

本节介绍如何创建不进行分区或分区投影的 AWS WAF 日志表。

注意

出于性能和成本原因,不建议使用非分区架构进行查询。有关更多信息,请参阅 AWS 大数据博客中的 Top 10 Performance Tuning Tips for Amazon Athena(Amazon Athena 的十大性能优化技巧)。

创建 AWS WAF 表
  1. 将以下 DDL 语句复制并粘贴到 Athena 控制台中。根据需要修改字段以匹配您的日志输出。修改 Amazon S3 存储桶的 LOCATION 以对应用于存储日志的存储桶。

    此查询使用 OpenX JSON SerDe

    注意

    SerDe 期望每个 JSON 文档都位于单行文本中,并且不使用行终止字符分隔记录中的字段。如果 JSON 文本采用美观的打印格式,当您在创建表后尝试对其进行查询时,可能会收到类似以下内容的错误消息:HIVE_CURSOR_ERROR: Row is not a valid JSON Object(HIVE_CURSOR_ERROR:行不是有效的 JSON 对象)或 HIVE_CURSOR_ERROR: JsonParseException: Unexpected end-of-input: expected close marker for OBJECT(HIVE_CURSOR_ERROR:JsonParseException:意外的输入结束:对象的预期关闭标记)。有关更多信息,请参阅 GitHub 上 OpenX SerDe 文档中的 JSON 数据文件

    CREATE EXTERNAL TABLE `waf_logs`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array < struct < rulegroupid: string, terminatingrule: struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > > >, nonterminatingmatchingrules: array < struct < ruleid: string, action: string, overriddenaction: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, excludedrules: string > >, `ratebasedrulelist` array < struct < ratebasedruleid: string, limitkey: string, maxrateallowed: int > >, `nonterminatingmatchingrules` array < struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, `requestheadersinserted` array < struct < name: string, value: string > >, `responsecodesent` string, `httprequest` struct < clientip: string, country: string, headers: array < struct < name: string, value: string > >, uri: string, args: string, httpversion: string, httpmethod: string, requestid: string >, `labels` array < struct < name: string > >, `captcharesponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `challengeresponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `ja3Fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/'
  2. 在 Athena 控制台查询编辑器中运行 CREATE EXTERNAL TABLE 语句。这将注册 waf_logs 表,并使其中的数据可用于来自 Athena 的查询。