使用分区投影为 Athena 中的 AWS WAF S3 日志创建表 - Amazon Athena

使用分区投影为 Athena 中的 AWS WAF S3 日志创建表

由于 AWS WAF 日志具有您可以预先指定其分区方案的已知结构,因此您可以使用 Athena 分区投影功能减少查询运行时间并自动管理分区。当添加新数据时,分区投影会自动添加新分区。这样就不必使用 ALTER TABLE ADD PARTITION 手动添加分区了。

以下示例 CREATE TABLE 语句会自动在 AWS WAF 日志上从指定日期开始到当前日期为止,为四个不同 AWS 区域使用分区投影。本示例中的 PARTITION BY 子句按区域和日期进行分区,但您可以根据自己的要求修改此子句。根据需要修改字段以匹配您的日志输出。在 LOCATIONstorage.location.template 子句中,将 bucketaccountID 占位符替换为值,该值标识 AWS WAF 日志在 Amazon S3 存储桶中的位置。对于 projection.day.range,将 2021/01/01 替换为要使用的开始日期。成功运行查询后,您可以查询表。您无需运行 ALTER TABLE ADD PARTITION 来加载分区。

CREATE EXTERNAL TABLE `waf_logs`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array < struct < rulegroupid: string, terminatingrule: struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > > >, nonterminatingmatchingrules: array < struct < ruleid: string, action: string, overriddenaction: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, excludedrules: string > >, `ratebasedrulelist` array < struct < ratebasedruleid: string, limitkey: string, maxrateallowed: int > >, `nonterminatingmatchingrules` array < struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, `requestheadersinserted` array < struct < name: string, value: string > >, `responsecodesent` string, `httprequest` struct < clientip: string, country: string, headers: array < struct < name: string, value: string > >, uri: string, args: string, httpversion: string, httpmethod: string, requestid: string >, `labels` array < struct < name: string > >, `captcharesponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `challengeresponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `ja3Fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int ) PARTITIONED BY ( `region` string, `date` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/accountID/WAFLogs/region/DOC-EXAMPLE-WEBACL/' TBLPROPERTIES( 'projection.enabled' = 'true', 'projection.region.type' = 'enum', 'projection.region.values' = 'us-east-1,us-west-2,eu-central-1,eu-west-1', 'projection.date.type' = 'date', 'projection.date.range' = '2021/01/01,NOW', 'projection.date.format' = 'yyyy/MM/dd', 'projection.date.interval' = '1', 'projection.date.interval.unit' = 'DAYS', 'storage.location.template' = 's3://amzn-s3-demo-bucket/AWSLogs/accountID/WAFLogs/${region}/DOC-EXAMPLE-WEBACL/${date}/')
注意

示例中 LOCATION 子句中的路径格式是标准格式,但可能因所实施的 AWS WAF 配置而异。例如,以下示例 AWS WAF 日志路径适用于 CloudFront 分配:

s3://amzn-s3-demo-bucket/AWSLogs/12345678910/WAFLogs/cloudfront/cloudfronyt/2022/08/08/17/55/

如果您在创建或查询 AWS WAF 日志表时遇到问题,请确认日志数据位置或联系 AWS Support

更多有关分区投影的信息,请参阅 将分区投影与 Amazon Athena 结合使用