本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用分割區投影為 Athena 中的 AWS WAF S3 日誌建立表格
因為 AWS WAF 記錄檔具有已知的結構,您可以事先指定其資料分割配置,因此您可以使用 Athena 分割區投影功能,縮短查詢執行階段並自動化分割區管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 ALTER TABLE ADD PARTITION
手動新增分割區。
下列範例CREATE TABLE
陳述式會自動在指定日期的 AWS WAF 記錄上使用分割區投影,直到四個不同 AWS 區域為止。本範例中的 PARTITION BY
子句按區域和日期進行分割,但您可以根據自己的要求進行修改。視需要修改欄位,以符合您的日誌輸出。在LOCATION
和子storage.location.template
句中,取代 bucket
以及 accountID
具有可識別 AWS WAF 日誌之 Amazon S3 儲存貯體位置值的預留位置。對於projection.day.range
,取代 2021
/01
/01
與您要使用的開始日期。成功執行查詢之後,您可以查詢資料表。您無須執行 ALTER TABLE ADD PARTITION
就能載入分割區。
CREATE EXTERNAL TABLE `waf_logs`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array < struct < rulegroupid: string, terminatingrule: struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > > >, nonterminatingmatchingrules: array < struct < ruleid: string, action: string, overriddenaction: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, excludedrules: string > >, `ratebasedrulelist` array < struct < ratebasedruleid: string, limitkey: string, maxrateallowed: int > >, `nonterminatingmatchingrules` array < struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, `requestheadersinserted` array < struct < name: string, value: string > >, `responsecodesent` string, `httprequest` struct < clientip: string, country: string, headers: array < struct < name: string, value: string > >, uri: string, args: string, httpversion: string, httpmethod: string, requestid: string >, `labels` array < struct < name: string > >, `captcharesponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `challengeresponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `ja3Fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int ) PARTITIONED BY ( `region` string, `date` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/
accountID
/WAFLogs/region
/DOC-EXAMPLE-WEBACL
/' TBLPROPERTIES( 'projection.enabled' = 'true', 'projection.region.type' = 'enum', 'projection.region.values' = 'us-east-1,us-west-2,eu-central-1,eu-west-1', 'projection.date.type' = 'date', 'projection.date.range' = '2021/01/01,NOW', 'projection.date.format' = 'yyyy/MM/dd', 'projection.date.interval' = '1', 'projection.date.interval.unit' = 'DAYS', 'storage.location.template' = 's3://amzn-s3-demo-bucket/AWSLogs/accountID
/WAFLogs/${region}/DOC-EXAMPLE-WEBACL
/${date}/')
注意
範例中LOCATION
子句中的路徑格式為標準格式,但可能會根據您實作的 AWS WAF 組態而有所不同。例如,下列範例 AWS WAF 記錄檔路徑適用於 CloudFront 散發:
s3://amzn-s3-demo-bucket/AWSLogs/12345678910
/WAFLogs/cloudfront/cloudfronyt/2022/08/08/17/55/
如果您在建立或查詢 AWS WAF 記錄資料表時遇到問題,請確認記錄資料或連絡人
如需有關分割區投影的詳細資訊,請參閱搭配 Amazon Athena 使用分割區投影。