使用分区投影功能创建和查询 Amazon VPC 流日志表
使用如下所示的 CREATE TABLE
语句创建表、对表进行分区并使用分区投影自动填充分区。将示例中的表名称 test_table_vpclogs
替换为您的表名称。编辑 LOCATION
子句以指定包含 Amazon VPC 日志数据的 Amazon S3 存储桶。
以下 CREATE TABLE
语句适用于以非 Hive 样式的分区格式传送的 VPC 流日志。该示例支持多账户聚合。要将来自多个账户的 VPC 流日志集中到一个 Amazon S3 存储桶中,必须在 Amazon S3 路径中输入账户 ID。
CREATE EXTERNAL TABLE IF NOT EXISTS test_table_vpclogs ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY (accid string, region string, day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LOCATION '
$LOCATION_OF_LOGS
' TBLPROPERTIES ( "skip.header.line.count"="1", "projection.enabled" = "true", "projection.accid.type" = "enum", "projection.accid.values" = "$ACCID_1
,$ACCID_2
", "projection.region.type" = "enum", "projection.region.values" = "$REGION_1
,$REGION_2
,$REGION_3
", "projection.day.type" = "date", "projection.day.range" = "$START_RANGE
,NOW", "projection.day.format" = "yyyy/MM/dd", "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs/${accid}/vpcflowlogs/${region}/${day}" )
test_table_vpclogs 表的查询示例
以下示例查询将查询之前 CREATE TABLE
语句创建的 test_table_vpclogs
。将查询中的 test_table_vpclogs
替换为您自己的表名称。根据您的要求修改列值和其他变量。
若要在指定时间段内按时间顺序返回前 100 个访问日志条目,请运行如下所示的查询。
SELECT * FROM test_table_vpclogs WHERE day >= '2021/02/01' AND day < '2021/02/28' ORDER BY day ASC LIMIT 100
若要查看哪个服务器在指定时间段内接收前十个 HTTP 数据包,请运行如下所示的查询。该查询计算在 HTTPS 端口 443 上接收的数据包数,按目标 IP 地址对其进行分组,并返回上一周的前 10 个条目。
SELECT SUM(packets) AS packetcount, dstaddr FROM test_table_vpclogs WHERE dstport = 443 AND day >= '2021/03/01' AND day < '2021/03/31' GROUP BY dstaddr ORDER BY packetcount DESC LIMIT 10
若要返回在指定时间段内创建的日志,请运行如下所示的查询。
SELECT interface_id, srcaddr, action, protocol, to_iso8601(from_unixtime(start)) AS start_time, to_iso8601(from_unixtime("end")) AS end_time FROM test_table_vpclogs WHERE DAY >= '2021/04/01' AND DAY < '2021/04/30'
若要返回指定时间段内源 IP 地址的访问日志,请运行如下所示的查询。
SELECT * FROM test_table_vpclogs WHERE srcaddr = '10.117.1.22' AND day >= '2021/02/01' AND day < '2021/02/28'
若要列出已被拒绝的 TCP 连接,请运行如下所示的查询。
SELECT day, interface_id, srcaddr, action, protocol FROM test_table_vpclogs WHERE action = 'REJECT' AND protocol = 6 AND day >= '2021/02/01' AND day < '2021/02/28' LIMIT 10
若要返回以 10.117
开头的 IP 地址范围的访问日志,请运行如下所示的查询。
SELECT * FROM test_table_vpclogs WHERE split_part(srcaddr,'.', 1)='10' AND split_part(srcaddr,'.', 2) ='117'
若要返回某个时间范围内目标 IP 地址的访问日志,请运行如下所示的查询。
SELECT * FROM test_table_vpclogs WHERE dstaddr = '10.0.1.14' AND day >= '2021/01/01' AND day < '2021/01/31'