Analyze bucket access logs with Amazon Athena in Lightsail - Amazon Lightsail

Analyze bucket access logs with Amazon Athena in Lightsail

In this guide, we show you how to identify requests to a bucket using access logs. For more information, see Bucket access logs.

Contents

Query access logs for requests using Amazon Athena

You can use Amazon Athena to query and identify requests to a bucket in access logs.

Lightsail stores access logs as objects in a Lightsail bucket. It is often easier to use a tool that can analyze the logs. Athena supports analysis of objects and can be used to query access logs.

Example

The following example shows how you can query bucket server access logs in Amazon Athena.

Note

To specify a bucket location in an Athena query, you need to format the target bucket name and target prefix where your logs are delivered as an S3 URI, as follows: s3://amzn-s3-demo-bucket1-logs/prefix/

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. In the Query Editor, run a command similar to the following.

    create database bucket_access_logs_db
    Note

    It's a best practice to create the database in the same AWS Region as your S3 bucket.

  3. In the Query Editor, run a command similar to the following to create a table schema in the database that you created in step 2. The STRING and BIGINT data type values are the access log properties. You can query these properties in Athena. For LOCATION, enter the bucket and prefix path as noted earlier.

    CREATE EXTERNAL TABLE `s3_access_logs_db.amzn-s3-demo-bucket_logs`( `bucketowner` STRING, `bucket_name` STRING, `requestdatetime` STRING, `remoteip` STRING, `requester` STRING, `requestid` STRING, `operation` STRING, `key` STRING, `request_uri` STRING, `httpstatus` STRING, `errorcode` STRING, `bytessent` BIGINT, `objectsize` BIGINT, `totaltime` STRING, `turnaroundtime` STRING, `referrer` STRING, `useragent` STRING, `versionid` STRING, `hostid` STRING, `sigv` STRING, `ciphersuite` STRING, `authtype` STRING, `endpoint` STRING, `tlsversion` STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket1-logs/prefix/'
  4. In the navigation pane, under Database, choose your database.

  5. Under Tables, choose Preview table next to your table name.

    In the Results pane, you should see data from the server access logs, such as bucketowner, bucket, requestdatetime, and so on. This means that you successfully created the Athena table. You can now query the bucket server access logs.

Example — Show who deleted an object and when (timestamp, IP address, and IAM user)

SELECT RequestDateTime, RemoteIP, Requester, Key FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE key = 'images/picture.jpg' AND operation like '%DELETE%';

Example — Show all operations that were performed by an IAM user

SELECT * FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE requester='arn:aws:iam::123456789123:user/user_name';

Example — Show all operations that were performed on an object in a specific time period

SELECT * FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE Key='prefix/images/picture.jpg' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2017-02-18:07:00:00','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2017-02-18:08:00:00','yyyy-MM-dd:HH:mm:ss');

Example — Show how much data was transferred by a specific IP address in a specific time period

SELECT SUM(bytessent) AS uploadTotal, SUM(objectsize) AS downloadTotal, SUM(bytessent + objectsize) AS Total FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE RemoteIP='1.2.3.4' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2017-06-01','yyyy-MM-dd') AND parse_datetime('2017-07-01','yyyy-MM-dd');

Identify object access requests using Amazon S3 access logs

You can use queries on access logs to identify object access requests, for operations such as GET, PUT, and DELETE, and discover further information about those requests.

The following Amazon Athena query example shows how to get all PUT object requests for a bucket from the server access log.

Example — Show all requesters that are sending PUT object requests in a certain period

SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db WHERE Operation='REST.PUT.OBJECT' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')

The following Amazon Athena query example shows how to get all GET object requests for Amazon S3 from the server access log.

Example — Show all requesters that are sending GET object requests in a certain period

SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db WHERE Operation='REST.GET.OBJECT' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')

The following Amazon Athena query example shows how to get all anonymous requests to your S3 buckets from the server access log.

Example — Show all anonymous requesters that are making requests to a bucket in a certain period

SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE Requester IS NULL AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
Note
  • You can modify the date range to suit your needs.

  • These query examples might also be useful for security monitoring. You can review the results for PutObject or GetObject calls from unexpected or unauthorized IP addresses/requesters and for identifying any anonymous requests to your buckets.

  • This query only retrieves information from the time at which logging was enabled.