Examples of using Amazon S3 Select on an object
Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. Learn more
You can use S3 Select to
select content from one object by using the Amazon S3 console, the REST API, and the AWS SDKs.
For more information about supported SQL functions for S3 Select, see SQL functions.
To select content from an object in the Amazon S3 console
-
Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
-
In the left navigation pane, choose Buckets.
-
Choose the bucket that contains the object that you want to select content from, and then choose the name of the object.
-
Choose
Object actions, and choose Query with S3 Select.
-
Configure Input settings, based on the format of your input data.
-
Configure Output settings, based on the format of the output that you want to receive.
-
To extract records from the chosen object, under SQL
query, enter the SELECT
SQL commands. For more information on how to write SQL
commands, see SQL reference for Amazon S3 Select.
-
After you enter SQL queries, choose Run SQL query. Then, under Query results, you
can see the results of your SQL queries.
You can use the AWS SDKs to select content from an object. However, if your
application requires it, you can send REST requests directly. For more
information about the request and response format, see SelectObjectContent.
You can use Amazon S3 Select to select some of the content of an object by using the
selectObjectContent
method. If this method is successful, it
returns the results of the SQL expression.
- Java
-
The following Java code returns the value of the first column for each record that
is stored in an object that contains data stored in CSV format. It also requests
Progress
and Stats
messages to be returned. You must
provide a valid bucket name and an object that contains data in CSV format.
For instructions on creating and testing a working sample, see Getting
Started in the AWS SDK for Java Developer Guide.
package com.amazonaws;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.CSVInput;
import com.amazonaws.services.s3.model.CSVOutput;
import com.amazonaws.services.s3.model.CompressionType;
import com.amazonaws.services.s3.model.ExpressionType;
import com.amazonaws.services.s3.model.InputSerialization;
import com.amazonaws.services.s3.model.OutputSerialization;
import com.amazonaws.services.s3.model.SelectObjectContentEvent;
import com.amazonaws.services.s3.model.SelectObjectContentEventVisitor;
import com.amazonaws.services.s3.model.SelectObjectContentRequest;
import com.amazonaws.services.s3.model.SelectObjectContentResult;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.concurrent.atomic.AtomicBoolean;
import static com.amazonaws.util.IOUtils.copy;
/**
* This example shows how to query data from S3Select and consume the response in the form of an
* InputStream of records and write it to a file.
*/
public class RecordInputStreamExample {
private static final String BUCKET_NAME = "${my-s3-bucket}";
private static final String CSV_OBJECT_KEY = "${my-csv-object-key}";
private static final String S3_SELECT_RESULTS_PATH = "${my-s3-select-results-path}";
private static final String QUERY = "select s._1 from S3Object s";
public static void main(String[] args) throws Exception {
final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
SelectObjectContentRequest request = generateBaseCSVRequest(BUCKET_NAME, CSV_OBJECT_KEY, QUERY);
final AtomicBoolean isResultComplete = new AtomicBoolean(false);
try (OutputStream fileOutputStream = new FileOutputStream(new File (S3_SELECT_RESULTS_PATH));
SelectObjectContentResult result = s3Client.selectObjectContent(request)) {
InputStream resultInputStream = result.getPayload().getRecordsInputStream(
new SelectObjectContentEventVisitor() {
@Override
public void visit(SelectObjectContentEvent.StatsEvent event)
{
System.out.println(
"Received Stats, Bytes Scanned: " + event.getDetails().getBytesScanned()
+ " Bytes Processed: " + event.getDetails().getBytesProcessed());
}
/*
* An End Event informs that the request has finished successfully.
*/
@Override
public void visit(SelectObjectContentEvent.EndEvent event)
{
isResultComplete.set(true);
System.out.println("Received End Event. Result is complete.");
}
}
);
copy(resultInputStream, fileOutputStream);
}
/*
* The End Event indicates all matching records have been transmitted.
* If the End Event is not received, the results may be incomplete.
*/
if (!isResultComplete.get()) {
throw new Exception("S3 Select request was incomplete as End Event was not received.");
}
}
private static SelectObjectContentRequest generateBaseCSVRequest(String bucket, String key, String query) {
SelectObjectContentRequest request = new SelectObjectContentRequest();
request.setBucketName(bucket);
request.setKey(key);
request.setExpression(query);
request.setExpressionType(ExpressionType.SQL);
InputSerialization inputSerialization = new InputSerialization();
inputSerialization.setCsv(new CSVInput());
inputSerialization.setCompressionType(CompressionType.NONE);
request.setInputSerialization(inputSerialization);
OutputSerialization outputSerialization = new OutputSerialization();
outputSerialization.setCsv(new CSVOutput());
request.setOutputSerialization(outputSerialization);
return request;
}
}
- JavaScript
-
For a JavaScript example that uses the AWS SDK for JavaScript with the S3
SelectObjectContent
API operation to select records
from JSON and CSV files that are stored in Amazon S3, see the blog post
Introducing support for Amazon S3 Select in the AWS SDK for JavaScript.
- Python
-
For a Python example of using SQL queries to search through data
that was loaded to Amazon S3 as a comma-separated value (CSV) file by
using S3 Select, see the blog post Querying data without servers or databases using Amazon S3
Select.