Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Query large datasets using an AWS Glue crawler

Focus mode
Query large datasets using an AWS Glue crawler - AWS Step Functions

This sample project demonstrates how to ingest a large data set in Amazon S3 and partition it through AWS Glue Crawlers, then execute Amazon Athena queries against that partition.

In this project, the Step Functions state machine invokes an AWS Glue crawler that partitions a large dataset in Amazon S3. Once the AWS Glue crawler returns a success message, the workflow executes Athena queries against that partition. Once query execution is successfully complete, an Amazon SNS notification is sent to an Amazon SNS topic.

Step 1: Create the state machine

  1. Open the Step Functions console and choose Create state machine.

  2. Choose Create from template and find the related starter template. Choose Next to continue.

  3. Choose how to use the template:

    1. Run a demo – creates a read-only state machine. After review, you can create the workflow and all related resources.

    2. Build on it – provides an editable workflow definition that you can review, customize, and deploy with your own resources. (Related resources, such as functions or queues, will not be created automatically.)

  4. Choose Use template to continue with your selection.

    Note

    Standard charges apply for services deployed to your account.

Step 2: Run the demo state machine

If you chose the Run a demo option, all related resources will be deployed and ready to run. If you chose the Build on it option, you might need to set placeholder values and create additional resources before you can run your custom workflow.

  1. Choose Deploy and run.

  2. Wait for the AWS CloudFormation stack to deploy. This can take up to 10 minutes.

  3. After the Start execution option appears, review the Input and choose Start execution.

Congratulations!

You should now have a running demo of your state machine. You can choose states in the Graph view to review input, output, variables, definition, and events.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.