Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Trino history and design - Amazon EMR

Trino history and design

Trino is specialized for querying large datasets from many different sources. Trino can access and query HDFS in a traditional big-data use case, but it can also query additional sources like relational databases and NoSQL databases. Trino originally started as a fork of the Presto query engine, in 2019. Since then, it has been developed independently from the Presto code base.

For more information about the Trino query engine and how it's used, see the Trino website. To read the Trino source documentation, see Trino Overview.

Architectural concepts

Trino can run quick and efficient queries because it processes data in parallel across a cluster. It's designed with querying a data lake in mind, as it's specialized for queries on large data volumes, typically in use cases involving Hadoop and HDFS. But it can also query traditional relational databases as well. For more information, see Architecture in the Trino Documentation.

Components of Trino

Trino has a few key architecture components that work together to make queries run fast. It helps to have a working knowledge of these when you fine tune your cluster for better performance:

  • The coordinator is responsible for query orchestration. It parses and optimizes incoming SQL queries, generates execution plans, assigns tasks to worker nodes, and collects and assembles query results. Additionally, it monitors resource usage and tracks the status of worker nodes. For more information, see Coordinator in the Trino documentation.

  • Worker nodes handle data processing for queries. After the coordinator assigns tasks, workers retrieve data, perform necessary operations, like joins and aggregations, and exchange intermediate data with other workers. For more information, see Worker in the Trino documentation.

  • Connectors are plugins that allow Trino to connect with and query various data sources. Each connector knows how to access and retrieve data from its source, such as Amazon S3, Apache Hive, or relational databases. These connectors map source data to Trino’s schema structure.

  • A catalog is a logical collection of schemas and tables associated with a specific connector. Defined in the coordinator, catalogs enable Trino to treat different data sources as a single namespace. This makes it so users can query multiple sources together, such as Hive and MySQL, in a unified way in the same query.

  • Clients such as the Trino CLI connect by means of JDBC and ODBC drivers to the Trino coordinator to submit SQL queries. The coordinator manages the query lifecycle, providing results to the client for further analysis or reporting.

Running queries

To understand how Trino takes SQL statements and runs them as queries, see Trino concepts in the Trino Documentation.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.