Trino history and design - Amazon EMR

Trino history and design

Trino is specialized for querying large datasets from many different sources. Trino can access and query HDFS in a traditional big-data use case, but it can also query additional sources like relational databases and NoSQL databases. Trino originally started as a fork of the Presto query engine, in 2019. Since then, it has been developed independently from the Presto code base.

For more information about the Trino query engine and how it's used, see the Trino website. To read the Trino source documentation, see Trino Overview.

Architectural concepts

Trino can run quick and efficient queries because it processes data in parallel across a cluster. It's designed with querying a data lake in mind, as it's specialized for queries on large data volumes, typically in use cases involving Hadoop and HDFS. But it can also query traditional relational databases as well. For more information, see Architecture in the Trino Documentation.

Components of Trino

Trino has a few key architecture components that work together to make queries run fast. It helps to have a working knowledge of these when you fine tune your cluster for better performance:

The coordinator is responsible for query orchestration. It parses and optimizes incoming SQL queries, generates execution plans, assigns tasks to worker nodes, and collects and assembles query results. Additionally, it monitors resource usage and tracks the status of worker nodes. For more information, see Coordinator in the Trino documentation.
Worker nodes handle data processing for queries. After the coordinator assigns tasks, workers retrieve data, perform necessary operations, like joins and aggregations, and exchange intermediate data with other workers. For more information, see Worker in the Trino documentation.
Connectors are plugins that allow Trino to connect with and query various data sources. Each connector knows how to access and retrieve data from its source, such as Amazon S3, Apache Hive, or relational databases. These connectors map source data to Trino’s schema structure.
A catalog is a logical collection of schemas and tables associated with a specific connector. Defined in the coordinator, catalogs enable Trino to treat different data sources as a single namespace. This makes it so users can query multiple sources together, such as Hive and MySQL, in a unified way in the same query.
Clients such as the Trino CLI connect by means of JDBC and ODBC drivers to the Trino coordinator to submit SQL queries. The coordinator manages the query lifecycle, providing results to the client for further analysis or reporting.

Running queries

To understand how Trino takes SQL statements and runs them as queries, see Trino concepts in the Trino Documentation.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Trino

Getting started with Trino

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Trino history and design

Architectural concepts

Components of Trino

Running queries

Did this page help you?

Next topic:

Previous topic:

Need help?