Trino
Trino is an open-source query engine that's designed for interactive queries on a wide range of data sources. These can include relational databases, file-based data, HDFS data, and others. The most common purpose for Trino with Amazon EMR is to run complex SQL queries on large datasets stored in Amazon S3. It's also compliant with ANSI SQL, which makes it familiar to database engineers, data analysts, and data scientists who are familiar with SQL.
Note
PrestoSQL was renamed to Trino in December 2020. Amazon EMR versions 6.4.0 and later generally refer to Trino
Important
PrestoSQL, the previous version of Trino, is still available for use with Amazon EMR. However, we highly recommend Trino going forward for use with Amazon EMR. Also note that Trino and PrestoSQL can't run simultaneously on the same cluster.
The following table lists the version of Trino included in the latest release of Amazon EMR 7.x, along with components that Amazon EMR installs with Trino. For the version of components installed with Trino in this release, see Release 7.8.0 Component Versions.
Amazon EMR Release Label | Trino Version | Components Installed With Trino |
---|---|---|
emr-7.8.0 |
Trino 467 |
emrfs, emr-goodies, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hudi, hudi-trino, hcatalog-server, mariadb-server, trino-coordinator, trino-worker |