Optimizing query performance
Important
End of support notice: Existing customers will be able to use Amazon QLDB until end of support on 07/31/2025. For more details, see
Migrate an Amazon QLDB Ledger to Amazon Aurora PostgreSQL
Amazon QLDB is intended to address the needs of high-performance online transaction processing (OLTP) workloads. This means that QLDB is optimized for a specific set of query patterns, even though it supports SQL-like query capabilities. It's critical to design applications and their data models to work with these query patterns. Otherwise, as your tables grow, you will encounter significant performance problems, including query latency, transaction timeouts, and concurrency conflicts.
This section describes query constraints in QLDB and provides guidance for writing optimal queries given these constraints.
Topics
Transaction timeout limit
In QLDB, every PartiQL statement (including every SELECT
query) is
processed in a transaction and is subject to a transaction timeout limit.
A transaction can run for up to 30 seconds before
being committed. After this limit, QLDB rejects any work done on the transaction
and discards the session that ran the
transaction. This limit protects the service's client from leaking sessions by
starting transactions and not committing or canceling them.
Concurrency conflicts
QLDB implements concurrency control by using optimistic concurrency control (OCC). Suboptimal queries can also lead to more OCC conflicts. For information about OCC, see Amazon QLDB concurrency model.
Optimal query patterns
As a best practice, you should run statements with a WHERE
predicate
clause that filters on an indexed field or a document ID. QLDB requires an
equality operator (=
or IN
) on an
indexed field to efficiently look up a document.
The following are examples of optimal query patterns in the user view.
--Indexed field (VIN) lookup using the = operator SELECT * FROM VehicleRegistration WHERE VIN = '1N4AL11D75C109151' --Indexed field (VIN) AND non-indexed field (City) lookup SELECT * FROM VehicleRegistration WHERE VIN = '1N4AL11D75C109151' AND City = 'Seattle' --Indexed field (VIN) lookup using the IN operator SELECT * FROM VehicleRegistration WHERE VIN IN ('1N4AL11D75C109151', 'KM8SRDHF6EU074761') --Document ID (r_id) lookup using the BY clause SELECT * FROM VehicleRegistration BY r_id WHERE r_id = '
3Qv67yjXEwB9SjmvkuG6Cp
'
Any query that doesn't follow these patterns invokes a full table scan. Table scans can cause transaction timeouts for queries on large tables or queries that return large result sets. They can also lead to OCC conflicts with competing transactions.
High-cardinality indexes
We recommend indexing fields that contain high-cardinality values. For example,
the VIN
and LicensePlateNumber
fields in the
VehicleRegistration
table are indexed fields that are intended to
be unique.
Avoid indexing low-cardinality fields such as status codes, address states or provinces, and postal codes. If you index such a field, your queries can produce large result sets that are more likely to result in transaction timeouts or cause unintended OCC conflicts.
Committed view queries
Queries that you run in the committed view follow the same optimization guidelines as user view queries. Indexes that you create on a table are also used for queries in the committed view.
History function queries
History function queries don't use the indexes that you create on a table. QLDB history is indexed by document ID only, and you can't create additional history indexes at this time.
As a best practice, qualify a history query with both a date range
(start time and end time) and a
document ID (metadata.id
). History queries that include a start time
and end time gain the benefit of date range qualification.
Inner join queries
For inner join queries, use join criteria that includes at least an indexed field
for the table on the right side of the join. Without a join index, a join query
invokes multiple table scans—for every document in the left table of the join,
the query fully scans the right table. The best practice is to join on fields that
are indexed for each table that you're joining, in addition to specifying a
WHERE
equality predicate for at least one table.
For example, the following query joins the VehicleRegistration
and
Vehicle
tables on their respective VIN
fields, which
are both indexed. This query also has an equality predicate on
VehicleRegistration.VIN
.
SELECT * FROM VehicleRegistration AS r INNER JOIN Vehicle AS v ON r.VIN = v.VIN WHERE r.VIN IN ('1N4AL11D75C109151', 'KM8SRDHF6EU074761')
Choose high-cardinality indexes for both the join criteria and the equality predicates in your join queries.
Query patterns to avoid
The following are some examples of suboptimal statements that don't scale well for larger tables in QLDB. We strongly recommend that you don't rely on these types of queries for tables that grow over time because your queries will eventually result in transaction timeouts. Because tables contain documents that vary in size, it's difficult to define precise limits for non-indexed queries.
--No predicate clause
SELECT * FROM Vehicle
--COUNT() is not an optimized function
SELECT COUNT(*) FROM Vehicle
--Low-cardinality predicate
SELECT * FROM Vehicle WHERE Color = 'Silver'
--Inequality (>) does not qualify for indexed lookup
SELECT * FROM Vehicle WHERE "Year" > 2019
--Inequality (LIKE)
SELECT * FROM Vehicle WHERE VIN LIKE '1N4AL%'
--Inequality (BETWEEN)
SELECT SUM(PendingPenaltyTicketAmount) FROM VehicleRegistration
WHERE ValidToDate BETWEEN `2020-01-01T` AND `2020-07-01T`
--No predicate clause
DELETE FROM Vehicle
--No document id, and no date range for the history() function
SELECT * FROM history(Vehicle)
In general, we don't recommend running the following types of query patterns for production use cases in QLDB:
-
Online analytical processing (OLAP) queries
-
Exploratory queries without a predicate clause
-
Reporting queries
-
Text search
Instead, we recommend streaming your data to a purpose-built database service that
is optimized for analytical use cases. For example, you can stream QLDB data to
Amazon OpenSearch Service to provide full text search capabilities over documents. For a
sample application that demonstrates this use case, see the GitHub repository aws-samples/amazon-qldb-streaming-amazon-opensearch-service-sample-python
Monitoring performance
The QLDB driver provides consumed I/O usage and timing information in the result object of a statement. You can use these metrics to identify inefficient PartiQL statements. To learn more, proceed to Getting PartiQL statement statistics.
You can also use Amazon CloudWatch to track your ledger's performance for data operations.
Monitor the CommandLatency
metric for a specified
LedgerName
and CommandType
. For more information, see
Monitoring with Amazon CloudWatch.
To learn how QLDB uses commands to manage data operations, see Session management with the
driver.