Caching query results in Amazon Neptune Gremlin
Starting in engine release 1.0.5.1, Amazon Neptune supports a results cache for Gremlin queries.
You can enable the query results cache and then use a query hint to cache the results of a Gremlin read-only query.
Any re-run of the query then retrieves the cached results with low latency and no I/O costs, as long as they are still in the cache. This works for queries submitted both on an HTTP endpoint and using Websockets, either as byte-code or in string form.
Note
Queries sent to the profile endpoint are not cached even when the query cache is enabled.
You can control how the Neptune query results cache behaves in several ways. For example:
You can get cached results paginated, in blocks.
You can specify the time-to-live (TTL) for specified queries.
You can clear the cache for specified queries.
You can clear the entire cache.
You can set up to be notified if results exceed the cache size.
The cache is maintained using a least-recently-used (LRU) policy, meaning that once the space allotted to the cache is full, the least-recently-used results are removed to make room when new results are being cached.
Important
The query-results cache is not available on t3.medium
or
t4.medium
instance types.
Enabling the query results cache in Neptune
To enable the query results cache in Neptune, use the console to set the
neptune_result_cache
DB instance parameter to 1
(enabled).
Once the results cache is enabled, Neptune sets aside a portion of current memory for caching query results. The larger the instance type you're using and the more memory is available, the more memory Neptune sets aside for the cache.
If the results cache memory fills up, Neptune automatically drops least-recently-used (LRU) cached results to make way for new ones.
You can check the current status of the results cache using the Instance Status command.
Using hints to cache query results
Once the query results cache is enabled, you use query hints to control query caching. All the examples below apply to the same query traversal, namely:
g.V().has('genre','drama').in('likes')
Using enableResultCache
With the query results cache enabled, you can cache the results of a Gremlin
query using the enableResultCache
query hint, as follows:
g.with('Neptune#enableResultCache', true) .V().has('genre','drama').in('likes')
Neptune then returns the query results to you, and also caches them. Later, you can access the cached results by issuing exactly the same query again:
g.with('Neptune#enableResultCache', true) .V().has('genre','drama').in('likes')
The cache key that identifies the cached results is the query string itself, namely:
g.V().has('genre','drama').in('likes')
Using enableResultCacheWithTTL
You can specify how long the query results should be cached for by using the
enableResultCacheWithTTL
query hint. For example, the following query
specifies that the query results should expire after 120 seconds:
g.with('Neptune#enableResultCacheWithTTL', 120) .V().has('genre','drama').in('likes')
Again, the cache key that identifies the cached results is the base query string:
g.V().has('genre','drama').in('likes')
And again, you can access the cached results using that query string with the
enableResultCache
query hint:
g.with('Neptune#enableResultCache', true) .V().has('genre','drama').in('likes')
If 120 or more seconds have passed since the results were cached, that query will return new results, and cache them, without any time-to-live.
You can also access the cached results by issuing the same query again with
the enableResultCacheWithTTL
query hint. For example:
g.with('Neptune#enableResultCacheWithTTL', 140) .V().has('genre','drama').in('likes')
Until 120 seconds have passed (that is, the TTL currently in effect),
this new query using the enableResultCacheWithTTL
query hint
returns the cached results. After 120 seconds, it would return new results
and cache them with a time-to-live of 140 seconds.
Note
If results for a query key are already cached, then the same query key with
enableResultCacheWithTTL
does not generate new results and has no
effect on the time-to-live of the currently cached results.
If results were previously cached using
enableResultCache
, the cache must first be cleared beforeenableResultCacheWithTTL
generates new results and caches them for the TTL that it specifies.If results were previously cached using
enableResultCachewithTTL
, that previous TTL must first expire beforeenableResultCacheWithTTL
generates new results and caches them for the TTL that it specifies.
Using invalidateResultCacheKey
You can use the invalidateResultCacheKey
query hint to clear
cached results for one particular query. For example:
g.with('Neptune#invalidateResultCacheKey', true) .V().has('genre','drama').in('likes')
That query clears the cache for the query key, g.V().has('genre','drama').in('likes')
,
and returns new results for that query.
You can also combine invalidateResultCacheKey
with
enableResultCache
or enableResultCacheWithTTL
.
For example, the following query clears the current cached results,
caches new results, and returns them:
g.with('Neptune#enableResultCache', true) .with('Neptune#invalidateResultCacheKey', true) .V().has('genre','drama').in('likes')
Using invalidateResultCache
You can use the invalidateResultCache
query hint to clear
all cached results in the query result cache. For example:
g.with('Neptune#invalidateResultCache', true) .V().has('genre','drama').in('likes')
That query clears the entire result cache and returns new results for the query.
You can also combine invalidateResultCache
with
enableResultCache
or enableResultCacheWithTTL
.
For example, the following query clears the entire results cache,
caches new results for this query, and returns them:
g.with('Neptune#enableResultCache', true) .with('Neptune#invalidateResultCache', true) .V().has('genre','drama').in('likes')
Paginating cached query results
Suppose you have already cached a large number of results like this:
g.with('Neptune#enableResultCache', true) .V().has('genre','drama').in('likes')
Now suppose you issue the following range query:
g.with('Neptune#enableResultCache', true) .V().has('genre','drama').in('likes').range(0,10)
Neptune first looks for the full cache key, namely
g.V().has('genre','drama').in('likes').range(0,10)
.
If that key doesn't exist, Neptune next looks to see if there is a key for that
query string without the range (namely g.V().has('genre','drama').in('likes')
).
When it finds that key, Neptune then fetches the first ten results from its cache,
as the range specifies.
Note
If you use the invalidateResultCacheKey
query hint
with a query that has a range at the end, Neptune clears the cache for
a query without the range if it doesn't find an exact match for the query
with the range.
Using numResultsCached
with .iterate()
Using the numResultsCached
query hint, you can populate
the results cache without returning all the results being cached, which can
be useful when you prefer to paginate a large number of results.
The numResultsCached
query hint only works with queries
that end with iterate()
.
For example, if you want to cache the first 50 results of the sample query:
g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 50) .V().has('genre','drama').in('likes').iterate()
In this case the query key in the cache is:
g.with("Neptune#numResultsCached", 50).V().has('genre','drama').in('likes')
.
You can now retrieve the first ten of the cached results with this query:
g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 50) .V().has('genre','drama').in('likes').range(0, 10)
And, you can retrieve the next ten results from the query as follows:
g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 50) .V().has('genre','drama').in('likes').range(10, 20)
Don't forget to include the numResultsCached
hint! It is
an essential part of the query key and must therefore be present in order
to access the cached results.
Some things to keep in mind when using numResultsCached
-
The number you supply with
numResultsCached
is applied at the end of the query. This means, for example, that the following query actually caches results in the range(1000, 1500)
:g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 500) .V().range(1000, 2000).iterate()
-
The number you supply with
numResultsCached
specifies the maximum number of results to cache. This means, for example, that the following query actually caches results in the range(1000, 2000)
:g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 100000) .V().range(1000, 2000).iterate()
-
Results cached by queries that end with
.range().iterate()
have their own range. For example, suppose you cache results using a query like this:g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 500) .V().range(1000, 2000).iterate()
To retrieve the first 100 results from the cache, you would write a query like this:
g.with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 500) .V().range(1000, 2000).range(0, 100)
Those hundred results would be equivalent to results from the base query in the range
(1000, 1100)
.
The query cache keys used to locate cached results
After the results of a query have been cached, subsequent queries with the same query cache key retrieve results from the cache rather than generating new ones. The query cache key of a query is evaluated as follows:
All the cache-related query hints are ignored, except for
numResultsCached
.A final
iterate()
step is ignored.The rest of the query is ordered according to its byte-code representation.
The resulting string is matched against an index of the query results already in the cache to determine whether there is a cache hit for the query.
For example, take this query:
g.withSideEffect('Neptune#typePromotion', false).with("Neptune#enableResultCache", true) .with("Neptune#numResultsCached", 50) .V().has('genre','drama').in('likes').iterate()
It will be stored as the byte-code version of this:
g.withSideEffect('Neptune#typePromotion', false) .with("Neptune#numResultsCached", 50) .V().has('genre','drama').in('likes')
Exceptions related to the results cache
If the results of a query that you are trying to cache are too large to fit
in the cache memory even after removing everything previously cached, Neptune raises
a QueryLimitExceededException
fault. No results are returned, and the
exception generates the following error message:
The result size is larger than the allocated cache, please refer to results cache best practices for options to rerun the query.
You can supress this message using the noCacheExceptions
query
hint, as follows:
g.with('Neptune#enableResultCache', true) .with('Neptune#noCacheExceptions', true) .V().has('genre','drama').in('likes')