Amazon Athena Redis OSS connector
The Amazon Athena Redis OSS connector enables Amazon Athena to communicate with your Redis OSS instances so that you can query your Redis OSS data with SQL. You can use the AWS Glue Data Catalog to map your Redis OSS key-value pairs into virtual tables.
Unlike traditional relational data stores, Redis OSS does not have the concept of a table
or a column. Instead, Redis OSS offers key-value access patterns where the key is
essentially a string
and the value is a string
,
z-set
, or hmap
.
You can use the AWS Glue Data Catalog to create schema and configure virtual tables. Special table properties tell the Athena Redis OSS connector how to map your Redis OSS keys and values into a table. For more information, see Setting up databases and tables in AWS Glue later in this document.
If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.
The Amazon Athena Redis OSS connector supports Amazon MemoryDB and Amazon ElastiCache (Redis OSS).
Prerequisites
Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see Deploy a data source connector or Use the AWS Serverless Application Repository to deploy a data source connector.
Set up a VPC and a security group before you use this connector. For more information, see Create a VPC for a data source connector.
Parameters
Use the Lambda environment variables in this section to configure the Redis connector.
-
spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
-
spill_prefix – (Optional) Defaults to a subfolder in the specified
spill_bucket
calledathena-federation-spill
. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours. -
spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3
putObject
request that is used for spilling (for example,{"x-amz-server-side-encryption" : "AES256"}
). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference. -
kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like
a7e63k4b-8loc-40db-a2a1-4d0en2cd8331
, you can specify a KMS key ID. -
disable_spill_encryption – (Optional) When set to
True
, disables spill encryption. Defaults toFalse
so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption. -
glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.
Setting up databases and tables in AWS Glue
To enable an AWS Glue table for use with Redis OSS, you can set the following table
properties on the table: redis-endpoint
, redis-value-type
, and
either redis-keys-zset
or redis-key-prefix
.
In addition, any AWS Glue database that contains Redis OSS tables must have a
redis-db-flag
in the URI property of the database. To set the
redis-db-flag
URI property, use the AWS Glue console to edit the
database.
The following list describes the table properties.
-
redis-endpoint – (Required) The
hostname
:
port
:
password
of the Redis OSS server that contains data for this table (for example,athena-federation-demo.cache.amazonaws.com:6379
) Alternatively, you can store the endpoint, or part of the endpoint, in AWS Secrets Manager by using ${Secret_Name
} as the table property value.
Note
To use the Athena Federated Query feature with AWS Secrets Manager, the VPC connected to your
Lambda function should have internet
access
-
redis-keys-zset – (Required if
redis-key-prefix
is not used) A comma-separated list of keys whose value is a zset(for example, active-orders,pending-orders
). Each of the values in the zset is treated as a key that is part of the table. Either theredis-keys-zset
property or theredis-key-prefix
property must be set. -
redis-key-prefix – (Required if
redis-keys-zset
is not used) A comma separated list of key prefixes to scan for values in the table (for example,accounts-*,acct-
). Either theredis-key-prefix
property or theredis-keys-zset
property must be set. -
redis-value-type – (Required) Defines how the values for the keys defined by either
redis-key-prefix
orredis-keys-zset
are mapped to your table. A literal maps to a single column. A zset also maps to a single column, but each key can store many rows. A hash enables each key to be a row with multiple columns (for example, a hash, literal, or zset.) -
redis-ssl-flag – (Optional) When
True
, creates a Redis connection that uses SSL/TLS. The default isFalse
. -
redis-cluster-flag – (Optional) When
True
, enables support for clustered Redis instances. The default isFalse
. -
redis-db-number – (Optional) Applies only to standalone, non-clustered instances.) Set this number (for example 1, 2, or 3) to read from a non-default Redis database. The default is Redis logical database 0. This number does not refer to a database in Athena or AWS Glue, but to a Redis logical database. For more information, see SELECT index
in the Redis documentation.
Data types
The Redis OSS connector supports the following data types. Redis OSS streams are not supported.
All Redis OSS values are retrieved as the string
data type. Then they are
converted to one of the following Apache Arrow data types based on how your tables are
defined in the AWS Glue Data Catalog.
AWS Glue data type | Apache Arrow data type |
---|---|
int | INT |
string | VARCHAR |
bigint | BIGINT |
double | FLOAT8 |
float | FLOAT4 |
smallint | SMALLINT |
tinyint | TINYINT |
boolean | BIT |
binary | VARBINARY |
Required Permissions
For full details on the IAM policies that this
connector requires, review the Policies
section of the athena-redis.yaml
-
Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
-
Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
-
AWS Glue Data Catalog – The Redis connector requires read only access to the AWS Glue Data Catalog to obtain schema information.
-
CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.
-
AWS Secrets Manager read access – If you choose to store Redis endpoint details in Secrets Manager, you must grant the connector access to those secrets.
-
VPC access – The connector requires the ability to attach and detach interfaces to your VPC so that it can connect to it and communicate with your Redis instances.
Performance
The Athena Redis OSS connector attempts to parallelize queries against your Redis OSS instance according to the type of table that you have defined (for example, zset keys or prefix keys).
The Athena Redis connector performs predicate pushdown to decrease the data scanned by the query. However, queries containing a predicate against the primary key
fail with timeout. LIMIT
clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT
queries with a LIMIT
clause to scan at least 16 MB of data. The Redis connector is resilient to throttling due to concurrency.
Passthrough queries
The Redis connector supports passthrough queries. You can use this feature to run queries that use Lua script on Redis databases.
To create passthrough queries with Redis, use the following syntax:
SELECT * FROM TABLE( system.script( script => 'return redis.[call|pcall](
query_script
)', keys => '[key_pattern
]', argv => '[script_arguments
]' ))
The following example runs a Lua script to get the value at key
l:a
.
SELECT * FROM TABLE( system.script( script => 'return redis.call("GET", KEYS[1])', keys => '[l:a]', argv => '[]' ))
License information
The Amazon Athena Redis connector project is licensed under the Apache-2.0 License
Additional resources
For additional information about this connector, visit the corresponding site