Amazon Athena Redis OSS connector - Amazon Athena

Amazon Athena Redis OSS connector

The Amazon Athena Redis OSS connector enables Amazon Athena to communicate with your Redis OSS instances so that you can query your Redis OSS data with SQL. You can use the AWS Glue Data Catalog to map your Redis OSS key-value pairs into virtual tables.

Unlike traditional relational data stores, Redis OSS does not have the concept of a table or a column. Instead, Redis OSS offers key-value access patterns where the key is essentially a string and the value is a string, z-set, or hmap.

You can use the AWS Glue Data Catalog to create schema and configure virtual tables. Special table properties tell the Athena Redis OSS connector how to map your Redis OSS keys and values into a table. For more information, see Setting up databases and tables in AWS Glue later in this document.

If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.

The Amazon Athena Redis OSS connector supports Amazon MemoryDB and Amazon ElastiCache (Redis OSS).

Prerequisites

Parameters

Use the Lambda environment variables in this section to configure the Redis connector.

  • spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.

  • spill_prefix – (Optional) Defaults to a subfolder in the specified spill_bucket called athena-federation-spill. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours.

  • spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3 putObject request that is used for spilling (for example, {"x-amz-server-side-encryption" : "AES256"}). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference.

  • kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like a7e63k4b-8loc-40db-a2a1-4d0en2cd8331, you can specify a KMS key ID.

  • disable_spill_encryption – (Optional) When set to True, disables spill encryption. Defaults to False so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption.

  • glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.

Setting up databases and tables in AWS Glue

To enable an AWS Glue table for use with Redis OSS, you can set the following table properties on the table: redis-endpoint, redis-value-type, and either redis-keys-zset or redis-key-prefix.

In addition, any AWS Glue database that contains Redis OSS tables must have a redis-db-flag in the URI property of the database. To set the redis-db-flag URI property, use the AWS Glue console to edit the database.

The following list describes the table properties.

  • redis-endpoint – (Required) The hostname:port:password of the Redis OSS server that contains data for this table (for example, athena-federation-demo.cache.amazonaws.com:6379) Alternatively, you can store the endpoint, or part of the endpoint, in AWS Secrets Manager by using ${Secret_Name} as the table property value.

Note

To use the Athena Federated Query feature with AWS Secrets Manager, the VPC connected to your Lambda function should have internet access or a VPC endpoint to connect to Secrets Manager.

  • redis-keys-zset – (Required if redis-key-prefix is not used) A comma-separated list of keys whose value is a zset (for example, active-orders,pending-orders). Each of the values in the zset is treated as a key that is part of the table. Either the redis-keys-zset property or the redis-key-prefix property must be set.

  • redis-key-prefix – (Required if redis-keys-zset is not used) A comma separated list of key prefixes to scan for values in the table (for example, accounts-*,acct-). Either the redis-key-prefix property or the redis-keys-zset property must be set.

  • redis-value-type – (Required) Defines how the values for the keys defined by either redis-key-prefix or redis-keys-zset are mapped to your table. A literal maps to a single column. A zset also maps to a single column, but each key can store many rows. A hash enables each key to be a row with multiple columns (for example, a hash, literal, or zset.)

  • redis-ssl-flag – (Optional) When True, creates a Redis connection that uses SSL/TLS. The default is False.

  • redis-cluster-flag – (Optional) When True, enables support for clustered Redis instances. The default is False.

  • redis-db-number – (Optional) Applies only to standalone, non-clustered instances.) Set this number (for example 1, 2, or 3) to read from a non-default Redis database. The default is Redis logical database 0. This number does not refer to a database in Athena or AWS Glue, but to a Redis logical database. For more information, see SELECT index in the Redis documentation.

Data types

The Redis OSS connector supports the following data types. Redis OSS streams are not supported.

All Redis OSS values are retrieved as the string data type. Then they are converted to one of the following Apache Arrow data types based on how your tables are defined in the AWS Glue Data Catalog.

AWS Glue data type Apache Arrow data type
int INT
string VARCHAR
bigint BIGINT
double FLOAT8
float FLOAT4
smallint SMALLINT
tinyint TINYINT
boolean BIT
binary VARBINARY

Required Permissions

For full details on the IAM policies that this connector requires, review the Policies section of the athena-redis.yaml file. The following list summarizes the required permissions.

  • Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.

  • Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.

  • AWS Glue Data Catalog – The Redis connector requires read only access to the AWS Glue Data Catalog to obtain schema information.

  • CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.

  • AWS Secrets Manager read access – If you choose to store Redis endpoint details in Secrets Manager, you must grant the connector access to those secrets.

  • VPC access – The connector requires the ability to attach and detach interfaces to your VPC so that it can connect to it and communicate with your Redis instances.

Performance

The Athena Redis OSS connector attempts to parallelize queries against your Redis OSS instance according to the type of table that you have defined (for example, zset keys or prefix keys).

The Athena Redis connector performs predicate pushdown to decrease the data scanned by the query. However, queries containing a predicate against the primary key fail with timeout. LIMIT clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT queries with a LIMIT clause to scan at least 16 MB of data. The Redis connector is resilient to throttling due to concurrency.

Passthrough queries

The Redis connector supports passthrough queries. You can use this feature to run queries that use Lua script on Redis databases.

To create passthrough queries with Redis, use the following syntax:

SELECT * FROM TABLE( system.script( script => 'return redis.[call|pcall](query_script)', keys => '[key_pattern]', argv => '[script_arguments]' ))

The following example runs a Lua script to get the value at key l:a.

SELECT * FROM TABLE( system.script( script => 'return redis.call("GET", KEYS[1])', keys => '[l:a]', argv => '[]' ))

License information

The Amazon Athena Redis connector project is licensed under the Apache-2.0 License.

Additional resources

For additional information about this connector, visit the corresponding site on GitHub.com.