Considerations for data sharing in Amazon Redshift - Amazon Redshift

Considerations for data sharing in Amazon Redshift

The following are considerations for working with Amazon Redshift data sharing.

  • Cross-region data sharing includes additional cross-region data-transfer charges. These data-transfer charges don't apply within the same region, only across regions. For more information, see Managing cost control for cross-Region data sharing.

  • When you read data from a datashare, you remain connected to your local cluster database. For more information about setting up and reading from a database created from a datashare, see Querying datashare objects and Materialized views on external data lake tables in Amazon Redshift Spectrum.

  • The consumer is charged for all compute and cross-region data transfer fees required to query the producer's data. The producer is charged for the underlying storage of data in their provisioned cluster or serverless namespace.

  • The performance of the queries on shared data depends on the compute capacity of the consumer clusters.

Cluster encryption management for data sharing

To share data across AWS account, both the producer and consumer clusters must be encrypted.

In Amazon Redshift, you can turn on database encryption for your clusters to help protect data at rest. When you turn on encryption for a cluster, the data blocks and system metadata are encrypted for the cluster and its snapshots. You can turn on encryption when you launch your cluster, or you can modify an unencrypted cluster to use AWS Key Management Service (AWS KMS) encryption.

For more information about Amazon Redshift database encryption, see Amazon Redshift database encryption in the Amazon Redshift Management Guide.

To protect data in transit, all data is encrypted in transit through the encryption schema of the producer cluster. The consumer cluster adopts this encryption schema when data is loaded. The consumer cluster then operates as a normal encrypted cluster. Communications between the producer and consumer are also encrypted using a shared key schema. For more information about encryption in transit, Encryption in transit.

Data sharing within and between clusters

You only need datashares when you are sharing data between different Amazon Redshift provisioned clusters or serverless workgroups. Within the same cluster, you can query another database using simple three-part notation database.schema.table as long as you have the required permissions on the objects in the other database.

Comparing read-only and multi-warehouse writes data sharing

Previously, objects in datashares were read only in all circumstances. Writing to an object in a datashare is a new feature. Objects in datashares are only write-enabled when a producer specifically grants write privileges like INSERT or CREATE on objects to the datashare. Additionally, for cross-account sharing, a producer has to authorize the datashare for writes and the consumer has to associate specific clusters and workgroups for writes.

Limitations for data sharing reads

The following are limitations when working with datashare reads in Amazon Redshift:

  • Data sharing is supported for all provisioned RA3 cluster types and Amazon Redshift Serverless. It isn't supported for other cluster types.

  • If both the producer and consumer clusters and serverless namespaces are in the same account, they must have the same encryption type (either both unencrypted, or both encrypted). In every other case, including Lake Formation managed datashares, both the consumer and producer must be encrypted. This is for security purposes. However, they don't need to share the same encryption key.

  • You can only share SQL UDFs through datashares. Python and Lambda UDFs aren't supported.

  • If the producer database has specific collation, use the same collation settings for the consumer database.

  • Amazon Redshift doesn't support nested SQL user-defined functions on producer clusters.

  • Amazon Redshift doesn't support sharing tables with interleaved sort keys and views that refer to tables with interleaved sort keys.

  • Consumers can't add datashare objects to another datashare. Additionally, consumers can't add views referencing datashare objects to another datashare.

  • Amazon Redshift doesn't support accessing a datashare object which had a concurrent DDL occur between the Prepare and Execute of the access.

  • Amazon Redshift doesn't support sharing stored procedures through datashares.

  • Amazon Redshift doesn't support sharing metadata system views and system tables.

Limitations for multi-warehouse writes

Note

Amazon Redshift multi-warehouse writes using data sharing is only supported on Amazon Redshift patch 186 for provisioned clusters on current track version 1.0.78881 or greater, and for Amazon Redshift Serverless workgroups on version 1.0.78890 or greater.

The following are limitations when working with datashare writes in Amazon Redshift:

  • Connections – You must be connected directly to a datashare database or run the USE command to write to datashares. You can also use three-part notation. The USE command is not supported on external tables.

  • Compute type – You must use Serverless workgroups, ra3.xlplus clusters, ra3.4xl clusters, or ra3.16xl clusters to use this feature.

  • Metadata Discovery – When you're a consumer connected directly to a datashare database through the Redshift JDBC, ODBC, or Python drivers, you can view catalog data in the following ways:

  • Permissions visibility – Consumers can see the permissions granted to the datashares through the SHOW GRANTS SQL command.

  • Encryption – For cross-account data sharing, both the producer and consumer cluster must be encrypted.

  • Isolation level – Your database’s isolation level must be snapshot isolation in order to allow other Serverless workgroups and provisioned clusters to write to it.

  • Auto operations – Consumers writing to datashare objects will not trigger an auto analyze operation. As a result, the producer must manually run analyze after data is inserted into the table to have table statistics updated. Without this, query plans may not be optimal.

  • Multi-statement queries and transactions – Multi-statement queries outside of a transaction block aren't currently supported. As a result, if you are using a query editor like dbeaver and you have multiple write queries, you need to wrap your queries in an explicit BEGIN...END transaction statement.

    When multi-command statements are used outside of transactions, if the first command is a write to a producer database, subsequent write commands in the statement are only allowed to that producer database. If the first command is a read, subsequent write commands are only allowed to the used database, if set, otherwise to the local database. Note that the writes in a transaction are only supported to a single database.

  • Consumer sizing – Consumer clusters must have at least 64 slices or more to perform writes using data sharing.

  • Views and materialized views – You can't create, update, or alter views or materialized views on a datashare database.

  • Security – You can't attach or remove security policies such as column-level (CLS), row-level (RLS) and dynamic data masking (DDM) to datashare objects.

  • Manageability – Consumers warehouses can't add datashare objects or views referencing datashare objects to another datashare. Consumers also can't modify or drop an existing datashare.

Limitations for data lake tables

The following are limitations when working with data lake tables in Amazon Redshift:

Permissions you can grant to datashares

Different object types and various permissions you can grant to them in a data sharing context.

Schemas:

  • USAGE

  • CREATE

Tables:

  • SELECT

  • INSERT

  • UPDATE

  • DELETE

  • TRUNCATE

  • DROP

  • REFERENCES

Functions:

  • EXECUTE

Databases:

  • CREATE

Supported SQL statements

These statements are supported for data sharing with writes:

  • ALTER TABLE RENAME { TO | COLUMN TO }

  • ALTER SCHEMA RENAME

  • BEGIN | START TRANSACTION

  • END | COMMIT | ROLLBACK

  • COPY without COMPUPDATE

  • { CREATE | DROP } SCHEMA

  • { CREATE | DROP | SHOW } TABLE

  • CREATE TABLE table_name AS

  • DELETE

  • { GRANT | REVOKE } privilege_name ON OBJECT_TYPE object_name TO consumer_user

  • INSERT

  • SELECT

  • INSERT INTO SELECT

  • SHOW GRANTS. For more information, see SHOW GRANTS.

  • TRUNCATE

  • UPDATE

  • Super data type columns

Unsupported statement types – The following aren't supported:

  • Multi-statement queries to consumer warehouses when writing to producers.

  • Multi-statement queries to consumer warehouses in a different database, if the previous command is a read statement.

  • Object references other than three-dot notations, such as one.dot or two.dot notations, if not connected to shared database.

  • Concurrency scaling queries writing from consumers to producers.

  • Auto-copy jobs writing from consumers to producers.

  • Streaming jobs writing from consumers to producers.

  • Consumers creating zero-ETL integration tables on producer clusters. For more information about zero-ETL integrations, see Working with zero-ETL integrations.

  • Writing to a table with an interleaved sort key through a data share.

  • Writing to a stored procedure through a data share.

  • Writing to a SQL user-defined functions (UDF) through a data share. These include nested, Python, and Lambda UDFs.

  • UPDATE, INSERT, or COPY statements on identity columns to consumer warehouses with more compute slices than producer.

  • MERGE statements on non-RMS external tables to consumer warehouses, when writing to producers.

  • CREATE TABLE statements with:

    • DEFAULT expression set to data type VARBYTE. The VARBYTE data type can't be implicitly converted to any other data type. For more information, see CAST function.

    • AS OF SELECT statements with NULL parameter to consumer warehouses, when writing to producers.

    • LIKE parameter to consumer warehouses, when writing to producers.