Data sharing overview - Amazon Redshift

Data sharing overview

With Amazon Redshift, you can securely share live data across Amazon Redshift clusters, simplifying data sharing workflows and reducing the need for complex extract, transform, and load (ETL) processes. Data sharing lets you share live data across Amazon Redshift clusters, enabling real-time access to the latest data without the need for copying or replicating the data. Database administrators, data analysts, and data engineers might be interested in this feature to streamline data access and collaboration within or across teams and organizations. It enables use cases such as sharing live production data with analytics teams, providing real-time reporting across distributed data sources, and simplifying data governance by centrally controlling access permissions. The following sections will cover the details of configuring and managing data sharing in Amazon Redshift.

With data sharing, you can securely and easily share live data across Amazon Redshift clusters.

For information about how to get started working with data sharing and manage datashares using the AWS Management Console, see Managing data sharing tasks.

Data sharing use cases for Amazon Redshift

Amazon Redshift data sharing is especially useful for these use cases:

  • Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance.

  • Enabling cross-group collaboration – Enable seamless collaboration across teams and business groups for broader analytics, data science, and cross-product impact analysis.

  • Delivering data as a service – Share data as a service across your organization.

  • Sharing data between environments – Share data among development, test, and production environments. You can improve team agility by sharing data at different levels of granularity.

  • Licensing access to data in Amazon Redshift – List Amazon Redshift data sets in the AWS Data Exchange catalog that customers can find, subscribe to, and query in minutes.

Data sharing write-access use cases (preview)

Datasharing for writes has several important use cases:

  • Update business source data on the producer – You can share data as a service across your organization, but then consumers can also perform actions on the source data. For instance, they can communicate back up-to-date values or acknowledge receipt of data. These are just a couple possible business use cases.

  • Insert additional records on the producer – Consumers can add records to the original source data. These can be marked as from the consumer, if needed.

For information specifically regarding how to perform write operations on a datashare, see Sharing write access to data (Preview).

Data sharing at different levels in Amazon Redshift

With Amazon Redshift, you can share data at different levels. These levels include databases, schemas, tables, views (including regular, late-binding, and materialized views), and SQL user-defined functions (UDFs). You can create multiple datashares for a given database. A datashare can contain objects from multiple schemas in the database on which sharing is created.

By having this flexibility in sharing data, you get fine-grained access control. You can tailor this control for different users and businesses that need access to Amazon Redshift data.

Consistency management for data sharing in Amazon Redshift

Amazon Redshift provides transactional consistency on all producer and consumer clusters and shares up-to-date and consistent views of the data with all consumers.

You can continuously update data on the producer cluster. All queries on a consumer cluster within a transaction read the same state of the shared data. Amazon Redshift doesn't consider the data that was changed by another transaction on the producer cluster that was committed after the beginning of the transaction on the consumer cluster. After the data change is committed on the producer cluster, new transactions on the consumer cluster can immediately query the updated data.

The strong consistency removes the risks of lower-fidelity business reports that might contain invalid results during sharing of data. This factor is especially important for financial analysis or where the results might be used to prepare datasets that are used to train machine learning models.