

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Streaming ingestion to a materialized view
<a name="materialized-view-streaming-ingestion"></a>

This topic describes how to use materialized views for fast access to streaming data.

 Streaming ingestion provides low-latency, high-speed data ingestion from [Amazon Kinesis Data Streams](https://aws.amazon.com//kinesis/data-streams/) or [Amazon Managed Streaming for Apache Kafka](https://aws.amazon.com//msk/) to an Amazon Redshift provisioned or Amazon Redshift Serverless database. The data lands in a Redshift materialized view that's configured for the purpose. This results in fast access to external data. Streaming ingestion lowers data-access time and reduces storage cost. You can configure it for your Amazon Redshift cluster or for your Amazon Redshift Serverless workgroup, using a small collection of SQL commands. After it's set up, each materialized-view refresh can ingest hundreds of megabytes of data per second. 

## How data flows from a streaming service to Redshift
<a name="materialized-view-streaming-ingestion-data-flow"></a>

 It helps to understand how streaming ingestion works and the database objects utilized in the process. Data flows directly from a data-stream provider to an Amazon Redshift provisioned cluster or to an Amazon Redshift Serverless workgroup. There isn't a temporary landing area, such as an Amazon S3 bucket. The provisioned cluster or workgroup is the stream consumer. In the Redshift database, the data read from the stream lands in a materialized view. The data is processed as it arrives. For instance, JSON values can be consumed and mapped to a materialized view's data columns, using SQL. When the materialized view is refreshed, Redshift consumes data from allocated Kinesis data shards or Kafka partitions until the view is brought up to date with the stream. 

 Use cases for Amazon Redshift streaming ingestion involve data that's generated continually and must be processed within a short period, or *latency*, from its origination. This is commonly called *near real-time* analytics. Sources can include IT devices, system-telemetry devices, and click-stream data from a busy website or application.

## Data parsing best practices for improving performance
<a name="materialized-view-streaming-recommendations"></a>

When you configure streaming ingestion, there are options in how you can parse the incoming data. Practices can include performing business logic or formatting as data arrives. We recommend the following best practices to avoid errors or data loss. These derived from internal testing and helping customers troublehoot configuration and parsing issues.
+ **Extracting values from streamed data** – If you use the [JSON\$1EXTRACT\$1PATH\$1TEXT](https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html) function in your materialized view definition to parse or *shred* streamed JSON, it can significantly impact performance and latency. To explain, for each column extracted using JSON\$1EXTRACT\$1PATH\$1TEXT, the incoming JSON is re-parsed. After this, data-type conversion, filtering, and business-logic calculations occur. This means, for example, that if you extract 10 columns from JSON data, each JSON record is parsed 10 times, which includes additional logic. This results in higher ingestion latency. An alternative approach we recommend is to use the [JSON\$1PARSE function](https://docs.aws.amazon.com/redshift/latest/dg/JSON_PARSE.html) to convert JSON records to Redshift's SUPER data type. After the streamed data lands in the materialized view, use PartiQL to extract individual strings from the SUPER representation of the JSON data. For more information, see [Querying semi-structured data](https://docs.aws.amazon.com/redshift/latest/dg/query-super.html).

   Additionally, note that JSON\$1EXTRACT\$1PATH\$1TEXT has a 64KB data-size maximum. Thus, if any JSON record is larger than 64KB, processing it with JSON\$1EXTRACT\$1PATH\$1TEXT results in an error. 
+  **Mapping an Amazon Kinesis Data Streams stream or Amazon MSK topic to multiple materialized views** – We don't recommend creating multiple materialized views to ingest data from a single stream or topic. This is because each materialized view creates a consumer for each shard in the Kinesis Data Streams stream or partition in the Kafka topic. This can result in throttling or exceeding the throughput of the stream or topic. It also can result in higher cost, because you ingest the same data multiple times. When you configure streaming ingestion, we recommend you create one materialized view for each stream or topic. 

  If your use case requires that you ingest data from one KDS stream or MSK topic into multiple materialized views, consult the [AWS Big Data blog](https://aws.amazon.com/blogs/big-data/), specifically [Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK](https://aws.amazon.com/blogs/big-data/best-practices-to-implement-near-real-time-analytics-using-amazon-redshift-streaming-ingestion-with-amazon-msk/), before you do so.

## Streaming ingestion behavior and data types
<a name="materialized-view-streaming-ingestion-limitations"></a>

The following table describes technical behavior details and size limits for various data types. We recommend being familiar with these prior to configuring a materialized view for streaming ingestion.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html)