Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Key considerations while building streaming analytics - Build Modern Data Streaming Architectures on AWS

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Key considerations while building streaming analytics

When you are building a streaming data pipeline using modern data architecture to stream log and event data to power live dashboards and deliver data into data lakes, to build real-time analytics and event-driven applications and machine learning (ML), you must first understand the ideal usage patterns of AWS streaming data solutions, your user personas, and your specific use case so you can choose the right service for the job.

Choosing the right Kinesis service for your use case

The following table illustrates the ideal usage patterns of various Kinesis data streaming and processing services.

Table 1: Amazon Kinesis usage patterns

Kinesis Data Streams Firehose Managed Service for Apache Flink
Usage Capture stream log and event data, run real-time analytics, and build event-driven applications Load data streams into AWS data stores Analyze data streams with Managed Service for Apache Flink Studio and Apache Flink
Data sources Mobile apps, application logs, web clickstream/social, IoT sensors, connected products, smart buildings Connected devices such as consumer appliances, embedded sensors, TV set-top boxes, clickstream data, application logs Analyze streaming data from Kinesis Data Streams, Amazon MSK, Amazon MQ, custom connectors
Stream ingestion AWS SDKs, Kinesis Producer Library, AWS Mobile SDKs, Kinesis Agent, AWS IoT, Amazon CloudWatch Events, Amazon DynamoDB, AWS DMS AWS SDKs, Kinesis Producer Library, Kinesis Data Streams, Kinesis Agent, AWS IoT, Amazon CloudWatch Events Analyze streaming data from Kinesis Data Streams, Amazon MSK, Amazon MQ, custom connectors

Choosing the right streaming service for your use case

The following table Illustrates the comparison between Apache Kafka, Kinesis Data Streams, and Amazon MSK.

Table 2 — Streaming services

Attribute Apache Kafka Kinesis Streams MSK
Ease of use Advanced setup required Get started in minutes Get started in minutes
Management Overhead High Low Low (Amazon MSK Serverless) to Medium (Amazon MSK Provisioned)
Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click
Throughput Very large Scale with Kinesis Data Streams on-demand Very large
Infrastructure You manage AWS manages AWS manages
Open-sourced? Yes No Yes (managed service for Apache Kafka)
Data rentention You can retain data for longer duration, and it is configurable. You can retain data for up to 365 days. You can retain data for longer duration, and it is configurable. With the tiered storage feature of Amazon MSK, you can cost-efficiently store vast amounts of data in Amazon S3.
Latency Low Low (70ms with Enhance Fan Out) Lowest

Choosing the right streaming data processing technology

Streaming data processing technologies support many use cases that include event-driven applications, data analytics applications, and data pipeline applications. Commonly used frameworks include Apache Kafka Streams, Apache Flink, KSQL, and Managed Service for Apache Flink for Flink. Apache Kafka Streams, Apache Flink, and KSQL are open-source options, while Amazon Managed Service for Apache Flink offers a fully managed Apache Flink.

The following table Illustrates the comparison between Apache Kafka Streams, Managed Service for Apache Flink for Apache Flink, and Managed Service for Apache Flink SQL.

Table 3 — Comparison between data stream processing technologies

Feature Apache Kafka Streams Managed Service for Apache Flink for Apache Flink Kinesis Client Library Lambda
Open source Yes Based on open-source Apache Flink Based on Kinesis Client Library for Java open source No, based on proprietary engine
Sources Kafka only Kinesis Data Streams, Amazon MSK for Apache Kafka, DynamoDB, custom sources, RabbitMQ RabbitMQ Kinesis Data Streams Kinesis Data Streams, Firehose, Amazon MSK
Destination/sinks Kafka only; over 10 connectors supported with Kafka connect Amazon MSK for Kafka, Kinesis Data Streams, Firehose, Amazon S3 Apache Cassandra, Amazon DynamoDB, OpenSearch Service, custom sinks supported by open-source Flink Multi-stream processing Use AWS Lambda to respond to or adjust immediate occurences within the event-driven applications. AWS Lambdacan read records from Kinesis Data Streams and invokes your function.
Development languages Java and Scala Java, Scala, SQL, and Python Java; support for languages other than Java is provided using a multi-language interface called the MultiLangDaemon Java, .NET Core, Go, PowerShell, Node.js#, Python, Ruby; it supports multiple languages through the use of Lambda runtimes
Development process Develop on any integrated development environment (IDE) using Java or Scala. The application is separate from the Kafka broker and needs to be scaled independently. Develop on any IDE and build a JAR file. Create a Managed Service for Apache Flink Flink application and upload application JAR. The Kinesis Client Library (KCL) is a Java library. KCL helps you consume and process data from a Kinesis data stream by taking care of many of the complex tasks, such as load balancing across multiple consumer application instances, responding to consumer application instance failures, checkpointing processed records, and reacting to resharding. Develop on any IDE that is supported by respective programming language
Exactly once processing support Yes Yes Not built in Not built in
Per record processing latency Sub-second Sub-second Seconds Seconds
Batch support No Yes, supported by Flink No Yes, with Amazon EventBridge
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.