What is Amazon Kinesis?
Amazon Kinesis is a platform for streaming data on AWS. It allows you to collect, process, and analyze real-time, streaming data such as video, audio, application logs, website clickstreams, and IoT telemetry data.
Key Components
1. Kinesis Data Streams
- Ingest massive amounts of data in real-time.
- Shards: Throughput is defined by "shards". You manually manage shards (unless using On-Demand mode).
- Retention: Data is stored for 24 hours by default (up to 365 days extended).
2. Kinesis Data Firehose (Now Amazon Data Firehose)
- Load streaming data into data stores.
- Easiest way to capture, transform, and load data into Amazon S3, Redshift, OpenSearch, and Splunk.
- Near Real-time: Delivery is not instantaneous (buffer time of 60 seconds usually).
3. Kinesis Data Analytics (Now Managed Service for Apache Flink)
- Analyze streaming data with SQL or Apache Flink.
Exam Tips
[!IMPORTANT] Streams vs Firehose:
- Kinesis Data Streams: "I need to write custom code to process data in real-time." (Think: Custom Consumers).
- Kinesis Firehose: "I need to SAVE data to S3/Redshift/OpenSearch." (Think: Delivery service).
[!NOTE] Real-time: Kinesis is the go-to answer for "Real-time streaming data ingestion".
Common Use Cases
- Log and Event Data Collection: Collecting logs from servers and applications in real-time.
- Real-time Analytics: Calculation metrics like leaderboard scores or stock prices instantly.