Amazon S3

Real Time Streaming With Kafka and Flink - Lab 3 Transform and Write Data to S3 From Kafka Using Flink

November 16, 202316 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon S3 Apache Flink Apache Kafka AWS Kpow Pyflink Python

In this lab, we will create a Pyflink application that exports Kafka topic messages into a S3 bucket. The app enriches the records by adding a new column using a user defined function and writes them via the FileSystem SQL connector. This allows us to achieve a simpler architecture compared to the original lab where the records are sent into Amazon Kinesis Data Firehose, enriched by a separate Lambda function and written to a S3 bucket afterwards. While the records are being written to the S3 bucket, a Glue table will be created to query them on Amazon Athena.

November 9, 202315 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon S3 Apache Flink Apache Kafka AWS Docker Kpow Pyflink

In this lab, we will create a Pyflink application that reads records from S3 and sends them into a Kafka topic. A custom pipeline Jar file will be created as the Kafka cluster is authenticated by IAM, and it will be demonstrated how to execute the app in a Flink cluster deployed on Docker as well as locally as a typical Python app. We can assume the S3 data is static metadata that needs to be joined into another stream, and this exercise can be useful for data enrichment.

Real Time Streaming With Kafka and Flink - Lab 3 Transform and Write Data to S3 From Kafka Using Flink

Real Time Streaming With Kafka and Flink - Lab 2 Write Data to Kafka From S3 Using Flink