Data Streaming

Real Time Streaming With Kafka and Flink - Introduction

October 5, 20236 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon Athena Amazon DyanmoDB Amazon MSK Amazon MSK Connect Amazon OpenSearch Service Amazon S3 Apache Camel Apache Flink Apache Kafka AWS AWS Glue AWS Lambda Docker Docker Compose Kafka Connect OpenSearch Pyflink Python

This series updates a real time analytics app based on Amazon Kinesis from an AWS workshop. Data is ingested from multiple sources into a Kafka cluster instead and Flink (Pyflink) apps are used extensively for data ingesting and processing. As an introduction, this post compares the original architecture with the new architecture, and the app will be implemented in subsequent posts.

September 14, 202317 min read Data Streaming Kafka, Flink and DynamoDB for Real Time Fraud Detection Amazon DynamoDB Amazon Managed Flink Amazon Managed Service for Apache Flink Amazon MSK Amazon MSK Connect Apache Flink Apache Kafka Fraud Detection Kafka Connect Pyflink Python

This series aims to help those who are new to Apache Flink and Amazon Managed Service for Apache Flink by re-implementing a simple fraud detection application that is discussed in an AWS workshop titled AWS Kafka and DynamoDB for real time fraud detection. In part 1, I demonstrated how to develop the application locally, and the app will be deployed via Amazon Managed Service for Apache Flink in this post.

September 4, 202313 min read Data Streaming Getting Started With Pyflink on AWS Amazon Managed Flink Amazon Managed Service for Apache Flink Amazon MSK Apache Flink Apache Kafka Docker Docker Compose Pyflink Python

In this series of posts, we discuss a Flink (Pyflink) application that reads/writes from/to Kafka topics. In the previous posts, I demonstrated a Pyflink app that targets a local Kafka cluster as well as a Kafka cluster on Amazon MSK. The app was executed in a virtual environment as well as in a local Flink cluster for improved monitoring. In this post, the app will be deployed via Amazon Managed Service for Apache Flink.

August 28, 202320 min read Data Streaming Getting Started With Pyflink on AWS Amazon Managed Flink Amazon Managed Service for Apache Flink Amazon MSK Apache Flink Apache Kafka Docker Docker Compose Pyflink Python

In this series of posts, we discuss a Flink (Pyflink) application that reads/writes from/to Kafka topics. In part 1, an app that targets a local Kafka cluster was created. In this post, we will update the app by connecting a Kafka cluster on Amazon MSK. The Kafka cluster is authenticated by IAM and the app has additional jar dependency. As Amazon Managed Service for Apache Flink does not allow you to specify multiple pipeline jar files, we have to build a custom Uber Jar that combines multiple jar files. Same as part 1, the app will be executed in a virtual environment as well as in a local Flink cluster for improved monitoring with the updated pipeline jar file.

August 17, 202316 min read Data Streaming Getting Started With Pyflink on AWS Amazon Managed Flink Amazon Managed Service for Apache Flink Amazon MSK Apache Flink Apache Kafka Docker Docker Compose Pyflink Python

Apache Flink is widely used for building real-time stream processing applications. On AWS, Amazon Managed Service for Apache Flink is the easiest option to develop a Flink app as it provides the underlying infrastructure. Updating a guide from AWS, this series of posts discuss how to develop and deploy a Flink (Pyflink) application via KDA where the data source and sink are Kafka topics. In part 1, the app will be developed locally targeting a Kafka cluster created by Docker. Furthermore, it will be executed in a virtual environment as well as in a local Flink cluster for improved monitoring.

August 10, 202316 min read Data Streaming Kafka, Flink and DynamoDB for Real Time Fraud Detection Amazon DynamoDB Apache Flink Apache Kafka Docker Docker Compose Fraud Detection Kafka Connect Pyflink Python

Apache Flink is widely used for building real-time stream processing applications. On AWS, Amazon Managed Service for Apache Flink is the easiest option to develop a Flink app as it provides the underlying infrastructure. Re-implementing a solution from an AWS workshop, this series of posts discuss how to develop and deploy a fraud detection app using Kafka, Flink and DynamoDB. Part 1 covers local development using Docker while deployment via KDA will be discussed in part 2.

July 3, 202314 min read Data Streaming Kafka Connect for AWS Services Integration Amazon DynamoDB Amazon MSK Amazon MSK Connect Apache Camel Apache Kafka AWS Kafka Connect

As part of investigating how to utilize Kafka Connect effectively for AWS services integration, I demonstrated how to develop the Camel DynamoDB sink connector using Docker in Part 2. Fake order data was generated using the MSK Data Generator source connector, and the sink connector was configured to consume the topic messages to ingest them into a DynamoDB table. In this post, I will illustrate how to deploy the data ingestion applications using Amazon MSK and MSK Connect.

June 4, 202313 min read Data Streaming Kafka Connect for AWS Services Integration Amazon DynamoDB Apache Camel Apache Kafka AWS Docker Docker Compose Kafka Connect

The suite of Apache Camel Kafka connectors and the Kinesis Kafka connector from the AWS Labs can be effective for building data ingestion pipelines that integrate AWS services. In this post, I will illustrate how to develop the Camel DynamoDB sink connector using Docker. Fake order data will be generated using the MSK Data Generator source connector, and the sink connector will be configured to consume the topic messages to ingest them into a DynamoDB table.

May 3, 20234 min read Data Streaming Kafka Connect for AWS Services Integration Apache Kafka AWS Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It can be used to build real-time data pipeline on AWS effectively. In this post, I will introduce available Kafka connectors mainly for AWS services integration. Also, developing and deploying some of them will be covered in later posts.

April 12, 202326 min read Data Streaming Amazon MSK Apache Kafka AWS AWS Glue Schema Registry AWS Lambda AWS Serverless Application Model Docker Docker Compose Python Terraform

Glue Schema Registry provides a centralized repository for managing and validating schemas for topic message data. Its features can be utilized by many AWS services when building data streaming applications. In this post, we will discuss how to integrate Python Kafka producer and consumer apps in AWS Lambda with the Glue Schema Registry.

Real Time Streaming With Kafka and Flink - Introduction

Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 2 Deployment via AWS Managed Flink

Getting Started With Pyflink on AWS - Part 3 AWS Managed Flink and MSK

Getting Started With Pyflink on AWS - Part 2 Local Flink and MSK

Getting Started With Pyflink on AWS - Part 1 Local Flink and Local Kafka

Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 1 Local Development

Kafka Connect for AWS Services Integration - Part 3 Deploy Camel DynamoDB Sink Connector

Kafka Connect for AWS Services Integration - Part 2 Develop Camel DynamoDB Sink Connector

Kafka Connect for AWS Services Integration - Part 1 Introduction

Integrate Glue Schema Registry With Your Python Kafka App