Apache Kafka

Kafka Development on Kubernetes - Part 3 Kafka Connect

January 11, 20247 min read Apache Kafka Data Streaming Kafka Development on Kubernetes Apache Kafka Docker Kafka Connect Kubernetes Minikube Strimzi

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. In this post, we discuss how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data is ingested into Kafka topics using the MSK Data Generator. Also, we use the Confluent S3 sink connector to save the messages of the topics into a S3 bucket. The Kafka Connect servers and individual connectors are deployed using the custom resources of Strimzi on Kubernetes.

January 4, 20248 min read Apache Kafka Data Streaming Kafka Development on Kubernetes Apache Kafka Docker Kubernetes Minikube Python Strimzi

Apache Kafka has five core APIs, and we can develop applications to send/read streams of data to/from topics in a Kafka cluster using the producer and consumer APIs. While the main Kafka project maintains only the Java APIs, there are several open source projects that provide the Kafka client APIs in Python. In this post, we discuss how to develop Kafka client applications using the kafka-python package on Kubernetes.

December 21, 20237 min read Apache Kafka Data Streaming Kafka Development on Kubernetes Apache Kafka Docker Kubernetes Minikube Strimzi

Apache Kafka is one of the key technologies for implementing data streaming architectures. Strimzi provides a way to run an Apache Kafka cluster and related resources on Kubernetes in various deployment configurations. In this series of posts, we will discuss how to create a Kafka cluster, to develop Kafka client applications in Python and to build a data pipeline using Kafka connectors on Kubernetes.

December 14, 20236 min read Apache Kafka Data Streaming Real Time Streaming With Kafka and Flink Amazon MSK Apache Kafka AWS AWS Lambda Docker Docker Compose Python

Amazon MSK can be configured as an event source of a Lambda function. Lambda internally polls for new messages from the event source and then synchronously invokes the target Lambda function. With this feature, we can develop a Kafka consumer application in serverless environment where developers can focus on application logic. In this lab, we will discuss how to create a Kafka consumer using a Lambda function.

November 30, 20239 min read Apache Kafka Data Streaming Real Time Streaming With Kafka and Flink Amazon DynamoDB Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. In this lab, we will discuss how to create a data pipeline that ingests data from a Kafka topic into a DynamoDB table using the Camel DynamoDB sink connector.

November 16, 202316 min read Apache Flink Apache Kafka Data Streaming Real Time Streaming With Kafka and Flink Amazon Athena Amazon MSK Amazon S3 Apache Flink Apache Kafka AWS Docker Docker Compose Pyflink Python

In this lab, we will create a Pyflink application that exports Kafka topic messages into a S3 bucket. The app enriches the records by adding a new column using a user defined function and writes them via the FileSystem SQL connector. This allows us to achieve a simpler architecture compared to the original lab where the records are sent into Amazon Kinesis Data Firehose, enriched by a separate Lambda function and written to a S3 bucket afterwards. While the records are being written to the S3 bucket, a Glue table will be created to query them on Amazon Athena.

November 9, 202315 min read Apache Flink Apache Kafka Data Streaming Real Time Streaming With Kafka and Flink Amazon MSK Apache Flink Apache Kafka AWS Docker Docker Compose Pyflink Python

In this lab, we will create a Pyflink application that reads records from S3 and sends them into a Kafka topic. A custom pipeline Jar file will be created as the Kafka cluster is authenticated by IAM, and it will be demonstrated how to execute the app in a Flink cluster deployed on Docker as well as locally as a typical Python app. We can assume the S3 data is static metadata that needs to be joined into another stream, and this exercise can be useful for data enrichment.

October 30, 202318 min read Apache Kafka Data Streaming Kafka Connect for AWS Services Integration Amazon MSK Amazon OpenSearch Service Apache Kafka AWS Docker Docker Compose Kafka Connect MSK Connect OpenSearch

In the previous post, we discussed how to develop a data pipeline from Apache Kafka into OpenSearch locally using Docker. The pipeline will be deployed on AWS using Amazon MSK, Amazon MSK Connect and Amazon OpenSearch Service using Terraform in this post. First the infrastructure will be deployed that covers a VPC, VPN server, MSK Cluster and OpenSearch domain. Then Kafka source and sink connectors will be deployed on MSK Connect, followed by performing quick data analysis.

October 26, 202314 min read Apache Kafka Data Streaming Real Time Streaming With Kafka and Flink Amazon MSK Apache Kafka AWS AWS Lambda Docker Docker Compose Python

In this lab, we will create a Kafka producer application using AWS Lambda, which sends fake taxi ride data into a Kafka topic on Amazon MSK. A configurable number of the producer Lambda function will be invoked by an Amazon EventBridge schedule rule. In this way we are able to generate test data concurrently based on the desired volume of messages.

October 23, 202312 min read Apache Kafka Data Streaming Kafka Connect for AWS Services Integration Apache Kafka AWS Docker Docker Compose Kafka Connect OpenSearch

Kafka Connect can be an effective tool to ingest data from Apache Kafka into OpenSearch. In this post, we will discuss how to develop a data pipeline from Apache Kafka into OpenSearch locally using Docker while the pipeline will be deployed on AWS in the next post. Fake impressions and clicks data will be pushed into Kafka topics using a Kafka source connector and those records will be ingested into OpenSearch indexes using a sink connector for near-real time analytics.

Kafka Development on Kubernetes - Part 3 Kafka Connect

Kafka Development on Kubernetes - Part 2 Producer and Consumer

Kafka Development on Kubernetes - Part 1 Cluster Setup

Real Time Streaming With Kafka and Flink - Lab 6 Consume Data From Kafka Using Lambda

Real Time Streaming With Kafka and Flink - Lab 5 Write Data to DynamoDB Using Kafka Connect

Real Time Streaming With Kafka and Flink - Lab 3 Transform and Write Data to S3 From Kafka Using Flink

Real Time Streaming With Kafka and Flink - Lab 2 Write Data to Kafka From S3 Using Flink

Kafka Connect for AWS Services Integration - Part 5 Deploy Aiven OpenSearch Sink Connector

Real Time Streaming With Kafka and Flink - Lab 1 Produce Data to Kafka Using Lambda

Kafka Connect for AWS Services Integration - Part 4 Develop Aiven OpenSearch Sink Connector