Apache Kafka

Kafka Development With Docker - Part 3 Kafka Connect

May 25, 20239 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. In this post, I will illustrate how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data will be ingested into the corresponding topics using the MSK Data Generator source connector. The topic messages will then be saved into a S3 bucket using the Confluent S3 sink connector.

May 18, 20238 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose

A Kafka management app can be a good companion for development, which helps monitor and manage resources on an easy-to-use user interface. An app can be more useful if it supports features that are desirable for Kafka development on AWS. Those features cover IAM access control and integration with MSK Connect and Glue Schema Registry. In this post, I'll introduce several management apps that meet those requirements.

May 11, 20233 min read General Apache Kafka Certification Confluent

I recently obtained the Confluent Certified Developer for Apache Kafka (CCDAK) certification. It focuses on knowledge of developing applications that work with Kafka, and is targeted to developers and solutions architects. As it assumes Java APIs for development and testing, I am contacted to share how I prepared for it as a non-Java developer from time to time. I thought it would be better to write a post to summarise how I did it rather than answering to them individually.

May 4, 20239 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose

Apache Kafka is one of the key technologies for modern data streaming architectures on AWS. Developing and testing Kafka-related applications can be easier using Docker and Docker Compose. In this series of posts, I will demonstrate reference implementations of those applications in Dockerized environments.

May 3, 20234 min read Data Streaming Kafka Connect for AWS Services Integration Apache Kafka AWS Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It can be used to build real-time data pipeline on AWS effectively. In this post, I will introduce available Kafka connectors mainly for AWS services integration. Also, developing and deploying some of them will be covered in later posts.

April 12, 202326 min read Data Streaming Amazon MSK Apache Kafka AWS AWS Glue Schema Registry AWS Lambda AWS Serverless Application Model Docker Docker Compose Python Terraform

Glue Schema Registry provides a centralized repository for managing and validating schemas for topic message data. Its features can be utilized by many AWS services when building data streaming applications. In this post, we will discuss how to integrate Python Kafka producer and consumer apps in AWS Lambda with the Glue Schema Registry.

March 14, 202312 min read Data Streaming Simplify Streaming Ingestion on AWS Amazon Athena Amazon EventBridge Amazon MSK Apache Kafka AWS AWS Lambda AWS SAM Python Terraform

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 2, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Athena. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

February 8, 202318 min read Data Streaming Simplify Streaming Ingestion on AWS Amazon EventBridge Amazon MSK Amazon Redshift Apache Kafka AWS AWS Lambda AWS SAM Python Terraform

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 1, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Redshift. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

January 10, 20239 min read Data Streaming Apache Kafka Docker Docker Compose Python

We will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription. Docker Compose is used for building a single node Kafka cluster and running multiple consumer instances.

April 3, 20227 min read Data Streaming Integrate Schema Registry With MSK Connect Amazon ECS Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect Terraform

We'll continue the discussion of a Change Data Capture (CDC) solution with a schema registry and its deployment to AWS. All major resources are deployed in private subnets and VPN is used to access them in order to improve developer experience. The Apicurio registry is used as the schema registry service and it is deployed as an ECS service. In order for the connectors to have access to the registry, the Confluent Avro Converter is packaged together with the connector sources. The post ends with illustrating how schema evolution is managed by the schema registry.

Kafka Development With Docker - Part 3 Kafka Connect

Kafka Development With Docker - Part 2 Management App

How I Prepared for Confluent Certified Developer for Apache Kafka as a Non-Java Developer

Kafka Development With Docker - Part 1 Cluster Setup

Kafka Connect for AWS Services Integration - Part 1 Introduction

Integrate Glue Schema Registry With Your Python Kafka App

Simplify Streaming Ingestion on AWS – Part 2 MSK and Athena

Simplify Streaming Ingestion on AWS – Part 1 MSK and Redshift

How to Configure Kafka Consumers to Seek Offsets by Timestamp

Use External Schema Registry With MSK Connect – Part 2 MSK Deployment