Docker Compose

Kafka Development With Docker - Part 3 Kafka Connect

May 25, 20239 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. In this post, I will illustrate how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data will be ingested into the corresponding topics using the MSK Data Generator source connector. The topic messages will then be saved into a S3 bucket using the Confluent S3 sink connector.

May 18, 20238 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose

A Kafka management app can be a good companion for development, which helps monitor and manage resources on an easy-to-use user interface. An app can be more useful if it supports features that are desirable for Kafka development on AWS. Those features cover IAM access control and integration with MSK Connect and Glue Schema Registry. In this post, I'll introduce several management apps that meet those requirements.

May 4, 20239 min read Apache Kafka Kafka Development With Docker Apache Kafka Docker Docker Compose

Apache Kafka is one of the key technologies for modern data streaming architectures on AWS. Developing and testing Kafka-related applications can be easier using Docker and Docker Compose. In this series of posts, I will demonstrate reference implementations of those applications in Dockerized environments.

April 12, 202326 min read Data Streaming Amazon MSK Apache Kafka AWS AWS Glue Schema Registry AWS Lambda AWS Serverless Application Model Docker Docker Compose Python Terraform

Glue Schema Registry provides a centralized repository for managing and validating schemas for topic message data. Its features can be utilized by many AWS services when building data streaming applications. In this post, we will discuss how to integrate Python Kafka producer and consumer apps in AWS Lambda with the Glue Schema Registry.

January 10, 20239 min read Data Streaming Apache Kafka Docker Docker Compose Python

We will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription. Docker Compose is used for building a single node Kafka cluster and running multiple consumer instances.

August 6, 202214 min read Data Engineering Apache Airflow AWS AWS Lambda Docker Docker Compose Python

We'll discuss limitations of the Lambda invoke function operator of Apache Airflow and create a custom Lambda operator. The custom operator extends the existing one and it reports the invocation result of a function correctly and records the exact error message from failure.

June 26, 202212 min read Data Engineering Amazon EMR Apache Iceberg Apache Spark AWS Docker Docker Compose ETL PySpark SCD Slowly Changing Dimension Visual Studio Code

We'll discuss how to implement data warehousing ETL using Iceberg for data storage/management and Spark for data processing. A Pyspark ETL app will be used for demonstration in an EMR local environment. Finally the ETL results will be queried by Athena for verification.

May 8, 202217 min read Data Engineering Amazon EMR Apache Spark AWS Docker Docker Compose PySpark Visual Studio Code

We'll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.

April 3, 20227 min read Data Streaming Integrate Schema Registry With MSK Connect Amazon ECS Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect Terraform

We'll continue the discussion of a Change Data Capture (CDC) solution with a schema registry and its deployment to AWS. All major resources are deployed in private subnets and VPN is used to access them in order to improve developer experience. The Apicurio registry is used as the schema registry service and it is deployed as an ECS service. In order for the connectors to have access to the registry, the Confluent Avro Converter is packaged together with the connector sources. The post ends with illustrating how schema evolution is managed by the schema registry.

March 7, 202210 min read Data Streaming Integrate Schema Registry With MSK Connect Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect

We'll discuss a Change Data Capture (CDC) architecture with a schema registry. As a starting point, a local development environment is set up using Docker Compose. The Debezium and Confluent S3 connectors are deployed with the Confluent Avro converter and the Apicurio registry is used as the schema registry service. A quick example is shown to illustrate how schema evolution can be managed by the schema registry.

Kafka Development With Docker - Part 3 Kafka Connect

Kafka Development With Docker - Part 2 Management App

Kafka Development With Docker - Part 1 Cluster Setup

Integrate Glue Schema Registry With Your Python Kafka App

How to Configure Kafka Consumers to Seek Offsets by Timestamp

Revisit AWS Lambda Invoke Function Operator of Apache Airflow

Data Warehousing ETL Demo With Apache Iceberg on EMR Local Environment

Develop and Test Apache Spark Apps for EMR Locally Using Docker

Use External Schema Registry With MSK Connect – Part 2 MSK Deployment

Use External Schema Registry With MSK Connect – Part 1 Local Development