Docker

Kafka Development With Docker - Part 7 Producer and Consumer With Glue Schema Registry

June 22, 202312 min read Data Streaming Kafka Development With Docker Apache Kafka AWS AWS Glue Schema Registry Docker Kpow Python

In Part 4, we developed Kafka producer and consumer applications using the kafka-python package without integrating schema registry. Later we discussed the benefits of schema registry when developing Kafka applications in Part 5. In this post, I'll demonstrate how to enhance the existing applications by integrating AWS Glue Schema Registry.

June 15, 202312 min read Data Integration Data Streaming Kafka Development With Docker Apache Kafka AWS AWS Glue Schema Registry Docker Kafka Connect Kpow

In Part 3, we developed a data ingestion pipeline using Kafka Connect source and sink connectors without enabling schemas. Later we discussed the benefits of schema registry when developing Kafka applications in Part 5. In this post, I'll demonstrate how to enhance the existing data ingestion pipeline by integrating AWS Glue Schema Registry.

June 4, 202313 min read Data Streaming Kafka Connect for AWS Services Integration Amazon DynamoDB Apache Camel Apache Kafka AWS Docker Kafka Connect

The suite of Apache Camel Kafka connectors and the Kinesis Kafka connector from the AWS Labs can be effective for building data ingestion pipelines that integrate AWS services. In this post, I will illustrate how to develop the Camel DynamoDB sink connector using Docker. Fake order data will be generated using the MSK Data Generator source connector, and the sink connector will be configured to consume the topic messages to ingest them into a DynamoDB table.

June 1, 20237 min read Data Streaming Kafka Development With Docker Apache Kafka Docker Python

Kafka includes the Producer/Consumer APIs that allow client applications to send/read streams of data to/from topics in a Kafka cluster. While the main Kafka project maintains only the Java clients, there are several open source projects that provide the Kafka client APIs in Python. In this post, I'll demonstrate how to develop producer/consumer applications using the kafka-python package.

May 25, 20239 min read Data Integration Data Streaming Kafka Development With Docker Apache Kafka Docker Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. In this post, I will illustrate how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data will be ingested into the corresponding topics using the MSK Data Generator source connector. The topic messages will then be saved into a S3 bucket using the Confluent S3 sink connector.

May 18, 20238 min read Data Streaming Kafka Development With Docker Apache Kafka Docker Kafka-Ui Kpow

A Kafka management app can be a good companion for development, which helps monitor and manage resources on an easy-to-use user interface. An app can be more useful if it supports features that are desirable for Kafka development on AWS. Those features cover IAM access control and integration with MSK Connect and Glue Schema Registry. In this post, I'll introduce several management apps that meet those requirements.

May 4, 20239 min read Data Streaming Kafka Development With Docker Apache Kafka Data Streaming Docker

Apache Kafka is one of the key technologies for modern data streaming architectures on AWS. Developing and testing Kafka-related applications can be easier using Docker and Docker Compose. In this series of posts, I will demonstrate reference implementations of those applications in Dockerized environments.

January 10, 202310 min read Data Streaming Apache Kafka Docker Python

We will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription. Docker Compose is used for building a single node Kafka cluster and running multiple consumer instances.

August 6, 202214 min read Data Engineering Apache Airflow AWS AWS Lambda Docker Python

We'll discuss limitations of the Lambda invoke function operator of Apache Airflow and create a custom Lambda operator. The custom operator extends the existing one and it reports the invocation result of a function correctly and records the exact error message from failure.

May 8, 202217 min read Data Engineering Amazon EMR Apache Spark AWS Docker PySpark

We'll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.

Kafka Development With Docker - Part 7 Producer and Consumer With Glue Schema Registry

Kafka Development With Docker - Part 6 Kafka Connect With Glue Schema Registry

Kafka Connect for AWS Services Integration - Part 2 Develop Camel DynamoDB Sink Connector

Kafka Development With Docker - Part 4 Producer and Consumer

Kafka Development With Docker - Part 3 Kafka Connect

Kafka Development With Docker - Part 2 Management App

Kafka Development With Docker - Part 1 Cluster Setup

How to Configure Kafka Consumers to Seek Offsets by Timestamp

Revisit AWS Lambda Invoke Function Operator of Apache Airflow

Develop and Test Apache Spark Apps for EMR Locally Using Docker