Amazon MSK Connect

Real Time Streaming With Kafka and Flink - Lab 5 Write Data to DynamoDB Using Kafka Connect

November 30, 20239 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon DynamoDB Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. In this lab, we will discuss how to create a data pipeline that ingests data from a Kafka topic into a DynamoDB table using the Camel DynamoDB sink connector.

October 5, 20236 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon Athena Amazon DyanmoDB Amazon MSK Amazon MSK Connect Amazon OpenSearch Service Amazon S3 Apache Camel Apache Flink Apache Kafka AWS AWS Glue AWS Lambda Docker Docker Compose Kafka Connect OpenSearch Pyflink Python

This series updates a real time analytics app based on Amazon Kinesis from an AWS workshop. Data is ingested from multiple sources into a Kafka cluster instead and Flink (Pyflink) apps are used extensively for data ingesting and processing. As an introduction, this post compares the original architecture with the new architecture, and the app will be implemented in subsequent posts.

September 14, 202317 min read Data Streaming Kafka, Flink and DynamoDB for Real Time Fraud Detection Amazon DynamoDB Amazon Managed Flink Amazon Managed Service for Apache Flink Amazon MSK Amazon MSK Connect Apache Flink Apache Kafka Fraud Detection Kafka Connect Pyflink Python

This series aims to help those who are new to Apache Flink and Amazon Managed Service for Apache Flink by re-implementing a simple fraud detection application that is discussed in an AWS workshop titled AWS Kafka and DynamoDB for real time fraud detection. In part 1, I demonstrated how to develop the application locally, and the app will be deployed via Amazon Managed Service for Apache Flink in this post.

July 3, 202314 min read Data Streaming Kafka Connect for AWS Services Integration Amazon DynamoDB Amazon MSK Amazon MSK Connect Apache Camel Apache Kafka AWS Kafka Connect

As part of investigating how to utilize Kafka Connect effectively for AWS services integration, I demonstrated how to develop the Camel DynamoDB sink connector using Docker in Part 2. Fake order data was generated using the MSK Data Generator source connector, and the sink connector was configured to consume the topic messages to ingest them into a DynamoDB table. In this post, I will illustrate how to deploy the data ingestion applications using Amazon MSK and MSK Connect.

April 3, 20227 min read Data Streaming Integrate Schema Registry With MSK Connect Amazon ECS Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect Terraform

We'll continue the discussion of a Change Data Capture (CDC) solution with a schema registry and its deployment to AWS. All major resources are deployed in private subnets and VPN is used to access them in order to improve developer experience. The Apicurio registry is used as the schema registry service and it is deployed as an ECS service. In order for the connectors to have access to the registry, the Confluent Avro Converter is packaged together with the connector sources. The post ends with illustrating how schema evolution is managed by the schema registry.

March 7, 202210 min read Data Streaming Integrate Schema Registry With MSK Connect Amazon MSK Amazon MSK Connect Apache Kafka AWS Docker Docker Compose Kafka Connect

We'll discuss a Change Data Capture (CDC) architecture with a schema registry. As a starting point, a local development environment is set up using Docker Compose. The Debezium and Confluent S3 connectors are deployed with the Confluent Avro converter and the Apicurio registry is used as the schema registry service. A quick example is shown to illustrate how schema evolution can be managed by the schema registry.

December 19, 202111 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Kafka Connect Terraform

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build a Hudi DeltaStramer app on Amazon EMR and use the resulting Hudi table with Athena and Quicksight to build a dashboard.

December 12, 202117 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Kafka Connect Terraform

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build CDC with Amazon MSK and MSK Connect.

December 5, 202118 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Docker Compose Kafka Connect

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. As a starting point, we’ll discuss the source database and CDC streaming infrastructure in the local environment.

Real Time Streaming With Kafka and Flink - Lab 5 Write Data to DynamoDB Using Kafka Connect

Real Time Streaming With Kafka and Flink - Introduction

Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 2 Deployment via AWS Managed Flink

Kafka Connect for AWS Services Integration - Part 3 Deploy Camel DynamoDB Sink Connector

Use External Schema Registry With MSK Connect – Part 2 MSK Deployment

Use External Schema Registry With MSK Connect – Part 1 Local Development

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 3 Implement Data Lake

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 2 Implement CDC

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 1 Local Development