<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Change Data Capture (CDC) on Jaehyeon Kim</title><link>https://jaehyeon.me/tags/change-data-capture-cdc/</link><description>Recent content in Change Data Capture (CDC) on Jaehyeon Kim</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Copyright © 2023-2026 Jaehyeon Kim. All Rights Reserved.</copyright><lastBuildDate>Thu, 07 Nov 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://jaehyeon.me/tags/change-data-capture-cdc/index.xml" rel="self" type="application/rss+xml"/><item><title>Change Data Capture (CDC) Local Development with PostgreSQL, Debezium Server and Pub/Sub Emulator</title><link>https://jaehyeon.me/blog/2024-11-07-cdc-local-dev/</link><pubDate>Thu, 07 Nov 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-11-07-cdc-local-dev/</guid><description><![CDATA[<p><em>Change data capture</em> (CDC) is a data integration pattern to track changes in a database so that actions can be taken using the changed data. <a href="https://debezium.io/" target="_blank" rel="noopener noreferrer"><em>Debezium</em><i class="fas fa-external-link-square-alt ms-1"></i></a> is probably the most popular open source platform for CDC. Originally providing Kafka source connectors, it also supports a ready-to-use application called <a href="https://debezium.io/documentation/reference/stable/operations/debezium-server.html" target="_blank" rel="noopener noreferrer">Debezium server<i class="fas fa-external-link-square-alt ms-1"></i></a>. The standalone application can be used to stream change events to other messaging infrastructure such as Google Cloud Pub/Sub, Amazon Kinesis and Apache Pulsar. In this post, we develop a CDC solution locally using Docker. The source of the <a href="https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce" target="_blank" rel="noopener noreferrer">theLook eCommerce<i class="fas fa-external-link-square-alt ms-1"></i></a> is modified to generate data continuously, and the data is inserted into multiple tables of a PostgreSQL database. Among those tables, two of them are tracked by the Debezium server, and it pushes row-level changes of those tables into Pub/Sub topics on the <a href="https://cloud.google.com/pubsub/docs/emulator" target="_blank" rel="noopener noreferrer">Pub/Sub emulator<i class="fas fa-external-link-square-alt ms-1"></i></a>. Finally, messages of the topics are read by a Python application.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2024-11-07-cdc-local-dev/featured.png" length="83605" type="image/png"/></item><item><title>Use External Schema Registry with MSK Connect – Part 2 MSK Deployment</title><link>https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/</link><pubDate>Sun, 03 Apr 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/</guid><description>In the previous post, we discussed a Change Data Capture (CDC) solution with a schema registry. A local development environment is set up using Docker Compose. The Debezium and Confluent S3 connectors are deployed with the Confluent Avro converter and the Apicurio registry is used as the schema registry service. A quick example is shown to illustrate how schema evolution can be managed by the schema registry. In this post, we&amp;rsquo;ll build the solution on AWS using MSK, MSK Connect, Aurora PostgreSQL and ECS.</description><enclosure url="https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/featured.png" length="59689" type="image/png"/></item><item><title>Use External Schema Registry with MSK Connect – Part 1 Local Development</title><link>https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/</link><pubDate>Mon, 07 Mar 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 When we discussed a Change Data Capture (CDC) solution in one of the earlier posts, we used the JSON converter that comes with Kafka Connect. We optionally enabled the key and value schemas and the topic messages include those schemas together with payload.</description><enclosure url="https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/featured.png" length="59689" type="image/png"/></item><item><title>Data Lake Demo using Change Data Capture (CDC) on AWS – Part 3 Implement Data Lake</title><link>https://jaehyeon.me/blog/2021-12-19-datalake-demo-part3/</link><pubDate>Sun, 19 Dec 2021 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2021-12-19-datalake-demo-part3/</guid><description>In the previous post, we created a VPC that has private and public subnets in 2 availability zones in order to build and deploy the data lake solution on AWS. NAT instances are created to forward outbound traffic to the internet and a VPN bastion host is set up to facilitate deployment. An Aurora PostgreSQL cluster is deployed to host the source database and a Python command line app is used to create the database.</description><enclosure url="https://jaehyeon.me/blog/2021-12-19-datalake-demo-part3/featured.png" length="164526" type="image/png"/></item><item><title>Data Lake Demo using Change Data Capture (CDC) on AWS – Part 2 Implement CDC</title><link>https://jaehyeon.me/blog/2021-12-12-datalake-demo-part2/</link><pubDate>Sun, 12 Dec 2021 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2021-12-12-datalake-demo-part2/</guid><description>In the previous post, we discussed a data lake solution where data ingestion is performed using change data capture (CDC) and the output files are upserted to an Apache Hudi table. Being registered to Glue Data Catalog, it can be used for ad-hoc queries and report/dashboard creation. The Northwind database is used as the source database and, following the transactional outbox pattern, order-related changes are _upserted _to an outbox table by triggers.</description><enclosure url="https://jaehyeon.me/blog/2021-12-12-datalake-demo-part2/featured.png" length="164526" type="image/png"/></item><item><title>Data Lake Demo using Change Data Capture (CDC) on AWS – Part 1 Local Development</title><link>https://jaehyeon.me/blog/2021-12-05-datalake-demo-part1/</link><pubDate>Sun, 05 Dec 2021 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2021-12-05-datalake-demo-part1/</guid><description>Change data capture (CDC) is a proven data integration pattern that has a wide range of applications. Among those, data replication to data lakes is a good use case in data engineering. Coupled with best-in-breed data lake formats such as Apache Hudi, we can build an efficient data replication solution. This is the first post of the data lake demo series. Over time, we&amp;rsquo;ll build a data lake that uses CDC.</description><enclosure url="https://jaehyeon.me/blog/2021-12-05-datalake-demo-part1/featured.png" length="164526" type="image/png"/></item></channel></rss>