We'll discuss a Change Data Capture (CDC) architecture with a schema registry. As a starting point, a local development environment is set up using Docker Compose. The Debezium and Confluent S3 connectors are deployed with the Confluent Avro converter and the Apicurio registry is used as the schema registry service. A quick example is shown to illustrate how schema evolution can be managed by the schema registry.
Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build a Hudi DeltaStramer app on Amazon EMR and use the resulting Hudi table with Athena and Quicksight to build a dashboard.
Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build CDC with Amazon MSK and MSK Connect.
Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. As a starting point, we’ll discuss the source database and CDC streaming infrastructure in the local environment.
Recently AWS Glue 3.0 was released but a docker image for this version is not published. In this post, I’ll illustrate how to create a development environment for AWS Glue 3.0 (and later versions) by building a custom docker image.
In this post, I'll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote - Containers extension.
In this post, it is demonstrated how AWS Lambda can be integrated with Apache Airflow using a custom operator inspired by the ECS Operator.
Traefik is a modern HTTP reverse proxy and load balancer. In this post, it'll be demonstrated how path-based routing can be set up by Traefik with Docker. Also a centralized authentication will be illustrated with the Forward Authentication feature of Traefik.
In this post, I'll illustrate how a web service is created using FastAPI framework where tasks are sent to multiple workers. The workers are built with Celery and Rserve. Redis is used as a message broker/result backend for Celery and a key-value store for Rserve. Demos can be run in both Docker Compose and Kubernetes.
In this post, I'll demonstrate how to create a Linux development environment on Windows using WSL. Also an example app (Rserve web service with a sidecar container) on Minikube will be demonstrated.