Docker

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 3 Implement Data Lake

December 19, 202111 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Kafka Connect Terraform

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build a Hudi DeltaStramer app on Amazon EMR and use the resulting Hudi table with Athena and Quicksight to build a dashboard.

December 12, 202117 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Kafka Connect Terraform

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. In this post, we'll build CDC with Amazon MSK and MSK Connect.

December 5, 202118 min read Data Engineering Data Lake Demo Using Change Data Capture Amazon EMR Amazon MSK Amazon MSK Connect Apache Hudi Apache Kafka Apache Spark AWS Change Data Capture Data Lake Docker Docker Compose Kafka Connect

Change data capture (CDC) on Amazon MSK and ingesting data using Apache Hudi on Amazon EMR can be used to build an efficient data lake solution. As a starting point, we’ll discuss the source database and CDC streaming infrastructure in the local environment.

November 14, 20218 min read Data Engineering Apache Spark AWS AWS Glue Docker PySpark Python Visual Studio Code

Recently AWS Glue 3.0 was released but a docker image for this version is not published. In this post, I’ll illustrate how to create a development environment for AWS Glue 3.0 (and later versions) by building a custom docker image.

August 20, 20219 min read Data Engineering Apache Spark AWS AWS Glue Docker PySpark Python Visual Studio Code

In this post, I'll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote - Containers extension.

April 13, 20209 min read Data Engineering Apache Airflow AWS AWS Lambda Docker Docker Compose Python

In this post, it is demonstrated how AWS Lambda can be integrated with Apache Airflow using a custom operator inspired by the ECS Operator.

November 29, 20199 min read Development Docker Docker Compose FastAPI Python R Rserve Traefik

Traefik is a modern HTTP reverse proxy and load balancer. In this post, it'll be demonstrated how path-based routing can be set up by Traefik with Docker. Also a centralized authentication will be illustrated with the Forward Authentication feature of Traefik.

November 15, 20198 min read Development Celery Docker Docker Compose FastAPI Kubernetes Python R Redis Rserve

In this post, I'll illustrate how a web service is created using FastAPI framework where tasks are sent to multiple workers. The workers are built with Celery and Rserve. Redis is used as a message broker/result backend for Celery and a key-value store for Rserve. Demos can be run in both Docker Compose and Kubernetes.

November 1, 201912 min read Development Docker Docker Compose FastAPI Kubernetes Minikube Python R Rserve VSCode WSL

In this post, I'll demonstrate how to create a Linux development environment on Windows using WSL. Also an example app (Rserve web service with a sidecar container) on Minikube will be demonstrated.

July 20, 20196 min read Development AWS AWS Lambda Docker Docker Compose Flask Flask-RestPlus LocalStack Python S3 SQS

LocalStack provides an easy-to-use test/mocking framework for developing AWS applications. In this post, I'll demonstrate how to utilize LocalStack for development using a web service.

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 3 Implement Data Lake

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 2 Implement CDC

Data Lake Demo Using Change Data Capture (CDC) on AWS – Part 1 Local Development

Local Development of AWS Glue 3.0 and Later

AWS Glue Local Development With Docker and Visual Studio Code

Thoughts on Apache Airflow AWS Lambda Operator

Dynamic Routing and Centralized Auth With Traefik, Python and R Example

Distributed Task Queue With Python and R Example

Linux Dev Environment on Windows

AWS Local Development With LocalStack