<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Docker on Jaehyeon Kim</title><link>https://jaehyeon.me/tags/docker/</link><description>Recent content in Docker on Jaehyeon Kim</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Copyright © 2023-2026 Jaehyeon Kim. All Rights Reserved.</copyright><lastBuildDate>Wed, 19 Nov 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://jaehyeon.me/tags/docker/index.xml" rel="self" type="application/rss+xml"/><item><title>Guide to Building Integrated Web Applications with FastAPI and NiceGUI</title><link>https://jaehyeon.me/blog/2025-11-19-fastapi-nicegui-template/</link><pubDate>Wed, 19 Nov 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-11-19-fastapi-nicegui-template/</guid><description>&lt;p>The standard architecture for modern web applications involves a decoupled frontend, typically built with a JavaScript framework, and a backend API. This pattern is powerful but introduces complexity in managing two separate codebases, development environments, and the API contract between them.&lt;/p>
&lt;p>This article explores an alternative approach: an integrated architecture where the backend API and the frontend UI are served from a single, cohesive Python application.&lt;/p></description><enclosure url="https://jaehyeon.me/blog/2025-11-19-fastapi-nicegui-template/featured.gif" length="2611548" type="image/gif"/></item><item><title>Flink Table API - Declarative Analytics for Supplier Stats in Real Time</title><link>https://jaehyeon.me/blog/2025-06-17-kotlin-getting-started-flink-table/</link><pubDate>Tue, 17 Jun 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-06-17-kotlin-getting-started-flink-table/</guid><description><![CDATA[<p>In the last post, we explored the fine-grained control of Flink&rsquo;s DataStream API. Now, we&rsquo;ll approach the same problem from a higher level of abstraction using the <strong>Flink Table API</strong>. This post demonstrates how to build a declarative analytics pipeline that processes our continuous stream of Avro-formatted order events. We will define a <code>Table</code> on top of a <code>DataStream</code> and use SQL-like expressions to perform windowed aggregations. This example highlights the power and simplicity of the Table API for analytical tasks and showcases Flink&rsquo;s seamless integration between its different API layers to handle complex requirements like late data.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-06-17-kotlin-getting-started-flink-table/featured.png" length="144113" type="image/png"/></item><item><title>Flink DataStream API - Scalable Event Processing for Supplier Stats</title><link>https://jaehyeon.me/blog/2025-06-10-kotlin-getting-started-flink-datastream/</link><pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-06-10-kotlin-getting-started-flink-datastream/</guid><description><![CDATA[<p>Building on our exploration of stream processing, we now transition from Kafka&rsquo;s native library to <strong>Apache Flink</strong>, a powerful, general-purpose distributed processing engine. In this post, we&rsquo;ll dive into Flink&rsquo;s foundational <strong>DataStream API</strong>. We will tackle the same supplier statistics problem - analyzing a stream of Avro-formatted order events - but this time using Flink&rsquo;s robust features for stateful computation. This example will highlight Flink&rsquo;s sophisticated event-time processing with watermarks and its elegant, built-in mechanisms for handling late-arriving data through side outputs.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-06-10-kotlin-getting-started-flink-datastream/featured.png" length="142918" type="image/png"/></item><item><title>Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</title><link>https://jaehyeon.me/blog/2025-06-03-kotlin-getting-started-kafka-streams/</link><pubDate>Tue, 03 Jun 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-06-03-kotlin-getting-started-kafka-streams/</guid><description><![CDATA[<p>In this post, we shift our focus from basic Kafka clients to real-time stream processing with <strong>Kafka Streams</strong>. We&rsquo;ll explore a Kotlin application designed to analyze a continuous stream of Avro-formatted order events, calculate supplier statistics in tumbling windows, and intelligently handle late-arriving data. This example demonstrates the power of Kafka Streams for building lightweight, yet robust, stream processing applications directly within your Kafka ecosystem, leveraging event-time processing and custom logic.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-06-03-kotlin-getting-started-kafka-streams/featured.png" length="131804" type="image/png"/></item><item><title>Kafka Clients with Avro - Schema Registry and Order Events</title><link>https://jaehyeon.me/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/</link><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/</guid><description><![CDATA[<p>In this post, we&rsquo;ll explore a practical example of building Kafka client applications using Kotlin, Apache Avro for data serialization, and Gradle for build management. We&rsquo;ll walk through the setup of a Kafka producer that generates mock order data and a consumer that processes these orders. This example highlights best practices such as schema management with Avro, robust error handling, and graceful shutdown, providing a solid foundation for your own Kafka-based projects. We&rsquo;ll dive into the build configuration, the Avro schema definition, utility functions for Kafka administration, and the core logic of both the producer and consumer applications.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/featured.png" length="73988" type="image/png"/></item><item><title>Kafka Clients with JSON - Producing and Consuming Order Events</title><link>https://jaehyeon.me/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/</link><pubDate>Tue, 20 May 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/</guid><description>&lt;p>This post explores a Kotlin-based Kafka project, meticulously detailing the construction and operation of both a Kafka producer application, responsible for generating and sending order data, and a Kafka consumer application, designed to receive and process these orders. We&amp;rsquo;ll delve into each component, from build configuration to message handling, to understand how they work together in an event-driven system.&lt;/p></description><enclosure url="https://jaehyeon.me/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/featured.png" length="97922" type="image/png"/></item><item><title>Run Flink SQL Cookbook in Docker</title><link>https://jaehyeon.me/blog/2025-04-15-sql-cookbook/</link><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-04-15-sql-cookbook/</guid><description><![CDATA[<p>The <a href="https://github.com/ververica/flink-sql-cookbook" target="_blank" rel="noopener noreferrer">Flink SQL Cookbook<i class="fas fa-external-link-square-alt ms-1"></i></a> by Ververica is a hands-on, example-rich guide to mastering <a href="https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/overview/" target="_blank" rel="noopener noreferrer">Apache Flink SQL<i class="fas fa-external-link-square-alt ms-1"></i></a> for real-time stream processing. It offers a wide range of self-contained recipes, from basic queries and table operations to more advanced use cases like windowed aggregations, complex joins, user-defined functions (UDFs), and pattern detection. These examples are designed to be run on the Ververica Platform, and as such, the cookbook doesn&rsquo;t include instructions for setting up a Flink cluster.</p>
<p>To help you run these recipes locally and explore Flink SQL without external dependencies, this post walks through setting up a fully functional local Flink cluster using Docker Compose. With this setup, you can experiment with the cookbook examples right on your machine.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-04-15-sql-cookbook/featured.gif" length="319243" type="image/gif"/></item><item><title>Realtime Dashboard with FastAPI, Streamlit and Next.js - Part 1 Data Producer</title><link>https://jaehyeon.me/blog/2025-02-18-realtime-dashboard-1/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2025-02-18-realtime-dashboard-1/</guid><description><![CDATA[<p>In this series, we develop real-time monitoring dashboard applications. A data generating app is created with Python, and it ingests the <a href="https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce" target="_blank" rel="noopener noreferrer">theLook eCommerce<i class="fas fa-external-link-square-alt ms-1"></i></a> data continuously into a PostgreSQL database. A WebSocket server, built by <a href="https://fastapi.tiangolo.com/" target="_blank" rel="noopener noreferrer">FastAPI<i class="fas fa-external-link-square-alt ms-1"></i></a>, periodically queries the data to serve its clients. The monitoring dashboards will be developed using <a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer">Streamlit<i class="fas fa-external-link-square-alt ms-1"></i></a> and <a href="https://nextjs.org/" target="_blank" rel="noopener noreferrer">Next.js<i class="fas fa-external-link-square-alt ms-1"></i></a>, with <a href="https://echarts.apache.org/en/index.html" target="_blank" rel="noopener noreferrer">Apache ECharts<i class="fas fa-external-link-square-alt ms-1"></i></a> for visualization. In this post, we walk through the data generation app and backend API, while the monitoring dashboards will be discussed in later posts.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2025-02-18-realtime-dashboard-1/featured.gif" length="1207440" type="image/gif"/></item><item><title>Deploy Python Stream Processing App on Kubernetes - Part 2 Beam Pipeline on Flink Runner</title><link>https://jaehyeon.me/blog/2024-06-06-beam-deploy-2/</link><pubDate>Thu, 06 Jun 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-06-06-beam-deploy-2/</guid><description><![CDATA[<p>In this post, we develop an <a href="https://beam.apache.org/" target="_blank" rel="noopener noreferrer">Apache Beam<i class="fas fa-external-link-square-alt ms-1"></i></a> pipeline using the <a href="https://beam.apache.org/documentation/sdks/python/" target="_blank" rel="noopener noreferrer">Python SDK<i class="fas fa-external-link-square-alt ms-1"></i></a> and deploy it on an <a href="https://flink.apache.org/" target="_blank" rel="noopener noreferrer">Apache Flink<i class="fas fa-external-link-square-alt ms-1"></i></a> cluster via the <a href="https://beam.apache.org/documentation/runners/flink/" target="_blank" rel="noopener noreferrer">Apache Flink Runner<i class="fas fa-external-link-square-alt ms-1"></i></a>. Same as <a href="/blog/2024-05-30-beam-deploy-1">Part I</a>, we deploy a Kafka cluster using the <a href="https://strimzi.io/" target="_blank" rel="noopener noreferrer">Strimzi Operator<i class="fas fa-external-link-square-alt ms-1"></i></a> on a <a href="https://minikube.sigs.k8s.io/docs/" target="_blank" rel="noopener noreferrer">minikube<i class="fas fa-external-link-square-alt ms-1"></i></a> cluster as the pipeline uses <a href="https://kafka.apache.org/" target="_blank" rel="noopener noreferrer">Apache Kafka<i class="fas fa-external-link-square-alt ms-1"></i></a> topics for its data source and sink. Then, we develop the pipeline as a Python package and add the package to a custom Docker image so that Python user code can be executed externally. For deployment, we create a Flink session cluster via the <a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/" target="_blank" rel="noopener noreferrer">Flink Kubernetes Operator<i class="fas fa-external-link-square-alt ms-1"></i></a>, and deploy the pipeline using a Kubernetes job. Finally, we check the output of the application by sending messages to the input Kafka topic using a Python producer application.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2024-06-06-beam-deploy-2/featured.png" length="58020" type="image/png"/></item><item><title>Deploy Python Stream Processing App on Kubernetes - Part 1 PyFlink Application</title><link>https://jaehyeon.me/blog/2024-05-30-beam-deploy-1/</link><pubDate>Thu, 30 May 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-05-30-beam-deploy-1/</guid><description><![CDATA[<p><a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/" target="_blank" rel="noopener noreferrer">Flink Kubernetes Operator<i class="fas fa-external-link-square-alt ms-1"></i></a> acts as a control plane to manage the complete deployment lifecycle of Apache Flink applications. With the operator, we can simplify deployment and management of Python stream processing applications. In this series, we discuss how to deploy a PyFlink application and Python Apache Beam pipeline on the <a href="https://beam.apache.org/documentation/runners/flink/" target="_blank" rel="noopener noreferrer">Flink Runner<i class="fas fa-external-link-square-alt ms-1"></i></a> on Kubernetes. In Part 1, we first deploy a Kafka cluster on a <a href="https://minikube.sigs.k8s.io/docs/" target="_blank" rel="noopener noreferrer">minikube<i class="fas fa-external-link-square-alt ms-1"></i></a> cluster as the source and sink of the PyFlink application are Kafka topics. Then, the application source is packaged in a custom Docker image and deployed on the minikube cluster using the Flink Kubernetes Operator. Finally, the output of the application is checked by sending messages to the input Kafka topic using a Python producer application.</p>]]></description><enclosure url="https://jaehyeon.me/blog/2024-05-30-beam-deploy-1/featured.png" length="64457" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 6 ETL on Amazon Athena via Airflow</title><link>https://jaehyeon.me/blog/2024-03-14-dbt-pizza-shop-6/</link><pubDate>Thu, 14 Mar 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-03-14-dbt-pizza-shop-6/</guid><description>In Part 5, we developed a dbt project that that targets Apache Iceberg where transformations are performed on Amazon Athena. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. To improve query performance, the fact table is denormalized to pre-join records from the dimension tables using the array and struct data types.</description><enclosure url="https://jaehyeon.me/blog/2024-03-14-dbt-pizza-shop-6/featured.png" length="82921" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 5 Modelling on Amazon Athena</title><link>https://jaehyeon.me/blog/2024-03-07-dbt-pizza-shop-5/</link><pubDate>Thu, 07 Mar 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-03-07-dbt-pizza-shop-5/</guid><description>In Part 1 and Part 3, we developed data build tool (dbt) projects that target PostgreSQL and BigQuery using fictional pizza shop data. The data is modelled by SCD type 2 dimension tables and one transactional fact table. While the order records should be joined with dimension tables to get complete details for PostgreSQL, the fact table is denormalized using nested and repeated fields to improve query performance for BigQuery.</description><enclosure url="https://jaehyeon.me/blog/2024-03-07-dbt-pizza-shop-5/featured.png" length="61499" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 4 ETL on BigQuery via Airflow</title><link>https://jaehyeon.me/blog/2024-02-22-dbt-pizza-shop-4/</link><pubDate>Thu, 22 Feb 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-02-22-dbt-pizza-shop-4/</guid><description>In Part 3, we developed a dbt project that targets Google BigQuery with fictional pizza shop data. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. The fact table is denormalized using nested and repeated fields for improving query performance. In this post, we discuss how to set up an ETL process on the project using Apache Airflow.</description><enclosure url="https://jaehyeon.me/blog/2024-02-22-dbt-pizza-shop-4/featured.png" length="89588" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 3 Modelling on BigQuery</title><link>https://jaehyeon.me/blog/2024-02-08-dbt-pizza-shop-3/</link><pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-02-08-dbt-pizza-shop-3/</guid><description>In this series, we discuss practical examples of data warehouse and lakehouse development where data transformation is performed by the data build tool (dbt) and ETL is managed by Apache Airflow. In Part 1, we developed a dbt project on PostgreSQL using fictional pizza shop data. At the end, the data sets are modelled by two SCD type 2 dimension tables and one transactional fact table. In this post, we create a new dbt project that targets Google BigQuery.</description><enclosure url="https://jaehyeon.me/blog/2024-02-08-dbt-pizza-shop-3/featured.png" length="70297" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 2 ETL on PostgreSQL via Airflow</title><link>https://jaehyeon.me/blog/2024-01-25-dbt-pizza-shop-2/</link><pubDate>Thu, 25 Jan 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-01-25-dbt-pizza-shop-2/</guid><description>In this series of posts, we discuss data warehouse/lakehouse examples using data build tool (dbt) including ETL orchestration with Apache Airflow. In Part 1, we developed a dbt project on PostgreSQL with fictional pizza shop data. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. In this post, we discuss how to set up an ETL process on the project using Apache Airflow.</description><enclosure url="https://jaehyeon.me/blog/2024-01-25-dbt-pizza-shop-2/featured.png" length="77355" type="image/png"/></item><item><title>Data Build Tool (dbt) Pizza Shop Demo - Part 1 Modelling on PostgreSQL</title><link>https://jaehyeon.me/blog/2024-01-18-dbt-pizza-shop-1/</link><pubDate>Thu, 18 Jan 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-01-18-dbt-pizza-shop-1/</guid><description>The data build tool (dbt) is a popular data transformation tool for data warehouse development. Moreover, it can be used for data lakehouse development thanks to open table formats such as Apache Iceberg, Apache Hudi and Delta Lake. dbt supports key AWS analytics services and I wrote a series of posts that discuss how to utilise dbt with Redshift, Glue, EMR on EC2, EMR on EKS, and Athena. Those posts focus on platform integration, however, they do not show realistic ETL scenarios.</description><enclosure url="https://jaehyeon.me/blog/2024-01-18-dbt-pizza-shop-1/featured.png" length="85093" type="image/png"/></item><item><title>Kafka Development on Kubernetes - Part 3 Kafka Connect</title><link>https://jaehyeon.me/blog/2024-01-11-kafka-development-on-k8s-part-3/</link><pubDate>Thu, 11 Jan 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-01-11-kafka-development-on-k8s-part-3/</guid><description>Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. In this post, we discuss how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data is ingested into Kafka topics using the MSK Data Generator. Also, we use the Confluent S3 sink connector to save the messages of the topics into a S3 bucket.</description><enclosure url="https://jaehyeon.me/blog/2024-01-11-kafka-development-on-k8s-part-3/featured.png" length="97270" type="image/png"/></item><item><title>Kafka Development on Kubernetes - Part 2 Producer and Consumer</title><link>https://jaehyeon.me/blog/2024-01-04-kafka-development-on-k8s-part-2/</link><pubDate>Thu, 04 Jan 2024 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2024-01-04-kafka-development-on-k8s-part-2/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 Apache Kafka has five core APIs, and we can develop applications to send/read streams of data to/from topics in a Kafka cluster using the producer and consumer APIs. While the main Kafka project maintains only the Java APIs, there are several open source projects that provide the Kafka client APIs in Python.</description><enclosure url="https://jaehyeon.me/blog/2024-01-04-kafka-development-on-k8s-part-2/featured.png" length="75889" type="image/png"/></item><item><title>Kafka Development on Kubernetes - Part 1 Cluster Setup</title><link>https://jaehyeon.me/blog/2023-12-21-kafka-development-on-k8s-part-1/</link><pubDate>Thu, 21 Dec 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-12-21-kafka-development-on-k8s-part-1/</guid><description>Apache Kafka is one of the key technologies for implementing data streaming architectures. Strimzi provides a way to run an Apache Kafka cluster and related resources on Kubernetes in various deployment configurations. In this series of posts, we will discuss how to create a Kafka cluster, to develop Kafka client applications in Python and to build a data pipeline using Kafka connectors on Kubernetes.
Part 1 Cluster Setup (this post) Part 2 Producer and Consumer Part 3 Kafka Connect Setup Kafka Cluster The Kafka cluster is deployed using the Strimzi Operator on a Minikube cluster.</description><enclosure url="https://jaehyeon.me/blog/2023-12-21-kafka-development-on-k8s-part-1/featured.png" length="108975" type="image/png"/></item><item><title>Setup Local Development Environment for Apache Flink and Spark Using EMR Container Images</title><link>https://jaehyeon.me/blog/2023-12-07-flink-spark-local-dev/</link><pubDate>Thu, 07 Dec 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-12-07-flink-spark-local-dev/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 Apache Flink became generally available for Amazon EMR on EKS from the EMR 6.15.0 releases, and we are able to pull the Flink (as well as Spark) container images from the ECR Public Gallery.</description><enclosure url="https://jaehyeon.me/blog/2023-12-07-flink-spark-local-dev/featured.png" length="133053" type="image/png"/></item><item><title>Real Time Streaming with Kafka and Flink - Lab 2 Write data to Kafka from S3 using Flink</title><link>https://jaehyeon.me/blog/2023-11-09-real-time-streaming-with-kafka-and-flink-3/</link><pubDate>Thu, 09 Nov 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-11-09-real-time-streaming-with-kafka-and-flink-3/</guid><description>In this lab, we will create a Pyflink application that reads records from S3 and sends them into a Kafka topic. A custom pipeline Jar file will be created as the Kafka cluster is authenticated by IAM, and it will be demonstrated how to execute the app in a Flink cluster deployed on Docker as well as locally as a typical Python app. We can assume the S3 data is static metadata that needs to be joined into another stream, and this exercise can be useful for data enrichment.</description><enclosure url="https://jaehyeon.me/blog/2023-11-09-real-time-streaming-with-kafka-and-flink-3/featured.png" length="139114" type="image/png"/></item><item><title>Kafka Connect for AWS Services Integration - Part 4 Develop Aiven OpenSearch Sink Connector</title><link>https://jaehyeon.me/blog/2023-10-23-kafka-connect-for-aws-part-4/</link><pubDate>Mon, 23 Oct 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-10-23-kafka-connect-for-aws-part-4/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 OpenSearch is a popular search and analytics engine and its use cases cover log analytics, real-time application monitoring, and clickstream analysis. OpenSearch can be deployed on its own or via Amazon OpenSearch Service.</description><enclosure url="https://jaehyeon.me/blog/2023-10-23-kafka-connect-for-aws-part-4/featured.png" length="61820" type="image/png"/></item><item><title>Building Apache Flink Applications in Python</title><link>https://jaehyeon.me/blog/2023-10-19-build-pyflink-apps/</link><pubDate>Thu, 19 Oct 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-10-19-build-pyflink-apps/</guid><description>Building Apache Flink Applications in Java is a course to introduce Apache Flink through a series of hands-on exercises, and it is provided by Confluent. Utilising the Flink DataStream API, the course develops three Flink applications that populate multiple source data sets, collect them into a standardised data set, and aggregate it to produce usage statistics. As part of learning the Flink DataStream API in Pyflink, I converted the Java apps into Python equivalent while performing the course exercises in Pyflink.</description><enclosure url="https://jaehyeon.me/blog/2023-10-19-build-pyflink-apps/featured.png" length="154736" type="image/png"/></item><item><title>Getting Started with Pyflink on AWS - Part 3 AWS Managed Flink and MSK</title><link>https://jaehyeon.me/blog/2023-09-04-getting-started-with-pyflink-on-aws-part-3/</link><pubDate>Mon, 04 Sep 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-09-04-getting-started-with-pyflink-on-aws-part-3/</guid><description>In this series of posts, we discuss a Flink (Pyflink) application that reads/writes from/to Kafka topics. In the previous posts, I demonstrated a Pyflink app that targets a local Kafka cluster as well as a Kafka cluster on Amazon MSK. The app was executed in a virtual environment as well as in a local Flink cluster for improved monitoring. In this post, the app will be deployed via Amazon Managed Service for Apache Flink, which is the easiest option to run Flink applications on AWS.</description><enclosure url="https://jaehyeon.me/blog/2023-09-04-getting-started-with-pyflink-on-aws-part-3/featured.png" length="74618" type="image/png"/></item><item><title>Getting Started with Pyflink on AWS - Part 1 Local Flink and Local Kafka</title><link>https://jaehyeon.me/blog/2023-08-17-getting-started-with-pyflink-on-aws-part-1/</link><pubDate>Thu, 17 Aug 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-08-17-getting-started-with-pyflink-on-aws-part-1/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 Apache Flink is an open-source, unified stream-processing and batch-processing framework. Its core is a distributed streaming data-flow engine that you can use to run real-time stream processing on high-throughput data sources. Currently, it is widely used to build applications for fraud/anomaly detection, rule-based alerting, business process monitoring, and continuous ETL to name a few.</description><enclosure url="https://jaehyeon.me/blog/2023-08-17-getting-started-with-pyflink-on-aws-part-1/featured.png" length="55960" type="image/png"/></item><item><title>Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 1 Local Development</title><link>https://jaehyeon.me/blog/2023-08-10-fraud-detection-part-1/</link><pubDate>Thu, 10 Aug 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-08-10-fraud-detection-part-1/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 Apache Flink is an open-source, unified stream-processing and batch-processing framework. Its core is a distributed streaming data-flow engine that you can use to run real-time stream processing on high-throughput data sources. Currently, it is widely used to build applications for fraud/anomaly detection, rule-based alerting, business process monitoring, and continuous ETL to name a few.</description><enclosure url="https://jaehyeon.me/blog/2023-08-10-fraud-detection-part-1/featured.png" length="72929" type="image/png"/></item><item><title>Kafka Development with Docker - Part 11 Kafka Authorization</title><link>https://jaehyeon.me/blog/2023-07-20-kafka-development-with-docker-part-11/</link><pubDate>Thu, 20 Jul 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-07-20-kafka-development-with-docker-part-11/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In the previous posts, we discussed how to implement client authentication by TLS (SSL or TLS/SSL) and SASL authentication. One of the key benefits of client authentication is achieving user access control. Kafka ships with a pluggable, out-of-the box authorization framework, which is configured with the authorizer.</description><enclosure url="https://jaehyeon.me/blog/2023-07-20-kafka-development-with-docker-part-11/featured.png" length="458848" type="image/png"/></item><item><title>Kafka Development with Docker - Part 10 SASL Authentication</title><link>https://jaehyeon.me/blog/2023-07-13-kafka-development-with-docker-part-10/</link><pubDate>Thu, 13 Jul 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-07-13-kafka-development-with-docker-part-10/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In the previous post, we discussed TLS (SSL or TLS/SSL) authentication to improve security. It enforces two-way verification where a client certificate is verified by Kafka brokers. Client authentication can also be enabled by Simple Authentication and Security Layer (SASL), and we will discuss how to implement SASL authentication with Java and Python client examples in this post.</description><enclosure url="https://jaehyeon.me/blog/2023-07-13-kafka-development-with-docker-part-10/featured.png" length="471947" type="image/png"/></item><item><title>Kafka Development with Docker - Part 9 SSL Authentication</title><link>https://jaehyeon.me/blog/2023-07-06-kafka-development-with-docker-part-9/</link><pubDate>Thu, 06 Jul 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-07-06-kafka-development-with-docker-part-9/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In the previous post, we discussed how to configure TLS (SSL or TLS/SSL) encryption with Java and Python client examples. SSL encryption is a one-way verification process where a server certificate is verified by a client via SSL Handshake.</description><enclosure url="https://jaehyeon.me/blog/2023-07-06-kafka-development-with-docker-part-9/featured.png" length="471471" type="image/png"/></item><item><title>Kafka Development with Docker - Part 8 SSL Encryption</title><link>https://jaehyeon.me/blog/2023-06-29-kafka-development-with-docker-part-8/</link><pubDate>Thu, 29 Jun 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-06-29-kafka-development-with-docker-part-8/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 By default, Apache Kafka communicates in PLAINTEXT, which means that all data is sent without being encrypted. To secure communication, we can configure Kafka clients and other components to use Transport Layer Security (TLS) encryption.</description><enclosure url="https://jaehyeon.me/blog/2023-06-29-kafka-development-with-docker-part-8/featured.png" length="469311" type="image/png"/></item><item><title>Kafka Development with Docker - Part 7 Producer and Consumer with Glue Schema Registry</title><link>https://jaehyeon.me/blog/2023-06-22-kafka-development-with-docker-part-7/</link><pubDate>Thu, 22 Jun 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-06-22-kafka-development-with-docker-part-7/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In Part 4, we developed Kafka producer and consumer applications using the kafka-python package. The Kafka messages are serialized as Json, but are not associated with a schema as there was not an integrated schema registry.</description><enclosure url="https://jaehyeon.me/blog/2023-06-22-kafka-development-with-docker-part-7/featured.png" length="57175" type="image/png"/></item><item><title>Kafka Development with Docker - Part 6 Kafka Connect with Glue Schema Registry</title><link>https://jaehyeon.me/blog/2023-06-15-kafka-development-with-docker-part-6/</link><pubDate>Thu, 15 Jun 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-06-15-kafka-development-with-docker-part-6/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In Part 3, we developed a data ingestion pipeline with fake online order data using Kafka Connect source and sink connectors. Schemas are not enabled on both of them as there was not an integrated schema registry.</description><enclosure url="https://jaehyeon.me/blog/2023-06-15-kafka-development-with-docker-part-6/featured.png" length="60354" type="image/png"/></item><item><title>Kafka Connect for AWS Services Integration - Part 2 Develop Camel DynamoDB Sink Connector</title><link>https://jaehyeon.me/blog/2023-06-04-kafka-connect-for-aws-part-2/</link><pubDate>Sun, 04 Jun 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-06-04-kafka-connect-for-aws-part-2/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In Part 1, we reviewed Kafka connectors focusing on AWS services integration. Among the available connectors, the suite of Apache Camel Kafka connectors and the Kinesis Kafka connector from the AWS Labs can be effective for building data ingestion pipelines on AWS.</description><enclosure url="https://jaehyeon.me/blog/2023-06-04-kafka-connect-for-aws-part-2/featured.png" length="87044" type="image/png"/></item><item><title>Kafka Development with Docker - Part 4 Producer and Consumer</title><link>https://jaehyeon.me/blog/2023-06-01-kafka-development-with-docker-part-4/</link><pubDate>Thu, 01 Jun 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-06-01-kafka-development-with-docker-part-4/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 In the previous post, we discussed Kafka Connect to stream data to/from a Kafka cluster. Kafka also includes the Producer/Consumer APIs that allow client applications to send/read streams of data to/from topics in a Kafka cluster.</description><enclosure url="https://jaehyeon.me/blog/2023-06-01-kafka-development-with-docker-part-4/featured.png" length="75255" type="image/png"/></item><item><title>Kafka Development with Docker - Part 3 Kafka Connect</title><link>https://jaehyeon.me/blog/2023-05-25-kafka-development-with-docker-part-3/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-05-25-kafka-development-with-docker-part-3/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 According to the documentation of Apache Kafka, Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka.</description><enclosure url="https://jaehyeon.me/blog/2023-05-25-kafka-development-with-docker-part-3/featured.png" length="69998" type="image/png"/></item><item><title>Kafka Development with Docker - Part 2 Management App</title><link>https://jaehyeon.me/blog/2023-05-18-kafka-development-with-docker-part-2/</link><pubDate>Thu, 18 May 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-05-18-kafka-development-with-docker-part-2/</guid><description>In the previous post, I illustrated how to create a topic and to produce/consume messages using the command utilities provided by Apache Kafka. It is not convenient, however, for example, when you consume serialised messages where their schemas are stored in a schema registry. Also, the utilities don&amp;rsquo;t support to browse or manage related resources such as connectors and schemas. Therefore, a Kafka management app can be a good companion for development, which helps monitor and manage resources on an easy-to-use user interface.</description><enclosure url="https://jaehyeon.me/blog/2023-05-18-kafka-development-with-docker-part-2/featured.png" length="59675" type="image/png"/></item><item><title>Kafka Development with Docker - Part 1 Cluster Setup</title><link>https://jaehyeon.me/blog/2023-05-04-kafka-development-with-docker-part-1/</link><pubDate>Thu, 04 May 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-05-04-kafka-development-with-docker-part-1/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 I&amp;rsquo;m teaching myself modern data streaming architectures on AWS, and Apache Kafka is one of the key technologies, which can be used for messaging, activity tracking, stream processing and so on. While applications tend to be deployed to cloud, it can be much easier if we develop and test those with Docker and Docker Compose locally.</description><enclosure url="https://jaehyeon.me/blog/2023-05-04-kafka-development-with-docker-part-1/featured.png" length="98355" type="image/png"/></item><item><title>How to configure Kafka consumers to seek offsets by timestamp</title><link>https://jaehyeon.me/blog/2023-01-10-kafka-consumer-seek-offsets/</link><pubDate>Tue, 10 Jan 2023 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2023-01-10-kafka-consumer-seek-offsets/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 Normally we consume Kafka messages from the beginning/end of a topic or last committed offsets. For backfilling or troubleshooting, however, we need to consume messages from a certain timestamp occasionally. If we know which topic partition to choose e.</description><enclosure url="https://jaehyeon.me/blog/2023-01-10-kafka-consumer-seek-offsets/featured.png" length="47217" type="image/png"/></item><item><title>Revisit AWS Lambda Invoke Function Operator of Apache Airflow</title><link>https://jaehyeon.me/blog/2022-08-06-revisit-lambda-operator/</link><pubDate>Sat, 06 Aug 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-08-06-revisit-lambda-operator/</guid><description>Apache Airflow is a popular workflow management platform. A wide range of AWS services are integrated with the platform by Amazon AWS Operators. AWS Lambda is one of the integrated services, and it can be used to develop workflows efficiently. The current Lambda Operator, however, just invokes a Lambda function, and it can fail to report the invocation result of a function correctly and to record the exact error message from failure.</description><enclosure url="https://jaehyeon.me/blog/2022-08-06-revisit-lambda-operator/featured.png" length="24814" type="image/png"/></item><item><title>Develop and Test Apache Spark Apps for EMR Locally Using Docker</title><link>https://jaehyeon.me/blog/2022-05-08-emr-local-dev/</link><pubDate>Sun, 08 May 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-05-08-emr-local-dev/</guid><description>[UPDATE 2023-12-07]
I wrote a new post that simplifies the Spark configuration dramatically. Besides, the log configuration is based on Log4J2, which applies to newer Spark versions. Moreover, the container is configured to run the Spark History Server, and it allows us to debug and diagnose completed and running Spark applications. I recommend referring to the new post. [UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository.</description><enclosure url="https://jaehyeon.me/blog/2022-05-08-emr-local-dev/featured.png" length="25693" type="image/png"/></item><item><title>Use External Schema Registry with MSK Connect – Part 2 MSK Deployment</title><link>https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/</link><pubDate>Sun, 03 Apr 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/</guid><description>In the previous post, we discussed a Change Data Capture (CDC) solution with a schema registry. A local development environment is set up using Docker Compose. The Debezium and Confluent S3 connectors are deployed with the Confluent Avro converter and the Apicurio registry is used as the schema registry service. A quick example is shown to illustrate how schema evolution can be managed by the schema registry. In this post, we&amp;rsquo;ll build the solution on AWS using MSK, MSK Connect, Aurora PostgreSQL and ECS.</description><enclosure url="https://jaehyeon.me/blog/2022-04-03-schema-registry-part2/featured.png" length="59689" type="image/png"/></item><item><title>Use External Schema Registry with MSK Connect – Part 1 Local Development</title><link>https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/</link><pubDate>Mon, 07 Mar 2022 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/</guid><description>[UPDATE 2025-10-01]
Bitnami&amp;rsquo;s public Docker images have been moved to the Bitnami Legacy repository. To ensure continued access and compatibility, please update your Docker image references accordingly.
For example:
bitnami/kafka:2.8.1 → bitnamilegacy/kafka:2.8.1 bitnami/zookeeper:3.7.0 → bitnamilegacy/zookeeper:3.7.0 bitnami/python:3.9.0 → bitnamilegacy/python:3.9.0 When we discussed a Change Data Capture (CDC) solution in one of the earlier posts, we used the JSON converter that comes with Kafka Connect. We optionally enabled the key and value schemas and the topic messages include those schemas together with payload.</description><enclosure url="https://jaehyeon.me/blog/2022-03-07-schema-registry-part1/featured.png" length="59689" type="image/png"/></item><item><title>Local Development of AWS Glue 3.0 and Later</title><link>https://jaehyeon.me/blog/2021-11-14-glue-3-local-development/</link><pubDate>Sun, 14 Nov 2021 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2021-11-14-glue-3-local-development/</guid><description>In an earlier post, I demonstrated how to set up a local development environment for AWS Glue 1.0 and 2.0 using a docker image that is published by the AWS Glue team and the Visual Studio Code Remote – Containers extension. Recently AWS Glue 3.0 was released, but a docker image for this version is not published. In this post, I&amp;rsquo;ll illustrate how to create a development environment for AWS Glue 3.</description><enclosure url="https://jaehyeon.me/blog/2021-11-14-glue-3-local-development/featured.png" length="30923" type="image/png"/></item><item><title>AWS Glue Local Development with Docker and Visual Studio Code</title><link>https://jaehyeon.me/blog/2021-08-20-glue-local-development/</link><pubDate>Fri, 20 Aug 2021 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2021-08-20-glue-local-development/</guid><description>As described in the product page, AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. For development, a development endpoint is recommended, but it can be costly, inconvenient or unavailable (for Glue 2.0). The AWS Glue team published a Docker image that includes the AWS Glue binaries and all the dependencies packaged together. After inspecting it, I find some modifications are necessary in order to build a development environment on it.</description><enclosure url="https://jaehyeon.me/blog/2021-08-20-glue-local-development/featured.png" length="19535" type="image/png"/></item><item><title>Thoughts on Apache Airflow AWS Lambda Operator</title><link>https://jaehyeon.me/blog/2020-04-13-airflow-lambda-operator/</link><pubDate>Mon, 13 Apr 2020 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2020-04-13-airflow-lambda-operator/</guid><description>Apache Airflow is a popular open-source workflow management platform. Typically tasks run remotely by Celery workers for scalability. In AWS, however, scalability can also be achieved using serverless computing services in a simpler way. For example, the ECS Operator allows to run dockerized tasks and, with the Fargate launch type, they can run in a serverless environment.
The ECS Operator alone is not sufficient because it can take up to several minutes to pull a Docker image and to set up network interface (for the case of Fargate launch type).</description><enclosure url="https://jaehyeon.me/blog/2020-04-13-airflow-lambda-operator/featured.png" length="44994" type="image/png"/></item><item><title>Dynamic Routing and Centralized Auth with Traefik, Python and R Example</title><link>https://jaehyeon.me/blog/2019-11-29-traefik-example/</link><pubDate>Fri, 29 Nov 2019 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2019-11-29-traefik-example/</guid><description>Ingress in Kubernetes exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. By setting rules, it routes requests to appropriate services (precisely requests are sent to individual Pods by Ingress Controller). Rules can be set up dynamically and I find it&amp;rsquo;s more efficient compared to traditional reverse proxy.
Traefik is a modern HTTP reverse proxy and load balancer and it can be used as a Kubernetes Ingress Controller.</description><enclosure url="https://jaehyeon.me/blog/2019-11-29-traefik-example/featured.png" length="139790" type="image/png"/></item><item><title>Linux Dev Environment on Windows</title><link>https://jaehyeon.me/blog/2019-11-01-linux-on-windows/</link><pubDate>Fri, 01 Nov 2019 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2019-11-01-linux-on-windows/</guid><description>I use Linux containers a lot for development. Having Windows computers at home and work, I used to use Linux VMs on VirtualBox or VMWare Workstation. It&amp;rsquo;s not a bad option but it requires a lot of resources. Recently, after my home computer was updated, I was not able to start my hypervisor anymore. Also I didn&amp;rsquo;t like huge resource consumption of it so that I began to look for a different development environment.</description><enclosure url="https://jaehyeon.me/blog/2019-11-01-linux-on-windows/featured.png" length="187978" type="image/png"/></item><item><title>AWS Local Development with LocalStack</title><link>https://jaehyeon.me/blog/2019-07-20-aws-localstack/</link><pubDate>Sat, 20 Jul 2019 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2019-07-20-aws-localstack/</guid><description>LocalStack provides an easy-to-use test/mocking framework for developing AWS applications. In this post, I&amp;rsquo;ll demonstrate how to utilize LocalStack for development using a web service.
Specifically a simple web service built with Flask-RestPlus is used. It supports simple CRUD operations against a database table. It is set that SQS and Lambda are used for creating and updating a record. When a POST or PUT request is made, the service sends a message to a SQS queue and directly returns 204 reponse.</description><enclosure url="https://jaehyeon.me/blog/2019-07-20-aws-localstack/featured.png" length="164886" type="image/png"/></item><item><title>Cronicle Multi Server Setup</title><link>https://jaehyeon.me/blog/2019-07-19-cronicle-multi-server-setup/</link><pubDate>Fri, 19 Jul 2019 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2019-07-19-cronicle-multi-server-setup/</guid><description>Accroding to the project GitHub repository,
Cronicle is a multi-server task scheduler and runner, with a web based front-end UI. It handles both scheduled, repeating and on-demand jobs, targeting any number of slave servers, with real-time stats and live log viewer.
By default, Cronicle is configured to launch a single master server - task scheduling is controlled by the master server. For high availability, it is important that another server takes the role of master when the existing master server fails.</description><enclosure url="https://jaehyeon.me/blog/2019-07-19-cronicle-multi-server-setup/featured.png" length="64396" type="image/png"/></item><item><title>API Development with R Part II</title><link>https://jaehyeon.me/blog/2017-11-19-api-development-with-r-2/</link><pubDate>Sun, 19 Nov 2017 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2017-11-19-api-development-with-r-2/</guid><description>In Part I, it is discussed how to serve an R function with plumber, Rserve and rApache. In this post, the APIs are deployed in a Docker container and, after showing example requests, their performance is compared. The rocker/r-ver:3.4 is used as the base image and each of the APIs is added to it. For simplicity, the APIs are served by Supervisor. For performance testing, Locust is used. The source of this post can be found in this GitHub repository.</description><enclosure url="https://jaehyeon.me/blog/2017-11-19-api-development-with-r-2/featured.png" length="367256" type="image/png"/></item><item><title>API Development with R Part I</title><link>https://jaehyeon.me/blog/2017-11-18-api-development-with-r-1/</link><pubDate>Sat, 18 Nov 2017 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2017-11-18-api-development-with-r-1/</guid><description>API is an effective way of distributing analysis outputs to external clients. When it comes to API development with R, however, there are not many choices. Probably development would be made with plumber, Rserve, rApache or OpenCPU if a client or bridge layer to R is not considered.
This is 2 part series in relation to API development with R. In this post, serving an R function with plumber, Rserve and rApache is discussed.</description><enclosure url="https://jaehyeon.me/blog/2017-11-18-api-development-with-r-1/featured.png" length="367256" type="image/png"/></item></channel></rss>