Jaehyeon Kim Page 2

Flink DataStream API - Scalable Event Processing for Supplier Stats

June 10, 202517 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kafka Streams Kotlin Kpow

Building on our exploration of stream processing, we now transition from Kafka’s native library to Apache Flink, a powerful, general-purpose distributed processing engine. In this post, we’ll dive into Flink’s foundational DataStream API. We will tackle the same supplier statistics problem - analyzing a stream of Avro-formatted order events - but this time using Flink’s robust features for stateful computation. This example will highlight Flink’s sophisticated event-time processing with watermarks and its elegant, built-in mechanisms for handling late-arriving data through side outputs.

June 3, 202518 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kafka Streams Kotlin Kpow

In this post, we shift our focus from basic Kafka clients to real-time stream processing with Kafka Streams. We’ll explore a Kotlin application designed to analyze a continuous stream of Avro-formatted order events, calculate supplier statistics in tumbling windows, and intelligently handle late-arriving data. This example demonstrates the power of Kafka Streams for building lightweight, yet robust, stream processing applications directly within your Kafka ecosystem, leveraging event-time processing and custom logic.

May 27, 202515 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kotlin Kpow

In this post, we’ll explore a practical example of building Kafka client applications using Kotlin, Apache Avro for data serialization, and Gradle for build management. We’ll walk through the setup of a Kafka producer that generates mock order data and a consumer that processes these orders. This example highlights best practices such as schema management with Avro, robust error handling, and graceful shutdown, providing a solid foundation for your own Kafka-based projects. We’ll dive into the build configuration, the Avro schema definition, utility functions for Kafka administration, and the core logic of both the producer and consumer applications.

May 20, 202514 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kotlin Kpow

This post explores a Kotlin-based Kafka project, meticulously detailing the construction and operation of both a Kafka producer application, responsible for generating and sending order data, and a Kafka consumer application, designed to receive and process these orders. We’ll delve into each component, from build configuration to message handling, to understand how they work together in an event-driven system.

May 6, 20256 min read Big Data Data Architecture Data Engineering Data Streaming Apache Flink Apache Iceberg Apache Paimon Fluss

The world of data is converging. The traditional divide between batch processing for historical analytics and stream processing for real-time insights is becoming increasingly blurry. Businesses demand architectures that handle both seamlessly. Enter the “Streamhouse” - an evolution of the Lakehouse concept, designed with streaming as a first-class citizen.

Today, we’ll introduce three key open-source technologies shaping this space: Apache Paimon™, Fluss, and Apache Iceberg. While each has unique strengths, their true power lies in how they can be integrated to build robust, flexible, and performant data platforms.

April 15, 20255 min read Data Streaming Apache Flink Docker Docker Compose Flink SQL Flink SQL Client

The Flink SQL Cookbook by Ververica is a hands-on, example-rich guide to mastering Apache Flink SQL for real-time stream processing. It offers a wide range of self-contained recipes, from basic queries and table operations to more advanced use cases like windowed aggregations, complex joins, user-defined functions (UDFs), and pattern detection. These examples are designed to be run on the Ververica Platform, and as such, the cookbook doesn’t include instructions for setting up a Flink cluster.

To help you run these recipes locally and explore Flink SQL without external dependencies, this post walks through setting up a fully functional local Flink cluster using Docker Compose. With this setup, you can experiment with the cookbook examples right on your machine.

March 4, 20258 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Apache ECharts Next.js React WebSocket

In this post, we build a real-time monitoring dashboard using Next.js, a React framework that supports server-side rendering, static site generation, and full-stack capabilities with built-in performance optimizations. Similar to the Streamlit app we developed in Part 2, this dashboard connects to the WebSocket server from Part 1 to continuously fetch and visualize key metrics such as order counts, sales data, and revenue by traffic source and country. With interactive bar charts and dynamic metrics, users can monitor sales trends and other critical business KPIs in real-time.

February 25, 20257 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Apache ECharts Python Streamlit WebSocket

In this post, we develop a real-time monitoring dashboard using Streamlit, an open-source Python framework that allows data scientists and AI/ML engineers to create interactive data apps. The app connects to the WebSocket server we developed in Part 1 and continuously fetches data to visualize key metrics such as order counts, sales data, and revenue by traffic source and country. With interactive bar charts and dynamic metrics, users can monitor sales trends and other important business KPIs in real-time.

February 18, 202510 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Docker FastAPI PostgreSQL Python WebSocket

In this series, we develop real-time monitoring dashboard applications. A data generating app is created with Python, and it ingests the theLook eCommerce data continuously into a PostgreSQL database. A WebSocket server, built by FastAPI, periodically queries the data to serve its clients. The monitoring dashboards will be developed using Streamlit and Next.js, with Apache ECharts for visualization. In this post, we walk through the data generation app and backend API, while the monitoring dashboards will be discussed in later posts.

December 19, 202412 min read Data Streaming Apache Beam Python Examples Apache Beam Apache Flink Python Splittable DoFn

In Part 9, we developed two Apache Beam pipelines using Splittable DoFn (SDF). One of them is a batch file reader, which reads a list of files in an input folder followed by processing them in parallel. We can extend the I/O connector so that, instead of listing files once at the beginning, it scans an input folder periodically for new files and processes whenever new files are created in the folder. The techniques used in this post can be quite useful as they can be applied to developing I/O connectors that target other unbounded (or streaming) data sources (eg Kafka) using the Python SDK.

Flink DataStream API - Scalable Event Processing for Supplier Stats

Kafka Streams - Lightweight Real-Time Processing for Supplier Stats

Kafka Clients With Avro - Schema Registry and Order Events

Kafka Clients With JSON - Producing and Consuming Order Events

Meet the Streamhouse Trio - Paimon, Fluss, and Iceberg for Unified Data Architectures

Run Flink SQL Cookbook in Docker

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 3 Next.js Dashboard

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 2 Streamlit Dashboard

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 1 Data Producer

Apache Beam Python Examples - Part 10 Develop Streaming File Reader Using Splittable DoFn