Jaehyeon Kim

Self-Service Data Platform via a Multi-Tenant SQL Gateway

July 17, 202515 min read Big Data Data Architecture Data Engineering Data Platform Data Streaming Apache Flink Apache Kyuubi Apache Langer Apache Spark Data Governance Data Lakehouse Data Lineage Hive Metastore Marquez Multi-Tenancy OpenLineage Self-Service Analytics SQL Gateway Trino

Providing direct access to big data engines like Spark and Flink often creates chaos. A gateway-centric architecture solves this by introducing a robust control plane. This article presents a detailed blueprint using Apache Kyuubi, a multi-tenant SQL gateway, to provision and manage on-demand Spark, Flink, and Trino engines. Learn how this model delivers true self-service analytics with centralized governance, finally resolving the conflict between user empowerment and platform stability.

June 17, 202519 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kafka Streams Kotlin Kpow

In the last post, we explored the fine-grained control of Flink’s DataStream API. Now, we’ll approach the same problem from a higher level of abstraction using the Flink Table API. This post demonstrates how to build a declarative analytics pipeline that processes our continuous stream of Avro-formatted order events. We will define a Table on top of a DataStream and use SQL-like expressions to perform windowed aggregations. This example highlights the power and simplicity of the Table API for analytical tasks and showcases Flink’s seamless integration between its different API layers to handle complex requirements like late data.

June 10, 202517 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kafka Streams Kotlin Kpow

Building on our exploration of stream processing, we now transition from Kafka’s native library to Apache Flink, a powerful, general-purpose distributed processing engine. In this post, we’ll dive into Flink’s foundational DataStream API. We will tackle the same supplier statistics problem - analyzing a stream of Avro-formatted order events - but this time using Flink’s robust features for stateful computation. This example will highlight Flink’s sophisticated event-time processing with watermarks and its elegant, built-in mechanisms for handling late-arriving data through side outputs.

June 3, 202518 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kafka Streams Kotlin Kpow

In this post, we shift our focus from basic Kafka clients to real-time stream processing with Kafka Streams. We’ll explore a Kotlin application designed to analyze a continuous stream of Avro-formatted order events, calculate supplier statistics in tumbling windows, and intelligently handle late-arriving data. This example demonstrates the power of Kafka Streams for building lightweight, yet robust, stream processing applications directly within your Kafka ecosystem, leveraging event-time processing and custom logic.

May 27, 202515 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kotlin Kpow

In this post, we’ll explore a practical example of building Kafka client applications using Kotlin, Apache Avro for data serialization, and Gradle for build management. We’ll walk through the setup of a Kafka producer that generates mock order data and a consumer that processes these orders. This example highlights best practices such as schema management with Avro, robust error handling, and graceful shutdown, providing a solid foundation for your own Kafka-based projects. We’ll dive into the build configuration, the Avro schema definition, utility functions for Kafka administration, and the core logic of both the producer and consumer applications.

May 20, 202514 min read Data Streaming Getting Started With Real-Time Streaming in Kotlin Apache Kafka Docker Factor House Local Kotlin Kpow

This post explores a Kotlin-based Kafka project, meticulously detailing the construction and operation of both a Kafka producer application, responsible for generating and sending order data, and a Kafka consumer application, designed to receive and process these orders. We’ll delve into each component, from build configuration to message handling, to understand how they work together in an event-driven system.

May 6, 20256 min read Big Data Data Architecture Data Engineering Data Streaming Apache Flink Apache Iceberg Apache Paimon Fluss

The world of data is converging. The traditional divide between batch processing for historical analytics and stream processing for real-time insights is becoming increasingly blurry. Businesses demand architectures that handle both seamlessly. Enter the “Streamhouse” - an evolution of the Lakehouse concept, designed with streaming as a first-class citizen.

Today, we’ll introduce three key open-source technologies shaping this space: Apache Paimon™, Fluss, and Apache Iceberg. While each has unique strengths, their true power lies in how they can be integrated to build robust, flexible, and performant data platforms.

April 15, 20255 min read Data Streaming Apache Flink Docker Docker Compose Flink SQL Flink SQL Client

The Flink SQL Cookbook by Ververica is a hands-on, example-rich guide to mastering Apache Flink SQL for real-time stream processing. It offers a wide range of self-contained recipes, from basic queries and table operations to more advanced use cases like windowed aggregations, complex joins, user-defined functions (UDFs), and pattern detection. These examples are designed to be run on the Ververica Platform, and as such, the cookbook doesn’t include instructions for setting up a Flink cluster.

To help you run these recipes locally and explore Flink SQL without external dependencies, this post walks through setting up a fully functional local Flink cluster using Docker Compose. With this setup, you can experiment with the cookbook examples right on your machine.

March 4, 20258 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Apache ECharts Next.js React WebSocket

In this post, we build a real-time monitoring dashboard using Next.js, a React framework that supports server-side rendering, static site generation, and full-stack capabilities with built-in performance optimizations. Similar to the Streamlit app we developed in Part 2, this dashboard connects to the WebSocket server from Part 1 to continuously fetch and visualize key metrics such as order counts, sales data, and revenue by traffic source and country. With interactive bar charts and dynamic metrics, users can monitor sales trends and other critical business KPIs in real-time.

February 25, 20257 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Apache ECharts Python Streamlit WebSocket

In this post, we develop a real-time monitoring dashboard using Streamlit, an open-source Python framework that allows data scientists and AI/ML engineers to create interactive data apps. The app connects to the WebSocket server we developed in Part 1 and continuously fetches data to visualize key metrics such as order counts, sales data, and revenue by traffic source and country. With interactive bar charts and dynamic metrics, users can monitor sales trends and other important business KPIs in real-time.

Self-Service Data Platform via a Multi-Tenant SQL Gateway

Flink Table API - Declarative Analytics for Supplier Stats in Real Time

Flink DataStream API - Scalable Event Processing for Supplier Stats

Kafka Streams - Lightweight Real-Time Processing for Supplier Stats

Kafka Clients With Avro - Schema Registry and Order Events

Kafka Clients With JSON - Producing and Consuming Order Events

Meet the Streamhouse Trio - Paimon, Fluss, and Iceberg for Unified Data Architectures

Run Flink SQL Cookbook in Docker

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 3 Next.js Dashboard

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 2 Streamlit Dashboard