<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Jaehyeon Kim</title><link>https://jaehyeon.me/</link><description>Developer Experience at Factor House</description><language>en</language><item><title>Why Digital Twins Are Rewiring Industry 4.0</title><link>https://jaehyeon.me/blog/2026-04-23-digital-twin-industry-4-0/</link><guid>https://jaehyeon.me/blog/2026-04-23-digital-twin-industry-4-0/</guid><pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate><description>
Beyond CAD Models There is a project by Dassault Systèmes called the Living Heart that illustrates the trajectory of this technology. Instead of relying on standard 2D scans, surgeons can pull up a 3D model of a patient&amp;rsquo;s heart that simulates blood flow, mechanics, and electricity based on real patient data. It allows doctors to test therapeutic interventions before surgery begins.
This highlights what a modern digital twin looks like in practice.</description><content:encoded><![CDATA[
        
<h2 id="beyond-cad-models" data-numberify>Beyond CAD Models<a class="anchor ms-1" href="#beyond-cad-models"></a></h2>
<p>There is a project by Dassault Systèmes called the <a href="https://www.3ds.com/3dexperiencelab/portfolio/living-heart" target="_blank" rel="noopener noreferrer"><strong>Living Heart</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> that illustrates the trajectory of this technology. Instead of relying on standard 2D scans, surgeons can pull up a 3D model of a patient&rsquo;s heart that simulates blood flow, mechanics, and electricity based on real patient data. It allows doctors to test therapeutic interventions before surgery begins.</p>
<p>This highlights what a modern digital twin looks like in practice. It has moved past being a static 3D CAD file and serves as a persistent link between physical and digital environments, updating continuously based on operational data.</p>

<h2 id="industry-applications" data-numberify>Industry Applications<a class="anchor ms-1" href="#industry-applications"></a></h2>
<p>While manufacturing initially drove this technology, digital twins are now expanding across a wider spectrum of advanced applications.</p>
<ul>
<li><strong>Smart Manufacturing:</strong> On automated factory floors, robotic arms stream live telemetry to virtual models. This allows plant managers to predict maintenance needs and test routing scenarios without halting physical production lines.</li>
<li><strong>Supply Chain and Logistics:</strong> Global supply chains use digital twins to track everything from warehouse inventory levels to fleet movements, achieving end-to-end visibility. Operators can simulate supply chain shocks and dynamically resolve bottlenecks before they impact customers.</li>
<li><strong>Cybersecurity and Infrastructure:</strong> Network digital twins are increasingly used to replicate API gateways and load balancers. Subjecting a live production server to a simulated DDoS attack carries unacceptable downtime risks. Instead, security teams use virtual replicas as isolated environments to safely test rate-limiting and AI behavioral rules.</li>
<li><strong>Next-Generation Telecommunications:</strong> Modern 5G networks are incredibly complex. Telecom operators build digital twins of their networks to manage dynamic bandwidth allocation and network slicing, simulating massive traffic spikes across thousands of nodes before they cause cellular outages.</li>
</ul>

<h2 id="reality-gap-simulations-vs-digital-twins" data-numberify>Reality Gap: Simulations vs. Digital Twins<a class="anchor ms-1" href="#reality-gap-simulations-vs-digital-twins"></a></h2>
<p>The tech industry frequently uses the terms &ldquo;simulation&rdquo; and &ldquo;digital twin&rdquo; interchangeably. Clarifying the technical difference between the two is critical for evaluating modern control architectures. To understand this difference practically, let&rsquo;s look at a factory floor where &ldquo;Machine B&rdquo; suddenly breaks down and its capacity drops to zero.</p>
<p>Here is how three different architectural systems handle that exact event:</p>

<h3 id="traditional-simulation-sealed-box" data-numberify>Traditional Simulation (Sealed Box)<a class="anchor ms-1" href="#traditional-simulation-sealed-box"></a></h3>
<p>A <strong>simulation</strong> is a predictive model designed to explore &ldquo;what if&rdquo; scenarios within bounded parameters, relying heavily on static inputs and batch processing. Many traditional simulation environments, including standard <em>Discrete-Event Simulation (DES)</em> tools, were not designed for continuous live operational updates.</p>
<ul>
<li><strong>Scenario:</strong> On Monday morning, an engineer runs a DES model of the factory. On Tuesday, Machine B physically breaks. Because the traditional simulation operates as a sealed box, it is unaware of real-world changes once the mathematical execution begins. Unless an engineer manually stops the simulation, rewrites the parameters, and restarts it, the model will incorrectly continue assuming Machine B is running at full capacity.</li>
</ul>

<h3 id="operational-digital-twin-live-mirror" data-numberify>Operational Digital Twin (Live Mirror)<a class="anchor ms-1" href="#operational-digital-twin-live-mirror"></a></h3>
<p>At the advanced end of the spectrum, an <strong>operational digital twin</strong> maintains a continuous, two-way connection to reality. Instead of relying on static snapshots, it utilizes distributed computing, time-series databases, and IoT connectivity to assess the exact current operational state.</p>
<ul>
<li><strong>Scenario:</strong> The moment Machine B breaks, sensors ping the cloud database. The digital twin dashboard immediately turns red, updating the machine&rsquo;s live status to &ldquo;Offline&rdquo; and its current capacity to zero. However, it only tells you what is happening <em>right now</em>. It does not automatically recalculate the mathematical impact to simulate the future consequences of that breakdown.</li>
</ul>

<h3 id="hybrid-execution-dynamic-adaptation" data-numberify>Hybrid Execution (Dynamic Adaptation)<a class="anchor ms-1" href="#hybrid-execution-dynamic-adaptation"></a></h3>
<p>The <strong>hybrid approach</strong> bridges this gap by wiring the live state of the digital twin directly into a running simulation engine.</p>
<ul>
<li><strong>Scenario:</strong> The operational twin registers that Machine B is broken and emits a live Kafka event. A background hybrid simulation engine ingests that event <em>while it is still running</em>. It dynamically mutates Machine B&rsquo;s capacity parameter to zero and recalculates the logic on the fly. The system now automatically outputs simulated telemetry showing the immediate cascading consequences: &ldquo;Machine B&rsquo;s queue is backing up, routing logic must shift to Machine D, and a critical bottleneck will form at Machine C in exactly 45 minutes if no intervention is taken.&rdquo;</li>
</ul>

<h2 id="upgraded-simulation-fallacy" data-numberify>Upgraded Simulation Fallacy<a class="anchor ms-1" href="#upgraded-simulation-fallacy"></a></h2>
<p>Because of this market confusion, organizations often fall into the &ldquo;Upgraded Simulation Fallacy.&rdquo; They purchase simulation software, attach a 3D dashboard, and classify the project as a digital twin. However, if the system cannot process continuous data streams or synchronize with live states, it fundamentally remains a static simulation.</p>
<p>Transitioning to an operational digital twin introduces strict engineering realities:</p>
<ul>
<li><strong>Cost and Complexity:</strong> Building them requires heavy investments in sensors, edge computing, and complex system architectures.</li>
<li><strong>Cybersecurity Risks:</strong> Connecting physical infrastructure to cloud-based models expands the attack surface.</li>
<li><strong>Data Pipelines and Standardization:</strong> Getting disparate sensors to speak the same language and pipe into a central engine is an architectural challenge.</li>
</ul>
<p>This last point is where many projects stall. If you force real-time updates into a legacy simulation engine, the simulation clock often bottlenecks while waiting for network requests, or the system fails due to unpredictable data. To build a functional digital twin, your simulation engine must be designed to handle asynchronous network I/O so the underlying math model runs uninterrupted, while enforcing strict data schemas (like Pydantic or Avro) to maintain system stability.</p>

<h2 id="taking-control-with-a-hybrid-approach" data-numberify>Taking Control with a Hybrid Approach<a class="anchor ms-1" href="#taking-control-with-a-hybrid-approach"></a></h2>
<p>Ultimately, organizations shouldn&rsquo;t have to choose between a static simulation and a live twin. The future of Industry 4.0 is a hybrid approach: running &ldquo;what-if&rdquo; simulations against live digital twin states.</p>
<p>By feeding the current, real-time state of a digital twin directly into a simulation engine, engineers can forecast the next four hours of production or network traffic based on exact current conditions.</p>
<p>But this introduces a major architectural problem: How do you take a live digital twin state, ingest an asynchronous Kafka stream, and dynamically alter the capacity of a running virtual machine without lagging behind real-time?</p>
<p>One approach to solving this class of problem is the <a href="https://github.com/jaehyeon-kim/dynamic-des" target="_blank" rel="noopener noreferrer"><strong><code>dynamic-des</code></strong><i class="fas fa-external-link-square-alt ms-1"></i></a> package. In Part 2 of this series, we will look at how the package&rsquo;s Switchboard pattern and dynamic resources can help turn a static mathematical model into a synchronized, event-driven digital twin.</p>
<hr>

<h3 id="references" data-numberify>References<a class="anchor ms-1" href="#references"></a></h3>
<ol>
<li>Attaran, M., &amp; Celik, B. G. (2023). Digital Twin: Benefits, use cases, challenges, and opportunities. <em>Decision Analytics Journal</em>, 6, 100165.</li>
<li>Javaid, M., Haleem, A., &amp; Suman, R. (2023). Digital Twin applications toward Industry 4.0: A Review. <em>Cognitive Robotics</em>, 3, 71-92.</li>
<li>Niantic Spatial. (2025). Simulations vs. Digital Twins. <em>Niantic Spatial Campaigns</em>.</li>
</ol>

      ]]></content:encoded></item><item><title>Building a Real-Time Industrial Digital Twin with Apache Flink and Online Machine Learning</title><link>https://jaehyeon.me/blog/2026-04-21-digital-twin-online-machine-learning/</link><guid>https://jaehyeon.me/blog/2026-04-21-digital-twin-online-machine-learning/</guid><pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate><description>
Overview Imagine using a rolling pin to flatten out a thick piece of dough. A Hot Strip Mill does the exact same thing, but with glowing red-hot steel slabs (often heated over 1000°C) and massive mechanical rollers. The steel is passed through a series of these rollers, crushing it down from a thick block into a long, thin sheet.
Calculating the exact Rolling Force required to crush the steel is critical.</description><content:encoded><![CDATA[
        
<h2 id="overview" data-numberify>Overview<a class="anchor ms-1" href="#overview"></a></h2>
<p>Imagine using a rolling pin to flatten out a thick piece of dough. A Hot Strip Mill does the exact same thing, but with glowing red-hot steel slabs (often heated over 1000°C) and massive mechanical rollers. The steel is passed through a series of these rollers, crushing it down from a thick block into a long, thin sheet.</p>
<p>Calculating the exact <strong>Rolling Force</strong> required to crush the steel is critical. If the machine pushes too hard, it can severely damage the rollers; if it doesn&rsquo;t push hard enough, the steel doesn&rsquo;t reach the target thickness. Because the rollers are constantly grinding against raw steel, their physical shape slowly degrades over time. As the machinery wears down, the legacy mathematical formulas used to predict that perfect force slowly become inaccurate. This physical degradation is the root of the <strong>Concept Drift</strong> our real-time ML pipeline is solving.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Hot Rolling Process" src="/blog/2026-04-21-digital-twin-online-machine-learning/hot-rolling-process.png" loading="lazy" width="1340" height="295" />
</picture>

</p>
<p>In heavy industrial manufacturing, such as steel hot strip rolling, deterministic physics formulas are the traditional standard for calculating the exact force required to deform a slab of steel. However, these pure physics models share a fatal flaw: they assume a pristine factory state. As physical rollers grind against red-hot steel over hours of production, they experience mechanical wear.</p>
<p>As the machinery degrades, the actual physical force required drifts away from the theoretical prediction. In data science, this is a classic manifestation of <strong>Concept Drift</strong>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Drift and Convergence Lifecycle" src="/blog/2026-04-21-digital-twin-online-machine-learning/drift-convergence-lifecycle.png" loading="lazy" width="1034" height="588" />
</picture>

</p>
<p>To tackle this, I recently built a real-time, fault-tolerant Online Machine Learning (OML) pipeline and Digital Twin. By combining Apache Kafka, Apache Flink (written in Kotlin), and the Massive Online Analysis (MOA) framework, the system learns the <em>new</em> physical reality of the worn machinery on the fly, autonomously correcting the physics baseline safely behind a deterministic Shadow Mode router.</p>
<p>Here is an overview of the architecture and the engineering challenges solved along the way. The complete source code for this project, including the Flink stream processor and the Python Digital Twin simulation, is available on <a href="https://github.com/jaehyeon-kim/oml-digital-twin-hotrolling" target="_blank" rel="noopener noreferrer">GitHub<i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h2 id="architecture-at-a-glance" data-numberify>Architecture at a Glance<a class="anchor ms-1" href="#architecture-at-a-glance"></a></h2>
<p><picture><img class="img-fluid mx-auto d-block" alt="High-Level System Architecture" src="/blog/2026-04-21-digital-twin-online-machine-learning/featured.png" loading="lazy" width="1082" height="460" />
</picture>

</p>
<p>The project is split into three highly decoupled domains:</p>
<ol>
<li><strong>Digital Twin (Python):</strong> Utilizing the <a href="https://github.com/jaehyeon-kim/dynamic-des" target="_blank" rel="noopener noreferrer">Dynamic DES<i class="fas fa-external-link-square-alt ms-1"></i></a> Python package, this layer generates synthetic rolling events, applies simulated mechanical wear, calculates theoretical/actual physics, and pushes the data to Kafka.</li>
<li><strong>Message Broker (Kafka):</strong> Handles the asynchronous, high-throughput streaming of prediction requests and delayed ground-truth target forces.</li>
<li><strong>Stream Processor (Flink/Kotlin):</strong> The core engine. It aligns asynchronous streams, trains the machine learning models dynamically, evaluates safety guardrails, and sinks metrics to ClickHouse for evaluation in real-time.</li>
</ol>
<p>Moving data from the Digital Twin through Kafka to Flink looks straightforward in a diagram, but real-world industrial physics and distributed systems are highly complex. To make this pipeline robust enough for the unpredictability of a live factory floor, we had to overcome four major engineering challenges.</p>

<h2 id="challenges" data-numberify>Challenges<a class="anchor ms-1" href="#challenges"></a></h2>
<p>Taking an online machine learning model out of the lab and deploying it into a live industrial environment introduces a unique set of hurdles. From managing the physical delays of factory sensors to ensuring the model never commands a physically unsafe action, here is a breakdown of the four primary engineering challenges we tackled to make this architecture more robust.</p>

<h3 id="challenge-1-asynchronous-industrial-streams" data-numberify>Challenge 1: Asynchronous Industrial Streams<a class="anchor ms-1" href="#challenge-1-asynchronous-industrial-streams"></a></h3>
<p>Industrial data streams are inherently asynchronous. The factory floor requests a prediction <em>before</em> the steel enters the rolling stand (Event A), but the ground-truth sensor data confirming the actual required force arrives <em>after</em> the steel is crushed (Event B).</p>
<p>To execute machine learning, these two events must be perfectly joined. However, network latency and partition skew in Kafka mean that these events might arrive out of order.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Flink Application DAG" src="/blog/2026-04-21-digital-twin-online-machine-learning/stream-processing-flink.png" loading="lazy" width="1365" height="613" />
</picture>

</p>
<p>In Flink, this is solved using a <code>KeyedCoProcessFunction</code>. By keying the streams on a composite <code>Slab ID</code> + <code>Pass Number</code>, Flink guarantees both events route to the exact same TaskManager. The <code>EventMatchProcessFunction</code> utilizes Flink&rsquo;s <code>ValueState</code> to buffer whichever event arrives first. It then registers a processing-time timer (Time-To-Live). If the matching event arrives, they are joined and emitted. If the timer expires (e.g., a physical sensor failure), the orphaned state is safely purged to prevent memory leaks.</p>

<h3 id="challenge-2-online-residual-learning" data-numberify>Challenge 2: Online Residual Learning<a class="anchor ms-1" href="#challenge-2-online-residual-learning"></a></h3>
<p>Instead of training a batch model offline on historical data, which would instantly become obsolete the moment the rollers wore down further, the Flink pipeline trains on the continuous stream of slabs using a strict <strong>Test-then-Train</strong> (prequential) paradigm.</p>
<p>Crucially, the ML models do not predict the absolute rolling force from scratch. Instead, they utilize <strong>Residual Learning</strong>. They predict the residual error between the theoretical formula and the actual physical force.</p>
<p>To execute this, the pipeline integrates the Java MOA framework. The primary production model is <strong>AMRules</strong> (Adaptive Model Rules), a streaming rule-learning algorithm. It builds an ensemble of rules that calculate a linear combination of physical attributes, continuously updating weights via the Delta rule.</p>
<p>Unlike static models, AMRules runs an internal Page-Hinkley test to detect sudden physical shocks (like a roller bearing breaking) and instantly prunes obsolete rules, allowing rapid convergence to new physical realities. Furthermore, because ML models require normalized inputs, the pipeline implements <strong>Welford&rsquo;s Online Algorithm</strong> in a custom Flink state object to calculate streaming Z-scores on the fly without needing to load full datasets into memory.</p>

<h3 id="challenge-3-shadow-mode-router" data-numberify>Challenge 3: Shadow Mode Router<a class="anchor ms-1" href="#challenge-3-shadow-mode-router"></a></h3>
<p>Industrial Machine Learning cannot operate without strict safety boundaries. A model error that generates excessive rolling force could severely damage a multi-million-dollar rolling stand or create a major production bottleneck in the factory.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Shadow Mode Router Decision Logic" src="/blog/2026-04-21-digital-twin-online-machine-learning/shadow-mode-router.png" loading="lazy" width="557" height="747" />
</picture>

</p>
<p>Before any ML-adjusted prediction is allowed to influence the factory floor, it must pass through a deterministic <strong>Shadow Mode Router</strong> consisting of two guardrails:</p>
<ol>
<li><strong>Stateless Mechanical Limits:</strong> Calculates the residual difference between the model&rsquo;s requested force and the physics baseline. If the model requests a deviation outside of physical safety bounds (e.g., &gt; +25% or &lt; -20%), the prediction is instantly rejected.</li>
<li><strong>Stateful Trust Score:</strong> Using an Exponentially Weighted Moving Average (EWMA), the router continuously tracks the Absolute Percentage Error (APE) of both the model and the pure physics baseline. If the model&rsquo;s average error trails the physics baseline by a defined &ldquo;Trust Deficit&rdquo; margin, it is benched.</li>
</ol>
<p>If either guardrail is triggered, the router safely falls back to the deterministic Physics Baseline. The ML model continues to train in the background (Shadow Mode) until it re-learns the physical reality, improves its EWMA trust score, and is autonomously promoted back to active control.</p>

<h3 id="challenge-4-checkpointing-complex-ml-state" data-numberify>Challenge 4: Checkpointing Complex ML State<a class="anchor ms-1" href="#challenge-4-checkpointing-complex-ml-state"></a></h3>
<p>For Flink to provide exactly-once processing guarantees, it must asynchronously snapshot the state of operators to a durable backend like RocksDB.</p>
<p>While simple tracking states serialize perfectly into basic Flink <code>ValueState&lt;Double&gt;</code>, the AMRules model is a deeply complex, dynamic tree structure. Allowing Flink&rsquo;s default Kryo serializers to traverse this massive object graph during checkpointing causes severe performance degradation and frequent serialization crashes.</p>
<p>To bypass this, the <code>MoaEvaluationProcessFunction</code> interacts with the MOA model as a standard Java object in memory for high-performance execution. However, upon every state update, it manually serializes the model into a raw byte array (<code>ValueState&lt;ByteArray&gt;</code>) using Java&rsquo;s native <code>ObjectOutputStream</code>. When Flink triggers a checkpoint, it simply flushes these pre-serialized byte arrays to disk. Upon recovery, Flink deserializes the bytes, instantly restoring the exact computational brain-state of the algorithm.</p>

<h2 id="system-in-action" data-numberify>System in Action<a class="anchor ms-1" href="#system-in-action"></a></h2>
<p>You can easily spin up the entire architecture locally to see the system in action. The project repository includes a complete Docker Compose environment that provisions Kafka, Flink, and ClickHouse.</p>
<p>To get started, simply clone the repository, build the Flink application using the provided Gradle wrapper, and bring up the infrastructure. Once the environment is running, you can launch the Python-based data generator to simulate the steel rolling process and open the NiceGUI control dashboard to monitor the machine learning metrics in real-time.</p>
<p>For detailed, step-by-step instructions on cloning the repository, configuring licenses, and deploying the Flink job, refer to the <a href="https://github.com/jaehyeon-kim/oml-digital-twin-hotrolling/tree/main?tab=readme-ov-file#-getting-started" target="_blank" rel="noopener noreferrer">Getting Started<i class="fas fa-external-link-square-alt ms-1"></i></a> section in the repository.</p>

<h2 id="simulation-scenarios" data-numberify>Simulation Scenarios<a class="anchor ms-1" href="#simulation-scenarios"></a></h2>
<p>Using the NiceGUI Dashboard, you can actively manipulate the physical state of the Digital Twin to observe the Flink pipeline&rsquo;s reaction in real-time.</p>
<p>Here is how the pipeline handles different industrial scenarios:</p>

<h3 id="abrupt-drift-mechanical-shock" data-numberify>Abrupt Drift (Mechanical Shock)<a class="anchor ms-1" href="#abrupt-drift-mechanical-shock"></a></h3>
<p>Simulates a sudden mechanical failure (e.g., a roller bearing breaking), instantly altering the physics of the mill.</p>
<ul>
<li><strong>Simulation Settings:</strong> Trigger Abrupt Shock (Wear Level: 60.0).</li>
<li><strong>Observation:</strong> The pure physics baseline error instantly spikes and remains high (often &gt;10% APE) because the physical reality no longer matches the math. The <strong>AMRules</strong> model initially spikes alongside it, but its Page-Hinkley change detector immediately drops obsolete rules, allowing it to rapidly converge back to lower error as it learns the new broken state.</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="Abrupt Drift" src="/blog/2026-04-21-digital-twin-online-machine-learning/abrupt-drift.gif" loading="lazy" width="1770" height="1166" />
</picture>

</p>

<h3 id="gradual-drift-standard-wear" data-numberify>Gradual Drift (Standard Wear)<a class="anchor ms-1" href="#gradual-drift-standard-wear"></a></h3>
<p>Simulates the continuous, bi-directional cycle of slow roller degradation and subsequent maintenance recovery over hours of production.</p>
<ul>
<li><strong>Simulation Settings:</strong> Gradual Wear (Step Size: 5.0 units, Frequency: 30 seconds).</li>
<li><strong>Observation:</strong> The physics baseline error slowly and persistently creeps upward/download over time (e.g. ranging from 2% to 7% APE) as the wear level drifts gradually. The <strong>AMRules</strong> model gracefully tracks this changing reality, updating its linear weights incrementally to maintain a smooth error rate.</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="Gradual Drift" src="/blog/2026-04-21-digital-twin-online-machine-learning/gradual-drift.gif" loading="lazy" width="1770" height="1166" />
</picture>

</p>

<h3 id="no-drift-pristine-state" data-numberify>No Drift (Pristine State)<a class="anchor ms-1" href="#no-drift-pristine-state"></a></h3>
<p>Simulates a pristine factory state, such as immediately after a maintenance shift replaces the rollers.</p>
<ul>
<li><strong>Simulation Settings:</strong> Wear Level: 0.0.</li>
<li><strong>Observation:</strong> The physical reality of the factory floor perfectly aligns with the deterministic mathematical formulas. The physics baseline maintains a highly accurate, near-zero error rate (&lt; 0.3% APE). <strong>AMRules</strong> remains stable under this condition.</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="No Drift" src="/blog/2026-04-21-digital-twin-online-machine-learning/no-drift.gif" loading="lazy" width="1770" height="1166" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>Combining Apache Flink with Online Machine Learning bridges the gap between theoretical physics and the harsh reality of a degrading factory floor. To make this safe and effective, the architecture relies on three core ideas: predicting residual errors instead of absolute forces, isolating models by product line to prevent forgetting, and enforcing strict safety guardrails. Together, these techniques ensure that heavy machinery operates optimally, even as it breaks down over time.</p>
<p>You can explore the complete codebase, run the simulation locally, and dive deeper into the architecture on the project&rsquo;s <a href="https://github.com/jaehyeon-kim/oml-digital-twin-hotrolling" target="_blank" rel="noopener noreferrer">GitHub repository<i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

      ]]></content:encoded></item><item><title>Building a Real-Time Recommender: Contextual Bandits & Event-Driven Architecture</title><link>https://jaehyeon.me/slides/2026-03-21-product-recommender/</link><guid>https://jaehyeon.me/slides/2026-03-21-product-recommender/</guid><pubDate>Sat, 21 Mar 2026 00:00:00 +0000</pubDate><description>
Building a Real-Time Product Recommender Contextual Bandits &amp;amp; Event-Driven Architecture
Why Contextual Bandits? Problem: Conventional recommenders (e.g., Collaborative Filtering) Ignore situational context (e.g., Time of Day, Location, Device). Struggle with &amp;ldquo;Cold Starts&amp;rdquo; for new items/users. Solution: Contextual Multi-Armed Bandits (CMAB). Exploitation: Maximize immediate reward using current knowledge. Exploration: Gather information on uncertain items to improve future performance. Part 1: Prototype | Part 2: Productionization
Part 1: Prototype Prototype an online product recommender with Python</description><content:encoded><![CDATA[
        <!-- raw HTML omitted -->
<!-- raw HTML omitted -->

<h2 id="building-a-real-time-product-recommender" data-numberify>Building a Real-Time Product Recommender<a class="anchor ms-1" href="#building-a-real-time-product-recommender"></a></h2>
<p>Contextual Bandits &amp; Event-Driven Architecture</p>
<hr>

<h2 id="why-contextual-bandits" data-numberify>Why Contextual Bandits?<a class="anchor ms-1" href="#why-contextual-bandits"></a></h2>
<ul>
<li><strong>Problem:</strong> Conventional recommenders (e.g., Collaborative Filtering)
<ul>
<li>Ignore situational context (e.g., Time of Day, Location, Device).</li>
<li>Struggle with &ldquo;Cold Starts&rdquo; for new items/users.</li>
</ul>
</li>
<li><strong>Solution:</strong> Contextual Multi-Armed Bandits (CMAB).
<ul>
<li><strong>Exploitation:</strong> Maximize immediate reward using current knowledge.</li>
<li><strong>Exploration:</strong> Gather information on uncertain items to improve future performance.</li>
</ul>
</li>
</ul>
<p><a href="/slides/2026-03-21-product-recommender/#prototype">Part 1: Prototype</a> | <a href="/slides/2026-03-21-product-recommender/#production">Part 2: Productionization</a></p>
<hr>
<!-- raw HTML omitted -->

<h2 id="part-1-prototype" data-numberify>Part 1: Prototype<a class="anchor ms-1" href="#part-1-prototype"></a></h2>
<p>Prototype an online product recommender with Python</p>
<p>&ndash;</p>

<h2 id="python-ecosystem" data-numberify>Python Ecosystem<a class="anchor ms-1" href="#python-ecosystem"></a></h2>
<ul>
<li><a href="https://vowpalwabbit.org/" target="_blank" rel="noopener noreferrer">Vowpal Wabbit<i class="fas fa-external-link-square-alt ms-1"></i></a> <!-- raw HTML omitted --> and <a href="https://riverml.xyz/latest/" target="_blank" rel="noopener noreferrer">River ML<i class="fas fa-external-link-square-alt ms-1"></i></a> <!-- raw HTML omitted --> are well-known for CMAB.
<ul>
<li><em>Gap:</em> Lack of end-to-end examples integrating feature engineering and offline policy evaluation.</li>
</ul>
</li>
<li><a href="https://github.com/fidelity" target="_blank" rel="noopener noreferrer">Fidelity Investments Open Source<i class="fas fa-external-link-square-alt ms-1"></i></a> <!-- raw HTML omitted -->
<ul>
<li><strong>MABWiser:</strong> Algorithm implementation.</li>
<li><strong>Mab2Rec:</strong> Offline policy evaluation.</li>
<li><strong>TextWiser:</strong> Text featurization.</li>
</ul>
</li>
</ul>
<p>&ndash;</p>

<h2 id="prototyping-workflow" data-numberify>Prototyping Workflow<a class="anchor ms-1" href="#prototyping-workflow"></a></h2>
<p>From synthetic data generation to live simulation.</p>
<!-- raw HTML omitted -->
<p>&ndash;</p>

<h2 id="live-demo--walkthrough" data-numberify>Live Demo & Walkthrough<a class="anchor ms-1" href="#live-demo--walkthrough"></a></h2>
<p>Let&rsquo;s dive into the code.</p>
<p>&ndash;</p>

<h2 id="offline-policy-evaluation" data-numberify>Offline Policy Evaluation<a class="anchor ms-1" href="#offline-policy-evaluation"></a></h2>
<table>
<thead>
<tr>
<th style="text-align:left">Model</th>
<th style="text-align:left">AUC(score)@5</th>
<th style="text-align:left">CTR(score)@5</th>
<th style="text-align:left">Precision@5</th>
<th style="text-align:left">Recall@5</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">Random</td>
<td style="text-align:left">0.550</td>
<td style="text-align:left">0.102</td>
<td style="text-align:left">0.003</td>
<td style="text-align:left">0.019</td>
</tr>
<tr>
<td style="text-align:left">Popularity</td>
<td style="text-align:left">0.592</td>
<td style="text-align:left">0.192</td>
<td style="text-align:left">0.007</td>
<td style="text-align:left">0.038</td>
</tr>
<tr>
<td style="text-align:left">LinGreedy</td>
<td style="text-align:left"><strong>0.885</strong></td>
<td style="text-align:left">0.117</td>
<td style="text-align:left">0.004</td>
<td style="text-align:left">0.023</td>
</tr>
<tr>
<td style="text-align:left"><strong>🏆 LinUCB</strong></td>
<td style="text-align:left">0.860</td>
<td style="text-align:left"><strong>0.204</strong></td>
<td style="text-align:left">0.006</td>
<td style="text-align:left">0.034</td>
</tr>
<tr>
<td style="text-align:left">LinTS</td>
<td style="text-align:left">0.640</td>
<td style="text-align:left">0.211</td>
<td style="text-align:left"><strong>0.008</strong></td>
<td style="text-align:left"><strong>0.042</strong></td>
</tr>
<tr>
<td style="text-align:left">ClustersTS</td>
<td style="text-align:left">0.550</td>
<td style="text-align:left">0.153</td>
<td style="text-align:left">0.004</td>
<td style="text-align:left">0.023</td>
</tr>
</tbody>
</table>
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
<p>Why LinUCB?</p>
<ul>
<li><strong>Best Trade-off:</strong> High Ranking (AUC) + High Engagement (CTR).</li>
<li><strong>Beats LinGreedy:</strong> Explores effectively (CTR 0.20 vs 0.11).</li>
<li><strong>Beats LinTS:</strong> Ranks accurately (AUC 0.86 vs 0.64).</li>
</ul>
<!-- raw HTML omitted -->
<p>&ndash;</p>

<h2 id="linucb-algorithm" data-numberify>LinUCB Algorithm<a class="anchor ms-1" href="#linucb-algorithm"></a></h2>
<p>Balancing Exploitation and Exploration</p>
<p>$$ \text{Score}_a = \color{cyan}{x^T \theta_a} \mathbin{\color{white}{+}} \color{orange}{\alpha \sqrt{x^T A_a^{-1} x}} \color{white}{, \quad \text{where } \theta_a = A_a^{-1} b_a} $$</p>
<!-- raw HTML omitted -->
<ul>
<li><!-- raw HTML omitted -->●<!-- raw HTML omitted --> <strong>Exploitation:</strong> Predicted reward ($x^T \theta_a$).</li>
<li><!-- raw HTML omitted -->●<!-- raw HTML omitted --> <strong>Exploration:</strong> Uncertainty bonus (UCB).</li>
<li><strong>$\theta_a = A_a^{-1} b_a$</strong>: <strong>Model weights</strong> estimated via Ridge Regression.</li>
</ul>
<!-- raw HTML omitted -->
<p>&ndash;</p>

<h2 id="limitations" data-numberify>Limitations<a class="anchor ms-1" href="#limitations"></a></h2>
<p>A monolithic Python script isn&rsquo;t built for scale.</p>
<ul>
<li><strong>Latency:</strong> Training blocks inference.</li>
<li><strong>Scalability:</strong> Matrix math in memory limits the catalog size.</li>
<li><strong>Fault Tolerance:</strong> If the script crashes, the learned state is lost.</li>
</ul>
<p><a href="/slides/2026-03-21-product-recommender/#/">Back to Start</a> | <strong><a href="/slides/2026-03-21-product-recommender/#production">Jump to Productionization</a></strong></p>
<hr>
<!-- raw HTML omitted -->

<h2 id="part-2-productionization" data-numberify>Part 2: Productionization<a class="anchor ms-1" href="#part-2-productionization"></a></h2>
<p>Scaling with an Event-Driven Architecture</p>
<p>&ndash;</p>
<!-- raw HTML omitted -->

<h2 id="architecture" data-numberify>Architecture<a class="anchor ms-1" href="#architecture"></a></h2>

<h3 id="training-apache-flink" data-numberify>Training (Apache Flink)<a class="anchor ms-1" href="#training-apache-flink"></a></h3>
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
<ul>
<li>
<p><strong>Stateful Processing:</strong></p>
<ul>
<li>Flink acts as &ldquo;Online Memory&rdquo;.</li>
</ul>
</li>
<li>
<p><strong>Asynchronous Updates:</strong></p>
<ul>
<li><strong>Fast Path:</strong> Updates $A$ and $b$.
<ul>
<li>($A \leftarrow A + x x^T, b \leftarrow b + r x$)</li>
</ul>
</li>
<li><strong>Slow Path:</strong> Every 5s, computes $A^{-1}$.</li>
</ul>
</li>
<li>
<p><strong>Sync to Redis:</strong> Emits $A^{-1}$ and $b$.</p>
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
</li>
</ul>
<!-- raw HTML omitted -->
<p>&ndash;</p>
<!-- raw HTML omitted -->

<h2 id="architecture-1" data-numberify>Architecture<a class="anchor ms-1" href="#architecture-1"></a></h2>

<h3 id="serving-python--redis" data-numberify>Serving (Python & Redis)<a class="anchor ms-1" href="#serving-python--redis"></a></h3>
<!-- raw HTML omitted -->
<ul>
<li>
<p><strong>Stateless Inference:</strong></p>
<ul>
<li>The client does <em>not</em> train.</li>
</ul>
</li>
<li>
<p><strong>Low Latency:</strong></p>
<ul>
<li>Fetches pre-computed LinUCB parameters from Redis.</li>
</ul>
</li>
<li>
<p><strong>Action:</strong></p>
<ul>
<li>Calculates scores,</li>
<li>Ranks items, and</li>
<li>Sends feedback to Kafka.
<!-- raw HTML omitted -->
</li>
</ul>
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
</li>
</ul>
<!-- raw HTML omitted -->
<p>&ndash;</p>
<!-- raw HTML omitted -->

<h2 id="architecture-2" data-numberify>Architecture<a class="anchor ms-1" href="#architecture-2"></a></h2>

<h3 id="transport-apache-kafka" data-numberify>Transport (Apache Kafka)<a class="anchor ms-1" href="#transport-apache-kafka"></a></h3>
<!-- raw HTML omitted -->
<ul>
<li>
<p><strong>Asynchronous Buffer:</strong></p>
<ul>
<li>Decouples user-facing app speed from backend training.</li>
</ul>
</li>
<li>
<p><strong>Durability:</strong></p>
<ul>
<li>Stores &ldquo;Ground Truth&rdquo; events safely for replay or analytics.
<!-- raw HTML omitted -->
</li>
</ul>
<!-- raw HTML omitted -->
<!-- raw HTML omitted -->
</li>
</ul>
<!-- raw HTML omitted -->
<p>&ndash;</p>

<h2 id="live-demo--walkthrough-1" data-numberify>Live Demo & Walkthrough<a class="anchor ms-1" href="#live-demo--walkthrough-1"></a></h2>
<p>Let&rsquo;s dive into the code.</p>
<p><a href="/slides/2026-03-21-product-recommender/#/">Back to Start</a> | <strong><a href="/slides/2026-03-21-product-recommender/#takeaways">Jump to Takeaways</a></strong></p>
<hr>
<!-- raw HTML omitted -->

<h2 id="key-takeaways" data-numberify>Key Takeaways<a class="anchor ms-1" href="#key-takeaways"></a></h2>
<p>Bridging the gap between Data Science and Data Engineering.</p>
<p>&ndash;</p>

<h2 id="start-small-evaluate-offline" data-numberify>Start Small, Evaluate Offline<a class="anchor ms-1" href="#start-small-evaluate-offline"></a></h2>
<ul>
<li>Before touching infrastructure, <strong>evaluate policies</strong>.</li>
<li>Using tools like <em>MABWiser</em> and <em>Mab2Rec</em> allows you to simulate user behavior and validate algorithms on historical data safely.</li>
</ul>
<p>&ndash;</p>

<h2 id="decouple-to-scale" data-numberify>Decouple to Scale<a class="anchor ms-1" href="#decouple-to-scale"></a></h2>
<ul>
<li>A monolithic architecture forces a trade-off between model accuracy and user latency.</li>
<li><strong>Event-Driven Architecture (EDA)</strong> solves this by separating high-speed inference (Redis) from heavy stateful training (Flink).</li>
</ul>
<p>&ndash;</p>

<h2 id="real-time-adaptability" data-numberify>Real-Time Adaptability<a class="anchor ms-1" href="#real-time-adaptability"></a></h2>
<ul>
<li>By integrating Kafka and Flink, the system learns from user behavior <em>instantly</em>.
<ul>
<li><strong>Dynamic Personalization:</strong> Optimizes for specific real-time user context with every click.</li>
<li><strong>Continuous Learning:</strong> Eliminates the &ldquo;Cold Start&rdquo; problem for new items without batch-job downtime.</li>
</ul>
</li>
</ul>
<hr>

<h1 id="thank-you" data-numberify>Thank You!<a class="anchor ms-1" href="#thank-you"></a></h1>
<!-- raw HTML omitted -->
<p><strong>Code &amp; Resources:</strong></p>
<ul>
<li><a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/product-recommender" target="_blank" rel="noopener noreferrer">GitHub Repository<i class="fas fa-external-link-square-alt ms-1"></i></a> <!-- raw HTML omitted --></li>
<li>Blog Posts: <a href="https://jaehyeon.me/blog/2026-01-29-prototype-recommender-with-python/">Part 1: Prototype</a> <!-- raw HTML omitted --> | <a href="https://jaehyeon.me/blog/2026-02-23-productionize-recommender-with-eda/">Part 2: Productionization</a> <!-- raw HTML omitted --></li>
<li><a href="https://youtube.com/playlist?list=PLrISYKWzp0eTTAbkhahnuyLOBlesOY5vN&amp;si=ML-G-oYqJaMD9fnY" target="_blank" rel="noopener noreferrer">Youtube Playlist<i class="fas fa-external-link-square-alt ms-1"></i></a> <!-- raw HTML omitted --></li>
</ul>
<!-- raw HTML omitted -->
<p><a href="/slides/2026-03-21-product-recommender/#/">Back to Start</a></p>

      ]]></content:encoded></item><item><title>Reveal.js Features</title><link>https://jaehyeon.me/slides/2026-03-09-reavealjs-demo/</link><guid>https://jaehyeon.me/slides/2026-03-09-reavealjs-demo/</guid><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><description>
Reveal.js Features Demonstrating Reveal.js Featuresfor Engineering Presentation
1. Image Embedding Standard Markdown or HTML backgrounds
&amp;ndash;
Full Background Slide (Use &amp;ndash; for vertical slides)
Cosmic Background This slide has a remote image background.
2. LaTeX Math Rendered via MathJax
The Probability Density Function for a Normal Distribution:
$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 } $$
3. HTML Tables Markdown and HTML mix
Tool Purpose Status Kafka Streaming ✅ Flink Processing ✅ Pinot Analytics 🚀 4.</description><content:encoded><![CDATA[
        
<h2 id="revealjs-features" data-numberify>Reveal.js Features<a class="anchor ms-1" href="#revealjs-features"></a></h2>
<p>Demonstrating Reveal.js Features<!-- raw HTML omitted -->for Engineering Presentation</p>
<hr>

<h2 id="1-image-embedding" data-numberify>1. Image Embedding<a class="anchor ms-1" href="#1-image-embedding"></a></h2>
<p>Standard Markdown or HTML backgrounds</p>
<p><picture><img class="img-fluid " alt="Local Image" src="/slides/2026-03-09-reavealjs-demo/featured.jpg" loading="lazy" width="2048" height="339" />
</picture>

</p>
<!-- raw HTML omitted -->
<p>&ndash;</p>

<h3 id="full-background-slide" data-numberify>Full Background Slide<a class="anchor ms-1" href="#full-background-slide"></a></h3>
<p>(Use &ndash; for vertical slides)</p>
<!-- raw HTML omitted -->

<h2 id="cosmic-background" data-numberify>Cosmic Background<a class="anchor ms-1" href="#cosmic-background"></a></h2>
<p>This slide has a remote image background.</p>
<hr>

<h2 id="2-latex-math" data-numberify>2. LaTeX Math<a class="anchor ms-1" href="#2-latex-math"></a></h2>
<p>Rendered via MathJax</p>
<p>The Probability Density Function for a Normal Distribution:</p>
<p>$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }
$$</p>
<hr>

<h2 id="3-html-tables" data-numberify>3. HTML Tables<a class="anchor ms-1" href="#3-html-tables"></a></h2>
<p>Markdown and HTML mix</p>
<table>
<thead>
<tr>
<th style="text-align:left">Tool</th>
<th style="text-align:left">Purpose</th>
<th style="text-align:left">Status</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">Kafka</td>
<td style="text-align:left">Streaming</td>
<td style="text-align:left">✅</td>
</tr>
<tr>
<td style="text-align:left">Flink</td>
<td style="text-align:left">Processing</td>
<td style="text-align:left">✅</td>
</tr>
<tr>
<td style="text-align:left">Pinot</td>
<td style="text-align:left">Analytics</td>
<td style="text-align:left">🚀</td>
</tr>
</tbody>
</table>
<hr>

<h2 id="4-charts-mermaidjs" data-numberify>4. Charts (Mermaid.js)<a class="anchor ms-1" href="#4-charts-mermaidjs"></a></h2>
<p>Diagrams from Code</p>
<!-- raw HTML omitted -->
<hr>

<h2 id="5-code-highlighting" data-numberify>5. Code Highlighting<a class="anchor ms-1" href="#5-code-highlighting"></a></h2>
<p>Highlight specific lines (Click next)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">def</span> <span class="nf">process_stream</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="c1"># Step 1: Clean</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="c1"># Step 2: Return</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="k">return</span> <span class="n">data</span>
</span></span></code></pre></div><hr>

<h2 id="6-fragments" data-numberify>6. Fragments<a class="anchor ms-1" href="#6-fragments"></a></h2>

<h3 id="step-by-step-visibility" data-numberify>Step-by-step visibility<a class="anchor ms-1" href="#step-by-step-visibility"></a></h3>
<ul>
<li>Item 1 (Fades in) <!-- raw HTML omitted --></li>
<li>Item 2 (Fades in second) <!-- raw HTML omitted --></li>
<li>Item 3 (Grows larger) <!-- raw HTML omitted --></li>
<li>Item 4 (Turns red) <!-- raw HTML omitted --></li>
<li>Item 5 (Fades out) <!-- raw HTML omitted --></li>
</ul>

      ]]></content:encoded></item><item><title>Slides as Code: Integrating Reveal.js into my Hugo Blog</title><link>https://jaehyeon.me/blog/2026-03-09-integrate-revealjs/</link><guid>https://jaehyeon.me/blog/2026-03-09-integrate-revealjs/</guid><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><description><![CDATA[
        <p>For a long time, I wanted a way to host my technical presentations directly on my website without relying on external platforms or bulky PDF exports. I wanted a <strong>&ldquo;Slides as Code&rdquo;</strong> approach: version-controlled Markdown files that live natively alongside my blog posts.</p>
      ]]></description><content:encoded><![CDATA[
        <p>For a long time, I wanted a way to host my technical presentations directly on my website without relying on external platforms or bulky PDF exports. I wanted a <strong>&ldquo;Slides as Code&rdquo;</strong> approach: version-controlled Markdown files that live natively alongside my blog posts.</p>
<h2 id="choosing-between-marp-and-revealjs" data-numberify>Choosing between Marp and Reveal.js<a class="anchor ms-1" href="#choosing-between-marp-and-revealjs"></a></h2>
<p>When looking for a solution, I initially considered <strong>Marp</strong>.</p>
<p>Marp is undeniably easier to author initially; its VS Code extension with live preview makes designing slides feel like magic. However, it presented a hurdle for my Hugo integration as it requires separate compilation.</p>
<p>Marp requires a separate build step to convert Markdown into standalone HTML. To integrate it, I would have to manually export each deck and move the resulting HTML into my <code>static/</code> folder. This felt &ldquo;disconnected&rdquo; from the Hugo ecosystem.</p>
<p><a href="https://revealjs.com/" target="_blank" rel="noopener noreferrer"><strong>Reveal.js</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>, on the other hand, is a browser-based engine. By using Reveal.js, I can:</p>
<ol>
<li><strong>Stay Native:</strong> Write pure Markdown in my <code>content/</code> folder just like a blog post.</li>
<li><strong>Zero Manual Export:</strong> Hugo handles the routing, and the browser handles the rendering at runtime.</li>
<li><strong>Automatic Updates:</strong> When I <code>git push</code> a Markdown file, the slides are live instantly. No separate compilation, no manual file moving.</li>
</ol>

<h2 id="see-it-in-action" data-numberify>See it in Action<a class="anchor ms-1" href="#see-it-in-action"></a></h2>
<p>Before diving into the configuration, you can interact with the live demo below. This is the actual slide deck, rendered natively by Hugo and embedded directly into this post:</p>
<div class="ratio ratio-16x9 my-4 shadow-sm rounded overflow-hidden">
  <iframe 
    src="/slides/2026-03-09-reavealjs-demo/" 
    title="Presentation" 
    allowfullscreen="true" 
    style="border: 0;">
  </iframe>
</div>
<div class="text-center mb-4">
    <a href="/slides/2026-03-09-reavealjs-demo/" target="_blank" class="btn btn-sm btn-outline-primary">
        <i class="fas fa-external-link-alt"></i> View Full Screen
    </a>
</div>
<p><em>You can use your arrow keys to navigate the slides above, or click the &ldquo;View Full Screen&rdquo; button for the complete experience.</em></p>

<h2 id="how-it-works" data-numberify>How It Works<a class="anchor ms-1" href="#how-it-works"></a></h2>
<p>I chose to keep slides independent from my main blog posts to avoid cluttering my main feed. Using Hugo&rsquo;s <strong>Leaf Bundles</strong>, each presentation gets its own folder for local assets like images.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">content/slides/
</span></span><span class="line"><span class="ln">2</span><span class="cl">└── 2026-03-06-revealjs-demo/            
</span></span><span class="line"><span class="ln">3</span><span class="cl">    ├── index.md                          # The slide content
</span></span><span class="line"><span class="ln">4</span><span class="cl">    └── featured.jpg                      # Thumbnail for list view
</span></span></code></pre></div>
<h3 id="custom-shell-layout" data-numberify>Custom &ldquo;Shell&rdquo; Layout<a class="anchor ms-1" href="#custom-shell-layout"></a></h3>
<p>The secret is a custom layout at <a href="https://github.com/jaehyeon-kim/jaehyeon-kim.github.io/blob/master/layouts/slides/single.html" target="_blank" rel="noopener noreferrer"><code>layouts/slides/single.html</code><i class="fas fa-external-link-square-alt ms-1"></i></a>. Instead of using my theme&rsquo;s standard blog layout (which includes headers and sidebars), I created a clean &ldquo;shell&rdquo; that loads the Reveal.js engine from a CDN.</p>
<p>Crucially, I use <code>{{ .RawContent }}</code> instead of <code>{{ .Content }}</code>. This is because Reveal.js needs the raw Markdown syntax (like <code>---</code> for slide breaks) to calculate transitions in the browser, rather than the pre-rendered HTML Hugo usually provides.</p>

<h3 id="advanced-technical-features" data-numberify>Advanced Technical Features<a class="anchor ms-1" href="#advanced-technical-features"></a></h3>
<p>Because Reveal.js is a web-based runtime, it allows for high-end technical storytelling that traditional slide software can&rsquo;t easily handle:</p>

<h4 id="interactive-code-storytelling" data-numberify>Interactive Code Storytelling<a class="anchor ms-1" href="#interactive-code-storytelling"></a></h4>
<p>Standard syntax highlighting is good, but Reveal.js allows for <strong>directed walkthroughs</strong>. Using a specific syntax like <code>[1|3-5|7]</code>, I can guide the audience through a specific block of code:</p>
<ul>
<li><strong>Workflow:</strong> First, I highlight the function definition, then step into the processing logic, and finally focus on the return statement. This &ldquo;guided tour&rdquo; of the code prevents the audience from getting overwhelmed by a large block of text.</li>
</ul>

<h4 id="precision-media-styling" data-numberify>Precision Media Styling<a class="anchor ms-1" href="#precision-media-styling"></a></h4>
<p>Using the <code>.element</code> comment syntax, I have granular control over CSS directly within the Markdown.</p>
<ul>
<li><strong>Dynamic Backgrounds:</strong> I can switch the entire slide context using the <code>data-background</code> attribute (like a high-impact &ldquo;Cosmic&rdquo; space background).</li>
<li><strong>CSS Overrides:</strong> I can apply borders, rounded corners, and specific alignments, for example, centering a gold-bordered image without ever leaving the Markdown file.</li>
</ul>

<h4 id="managing-cognitive-load-with-fragments" data-numberify>Managing Cognitive Load with Fragments<a class="anchor ms-1" href="#managing-cognitive-load-with-fragments"></a></h4>
<p>To keep the audience focused on the current talking point, I use <strong>Fragments</strong>. This provides progressive disclosure of information. Beyond just &ldquo;appearing,&rdquo; I can use classes like <code>grow</code> to emphasize a specific item or <code>highlight-red</code> to signal a warning, all triggered by a simple keypress.</p>

<h4 id="mathematical-precision-latexmathjax" data-numberify>Mathematical Precision (LaTeX/MathJax)<a class="anchor ms-1" href="#mathematical-precision-latexmathjax"></a></h4>
<p>For engineering-heavy talks, raw text equations look unprofessional. By integrating <strong>MathJax</strong>, I can render complex mathematical models, such as Probability Density Functions, with the same precision found in a LaTeX paper. They remain sharp at any zoom level because they are rendered as vectors in the browser.</p>

<h4 id="diagrams-as-code-mermaidjs" data-numberify>Diagrams as Code (Mermaid.js)<a class="anchor ms-1" href="#diagrams-as-code-mermaidjs"></a></h4>
<p>Instead of embedding static PNGs of architecture diagrams that inevitably become outdated, I use <strong>Mermaid.js</strong>. This allows me to define data pipelines, like a Kafka-to-Flink-to-Redis flow, directly in the Markdown text.</p>
<ul>
<li><strong>Engineering Value:</strong> Diagrams are now version-controlled and searchable. If the architecture changes, I update a few lines of text, and the visual layout re-renders automatically.</li>
</ul>

<h2 id="road-ahead-professional-diagrams-with-drawio" data-numberify>Road Ahead: Professional Diagrams with Draw.io<a class="anchor ms-1" href="#road-ahead-professional-diagrams-with-drawio"></a></h2>
<p>While Mermaid.js is excellent for rapid, text-based flowcharts, complex system architectures often require the precision of a dedicated design tool. I am currently investigating the best way to integrate <strong>Draw.io</strong> into this &ldquo;Slides-as-Code&rdquo; workflow without losing the benefits of version control.</p>
<p>Currently, I am evaluating two brief paths:</p>
<ul>
<li><strong>Editable SVGs:</strong> Exporting diagrams as SVGs with embedded XML. This allows the slides to treat the diagram as a standard image while keeping the source file fully editable and version-controlled.</li>
<li><strong>The Draw.io JS Viewer:</strong> Utilizing the official Diagrams.net JavaScript library to render <code>.drawio</code> XML files dynamically. This would enable interactive features like zooming and layer toggling directly during a presentation.</li>
</ul>
<p>Moving forward, the goal is to ensure that even the most complex architectural designs remain as easy to maintain as a line of Markdown.</p>

<h2 id="why-this-works-for-me" data-numberify>Why this works for me<a class="anchor ms-1" href="#why-this-works-for-me"></a></h2>
<p>By leveraging Hugo&rsquo;s <strong>Directory Inference</strong>, any file I drop into <code>content/slides/</code> automatically inherits this powerful shell. I no longer &ldquo;design&rdquo; slides; I <strong>engineer</strong> them.</p>
<p>While Marp might be slightly easier to <strong>write</strong>, the Reveal.js integration is much easier to <strong>maintain</strong>. It turns my site into a single source of truth for both my articles and my presentations.</p>
<p>You can check out my latest presentations in the <a href="/slides/">Slides</a> section!</p>
      ]]></content:encoded></item><item><title>Productionizing an Online Product Recommender using Event Driven Architecture</title><link>https://jaehyeon.me/blog/2026-02-23-productionize-recommender-with-eda/</link><guid>https://jaehyeon.me/blog/2026-02-23-productionize-recommender-with-eda/</guid><pubDate>Mon, 23 Feb 2026 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In <a href="/blog/2026-01-29-prototype-recommender-with-python/"><strong>Part 1</strong></a>, we built a contextual bandit prototype using Python and <a href="https://github.com/fidelity/mab2rec" target="_blank" rel="noopener noreferrer"><code>Mab2Rec</code><i class="fas fa-external-link-square-alt ms-1"></i></a>. While effective for testing algorithms locally, a monolithic script cannot handle production scale. Real-world recommendation systems require low-latency inference for users and high-throughput training for model updates.</p>
<p>This post demonstrates how to decouple these concerns using an event-driven architecture with Apache Flink, Kafka, and Redis.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In <a href="/blog/2026-01-29-prototype-recommender-with-python/"><strong>Part 1</strong></a>, we built a contextual bandit prototype using Python and <a href="https://github.com/fidelity/mab2rec" target="_blank" rel="noopener noreferrer"><code>Mab2Rec</code><i class="fas fa-external-link-square-alt ms-1"></i></a>. While effective for testing algorithms locally, a monolithic script cannot handle production scale. Real-world recommendation systems require low-latency inference for users and high-throughput training for model updates.</p>
<p>This post demonstrates how to decouple these concerns using an event-driven architecture with Apache Flink, Kafka, and Redis.</p>
<h2 id="system-architecture" data-numberify>System Architecture<a class="anchor ms-1" href="#system-architecture"></a></h2>
<p>To move from prototype to production, we split the application into two distinct layers: Serving and Training.</p>
<ul>
<li><strong>Python Client (Serving):</strong> A lightweight, stateless client responsible for inference. It fetches pre-calculated model parameters from Redis, computes scores locally to make product recommendations, and captures user feedback.</li>
<li><strong>Kafka (Transport):</strong> Buffers feedback events asynchronously, decoupling the speed of serving from the speed of training.</li>
<li><strong>Flink (Training):</strong> A stateful streaming application. It consumes feedback events, updates the model parameters (LinUCB matrices $A$ and $b$), and pushes the inverted matrices back to Redis.
<ul>
<li>❗ Unlike <em>Part 1</em>, where training relied on <a href="https://github.com/fidelity/mabwiser" target="_blank" rel="noopener noreferrer"><code>MABWiser</code><i class="fas fa-external-link-square-alt ms-1"></i></a>, here it is performed via explicit matrix operations.</li>
</ul>
</li>
<li><strong>Redis (Model Store):</strong> Stores the latest model parameters ($A^{-1}$ and $b$) for low-latency access by the client.</li>
</ul>
<blockquote>
<p><strong>📂 Source Code for the Post</strong></p>
<p>The source code for this post is available in the <strong>product-recommender</strong> folder of the <a href="https://github.com/jaehyeon-kim/streaming-demos" target="_blank" rel="noopener noreferrer">streaming-demos<i class="fas fa-external-link-square-alt ms-1"></i></a> GitHub repository.</p>
</blockquote>
<p><picture><img class="img-fluid mx-auto d-block" alt="Architecture" src="/blog/2026-02-23-productionize-recommender-with-eda/featured.gif" loading="lazy" width="1698" height="1268" />
</picture>

</p>

<h2 id="flink-application-design" data-numberify>Flink Application Design<a class="anchor ms-1" href="#flink-application-design"></a></h2>
<p>The <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/product-recommender/recsys-trainer" target="_blank" rel="noopener noreferrer">Flink job (<code>recsys-trainer</code>)<i class="fas fa-external-link-square-alt ms-1"></i></a> ties these concepts together using a few specific patterns.</p>

<h3 id="stateful-model-training" data-numberify>Stateful Model Training<a class="anchor ms-1" href="#stateful-model-training"></a></h3>
<p>The core challenge in distributed online learning is managing state. The <a href="https://github.com/jaehyeon-kim/streaming-demos/blob/main/product-recommender/recsys-trainer/src/main/kotlin/me/jaehyeon/topology/processing/LinUCBUpdater.kt" target="_blank" rel="noopener noreferrer"><code>LinUCBUpdater</code> function<i class="fas fa-external-link-square-alt ms-1"></i></a> in the Flink trainer acts as the system&rsquo;s memory. It implements a <strong>disjoint LinUCB</strong> model, meaning it maintains a completely independent set of matrices for <strong>each unique product</strong>.</p>
<p>❗The matrices are used to calculate scores for making recommendations.</p>
<p>For each <code>product_id</code>, Flink maintains two pieces of state in RocksDB:</p>
<ol>
<li><strong>Matrix $A$ ($d \times d$):</strong> Represents the covariance of features seen so far. It tracks <strong>Exposure</strong>, recording how many times specific user contexts (e.g., &ldquo;Morning Users&rdquo; or &ldquo;Weekend Users&rdquo;) have been seen for a specific product.</li>
<li><strong>Vector $b$ ($d \times 1$):</strong> Represents the accumulated reward. It tracks <strong>Success</strong>, recording which features actually led to a click.</li>
</ol>
<p>The matrix $A$ is initialized as a scaled identity matrix $A_0 = \lambda I$ to ensure invertibility and to encode an initial prior of uniform uncertainty across feature dimensions.</p>
<p>When a feedback event arrives (Context $x$, Reward $r$), Flink performs the updates:</p>
<ul>
<li><strong>Update A:</strong> $A \leftarrow A + x x^T$. The outer product $x x^T$ increases covariance along the observed feature directions. As similar contexts repeat, $A$ grows in those directions, reflecting increased confidence.</li>
<li><strong>Update b:</strong> $b \leftarrow b + r x$. If the user clicked ($r=1$), we add their feature vector to $b$, reinforcing that preference pattern.</li>
</ul>
<p>The updated $A$ and $b$ are stored immediately in <strong>Flink keyed state (backed by RocksDB)</strong>. With checkpointing enabled, this state is durably persisted and recovered in case of failure.</p>
<p><strong>Optimization 1: Inversion on Write</strong></p>
<p>To generate a score, we need the inverse matrix $A^{-1}$, which is computationally expensive. If we performed this inversion inside the Python client for every recommendation request, latency would increase significantly. Instead, the Flink training job periodically loads $A$ from state, factorizes it using <strong>LU decomposition</strong>, computes $A^{-1}$, and stores the inverse in Redis. Because the contextual feature dimension in this demo is small, recomputing the inverse periodically remains efficient while keeping the serving layer lightweight.</p>
<p><strong>Optimization 2: Batched Updates</strong></p>
<p>In a high-traffic environment, a popular product might receive thousands of clicks per second. Inverting the matrix and writing to Redis for <em>every single click</em> would be inefficient.</p>
<p>To solve this, we use Flink timers to buffer updates. The model state ($A$ and $b$) is updated immediately for every event, while the expensive inversion and Redis write are triggered periodically (e.g., every 5 seconds). This drastically reduces CPU load and network traffic while keeping the model fresh.</p>

<h3 id="scalable-inference-logic" data-numberify>Scalable Inference Logic<a class="anchor ms-1" href="#scalable-inference-logic"></a></h3>
<p>The Python client (<a href="https://github.com/jaehyeon-kim/streaming-demos/blob/main/product-recommender/recsys-engine/eda_recommender.py" target="_blank" rel="noopener noreferrer"><code>eda_recommender.py</code><i class="fas fa-external-link-square-alt ms-1"></i></a>) is responsible for ranking items. It uses the <strong>Upper Confidence Bound (UCB)</strong> formula to balance exploiting known good items and exploring uncertain ones.</p>
<p>For a given user context vector $x$ and product $a$, the score is calculated as:</p>
<p>$$ \text{Score}_a = x^T \theta_a + \alpha \sqrt{x^T A_a^{-1} x}, \quad \text{where } \theta_a = A_a^{-1} b_a $$</p>
<p><strong>Prediction ($x^T \theta_a$)</strong><br>
This is the standard Linear Regression prediction. It asks: <em>&ldquo;Based on historical data, how likely is this user to click?&rdquo;</em> If the user matches features stored in vector $b$ (features that previously led to clicks), this term is high.</p>
<p><strong>Exploration ($\alpha \sqrt{x^T A_a^{-1} x}$)</strong></p>
<ul>
<li><strong>Familiar User:</strong> If we have seen this user type many times, the matrix $A$ accumulates repeated contributions of $x x^T$. This increases the magnitude of $A$ in those feature directions. Because the exploration term depends on $x^T A^{-1} x$, a larger $A$ leads to a smaller quadratic form, shrinking the confidence bound. The model therefore relies more on exploitation.</li>
<li><strong>Cold Start:</strong> If we have rarely (or never) observed this feature pattern, $A$ remains close to its initial regularized identity matrix. After inversion, these directions yield larger values of $x^T A^{-1} x$, increasing the confidence bound and encouraging exploration to reduce uncertainty.</li>
<li>❗ $\alpha$ is a hyperparameter and it is set to 0.1 as determined in <em>Part 1</em>.</li>
</ul>

<h3 id="hybrid-source-for-warm-start" data-numberify>Hybrid Source for Warm Start<a class="anchor ms-1" href="#hybrid-source-for-warm-start"></a></h3>
<p>Contextual bandits suffer from the &ldquo;Cold Start&rdquo; problem. To mitigate this, we implement a <strong>Hybrid Source</strong>.</p>
<ol>
<li><strong>File Source:</strong> Reads the historical CSV (<code>training_log.csv</code>) generated in <em>Part 1</em> to bootstrap the model state.</li>
<li><strong>Kafka Source:</strong> Automatically switches to the live <code>feedback-events</code> topic once the historical data is processed.</li>
</ol>

<h3 id="custom-redis-sink" data-numberify>Custom Redis Sink<a class="anchor ms-1" href="#custom-redis-sink"></a></h3>
<p>We implement a custom Sink using the Sink V2 API and Jedis. This allows us to perform efficient <code>SET</code> operations to update the model parameters in Redis directly from the Flink stream. Because each update overwrites the full parameter vector, repeated writes remain logically safe under at-least-once delivery semantics. Besides, because the upstream <code>LinUCBUpdater</code> batches the emissions, this sink receives highly aggregated model updates, preventing Redis from being overwhelmed by write operations.</p>

<h2 id="recommender-simulation-design" data-numberify>Recommender Simulation Design<a class="anchor ms-1" href="#recommender-simulation-design"></a></h2>
<p>To validate the architecture without live user traffic, we designed a Python client (<code>eda_recommender.py</code>) that simulates the entire lifecycle of a recommendation request. This script plays two roles simultaneously: it acts as the <strong>Recommendation Service</strong> (serving predictions) and the <strong>User</strong> (providing feedback).</p>

<h3 id="serving-logic" data-numberify>Serving Logic<a class="anchor ms-1" href="#serving-logic"></a></h3>
<p>In a production environment, this logic would live in a high-performance API. For this simulation, the Python client:</p>
<ol>
<li><strong>Context Generation:</strong> Creates a synthetic user profile (Age, Gender) and derives key temporal features (e.g., <em>Morning</em>, <em>Weekend</em>) from a simulated timestamp to form the full context.</li>
<li><strong>Model Retrieval:</strong> Fetches the latest LinUCB parameters ($A^{-1}$ and $b$) for all active products directly from Redis.</li>
<li><strong>Scoring and Ranking:</strong> Calculates the UCB score for every product, ranks them in descending order, and returns the <strong>top 5 highest-scoring items</strong> as the recommendation set.</li>
</ol>

<h3 id="feedback-generation" data-numberify>Feedback Generation<a class="anchor ms-1" href="#feedback-generation"></a></h3>
<p>To prove the model is learning, the simulation follows the same <strong>&ldquo;Ground Truth&rdquo;</strong> logic used in <em>Part 1</em>:</p>
<ul>
<li><strong>Morning Routine:</strong> Users click &ldquo;Drinks &amp; Desserts&rdquo; (Coffee) between 6 AM and 11 AM.</li>
<li><strong>Weekend Treats:</strong> On Saturdays and Sundays, users prefer &ldquo;Pizzas&rdquo; or &ldquo;Burgers.&rdquo;</li>
<li><strong>Price Sensitivity:</strong> Users under 25 avoid expensive items.</li>
</ul>
<p>If any of the recommended top 5 items matches the user&rsquo;s current context (e.g., showing a Latte on a Tuesday morning), the script generates a Reward (1). Otherwise, it generates no reward (0). This feedback is serialized to Avro and produced to Kafka, completing the loop.</p>

<h2 id="environment-setup" data-numberify>Environment Setup<a class="anchor ms-1" href="#environment-setup"></a></h2>
<p>We use Docker Compose to orchestrate the infrastructure (Kafka, Flink, Redis) and Gradle to build the Kotlin application.</p>

<h3 id="prerequisites" data-numberify>Prerequisites<a class="anchor ms-1" href="#prerequisites"></a></h3>
<p>Clone the repository and infrastructure utilities, then download the required connectors (Kafka, Flink, Avro).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">git clone https://github.com/jaehyeon-kim/streaming-demos.git
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nb">cd</span> streaming-demos
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># Clone Factor House Local for infrastructure definitions</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># Download Kafka/Flink Dependencies</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">./factorhouse-local/resources/setup-env.sh
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nb">cd</span> product-recommender
</span></span></code></pre></div>
<h3 id="build-and-launch" data-numberify>Build and Launch<a class="anchor ms-1" href="#build-and-launch"></a></h3>
<p>We bootstrap the environment by generating training data, building the Flink JAR, and launching the cluster. We use Kpow and Flex to monitor the Kafka and Flink clusters; these tools require a Factor House community license. Visit the <a href="https://account.factorhouse.io/auth/getting_started" target="_blank" rel="noopener noreferrer">Factor House License Portal<i class="fas fa-external-link-square-alt ms-1"></i></a> to generate your license, save the details in a file (e.g., <code>license.env</code>), and export the associated environment variables (<code>KPOW_LICENSE</code> and <code>FLEX_LICENSE</code>).</p>
<p>With the license configured, launch the Docker Compose services as shown below.</p>
<p><em>❗ You do not need Kotlin or Gradle installed locally. The <code>./gradlew</code> script handles all build dependencies.</em></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Setup Python and Generate Bootstrap Data</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">python3 -m venv venv
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">pip install -r requirements.txt
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">python recsys-engine/prepare_data.py
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># Build Flink Application (Shadow Jar)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="nb">cd</span> recsys-trainer
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">./gradlew shadowJar
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nb">cd</span> ..
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># Launch Infrastructure (Kafka, Flink, Redis, Kpow)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="nb">export</span> <span class="nv">KPOW_SUFFIX</span><span class="o">=</span><span class="s2">&#34;-ce&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="nb">export</span> <span class="nv">FLEX_SUFFIX</span><span class="o">=</span><span class="s2">&#34;-ce&#34;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="nb">export</span> <span class="nv">KPOW_LICENSE</span><span class="o">=</span>&lt;path-to-license-file&gt;
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="nb">export</span> <span class="nv">FLEX_LICENSE</span><span class="o">=</span>&lt;path-to-license-file&gt;
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl">docker compose -p kpow -f ../factorhouse-local/compose-kpow.yml up -d <span class="se">\
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="se"></span>  <span class="o">&amp;&amp;</span> docker compose -p stripped -f ./compose-stripped.yml up -d <span class="se">\
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="se"></span>  <span class="o">&amp;&amp;</span> docker compose -p flex -f ./compose-flex.yml up -d
</span></span></code></pre></div>
<h2 id="live-recommender-simulation" data-numberify>Live Recommender Simulation<a class="anchor ms-1" href="#live-recommender-simulation"></a></h2>
<p>Once the infrastructure is running, Flink will first process the historical events to warm up. Once the historical processing is complete, we can run the Python client to simulate live traffic.</p>
<p>To visualize the system in action, open two terminals.</p>
<p><strong>Terminal 1: Client</strong></p>
<p>Run the Python script. It acts as the user, receiving recommendations and sending feedback (clicks) to Kafka.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python recsys-engine/eda_recommender.py
</span></span></code></pre></div><p><strong>Terminal 2: Trainer</strong></p>
<p>Watch the Flink TaskManager logs. You will see the application reacting to the events sent to the <code>feedback-events</code> topic in real-time.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker logs taskmanager -f
</span></span></code></pre></div><p><strong>Result</strong></p>
<p>You will see a series of feedback events generated by users on the left-hand side. On the right-hand side, you can see the logs confirming that model parameters are being sent to Redis in batches.</p>
<p>This confirms the closed loop: <strong>Read (Redis) -&gt; Act (Kafka) -&gt; Learn (Flink) -&gt; Write (Redis)</strong>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Live Simulation Result" src="/blog/2026-02-23-productionize-recommender-with-eda/recommender-output.gif" loading="lazy" width="1665" height="1476" />
</picture>

</p>
<p>You can inspect feedback events on Kpow at <a href="http://localhost:3000" target="_blank" rel="noopener noreferrer">http://localhost:3000<i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Feedback Events" src="/blog/2026-02-23-productionize-recommender-with-eda/feedback-events.png" loading="lazy" width="981" height="1123" />
</picture>

</p>

<h2 id="teardown" data-numberify>Teardown<a class="anchor ms-1" href="#teardown"></a></h2>
<p>To stop the cluster and remove resources:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose -p flex -f ./compose-flex.yml down <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  <span class="o">&amp;&amp;</span> docker compose -p stripped -f ./compose-stripped.yml down <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  <span class="o">&amp;&amp;</span> docker compose -p kpow -f ../factorhouse-local/compose-kpow.yml down
</span></span></code></pre></div>
<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>Traditional recommendation systems such as Collaborative Filtering rely on long-term interaction history and often treat user preferences as static. As a result, they struggle to incorporate <strong>immediate context</strong>, missing situational shifts like a user preferring <strong>coffee in the morning but pizza in the evening</strong>.</p>
<p>To overcome this, we use <strong>Contextual Multi-Armed Bandits (CMAB)</strong>, an online learning approach that balances <strong>exploitation</strong> and <strong>exploration</strong> using real-time contextual signals. While our Python prototype in <em>Part 1</em> validated the concept, it was not built for scale.</p>
<p>We then evolved it into a production-ready <strong>event-driven architecture</strong>: <strong>Kafka</strong> streams feedback events, <strong>Flink</strong> handles distributed stateful training, and <strong>Redis</strong> serves precomputed parameters for low-latency inference. This design enables horizontal scalability and real-time adaptation to user behavior.</p>
      ]]></content:encoded></item><item><title>Prototyping an Online Product Recommender in Python</title><link>https://jaehyeon.me/blog/2026-01-29-prototype-recommender-with-python/</link><guid>https://jaehyeon.me/blog/2026-01-29-prototype-recommender-with-python/</guid><pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate><description>
Overview Traditional recommendation approaches such as Collaborative Filtering remain widely adopted, yet they come with notable constraints. They are particularly vulnerable to the cold-start problem, where new users lack sufficient interaction history, and they depend heavily on long-term behavioral data. As a result, they frequently overlook real-time contextual signals, including time of day, device type, location, or session intent. This can prevent them from capturing situational preferences, such as someone preferring coffee in the morning but pizza in the evening.</description><content:encoded><![CDATA[
        
<h2 id="overview" data-numberify>Overview<a class="anchor ms-1" href="#overview"></a></h2>
<p>Traditional recommendation approaches such as <a href="https://en.wikipedia.org/wiki/Collaborative_filtering" target="_blank" rel="noopener noreferrer">Collaborative Filtering<i class="fas fa-external-link-square-alt ms-1"></i></a> remain widely adopted, yet they come with notable constraints. They are particularly vulnerable to the <strong>cold-start problem</strong>, where new users lack sufficient interaction history, and they depend heavily on long-term behavioral data. As a result, they frequently overlook <strong>real-time contextual signals</strong>, including time of day, device type, location, or session intent. This can prevent them from capturing situational preferences, such as someone preferring <strong>coffee in the morning but pizza in the evening</strong>.</p>
<p><a href="https://en.wikipedia.org/wiki/Multi-armed_bandit#Contextual_bandit" target="_blank" rel="noopener noreferrer"><strong>Contextual Multi-Armed Bandits (CMAB)</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> address these gaps through <strong>online learning</strong>.</p>
<p>As a practical form of <a href="https://en.wikipedia.org/wiki/Reinforcement_learning" target="_blank" rel="noopener noreferrer">reinforcement learning<i class="fas fa-external-link-square-alt ms-1"></i></a>, CMAB balances two goals in real time:</p>
<ol>
<li><strong>Exploitation:</strong> Recommending what is known to work.</li>
<li><strong>Exploration:</strong> Trying less-tested options to discover new favorites.</li>
</ol>
<p>By conditioning decisions on live context, CMAB adapts instantly to changing user behavior.</p>

<h3 id="why-cmab" data-numberify>Why CMAB?<a class="anchor ms-1" href="#why-cmab"></a></h3>
<ul>
<li><strong>Beyond A/B Testing:</strong> Instead of finding a single global winner, CMAB enables <strong>1:1 personalization</strong>, selecting the best option for this user in this context.</li>
<li><strong>Real-Time Adaptation:</strong> Unlike batch-trained models that quickly become stale, CMAB updates incrementally, making it ideal for news/products recommendation, dynamic pricing, or inventory-aware ranking.</li>
</ul>
<p>Several CMAB implementations exist, including <a href="https://vowpalwabbit.org/" target="_blank" rel="noopener noreferrer"><strong>Vowpal Wabbit</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> and <a href="https://riverml.xyz/latest/" target="_blank" rel="noopener noreferrer"><strong>River ML</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>. In this post, we use <a href="https://github.com/fidelity/mab2rec" target="_blank" rel="noopener noreferrer"><strong>Mab2Rec</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> for offline policy evaluation and <a href="https://github.com/fidelity/mabwiser" target="_blank" rel="noopener noreferrer"><strong>MABWiser</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> to build the product recommender prototype.</p>

<h3 id="data-streaming-opportunity" data-numberify>Data Streaming Opportunity<a class="anchor ms-1" href="#data-streaming-opportunity"></a></h3>
<p>CMAB performs well in <strong>data streaming environments</strong>. Integrated with platforms like <strong>Kafka</strong> and <strong>Flink</strong>, it learns directly from event streams, creating a feedback loop that responds to trends and shifts in user intent in sub-seconds.</p>
<p>In this series, <strong>Part 1</strong> (<em>this post</em>) builds a complete <strong>Python prototype</strong> to validate the algorithm and simulate user behavior. <a href="/blog/2026-02-23-productionize-recommender-with-eda/"><strong>Part 2</strong></a> will scale this to a distributed, event-driven architecture.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="Architecture" src="/blog/2026-01-29-prototype-recommender-with-python/featured.gif" loading="lazy" width="1774" height="1080" />
</picture>

</p>

<h2 id="tech-stack" data-numberify>Tech Stack<a class="anchor ms-1" href="#tech-stack"></a></h2>
<p>We are building this prototype using <strong>Python 3.11</strong>.</p>
<blockquote>
<p><strong>Engineering Note:</strong> We explicitly chose Python 3.11 because parts of our stack (specifically <code>mabwiser</code> dependencies) rely on older versions of <code>pandas</code> (&lt; 2.0). On Python 3.12+, installing these dependencies often triggers long compilation times or failures due to missing binary wheels.</p>
</blockquote>
<p>We use <a href="https://docs.astral.sh/uv/" target="_blank" rel="noopener noreferrer"><strong>uv</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> for Python environment management. The core libraries include:</p>
<ul>
<li><a href="https://github.com/fidelity/mabwiser" target="_blank" rel="noopener noreferrer"><strong>MABWiser:</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> The engine. It implements the core Contextual Bandit algorithms.</li>
<li><a href="https://github.com/fidelity/mab2rec" target="_blank" rel="noopener noreferrer"><strong>Mab2Rec:</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> The vehicle. A high-level wrapper that streamlines Recommender System pipelines.</li>
<li><a href="https://github.com/fidelity/textwiser" target="_blank" rel="noopener noreferrer"><strong>TextWiser:</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> For converting raw text features into numerical embeddings.</li>
<li><strong>scikit-learn:</strong> For feature scaling and encoding.</li>
<li><strong>Faker &amp; Pandas:</strong> For synthetic data generation and simulation.</li>
</ul>
<p>The development environment can be constructed as follows:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">$ git clone https://github.com/jaehyeon-kim/streaming-demos.git
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">$ <span class="nb">cd</span> streaming-demos
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">$ uv python install 3.11
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">$ uv venv --python 3.11 venv
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">$ <span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="o">(</span>venv<span class="o">)</span> $ uv pip install -r product-recommender/requirements.txt
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="o">(</span>venv<span class="o">)</span> $ uv pip list <span class="p">|</span> grep -E <span class="s2">&#34;mab|wiser|panda|numpy|scikit|faker&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># Using Python 3.11.14 environment at: venv</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># faker                              40.1.2</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># mab2rec                            1.3.1</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># mabwiser                           2.7.4</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># numpy                              1.26.4</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># pandas                             1.5.3</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># scikit-learn                       1.8.0</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># textwiser                          2.0.2</span>
</span></span></code></pre></div><blockquote>
<p><strong>📂 Source Code for the Post</strong></p>
<p>The source code for this post is available in the <strong>product-recommender</strong> folder of the <a href="https://github.com/jaehyeon-kim/streaming-demos" target="_blank" rel="noopener noreferrer">streaming-demos<i class="fas fa-external-link-square-alt ms-1"></i></a> GitHub repository.</p>
</blockquote>

<h2 id="data-generation" data-numberify>Data Generation<a class="anchor ms-1" href="#data-generation"></a></h2>
<p>We first need product and user data to generate the required features.</p>

<h3 id="products" data-numberify>Products<a class="anchor ms-1" href="#products"></a></h3>
<p>We utilize a set of <strong>200 raw products</strong>, each containing a product ID, name, text description, price, and high-level category.</p>
<p>Here is a list of sample products:</p>
<table>
<thead>
<tr>
<th>product_id</th>
<th>name</th>
<th>description</th>
<th>price</th>
<th>category</th>
</tr>
</thead>
<tbody>
<tr>
<td>8</td>
<td>The Aussie Burger</td>
<td>A true classic with beetroot, a fried egg, pineapple, bacon, cheese, lettuce, and tomato.</td>
<td>16.99</td>
<td>Burgers &amp; Sandwiches</td>
</tr>
<tr>
<td>42</td>
<td>The Aussie Pizza</td>
<td>Tomato base topped with ham, bacon, onions, and a cracked egg in the center.</td>
<td>23.99</td>
<td>Pizzas</td>
</tr>
<tr>
<td>61</td>
<td>Chicken Parma</td>
<td>Classic crumbed chicken breast topped with napoli, ham, and cheese. Served with chips &amp; salad.</td>
<td>24.99</td>
<td>Aussie Pub Classics</td>
</tr>
<tr>
<td>101</td>
<td>Fish Tacos (Baja Style)</td>
<td>Three tortillas with battered fish, cabbage, and creamy sauce.</td>
<td>12.95</td>
<td>Mexican Specialties</td>
</tr>
</tbody>
</table>

<h3 id="users" data-numberify>Users<a class="anchor ms-1" href="#users"></a></h3>
<p>We generate <strong>1,000 Synthetic Users</strong> using <code>Faker</code>. Each user is assigned static attributes like Age, Gender, Location, and Traffic Source. These attributes will serve as the &ldquo;Context&rdquo; for our Bandit.</p>
<p>Here is a sample of our user base:</p>
<p><em>Note that street address, postal code, city, state, and country are omitted, as only latitude and longitude are used for feature generation.</em></p>
<table>
<thead>
<tr>
<th>user_id</th>
<th>first_name</th>
<th>last_name</th>
<th>email</th>
<th>&hellip;</th>
<th>age</th>
<th>gender</th>
<th>latitude</th>
<th>longitude</th>
<th>traffic_source</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Stephen</td>
<td>Parker</td>
<td><a href="mailto:stephen.parker@example.net">stephen.parker@example.net</a></td>
<td>&hellip;</td>
<td>38</td>
<td>M</td>
<td>-37.78525508</td>
<td>144.94969</td>
<td>Search</td>
</tr>
<tr>
<td>2</td>
<td>Brianna</td>
<td>Williams</td>
<td><a href="mailto:brianna.williams@example.net">brianna.williams@example.net</a></td>
<td>&hellip;</td>
<td>60</td>
<td>F</td>
<td>-37.82290733</td>
<td>145.0040437</td>
<td>Search</td>
</tr>
<tr>
<td>3</td>
<td>Carlos</td>
<td>Hunt</td>
<td><a href="mailto:carlos.hunt@example.com">carlos.hunt@example.com</a></td>
<td>&hellip;</td>
<td>46</td>
<td>M</td>
<td>-37.74295704</td>
<td>144.8004261</td>
<td>Search</td>
</tr>
<tr>
<td>4</td>
<td>Charles</td>
<td>Martin</td>
<td><a href="mailto:charles.martin@example.com">charles.martin@example.com</a></td>
<td>&hellip;</td>
<td>41</td>
<td>M</td>
<td>-37.80480003</td>
<td>145.1229819</td>
<td>Organic</td>
</tr>
</tbody>
</table>

<h2 id="feature-engineering" data-numberify>Feature Engineering<a class="anchor ms-1" href="#feature-engineering"></a></h2>
<p>Bandit algorithms operate on numerical vectors, not raw text. In other words, they cannot interpret <code>&quot;Burger&quot;</code> unless it is converted into numbers. To address this, we developed a transformation pipeline to properly prepare our data:</p>
<ol>
<li><strong>Product Features:</strong> We used <code>TextWiser</code> to convert raw product descriptions into vector embeddings. This allows the model to understand that &ldquo;Burger&rdquo; and &ldquo;Sandwich&rdquo; are semantically closer than &ldquo;Burger&rdquo; and &ldquo;Headphones&rdquo;. We also applied One-Hot Encoding to categories (<em>Product Category</em>) and MinMax scaling to the <em>price</em>. Finally, we added a binary feature, <code>is_coffee</code>, which is set to 1 for coffee products (e.g., espresso, cappuccino) and 0 otherwise.</li>
<li><strong>User Features:</strong> Similar to the product features, we applied One-Hot Encoding to categories (<em>Gender</em> and <em>Traffic Source</em>) and MinMax scaling to numerical fields (<em>Age</em>, <em>Latitude</em>, and <em>Longitude</em>).</li>
<li><strong>Pipeline Artifacts:</strong> We save these transformers as <code>preprocessing_artifacts.pkl</code>. This allows our system to instantly transform any new user/product record into a compatible feature vector during inference.</li>
</ol>
<p><strong>Sample Processed Product Features:</strong></p>
<p><em>Notice how the description is now represented by <code>txt_0</code>&hellip;<code>txt_9</code> embeddings.</em></p>
<table>
<thead>
<tr>
<th>product_id</th>
<th>txt_0</th>
<th>txt_1</th>
<th>txt_2</th>
<th>txt_3</th>
<th>txt_4</th>
<th>txt_5</th>
<th>txt_6</th>
<th>txt_7</th>
<th>txt_8</th>
<th>txt_9</th>
<th>cat_Appetizers &amp; Sides</th>
<th>cat_Aussie Pub Classics</th>
<th>cat_Burgers &amp; Sandwiches</th>
<th>cat_Drinks &amp; Desserts</th>
<th>cat_Mexican Specialties</th>
<th>cat_Pasta &amp; Risotto</th>
<th>cat_Pizzas</th>
<th>cat_Salads &amp; Healthy Options</th>
<th>is_coffee</th>
<th>price</th>
</tr>
</thead>
<tbody>
<tr>
<td>8</td>
<td>0.3354452</td>
<td>0.36037982</td>
<td>-0.04443971</td>
<td>0.14370468</td>
<td>-0.19956689</td>
<td>-0.17493485</td>
<td>-0.18741444</td>
<td>-0.02776922</td>
<td>-0.07173516</td>
<td>-0.11751403</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0.3887</td>
</tr>
<tr>
<td>42</td>
<td>0.3015529</td>
<td>0.28032377</td>
<td>0.03035132</td>
<td>0.21287075</td>
<td>0.04236558</td>
<td>-0.054545</td>
<td>-0.10349114</td>
<td>-0.13550489</td>
<td>-0.04504355</td>
<td>-0.22817583</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0.5832</td>
</tr>
<tr>
<td>61</td>
<td>0.53950787</td>
<td>-0.020039</td>
<td>-0.36858445</td>
<td>-0.10636957</td>
<td>0.00259933</td>
<td>0.15990224</td>
<td>0.04153050</td>
<td>0.11348728</td>
<td>-0.02482079</td>
<td>-0.23463035</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0.6110</td>
</tr>
<tr>
<td>101</td>
<td>0.20630628</td>
<td>-0.04121789</td>
<td>0.11134595</td>
<td>-0.2160106</td>
<td>0.00511632</td>
<td>-0.20131038</td>
<td>0.05482014</td>
<td>-0.19734132</td>
<td>0.35356910</td>
<td>0.23985470</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0.2765</td>
</tr>
</tbody>
</table>
<p><strong>Sample Processed User Features:</strong></p>
<p><em>Notice that Age, Latitude and Longitude are normalized between 0 and 1, and categorical fields are binary.</em></p>
<table>
<thead>
<tr>
<th>user_id</th>
<th>age</th>
<th>latitude</th>
<th>longitude</th>
<th>gender_F</th>
<th>gender_M</th>
<th>traffic_source_Display</th>
<th>traffic_source_Email</th>
<th>traffic_source_Facebook</th>
<th>traffic_source_Organic</th>
<th>traffic_source_Search</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0.4074074</td>
<td>0.82048548</td>
<td>0.32804966</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>0.8148148</td>
<td>0.76928412</td>
<td>0.41833646</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>0.5555556</td>
<td>0.87800441</td>
<td>0.08010776</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0.4629630</td>
<td>0.79390730</td>
<td>0.61590441</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

<h2 id="bandit-history-simulation" data-numberify>Bandit History Simulation<a class="anchor ms-1" href="#bandit-history-simulation"></a></h2>
<p>To evaluate whether our model can <strong>truly learn user behavior</strong>, we need a controlled <strong>Ground Truth</strong>, which is an <em>Oracle</em> that determines the likelihood of a simulated user clicking on a recommendation.</p>
<p>Crucially, <strong>this Oracle is hidden from the model</strong>. The model&rsquo;s task is to infer these patterns purely from trial and error.</p>
<p>We also inject <strong>Dynamic Context</strong> features like <strong>Time of Day</strong> and <strong>Day of Week</strong> into the user profile at the moment of interaction. These temporal signals create realistic, fluctuating patterns that the model must adapt to.</p>

<h3 id="simulation-logic" data-numberify>Simulation Logic<a class="anchor ms-1" href="#simulation-logic"></a></h3>
<p>The simulation is implemented as a class <code>GroundTruth</code>, and we define specific rules that govern user behaviour:</p>
<ul>
<li>Start from a low base logit (−2.5) to model generally low click probability.</li>
<li><strong>Rule 1: Morning coffee preference:</strong> if the user is browsing in the morning and the item is a <em>coffee</em> product, add a strong positive boost to the score.</li>
<li><strong>Rule 2: Weekend comfort food:</strong> if the session is on a weekend and the item is <em>Pizza</em> or <em>Burgers &amp; Sandwiches</em>, add a moderate positive boost.</li>
<li><strong>Rule 3: Budget sensitivity:</strong> if the user is young (normalized age &lt; 0.25) and the item is expensive (normalized price &gt; 0.8), apply a strong negative penalty.</li>
<li><strong>Rule 4: Traffic source bias:</strong> if the user arrived via Search, add a small intent-based boost.</li>
<li>Convert the final logit score into a click probability using a sigmoid function, then sample a Bernoulli trial to simulate whether a click occurs.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># product-recommender/recsys-engine/src/bandit_simulator.py</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="k">class</span> <span class="nc">GroundTruth</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s2">    The HIDDEN FORMULA (Ground Truth) for click simulation.
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s2">    Determines user click behavior based on context and item features.
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s2">    &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="nd">@staticmethod</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">def</span> <span class="nf">calculate_probability</span><span class="p">(</span><span class="n">user_ctx</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">item_ctx</span><span class="p">:</span> <span class="nb">dict</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s2">        Computes the probability that a user clicks an item.
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s2">        Uses logistic regression-style scoring with domain-specific rules.
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s2">        &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="n">score</span> <span class="o">=</span> <span class="o">-</span><span class="mf">2.5</span>  <span class="c1"># Base logit: starts with a low probability</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="c1"># Rule 1: Morning Coffee</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="c1"># Users are more likely to click coffee in the morning</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="k">if</span> <span class="n">user_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;is_morning&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">item_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;is_coffee&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">            <span class="n">score</span> <span class="o">+=</span> <span class="mf">2.5</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="c1"># Rule 2: Weekend Comfort Food</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="c1"># Users tend to choose Pizza or Burgers on weekends</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="k">if</span> <span class="n">user_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;is_weekend&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="k">if</span> <span class="n">item_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;cat_Pizzas&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">or</span> <span class="n">item_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;cat_Burgers &amp; Sandwiches&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">                <span class="n">score</span> <span class="o">+=</span> <span class="mf">1.8</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl">        <span class="c1"># Rule 3: Budget Constraint</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="c1"># Young users (&lt;25 years) avoid expensive items (normalized price &gt; 0.8)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="n">user_age</span> <span class="o">=</span> <span class="n">user_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;age&#34;</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>  <span class="c1"># normalized age 0-1</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">        <span class="n">item_price</span> <span class="o">=</span> <span class="n">item_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;price&#34;</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>  <span class="c1"># normalized price 0-1</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="k">if</span> <span class="n">user_age</span> <span class="o">&lt;</span> <span class="mf">0.25</span> <span class="ow">and</span> <span class="n">item_price</span> <span class="o">&gt;</span> <span class="mf">0.8</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">            <span class="n">score</span> <span class="o">-=</span> <span class="mf">3.0</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">
</span></span><span class="line"><span class="ln">34</span><span class="cl">        <span class="c1"># Rule 4: Traffic Bias</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">        <span class="c1"># Users arriving via Search have a slightly higher propensity to click</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">        <span class="k">if</span> <span class="n">user_ctx</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;traffic_source_Search&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">            <span class="n">score</span> <span class="o">+=</span> <span class="mf">0.5</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">
</span></span><span class="line"><span class="ln">39</span><span class="cl">        <span class="c1"># Convert logit score to probability using sigmoid function</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">        <span class="k">return</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">score</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">
</span></span><span class="line"><span class="ln">42</span><span class="cl">    <span class="k">def</span> <span class="nf">will_click</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">user_ctx</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">item_ctx</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">fake</span><span class="p">:</span> <span class="n">Faker</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">int</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">        <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="s2">        Simulates a Bernoulli trial (click = 1, no click = 0) based on probability.
</span></span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="s2">        &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">        <span class="n">prob</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">calculate_probability</span><span class="p">(</span><span class="n">user_ctx</span><span class="p">,</span> <span class="n">item_ctx</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="k">return</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">fake</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">prob</span> <span class="k">else</span> <span class="mi">0</span>
</span></span></code></pre></div>
<h3 id="data-preparation" data-numberify>Data Preparation<a class="anchor ms-1" href="#data-preparation"></a></h3>
<p>We generate 10,000 historical events to serve as our &ldquo;Offline Training&rdquo; dataset. This process involves picking a random user and a random product, then asking the Oracle &ldquo;Did they click?&rdquo;.</p>
<p>Because the user and product are matched randomly (not by a recommender), the <strong>Average Click Rate (CTR)</strong> is naturally low. In this example, it is around <strong>13.65%</strong>, and this serves as our baseline.</p>
<p>💡 There are three main scripts for this post: <code>prepare_data.py</code> for feature engineering and bandit history simulation, <code>evalue.py</code> for offline policy evaluation, and <code>local_recommender.py</code> for running product recommendation locally. Each script accepts a <code>--seed</code> argument, which defaults to <em>1237</em>. As long as the seed remains the same, running the scripts will produce identical outputs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="o">(</span>venv<span class="o">)</span> $ python product-recommender/recsys-engine/prepare_data.py
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:09<span class="o">]</span> INFO    : Generating <span class="m">1000</span> synthetic users...
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:09<span class="o">]</span> INFO    : Saved raw users to: .../users.csv
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:09<span class="o">]</span> INFO    : Starting Feature Engineering...
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:09<span class="o">]</span> INFO    : Saved User Features: <span class="o">(</span>1000, 11<span class="o">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Saved Product Features: <span class="o">(</span>200, 21<span class="o">)</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Saved Pipeline Artifacts to: .../preprocessing_artifacts.pkl
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Loaded <span class="m">1000</span> users and <span class="m">200</span> products.
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Generating <span class="m">10000</span> events...
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Done. Saved Training Log to .../training_log.csv
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Avg Click Rate: 13.65%
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="o">[</span>2026-01-26 19:16:10<span class="o">]</span> INFO    : Data Preparation Complete.
</span></span></code></pre></div><p>The main dataset (<code>training_log.csv</code>) combines <em>user features</em>, <em>dynamic context</em> (e.g., <code>is_morning</code>), <em>product ID</em>, and the <em>interaction result</em> (<code>response</code>):</p>
<table>
<thead>
<tr>
<th>event_id</th>
<th>age</th>
<th>&hellip;</th>
<th>traffic_source_Search</th>
<th>is_morning</th>
<th>is_weekend</th>
<th>is_weekday</th>
<th>product_id</th>
<th>response</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0.5925926</td>
<td>&hellip;</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>182</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0.6111111</td>
<td>&hellip;</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>101</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0.6296296</td>
<td>&hellip;</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>34</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0.4814815</td>
<td>&hellip;</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>101</td>
<td>0</td>
</tr>
</tbody>
</table>

<h2 id="offline-policy-evaluation" data-numberify>Offline Policy Evaluation<a class="anchor ms-1" href="#offline-policy-evaluation"></a></h2>
<p>We benchmarked several policies using <code>Mab2Rec</code> on the 10,000 historical events.</p>

<h3 id="the-candidates" data-numberify>The Candidates<a class="anchor ms-1" href="#the-candidates"></a></h3>
<ul>
<li><strong>Random:</strong> The baseline. Recommends items blindly.</li>
<li><strong>Popularity:</strong> Recommends items with the highest <em>global</em> click rate.
<ul>
<li><em>Result:</em> Mediocre (AUC ~0.59). While better than random, it still fails to capture specific rules, such as &ldquo;Morning Coffee&rdquo; vs. &ldquo;Weekend Pizza.&rdquo;</li>
</ul>
</li>
<li><strong>LinGreedy:</strong> Disjoint Linear Regression with $\epsilon$-greedy exploration.</li>
<li><strong>LinUCB (The Winner):</strong> Disjoint Linear Regression with <strong>Upper Confidence Bound</strong>.</li>
<li><strong>LinTS (Thompson Sampling):</strong> Bayesian regression that samples from a probability distribution.</li>
</ul>

<h3 id="winner-linucb" data-numberify>Winner: LinUCB<a class="anchor ms-1" href="#winner-linucb"></a></h3>
<p>While <strong>LinGreedy</strong> achieved the highest theoretical ranking accuracy (AUC ~0.88), it suffered from a low click rate (CTR ~11%) because it exploited &ldquo;safe&rdquo; choices too early.</p>
<p><strong>LinUCB</strong> is the practical winner. It achieved a comparable ranking accuracy (<strong>AUC ~0.86</strong>) but nearly <strong>double the engagement (CTR ~20.5%)</strong> to <strong>LinGreedy</strong>.</p>
<p>This algorithm excels because it balances two competing goals:</p>
<ol>
<li><strong>Exploitation:</strong> It uses the predicted probability of a click ($x^T \theta$) to find good items.</li>
<li><strong>Exploration:</strong> It adds a confidence interval ($\alpha \sqrt{x^T A^{-1} x}$) to the score. If the model is uncertain about a specific context (e.g., &ldquo;I haven&rsquo;t seen a user drink Coffee at 8 PM before&rdquo;), the interval widens, boosting the score and forcing the model to test that hypothesis.</li>
</ol>
<p>This allows LinUCB to discover high-value opportunities that the conservative LinGreedy model misses.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="o">(</span>venv<span class="o">)</span> $ python product-recommender/recsys-engine/evaluate.py
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">Running Benchmark... <span class="o">(</span>This trains and scores all models automatically<span class="o">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">--------------------------------------------------------------------------------
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">Available Metrics: <span class="o">[</span><span class="s1">&#39;AUC(score)@5&#39;</span>, <span class="s1">&#39;CTR(score)@5&#39;</span>, <span class="s1">&#39;Precision@5&#39;</span>, <span class="s1">&#39;Recall@5&#39;</span><span class="o">]</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            AUC<span class="o">(</span>score<span class="o">)</span>@5  CTR<span class="o">(</span>score<span class="o">)</span>@5  Precision@5  Recall@5
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Random          0.550000      0.102041     0.003876  0.019380
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">Popularity      0.592857      0.192308     0.007752  0.038760
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">LinGreedy       0.885185      0.117647     0.004651  0.023256
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">LinUCB          0.860317      0.204545     0.006977  0.034884
</span></span><span class="line"><span class="ln">10</span><span class="cl">LinTS           0.640798      0.211538     0.008527  0.042636
</span></span><span class="line"><span class="ln">11</span><span class="cl">ClustersTS      0.550505      0.153846     0.004651  0.023256
</span></span><span class="line"><span class="ln">12</span><span class="cl">--------------------------------------------------------------------------------
</span></span></code></pre></div>
<h3 id="why-linucb-outperforms-the-baseline-in-ctr" data-numberify>Why LinUCB Outperforms the Baseline in CTR<a class="anchor ms-1" href="#why-linucb-outperforms-the-baseline-in-ctr"></a></h3>
<p>This is the core concept of <strong>Offline Policy Evaluation</strong>.</p>
<p>The benchmark does <strong>not</strong> test on every single row of your history. It uses a technique called <strong>Rejection Sampling</strong> (or simply &ldquo;Matching&rdquo;).</p>
<p>Here is exactly how <code>mab2rec</code> calculates that <strong>20.5%</strong>:</p>
<ol>
<li>
<p><strong>The Log (History):</strong> Contains a mix of &ldquo;Good Decisions&rdquo; and &ldquo;Bad Decisions&rdquo; because it was generated randomly.</p>
<ul>
<li>Row A: Morning User $\to$ Show <strong>Pizza</strong> $\to$ <strong>No Click</strong> (Bad Random Choice)</li>
<li>Row B: Morning User $\to$ Show <strong>Coffee</strong> $\to$ <strong>Click</strong> (Lucky Random Choice)</li>
</ul>
</li>
<li>
<p><strong>The Test (LinUCB):</strong> The model is smart. It knows Morning users want Coffee.</p>
<ul>
<li>For Row A, LinUCB says: <em>&ldquo;I would recommend <strong>Coffee</strong>.&rdquo;</em>
<ul>
<li><strong>Mismatch!</strong> The history shows Pizza. We cannot know what would have happened if we showed Coffee. <strong>This row is IGNORED.</strong></li>
</ul>
</li>
<li>For Row B, LinUCB says: <em>&ldquo;I would recommend <strong>Coffee</strong>.&rdquo;</em>
<ul>
<li><strong>Match!</strong> The history shows Coffee. We know the result (Click). <strong>This row is COUNTED.</strong></li>
</ul>
</li>
</ul>
</li>
</ol>
<p>The dataset average (<strong>13.7%</strong>) includes all the &ldquo;Bad Random Choices&rdquo; (Row A). The LinUCB score (<strong>20.5%</strong>) <strong>filters out</strong> the bad choices. It effectively says: <em>&ldquo;On the rare occasions where the random history actually showed the right product (Row B), did the user click?&rdquo;</em> Since LinUCB focuses only on the &ldquo;Right Products,&rdquo; the click rate for those specific matches is much higher than the average of the random pile.</p>

<h2 id="simulation-of-the-selected-product-recommender-locally" data-numberify>Simulation of the Selected Product Recommender Locally<a class="anchor ms-1" href="#simulation-of-the-selected-product-recommender-locally"></a></h2>
<p>With the model selected, we built a script to simulate the product recommender locally. This script acts as the Server, the User, and the Trainer simultaneously in a continuous loop.</p>

<h3 id="step-1-pre-training-offline-replay" data-numberify>Step 1: Pre-training (Offline Replay)<a class="anchor ms-1" href="#step-1-pre-training-offline-replay"></a></h3>
<p>We don&rsquo;t want to start with a &ldquo;dumb&rdquo; model. We load the 10,000 historical events (<code>training_log.csv</code>) and run <code>model.fit()</code>. This gives the bandit a baseline knowledge of the world before the live loop begins.</p>

<h3 id="step-2-the-online-loop" data-numberify>Step 2: The Online Loop<a class="anchor ms-1" href="#step-2-the-online-loop"></a></h3>
<p>We simulate a sequence of user visits:</p>
<ol>
<li><strong>User Arrival:</strong> Pick a random user from the pool.</li>
<li><strong>Contextualize:</strong> Inject a simulated timestamp (e.g., varying between Mon 08:00 AM and Sat 09:00 PM). This is the key &ldquo;Context&rdquo; the model must react to.</li>
<li><strong>Recommend:</strong> LinUCB calculates scores for all 200 products and returns the Top 5.</li>
<li><strong>Reaction:</strong> The <code>GroundTruth</code> Oracle decides if the user clicks.</li>
<li><strong>Online Update:</strong> We call <code>model.partial_fit()</code>. <strong>This updates the matrices ($A$ and $b$) instantly.</strong> The very next recommendation will reflect this new learning.</li>
</ol>
<p>Here is a sample of 30 recommendation records from the local simulation.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="o">(</span>venv<span class="o">)</span> $ python product-recommender/recsys-engine/local_recommender.py 
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="o">[</span>2026-02-05 15:47:43<span class="o">]</span> INFO    : Loaded <span class="m">1000</span> users
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="o">[</span>2026-02-05 15:47:43<span class="o">]</span> INFO    : Loading artifacts...
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="o">[</span>2026-02-05 15:47:48<span class="o">]</span> INFO    : Loaded <span class="m">200</span> products.
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="o">[</span>2026-02-05 15:47:48<span class="o">]</span> INFO    : Pre-training model from history...
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="o">[</span>2026-02-05 15:47:48<span class="o">]</span> INFO    : Model pre-trained on <span class="m">10000</span> events.
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">--- STARTING LIVE LOOP <span class="o">(</span><span class="m">30</span> visits<span class="o">)</span> ---
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">User <span class="m">0153</span> <span class="o">(</span><span class="m">56</span> yo<span class="o">)</span> @ Fri 00:58 -&gt; Recs: <span class="o">[</span>200, 124, 015, 058, 011<span class="o">]</span> -&gt; Clicked: <span class="m">200</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">User <span class="m">0909</span> <span class="o">(</span><span class="m">21</span> yo<span class="o">)</span> @ Sat 15:53 -&gt; Recs: <span class="o">[</span>038, 040, 017, 020, 046<span class="o">]</span> -&gt; Clicked: <span class="m">038</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">User <span class="m">0406</span> <span class="o">(</span><span class="m">30</span> yo<span class="o">)</span> @ Sat 05:24 -&gt; Recs: <span class="o">[</span>020, 041, 008, 055, 040<span class="o">]</span> -&gt; Clicked: <span class="m">020</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">User <span class="m">0317</span> <span class="o">(</span><span class="m">31</span> yo<span class="o">)</span> @ Sat 05:38 -&gt; Recs: <span class="o">[</span>008, 055, 057, 059, 139<span class="o">]</span> -&gt; Clicked: <span class="m">055</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">User <span class="m">0246</span> <span class="o">(</span><span class="m">44</span> yo<span class="o">)</span> @ Mon 02:04 -&gt; Recs: <span class="o">[</span>015, 058, 057, 011, 124<span class="o">]</span> -&gt; Clicked: <span class="m">015</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">User <span class="m">0974</span> <span class="o">(</span><span class="m">61</span> yo<span class="o">)</span> @ Fri 01:16 -&gt; Recs: <span class="o">[</span>058, 073, 124, 074, 051<span class="o">]</span> -&gt; Clicked: <span class="m">058</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">User <span class="m">0234</span> <span class="o">(</span><span class="m">26</span> yo<span class="o">)</span> @ Thu 12:16 -&gt; Recs: <span class="o">[</span>036, 103, 002, 186, 070<span class="o">]</span> -&gt; Clicked: <span class="m">036</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">User <span class="m">0360</span> <span class="o">(</span><span class="m">35</span> yo<span class="o">)</span> @ Sat 20:23 -&gt; Recs: <span class="o">[</span>058, 051, 008, 042, 018<span class="o">]</span> -&gt; Clicked: <span class="m">042</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">User <span class="m">0513</span> <span class="o">(</span><span class="m">51</span> yo<span class="o">)</span> @ Sun 05:37 -&gt; Recs: <span class="o">[</span>051, 059, 043, 014, 020<span class="o">]</span> -&gt; Clicked: <span class="m">051</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">User <span class="m">0640</span> <span class="o">(</span><span class="m">33</span> yo<span class="o">)</span> @ Mon 00:49 -&gt; Recs: <span class="o">[</span>073, 124, 023, 147, 074<span class="o">]</span> -&gt; Clicked: <span class="m">073</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">User <span class="m">0363</span> <span class="o">(</span><span class="m">31</span> yo<span class="o">)</span> @ Fri 23:35 -&gt; Recs: <span class="o">[</span>200, 126, 085, 058, 018<span class="o">]</span> -&gt; Clicked: <span class="m">018</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">User <span class="m">0718</span> <span class="o">(</span><span class="m">58</span> yo<span class="o">)</span> @ Sat 23:05 -&gt; Recs: <span class="o">[</span>018, 036, 040, 042, 020<span class="o">]</span> -&gt; Clicked: <span class="m">018</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">User <span class="m">0390</span> <span class="o">(</span><span class="m">49</span> yo<span class="o">)</span> @ Tue 00:56 -&gt; Recs: <span class="o">[</span>147, 165, 020, 089, 047<span class="o">]</span> -&gt; Clicked: <span class="m">147</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">User <span class="m">0425</span> <span class="o">(</span><span class="m">39</span> yo<span class="o">)</span> @ Thu 03:59 -&gt; Recs: <span class="o">[</span>147, 165, 062, 028, 055<span class="o">]</span> -&gt; Clicked: <span class="m">147</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">User <span class="m">0792</span> <span class="o">(</span><span class="m">21</span> yo<span class="o">)</span> @ Sun 23:28 -&gt; Recs: <span class="o">[</span>042, 056, 018, 043, 046<span class="o">]</span> -&gt; Clicked: <span class="m">042</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">User <span class="m">0190</span> <span class="o">(</span><span class="m">41</span> yo<span class="o">)</span> @ Sat 10:54 -&gt; Recs: <span class="o">[</span>192, 139, 189, 008, 055<span class="o">]</span> -&gt; Clicked: <span class="m">192</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">User <span class="m">0544</span> <span class="o">(</span><span class="m">41</span> yo<span class="o">)</span> @ Tue 20:42 -&gt; Recs: <span class="o">[</span>018, 058, 090, 043, 147<span class="o">]</span> -&gt; Clicked: <span class="m">018</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">User <span class="m">0192</span> <span class="o">(</span><span class="m">17</span> yo<span class="o">)</span> @ Sat 18:38 -&gt; Recs: <span class="o">[</span>042, 056, 018, 046, 043<span class="o">]</span> -&gt; Clicked: <span class="m">042</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">User <span class="m">0757</span> <span class="o">(</span><span class="m">55</span> yo<span class="o">)</span> @ Thu 16:08 -&gt; Recs: <span class="o">[</span>015, 171, 165, 037, 126<span class="o">]</span> -&gt; Clicked: <span class="m">126</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">User <span class="m">0904</span> <span class="o">(</span><span class="m">60</span> yo<span class="o">)</span> @ Sat 02:07 -&gt; Recs: <span class="o">[</span>103, 041, 017, 042, 057<span class="o">]</span> -&gt; Clicked: <span class="m">042</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">User <span class="m">0552</span> <span class="o">(</span><span class="m">39</span> yo<span class="o">)</span> @ Tue 11:33 -&gt; Recs: <span class="o">[</span>192, 190, 189, 194, 193<span class="o">]</span> -&gt; Clicked: <span class="m">192</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">User <span class="m">0540</span> <span class="o">(</span><span class="m">36</span> yo<span class="o">)</span> @ Sat 08:56 -&gt; Recs: <span class="o">[</span>043, 041, 192, 014, 073<span class="o">]</span> -&gt; Clicked: <span class="m">043</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">User <span class="m">0326</span> <span class="o">(</span><span class="m">26</span> yo<span class="o">)</span> @ Wed 13:04 -&gt; Recs: <span class="o">[</span>015, 171, 165, 023, 126<span class="o">]</span> -&gt; Clicked: <span class="m">015</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">User <span class="m">0834</span> <span class="o">(</span><span class="m">29</span> yo<span class="o">)</span> @ Sat 22:58 -&gt; Recs: <span class="o">[</span>051, 002, 042, 058, 036<span class="o">]</span> -&gt; Clicked: <span class="m">051</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">User <span class="m">0290</span> <span class="o">(</span><span class="m">21</span> yo<span class="o">)</span> @ Mon 19:09 -&gt; Recs: <span class="o">[</span>058, 200, 004, 018, 085<span class="o">]</span> -&gt; Clicked: <span class="m">018</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">User <span class="m">0275</span> <span class="o">(</span><span class="m">18</span> yo<span class="o">)</span> @ Wed 11:10 -&gt; Recs: <span class="o">[</span>189, 002, 160, 078, 103<span class="o">]</span> -&gt; Clicked: <span class="m">189</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">User <span class="m">0327</span> <span class="o">(</span><span class="m">23</span> yo<span class="o">)</span> @ Wed 19:54 -&gt; Recs: <span class="o">[</span>200, 126, 018, 085, 058<span class="o">]</span> -&gt; Clicked: <span class="m">200</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">User <span class="m">0144</span> <span class="o">(</span><span class="m">67</span> yo<span class="o">)</span> @ Thu 12:31 -&gt; Recs: <span class="o">[</span>087, 126, 047, 103, 034<span class="o">]</span> -&gt; Clicked: <span class="m">087</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">User <span class="m">0497</span> <span class="o">(</span><span class="m">60</span> yo<span class="o">)</span> @ Sun 08:26 -&gt; Recs: <span class="o">[</span>192, 139, 008, 189, 059<span class="o">]</span> -&gt; Clicked: <span class="m">192</span> <span class="o">(</span>✅<span class="o">)</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">User <span class="m">0508</span> <span class="o">(</span><span class="m">64</span> yo<span class="o">)</span> @ Tue 12:41 -&gt; Recs: <span class="o">[</span>165, 087, 026, 171, 037<span class="o">]</span> -&gt; Clicked: <span class="m">165</span> <span class="o">(</span>❌<span class="o">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">
</span></span><span class="line"><span class="ln">41</span><span class="cl">--- END LOOP ---
</span></span></code></pre></div>
<h3 id="evaluation-of-simulation" data-numberify>Evaluation of Simulation<a class="anchor ms-1" href="#evaluation-of-simulation"></a></h3>
<p>The system is behaving exactly as a Contextual Bandit should. It is aggressively exploiting known high-probability zones while struggling (realistically) in neutral zones.</p>

<h4 id="the-weekend-pizza-strategy-is-dominant" data-numberify>The &ldquo;Weekend Pizza&rdquo; Strategy is Dominant<a class="anchor ms-1" href="#the-weekend-pizza-strategy-is-dominant"></a></h4>
<p>The model has learned that Weekends (Sat/Sun) are for <strong>Pizzas (Category 40s)</strong>.</p>
<ul>
<li><strong>User 0360 (Sat 20:23):</strong> Recommended <code>[..., 042 (Aussie Pizza), ...]</code> $\to$ Clicked ✅.</li>
<li><strong>User 0513 (Sun 05:37):</strong> Recommended <code>[051 (Buffalo Pizza), 059 (Lamb Pizza)...]</code> $\to$ Clicked ✅.</li>
<li><strong>User 0834 (Sat 22:58):</strong> Recommended <code>[051 (Buffalo Pizza)... 042 (Aussie Pizza)]</code> $\to$ Clicked ✅.</li>
<li><strong>Insight:</strong> The model pushes Pizzas hard on weekends regardless of the specific hour, resulting in a very high conversion rate for these users.</li>
</ul>

<h4 id="the-morning-coffee-precision" data-numberify>The &ldquo;Morning Coffee&rdquo; Precision<a class="anchor ms-1" href="#the-morning-coffee-precision"></a></h4>
<p>The model correctly switches strategies based on the hour, even distinguishing &ldquo;Weekend Morning&rdquo; from &ldquo;Weekend Night&rdquo;.</p>
<ul>
<li><strong>User 0552 (Tue 11:33):</strong> It is a Weekday Morning. The model recommended <strong>5 Coffees</strong> <code>[192, 190, 189, 194, 193]</code>. The user clicked <code>192</code> (Long Black). ✅</li>
<li><strong>User 0497 (Sun 08:26):</strong> It is a Weekend, but it is Morning. The model prioritized <strong>Coffee (192, 189)</strong> over Pizza. The user clicked <code>192</code>. ✅</li>
<li><strong>User 0508 (Tue 12:41):</strong> This is a great edge case. It is <strong>41 minutes past</strong> the &ldquo;Morning&rdquo; cutoff (12:00). The model stopped recommending Coffee and switched to Lunch items (Steamed Veggies, Burritos). The user did not click ❌, but the <em>behavior</em> change proves the features are working perfectly.</li>
</ul>

<h4 id="perfect-recommendation-no-interaction-realism" data-numberify>Perfect Recommendation, No Interaction (Realism)<a class="anchor ms-1" href="#perfect-recommendation-no-interaction-realism"></a></h4>
<ul>
<li><strong>User 0275 (Wed 11:10):</strong> The model recommended <code>189</code> (Flat White). However, the user ignored it, and the result was ❌ (No Reward).</li>
<li><strong>Why?</strong> This mimics real life. Even if the recommendation is perfect, users don&rsquo;t always convert. In the <code>GroundTruth</code>, the probability caps at ~50% (sigmoid of 0). This &ldquo;Bad Luck&rdquo; outcome confirms your evaluation pipeline is honest.</li>
</ul>

<h3 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h3>
<p>Across 30 simulated visits, CTR reached <strong>53%</strong>.</p>
<p>This elevated CTR indicates that the model has internalized dominant temporal and demographic patterns encoded in the simulation. It aggressively exploits strong signals such as weekend pizza and morning coffee while still exploring less-certain regions.</p>
<p>Importantly, the simulation includes realistic noise. Even strong recommendations do not always convert, reflecting probabilistic user behavior.</p>

<h2 id="whats-next" data-numberify>What&rsquo;s Next?<a class="anchor ms-1" href="#whats-next"></a></h2>
<p>We have successfully prototyped a Contextual Bandit that learns time-based preferences. However, this Python script has major limitations for a production environment:</p>
<ol>
<li><strong>Scalability:</strong> <code>Disjoint LinUCB</code> maintains a matrix for <em>every</em> product. With 10 million products, a single server will run out of memory.</li>
<li><strong>Latency:</strong> The training (<code>partial_fit</code>) blocks the inference (<code>recommend</code>). In a real system, you cannot make a user wait for the model to update.</li>
<li><strong>Fault Tolerance:</strong> If the script crashes, the learned state is lost.</li>
<li><strong>Concurrency:</strong> A single Python process cannot handle thousands of concurrent requests.</li>
</ol>
<p>In <a href="/blog/2026-02-23-productionize-recommender-with-eda/"><strong>Part 2</strong></a>, we will transform this prototype into an <em>Event-Driven Architecture</em>:</p>
<ul>
<li><strong>Kafka</strong> will transport click events asynchronously.</li>
<li><strong>Flink</strong> will handle distributed, stateful model training.</li>
<li><strong>Redis</strong> will serve the model matrices for sub-millisecond inference.</li>
</ul>
<p>Stay tuned!</p>

      ]]></content:encoded></item><item><title>Stream Processing with Flink in Kotlin</title><link>https://jaehyeon.me/blog/2025-12-10-streaming-processing-with-flink-in-kotlin/</link><guid>https://jaehyeon.me/blog/2025-12-10-streaming-processing-with-flink-in-kotlin/</guid><pubDate>Wed, 10 Dec 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>A couple of years ago, I read <a href="https://www.oreilly.com/library/view/stream-processing-with/9781491974285/" target="_blank" rel="noopener noreferrer">Stream Processing with Apache Flink<i class="fas fa-external-link-square-alt ms-1"></i></a> and worked through the examples using PyFlink. While the book offered a solid introduction to Flink, I frequently hit limitations with the Python API, as many features from the book weren&rsquo;t supported. This time, I decided to revisit the material, but using Kotlin. The experience has been much more rewarding and fun.</p>
<p>In porting the examples to Kotlin, I also took the opportunity to align the code with modern Flink practices. The complete source for this post is available in the <a href="https://github.com/jaehyeon-kim/flink-demos/tree/master/stream-processing-with-flink" target="_blank" rel="noopener noreferrer"><code>stream-processing-with-flink</code><i class="fas fa-external-link-square-alt ms-1"></i></a> directory of the <code>flink-demos</code> GitHub repository.</p>
      ]]></description><content:encoded><![CDATA[
        <p>A couple of years ago, I read <a href="https://www.oreilly.com/library/view/stream-processing-with/9781491974285/" target="_blank" rel="noopener noreferrer">Stream Processing with Apache Flink<i class="fas fa-external-link-square-alt ms-1"></i></a> and worked through the examples using PyFlink. While the book offered a solid introduction to Flink, I frequently hit limitations with the Python API, as many features from the book weren&rsquo;t supported. This time, I decided to revisit the material, but using Kotlin. The experience has been much more rewarding and fun.</p>
<p>In porting the examples to Kotlin, I also took the opportunity to align the code with modern Flink practices. The complete source for this post is available in the <a href="https://github.com/jaehyeon-kim/flink-demos/tree/master/stream-processing-with-flink" target="_blank" rel="noopener noreferrer"><code>stream-processing-with-flink</code><i class="fas fa-external-link-square-alt ms-1"></i></a> directory of the <code>flink-demos</code> GitHub repository.</p>
<h3 id="updating-the-code-and-apis" data-numberify>Updating the Code and APIs<a class="anchor ms-1" href="#updating-the-code-and-apis"></a></h3>
<p>The book, while conceptually valuable, is a bit dated. As I worked through the examples, I updated several deprecated features to use their modern equivalents.</p>
<ul>
<li><strong><code>ListCheckpointed</code> to <code>CheckpointedFunction</code></strong>: Replaced the older checkpointing interface with the more flexible <code>CheckpointedFunction</code>.</li>
<li><strong><code>SourceFunction</code> to Source API</strong>: Migrated from the legacy <code>SourceFunction</code> to the newer, more robust Source API.</li>
<li><strong><code>SinkFunction</code> to Sink V2 API</strong>: Updated the <code>SinkFunction</code> to the Sink V2 API.</li>
<li><strong>Queryable State</strong>: Ignored as it has been deprecated since Flink 1.18. These examples are built using Flink 1.20.1.</li>
</ul>

<h3 id="optimizing-the-build-with-gradle" data-numberify>Optimizing the Build with Gradle<a class="anchor ms-1" href="#optimizing-the-build-with-gradle"></a></h3>
<p>Figuring out the Gradle build was a valuable lesson in itself. I learned how to create a single <code>build.gradle.kts</code> to handle two different scenarios: producing a lean production JAR and keeping local execution simple.</p>
<p>For the production JAR, Flink dependencies are declared with <code>compileOnly</code>. This correctly excludes them from the final artifact, as the Flink cluster provides these libraries at runtime.</p>
<blockquote>
<p>❗The test environment also needs the Flink APIs to compile and run. This is handled by <code>testImplementation</code>. This standard Gradle configuration provides the Flink libraries <em>only</em> to the test classpath, keeping them completely separate from the production JAR and the local <code>run</code> task.</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="c1">// Flink Dependencies are not bundled into the JAR
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span>    <span class="n">compileOnly</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-streaming-java:</span><span class="si">$flinkVersion</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="n">compileOnly</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-clients:</span><span class="si">$flinkVersion</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="c1">// Flink is available for test compilation and execution
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-streaming-java:</span><span class="si">$flinkVersion</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="n">testImplementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-clients:</span><span class="si">$flinkVersion</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>However, this creates a problem for local development, as the Flink libraries are now missing from the default runtime classpath. The key technique I learned was how to solve this by creating a custom configuration, <code>localRunClasspath</code>.</p>
<p>This configuration rebuilds the full classpath specifically for the <code>run</code> task by adding the <code>compileOnly</code> dependencies back in, along with the standard <code>implementation</code> and <code>runtimeOnly</code> scopes. This makes local development seamless.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">val</span> <span class="py">localRunClasspath</span> <span class="k">by</span> <span class="n">configurations</span><span class="p">.</span><span class="n">creating</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">extendsFrom</span><span class="p">(</span><span class="n">configurations</span><span class="p">.</span><span class="n">implementation</span><span class="p">.</span><span class="k">get</span><span class="p">(),</span> <span class="n">configurations</span><span class="p">.</span><span class="n">compileOnly</span><span class="p">.</span><span class="k">get</span><span class="p">(),</span> <span class="n">configurations</span><span class="p">.</span><span class="n">runtimeOnly</span><span class="p">.</span><span class="k">get</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1">// ...
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">&lt;</span><span class="n">JavaExec</span><span class="p">&gt;(</span><span class="s2">&#34;run&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="n">classpath</span> <span class="p">=</span> <span class="n">localRunClasspath</span> <span class="p">+</span> <span class="n">sourceSets</span><span class="p">.</span><span class="n">main</span><span class="p">.</span><span class="k">get</span><span class="p">().</span><span class="n">output</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="main-chapters" data-numberify>Main Chapters<a class="anchor ms-1" href="#main-chapters"></a></h3>
<p>So far, I have translated the examples from the following main chapters:</p>
<ul>
<li><strong>Chapter 5: Basic and Keyed Transformations</strong>: Covers fundamental data manipulation, including <code>map</code>, <code>filter</code>, <code>keyBy</code>, and rolling sum aggregations, as well as multi-stream transformations.</li>
<li><strong>Chapter 6: Event Time and Windowing</strong>: Focuses on time-based operations, including <code>ProcessFunction</code> timers, watermark generation strategies, window functions, custom window logic, and side outputs for late data handling.</li>
<li><strong>Chapter 7: State Management</strong>: Explores different types of state in Flink, such as <code>ValueState</code>, <code>ListState</code>, <code>MapState</code>, and <code>BroadcastState</code>, along with operator state.</li>
<li><strong>Chapter 8: Asynchronous I/O and Custom Connectors</strong>: Demonstrates how to interact with external systems asynchronously and build custom sources and sinks.</li>
</ul>

<h3 id="how-to-build-and-run-the-examples" data-numberify>How to Build and Run the Examples<a class="anchor ms-1" href="#how-to-build-and-run-the-examples"></a></h3>
<p>The Flink applications can be run directly from the command line for local testing and development. This is useful for quick debugging without needing a full Flink cluster. Moreover, each Flink app has detailed documentation so it is easy to understand, for example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="cm">/**
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="cm"> * This Flink job demonstrates transformations on a `KeyedStream`.
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="cm"> *
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="cm"> * It showcases the `reduce` operator, a powerful tool for maintaining running aggregates
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="cm"> * for each key in a stream.
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="cm"> *
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="cm"> * The pipeline is as follows:
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="cm"> * 1. **Source**: Ingests a stream of `SensorReading` events.
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="cm"> * 2. **KeyBy**: Partitions the stream by the `id` of each sensor. All subsequent
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="cm"> *    operations will run independently for each sensor.
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="cm"> * 3. **Reduce**: For each key, this operator maintains a running state of the `SensorReading`
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="cm"> *    with the maximum temperature seen so far. For every new reading that arrives, it
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="cm"> *    compares it to the current maximum and emits the new maximum downstream.
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="cm"> * 4. **Sink**: Prints the continuous stream of running maximums for each sensor to the console.
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="cm"> */</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">object</span> <span class="nc">KeyedTransformations</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// ...
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span><span class="p">}</span>
</span></span></code></pre></div><p>To launch the apps, use the <code>run</code> task and set the desired main class with the <code>-PmainClass</code> project property. Here are the full examples:</p>
<p><strong>1. Run the Chapter 5 examples.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter5.BasicTransformations
</span></span><span class="line"><span class="ln">2</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter5.KeyedTransformations
</span></span><span class="line"><span class="ln">3</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter5.RollingSum
</span></span><span class="line"><span class="ln">4</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter5.MultiStreamTransformations
</span></span></code></pre></div><p><strong>2. Run the Chapter 6 examples.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.ProcessFunctionTimers
</span></span><span class="line"><span class="ln">2</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.PeriodicWatermarkGeneration
</span></span><span class="line"><span class="ln">3</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.MarkerBasedWatermarkGeneration
</span></span><span class="line"><span class="ln">4</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.CoProcessFunctionTimers
</span></span><span class="line"><span class="ln">5</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.WindowFunctions --args<span class="o">=</span><span class="s2">&#34;min1&#34;</span> <span class="c1"># min2, avg, minmax1, or minmax2</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.CustomWindows
</span></span><span class="line"><span class="ln">7</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.SideOutputs
</span></span><span class="line"><span class="ln">8</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter6.LateDataHandling --args<span class="o">=</span><span class="s2">&#34;filter&#34;</span> <span class="c1"># sideout or update</span>
</span></span></code></pre></div><p><strong>3. Run the Chapter 7 examples.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter7.KeyedStateFunction
</span></span><span class="line"><span class="ln">2</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter7.StatefulProcessFunction
</span></span><span class="line"><span class="ln">3</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter7.BroadcastStateFunction
</span></span><span class="line"><span class="ln">4</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter7.OperatorStateFunction
</span></span><span class="line"><span class="ln">5</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter7.KeyedAndOperatorStateFunction
</span></span></code></pre></div><p><strong>4. Run the Chapter 8 examples.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter8.AsyncFunction
</span></span><span class="line"><span class="ln">2</span><span class="cl">./gradlew run -PmainClass<span class="o">=</span>me.jaehyeon.chapter8.CustomConnectors
</span></span></code></pre></div>
<h3 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h3>
<p>Working through the examples in Kotlin has been an effective way to dive deeper into Apache Flink. Translating the examples to Kotlin not only forced me to understand the concepts more thoroughly but also provided a great opportunity to get hands-on with the latest APIs and build practices. For those looking to learn Apache Flink through up-to-date examples, I hope sharing my experience and code proves helpful. It&rsquo;s been a fun and effective learning journey.</p>
      ]]></content:encoded></item><item><title>Guide to Building Integrated Web Applications with FastAPI and NiceGUI</title><link>https://jaehyeon.me/blog/2025-11-19-fastapi-nicegui-template/</link><guid>https://jaehyeon.me/blog/2025-11-19-fastapi-nicegui-template/</guid><pubDate>Wed, 19 Nov 2025 00:00:00 +0000</pubDate><description>
&lt;p>The standard architecture for modern web applications involves a decoupled frontend, typically built with a JavaScript framework, and a backend API. This pattern is powerful but introduces complexity in managing two separate codebases, development environments, and the API contract between them.&lt;/p>
&lt;p>This article explores an alternative approach: an integrated architecture where the backend API and the frontend UI are served from a single, cohesive Python application.&lt;/p></description><content:encoded><![CDATA[
        <p>The standard architecture for modern web applications involves a decoupled frontend, typically built with a JavaScript framework, and a backend API. This pattern is powerful but introduces complexity in managing two separate codebases, development environments, and the API contract between them.</p>
<p>This article explores an alternative approach: an integrated architecture where the backend API and the frontend UI are served from a single, cohesive Python application.</p>
<p>We will provide a technical analysis of this architecture&rsquo;s implementation by integrating FastAPI and NiceGUI, referencing the complete project template available at <a href="https://github.com/jaehyeon-kim/nicegui-fastapi-template" target="_blank" rel="noopener noreferrer">jaehyeon-kim/nicegui-fastapi-template<i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>
<hr>

<h2 id="-version-20-unified-application-architecture" data-numberify>💡 Version 2.0: Unified Application Architecture<a class="anchor ms-1" href="#-version-20-unified-application-architecture"></a></h2>
<p>The initial version was designed with a distinct separation between a FastAPI backend and a NiceGUI frontend, which communicated over HTTP. This new version consolidates the application, leveraging the fact that NiceGUI is built on top of FastAPI. The result is a more tightly integrated structure that allows the UI and API logic to coexist in the same process.</p>

<h3 id="key-architectural-changes" data-numberify>Key Architectural Changes<a class="anchor ms-1" href="#key-architectural-changes"></a></h3>
<ul>
<li><strong>Single FastAPI Instance:</strong> The separate FastAPI server process has been removed. The application now operates on the single FastAPI instance provided by <code>nicegui.app</code>.</li>
<li><strong>Direct Function Calls:</strong> UI event handlers no longer make HTTP requests (<code>httpx</code>) to the backend. They now import and call the necessary Python functions from the repository layer directly, removing the network layer for UI-to-backend communication.</li>
<li><strong>Preserved API Endpoints:</strong> The original API, intended for external clients, is maintained. It is mounted using FastAPI&rsquo;s <code>APIRouter</code> onto the main NiceGUI application, ensuring that JSON endpoints remain available.</li>
<li><strong>Consolidated Codebase:</strong> The <code>frontend</code> and <code>backend</code> directories have been merged into a single application package (e.g., <code>app</code> or <code>src</code>). A <code>run.py</code> script at the project root now serves as the single entry point.</li>
<li><strong>Shared Logic:</strong> Business logic, such as permission checks and database operations, has been centralized in the repository layer, where it is called by both the UI event handlers and the API endpoints.</li>
</ul>
<p>This updated architecture provides a more direct and cohesive way to build full-stack applications where the UI and backend logic are tightly coupled.</p>

<h3 id="project-structure" data-numberify>Project Structure<a class="anchor ms-1" href="#project-structure"></a></h3>
<p>The application is structured with a single entry point, <strong><code>app.py</code></strong>, at the project root, and a main source package, <strong><code>src/</code></strong>, which contains all the application&rsquo;s logic. This design provides a clear separation between the runnable script and the installable source code.</p>
<ul>
<li>
<p><strong><code>app.py</code></strong>: This script is the single entry point for the application. It is responsible for creating the main NiceGUI <code>app</code> instance, including all the API routers from <code>src/backend/</code>, importing the UI pages from <code>src/frontend/</code> to register their routes, and starting the web server.</p>
</li>
<li>
<p><strong><code>src/</code></strong>: This directory is the main Python package for the application. It contains all the core logic, API definitions, and UI code, organized into the following modules:</p>
<ul>
<li><strong><code>backend/</code></strong>: Contains the code for the data-only API, intended for external clients.
<ul>
<li><code>endpoints/</code>: Each file defines a set of related API routes (e.g., for items, users, login) using FastAPI&rsquo;s <code>APIRouter</code>.</li>
<li><code>deps.py</code>: Manages FastAPI&rsquo;s dependency injection system for the API, such as providing database sessions or the current authenticated user.</li>
</ul>
</li>
<li><strong><code>core/</code></strong>: Holds application-wide configuration (<code>config.py</code>) and security-related functions like password hashing and token creation (<code>security.py</code>).</li>
<li><strong><code>db/</code></strong>: Manages all database interactions, including engine creation, session management (<code>session.py</code>), and initial database setup (<code>init_db.py</code>).</li>
<li><strong><code>frontend/</code></strong>: Contains all the NiceGUI code for the user interface.
<ul>
<li><code>components/</code>: Holds reusable UI elements and utilities, such as <code>header.py</code>, <code>footer.py</code>, <code>notifications.py</code>, and authentication helpers in <code>auth_utils.py</code>.</li>
<li><code>layouts/</code>: Defines the overall page structure, like the main dashboard frame, ensuring a consistent look and feel.</li>
<li><code>pages/</code>: Each file represents a specific UI view, such as the login screen (<code>login.py</code>) or the item management page (<code>items.py</code>), using the <code>@ui.page</code> decorator.</li>
<li><code>state.py</code>: A module for managing UI-specific state, like the user&rsquo;s authentication token.</li>
</ul>
</li>
<li><strong><code>models/</code></strong>: Contains the SQLModel (and Pydantic) schemas that define database tables and data structures used across the entire application.</li>
<li><strong><code>repositories/</code></strong>: This is the core business logic and data access layer. It abstracts all database queries and contains functions for data manipulation. In this unified architecture, its functions are now called directly by <strong>both</strong> the API endpoints in <code>src/backend/</code> and the UI event handlers in <code>src/frontend/</code>.</li>
</ul>
</li>
</ul>
<hr>

<h2 id="fastapi-backend" data-numberify>FastAPI Backend<a class="anchor ms-1" href="#fastapi-backend"></a></h2>
<p>FastAPI provides a robust foundation for the backend due to several key technical features:</p>
<ul>
<li><strong>ASGI Foundation:</strong> Built on the Asynchronous Server Gateway Interface (ASGI), FastAPI natively supports asynchronous operations. This allows it to handle high-concurrency, I/O-bound tasks, such as network requests and database queries, efficiently without blocking the server.</li>
<li><strong>Schema Generation and Data Validation:</strong> FastAPI uses Pydantic models for strict, type-hint-based data validation and serialization. These models automatically generate OpenAPI schemas, which power the interactive API documentation (via Swagger UI and ReDoc) and ensure that the API contract is clearly defined and enforced.</li>
<li><strong>Dependency Injection System:</strong> Its dependency injection system is a core feature that enhances modularity and testability. It allows for the management of dependencies like database sessions and authentication credentials, ensuring that resources are correctly provisioned and cleaned up for each request.</li>
</ul>

<h2 id="nicegui-frontend" data-numberify>NiceGUI Frontend<a class="anchor ms-1" href="#nicegui-frontend"></a></h2>
<p>NiceGUI serves as the frontend component, allowing for UI development entirely within Python.</p>
<ul>
<li><strong>Pythonic Abstraction of Web Technologies:</strong> NiceGUI functions as an abstraction layer that generates the necessary HTML, CSS, and JavaScript from Python objects and methods. This allows developers to define complex user interfaces without writing client-side code directly.</li>
<li><strong>Server-Side Event Handling:</strong> The framework employs an event-driven model where UI components are bound to Python callback functions. User interactions (e.g., button clicks, form submissions) trigger these functions, which execute on the server. This creates a direct and clear link between a UI event and its corresponding backend logic.</li>
<li><strong>Server-Maintained State:</strong> Unlike JavaScript frontend frameworks that manage state on the client, NiceGUI maintains the UI state within the server&rsquo;s Python process. This simplifies application logic, as there is no need for complex state synchronization mechanisms between client and server.</li>
</ul>

<h2 id="architectural-advantages-and-framework-comparisons" data-numberify>Architectural Advantages and Framework Comparisons<a class="anchor ms-1" href="#architectural-advantages-and-framework-comparisons"></a></h2>

<h3 id="the-integrated-server-approach" data-numberify>The Integrated Server Approach<a class="anchor ms-1" href="#the-integrated-server-approach"></a></h3>
<p>In this architecture, the NiceGUI application is mounted directly onto the FastAPI instance. This is typically done with a single function call, creating a unified application that serves both API endpoints and the user interface from one process.</p>
<p>The primary benefit is the ability to use shared data models (e.g., SQLModel or Pydantic) across the entire stack. A model defined once can be used to structure a database table, validate an API request payload, and define the data contract for the UI. This ensures end-to-end data consistency and reduces code duplication, as all parts of the application are built around the same data structures.</p>

<h3 id="nicegui-vs-javascript-frameworks-eg-react-vue" data-numberify>NiceGUI vs. JavaScript Frameworks (e.g., React, Vue)<a class="anchor ms-1" href="#nicegui-vs-javascript-frameworks-eg-react-vue"></a></h3>
<ul>
<li><strong>Bridging the Expertise Gap:</strong> Acquiring deep expertise in both backend Python and a modern JavaScript frontend framework is a significant undertaking. NiceGUI directly addresses this by enabling Python developers to build sophisticated user interfaces without leaving the Python ecosystem. This allows them to leverage their existing skills rather than learning a new language and its complex toolchain.</li>
<li><strong>Leveraging Mature Frontend Technologies:</strong> NiceGUI is not a proprietary UI system built from scratch. It is built on top of the robust and widely-used Vue and Quasar frameworks. This provides the best of both worlds: developers interact with a simple Pythonic API while benefiting from the power, performance, and rich component library of a mature frontend technology.</li>
<li><strong>Trade-offs:</strong> The server-side rendering approach is highly efficient for internal tools and data-heavy applications. However, because every interaction requires a round-trip to the server, it may introduce latency on highly interactive UIs compared to a client-side Single Page Application (SPA), which can handle many state changes without network requests.</li>
</ul>

<h3 id="nicegui-vs-streamlit" data-numberify>NiceGUI vs. Streamlit<a class="anchor ms-1" href="#nicegui-vs-streamlit"></a></h3>
<ul>
<li><strong>Control and Layout:</strong> NiceGUI provides more granular control over UI component placement and application layout, using rows, columns, and grids. This makes it well-suited for building applications with a structured, traditional design. Streamlit is more opinionated, favoring a simple, top-to-bottom script execution model that is excellent for linear data narratives but offers less layout flexibility.</li>
<li><strong>Event Handling and Execution Model:</strong> Both frameworks use callbacks, but their underlying execution models differ significantly. NiceGUI uses a persistent component model where an event, such as a button click, executes a specific callback function (<code>on_click=handle_click</code>). This function is the <em>only</em> code that runs and is responsible for explicitly updating any UI elements. This aligns closely with traditional GUI programming paradigms. In contrast, Streamlit uses a script re-run model. While it also has an <code>on_click</code> callback, this function typically modifies a session state object. After the callback completes, Streamlit <strong>re-runs the entire application script from top to bottom</strong>. The UI is then re-rendered based on the new values in the session state. This model simplifies the creation of linear, data-centric apps but can be less direct and potentially less performant for managing complex, multi-state interfaces compared to NiceGUI&rsquo;s explicit event-driven approach.</li>
<li><strong>Integration:</strong> NiceGUI is designed to be a component that can be integrated with standard web frameworks like FastAPI. Streamlit is generally used as a self-contained application server and is less straightforward to embed within another ASGI application.</li>
</ul>

<h2 id="implementation-overview" data-numberify>Implementation Overview<a class="anchor ms-1" href="#implementation-overview"></a></h2>
<p>To provide a concrete example of this architecture, the reference repository at <a href="https://github.com/jaehyeon-kim/nicegui-fastapi-template" target="_blank" rel="noopener noreferrer">jaehyeon-kim/nicegui-fastapi-template<i class="fas fa-external-link-square-alt ms-1"></i></a> contains a fully functional application. Let&rsquo;s examine its structure and key components.</p>

<h3 id="project-structure-1" data-numberify>Project Structure<a class="anchor ms-1" href="#project-structure-1"></a></h3>
<p>The project is organized into two main directories, ensuring a clear separation of concerns between the backend logic and the user interface:</p>
<ul>
<li><strong><code>backend/</code></strong>: This directory contains all the FastAPI source code. It is further subdivided into modules for handling specific responsibilities:
<ul>
<li><code>api/</code>: Defines the API endpoints for resources like users and items.</li>
<li><code>db/</code>: Manages the database session and engine configuration.</li>
<li><code>models/</code>: Contains the SQLModel (and Pydantic) data schemas that define our database tables and API data structures.</li>
<li><code>repositories/</code>: Implements the data access layer, abstracting the database queries from the API endpoints.</li>
</ul>
</li>
<li><strong><code>frontend/</code></strong>: This directory holds all the NiceGUI code for the user interface, organized by function into the following subdirectories and modules:
<ul>
<li><strong><code>components/</code></strong>: Contains reusable UI elements and helper functions. This includes visual components like <code>header.py</code> and <code>footer.py</code>, as well as utility modules like <code>notifications.py</code> for displaying messages to the user and <code>form_helpers.py</code> for handling common form logic.</li>
<li><strong><code>layouts/</code></strong>: Defines the overall structure of the application&rsquo;s pages. The <code>default.py</code> file, for instance, assembles the header, drawer, and footer to ensure a consistent look and feel across different views.</li>
<li><strong><code>pages/</code></strong>: Each file in this directory represents a specific page or view within the application, such as the login screen (<code>login.py</code>), the item management page (<code>items.py</code>), and the user creation form (<code>create_user.py</code>).</li>
<li><strong><code>state.py</code></strong>: A dedicated module for managing the application&rsquo;s client-side state, such as user authentication status. This allows different parts of the UI to react consistently to changes in state.</li>
</ul>
</li>
<li><strong><code>backend/main.py</code></strong>: This file serves as the central integration point. It initializes the FastAPI application, includes the API routers from the <code>backend/api/</code> directory, and finally, mounts the entire NiceGUI frontend. This is where the two parts of the application become one.</li>
</ul>

<h3 id="application-demo" data-numberify>Application Demo<a class="anchor ms-1" href="#application-demo"></a></h3>
<p>The repository provides a complete user and item management application that showcases role-based access control. The functionality is divided between two user roles:</p>
<ul>
<li>
<p><strong>Standard User:</strong> After logging in, a standard user has full CRUD (Create, Read, Update, Delete) permissions over their own items. They can add new items, view their list, and edit or delete them as needed.</p>
</li>
<li>
<p><strong>Superuser:</strong> A superuser has elevated privileges. In addition to managing their own items, they can also create new user accounts. Crucially, they have a global view of the system and can manage the items belonging to any user, making this role suitable for administrative purposes.</p>
</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-11-19-fastapi-nicegui-template/featured.gif" loading="lazy" width="2614" height="1794" />
</picture>

</p>
<p>The demo showcases these distinct workflows, illustrating how the NiceGUI frontend dynamically adapts to the user&rsquo;s permissions, which are enforced by the FastAPI backend.</p>

<h3 id="automatic-api-documentation" data-numberify>Automatic API Documentation<a class="anchor ms-1" href="#automatic-api-documentation"></a></h3>
<p>One of the most powerful features of FastAPI is its ability to automatically generate interactive API documentation from the Pydantic models and endpoint definitions. The application provides two documentation interfaces out-of-the-box:</p>
<ul>
<li><strong>Swagger UI (<code>/docs</code>)</strong>: A feature-rich, interactive interface that allows developers to not only view the API endpoints but also test them directly from the browser by sending live requests.</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-11-19-fastapi-nicegui-template/docs.png" loading="lazy" width="1312" height="881" />
</picture>

</p>
<ul>
<li><strong>ReDoc (<code>/redoc</code>)</strong>: A clean, read-only documentation page that presents the API in a more traditional, hierarchical format. It is excellent for quickly referencing endpoints and their schemas.</li>
</ul>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-11-19-fastapi-nicegui-template/redoc.png" loading="lazy" width="1314" height="837" />
</picture>

</p>
<p>These auto-generated documents are invaluable for development, testing, and collaboration, and they are created without any extra effort, thanks to FastAPI&rsquo;s adherence to the OpenAPI standard.</p>

<h2 id="summary-and-use-cases" data-numberify>Summary and Use Cases<a class="anchor ms-1" href="#summary-and-use-cases"></a></h2>
<p>The integration of FastAPI and NiceGUI provides a robust architecture for building web applications entirely in Python. It streamlines development by creating a unified environment, simplifies deployment to a single process, and ensures strong data consistency through the use of shared models.</p>
<p>This architecture is exceptionally well-suited for:</p>
<ul>
<li><strong>Internal Tools and Administrative Dashboards:</strong> Where rapid development and ease of maintenance are critical.</li>
<li><strong>Rapid Prototyping and MVPs:</strong> To quickly build and validate a functional application.</li>
<li><strong>Machine Learning and Data Science Demos:</strong> To create interactive interfaces for models without requiring frontend expertise.</li>
</ul>
<p>While it may not be the optimal choice for every project, particularly those requiring complex client-side interactivity, it offers a powerful and efficient alternative for a significant class of web applications. For a practical implementation, refer to the project code at <a href="https://github.com/jaehyeon-kim/nicegui-fastapi-template" target="_blank" rel="noopener noreferrer">jaehyeon-kim/nicegui-fastapi-template<i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>
      ]]></content:encoded></item><item><title>Self-service Data Platform via a Multi-tenant SQL Gateway</title><link>https://jaehyeon.me/blog/2025-07-17-self-service-data-platform-via-sql-gateway/</link><guid>https://jaehyeon.me/blog/2025-07-17-self-service-data-platform-via-sql-gateway/</guid><pubDate>Thu, 17 Jul 2025 00:00:00 +0000</pubDate><description>
In the modern data stack, providing direct access to powerful engines like Apache Spark and Flink is a double-edged sword. While it empowers users, it often leads to chaos: resource contention from &amp;ldquo;noisy neighbors,&amp;rdquo; inconsistent security enforcement, and operational fragility. The core problem is the lack of a robust control plane between users and the raw compute power. The solution, therefore, isn&amp;rsquo;t to take power away from users, but to manage it through an intelligent intermediary.</description><content:encoded><![CDATA[
        <p>In the modern data stack, providing direct access to powerful engines like Apache Spark and Flink is a double-edged sword. While it empowers users, it often leads to chaos: resource contention from &ldquo;noisy neighbors,&rdquo; inconsistent security enforcement, and operational fragility. The core problem is the lack of a robust control plane between users and the raw compute power. The solution, therefore, isn&rsquo;t to take power away from users, but to manage it through an intelligent intermediary.</p>
<p>This is where a gateway-centric architecture comes in. By placing a specialized gateway at the heart of the platform, we can transform this chaos into a stable, governed, and self-service system. The Apache Kyuubi project provides a perfect blueprint for this approach, defining its role as:</p>
<blockquote>
<p><strong>Apache Kyuubi is a distributed, multi-tenant gateway providing unified access to big data engines like Spark and Flink via JDBC/ODBC for interactive queries and REST APIs for programmatic submissions.</strong></p>
</blockquote>
<p>This definition is more than just a product description; it&rsquo;s a strategic vision. By embracing this model, we can finally deliver on the promise of self-service data access without sacrificing stability or governance. Let&rsquo;s explore exactly how each layer of the platform is designed to support this powerful concept.</p>

<h2 id="architecture-a-layer-by-layer-breakdown" data-numberify>Architecture: A Layer-by-Layer Breakdown<a class="anchor ms-1" href="#architecture-a-layer-by-layer-breakdown"></a></h2>
<p><picture><img class="img-fluid mx-auto d-block" alt="Self-Service Data Platform" src="/blog/2025-07-17-self-service-data-platform-via-sql-gateway/featured.png" loading="lazy" width="915" height="766" />
</picture>

</p>

<h3 id="1-client-layer-simplicity-through-standardization" data-numberify>1. Client Layer: Simplicity Through Standardization<a class="anchor ms-1" href="#1-client-layer-simplicity-through-standardization"></a></h3>
<p><em>How do users connect? Via interactive SQL clients or programmatic APIs.</em></p>
<p>The primary goal of a self-service platform is to meet users where they are. The gateway makes this possible by providing multiple, standardized entry points, ensuring that both interactive users and automated systems can leverage the platform&rsquo;s power. Kyuubi provides two primary connection methods: the Thrift API for interactive SQL and the REST API for programmatic batch workloads.</p>

<h4 id="thrift-api-gateway-for-interactive-analytics" data-numberify>Thrift API: Gateway for Interactive Analytics<a class="anchor ms-1" href="#thrift-api-gateway-for-interactive-analytics"></a></h4>
<p>This is the most established and widely used connection method, ideal for analysts and data scientists using standard BI and SQL tools.</p>
<ul>
<li><strong>Primary Use Case:</strong> Interactive, ad-hoc queries and data exploration.</li>
<li><strong>How it Works:</strong> Kyuubi implements the HiveServer2 Thrift protocol, which is the standard for SQL-on-Big-Data. This allows any tool that can communicate with Hive or Spark SQL to connect seamlessly to Kyuubi using a standard JDBC/ODBC driver.</li>
<li><strong>Tools:</strong> Tableau, PowerBI, DBeaver, Jupyter Notebooks (with PyHive), and other SQL-native clients.</li>
<li><strong>Gateway&rsquo;s Role:</strong> From the user&rsquo;s perspective, they are connecting to a traditional database. They don&rsquo;t need engine-specific drivers or complex connection strings. The immense complexity of the underlying Spark or Flink engines is completely abstracted away behind one familiar JDBC/ODBC endpoint. The Thrift interface uses long-lived connections, making it well-suited for the back-and-forth nature of an interactive session.</li>
</ul>
<blockquote>
<p>💡 <strong>Note:</strong> While tools typically use JDBC or ODBC drivers, these drivers communicate over the Thrift protocol—hence the term &ldquo;Thrift API&rdquo; is technically accurate.</p>
</blockquote>

<h4 id="rest-api-interface-for-programmatic-workflows" data-numberify>REST API: Interface for Programmatic Workflows<a class="anchor ms-1" href="#rest-api-interface-for-programmatic-workflows"></a></h4>
<p>This interface is designed for automation, batch job submission, and integration into larger data pipelines.</p>
<ul>
<li><strong>Primary Use Case:</strong> Submitting and managing batch jobs (e.g., SQL, Scala, Python, or JAR files) programmatically. This is perfect for ETL/ELT workflows, scheduled reports, and machine learning model training.</li>
<li><strong>How it Works:</strong> The REST API operates over standard, short-lived HTTP connections. Clients can submit a job definition (including the code and execution parameters) to a REST endpoint. Kyuubi then manages the entire lifecycle of the batch job on the user&rsquo;s behalf. This stateless approach is ideal for automation and works well with load balancers in a high-availability setup.</li>
<li><strong>Tools:</strong>
<ul>
<li><strong>Programmatic Clients:</strong> Custom applications, workflow orchestrators like Apache Airflow, or scripts using tools like <code>curl</code>.</li>
<li><strong>Command-Line Tool:</strong> Kyuubi provides <code>kyuubi-ctl</code>, a command-line interface that uses the REST API for managing batch jobs via YAML files.</li>
</ul>
</li>
</ul>
<blockquote>
<p>💡 <strong>Tip:</strong> Because of its stateless nature, the REST API is particularly well-suited for distributed environments and works seamlessly with CI/CD systems and job schedulers.</p>
</blockquote>

<h3 id="2-gateway-layer-multi-tenant-control-plane" data-numberify>2. Gateway Layer: Multi-Tenant Control Plane<a class="anchor ms-1" href="#2-gateway-layer-multi-tenant-control-plane"></a></h3>
<p><em>This is the heart of the architecture, where raw compute power is transformed into a stable, governed service.</em></p>
<p>This layer is where Apache Kyuubi lives. It is not merely a pass-through proxy; it is an intelligent, distributed control plane responsible for security, stability, and resource management. It acts as the indispensable intermediary between a diverse set of users and a powerful but complex set of backend engines.</p>

<h4 id="high-availability-and-fault-tolerance" data-numberify>High Availability and Fault Tolerance<a class="anchor ms-1" href="#high-availability-and-fault-tolerance"></a></h4>
<p>A gateway cannot be a single point of failure. Kyuubi is designed for resilience and can be deployed in a high-availability (HA) configuration. You run multiple Kyuubi server instances, stateless by nature, behind a load balancer. Session information and operational metadata are stored in a shared, external state store (like ZooKeeper or Etcd). If one Kyuubi server goes down, the load balancer redirects clients to a healthy instance, which can then recover the user&rsquo;s session state from the shared store, providing seamless failover for end-users.</p>

<h4 id="centralized-authentication-hub" data-numberify>Centralized Authentication Hub<a class="anchor ms-1" href="#centralized-authentication-hub"></a></h4>
<p>Before any query is processed, the user must be authenticated. The gateway serves as the single, secure entry point for the entire platform. It integrates directly with standard enterprise authentication protocols like <strong>LDAP</strong> and <strong>Kerberos</strong>. This centralizes all authentication logic. Instead of exposing multiple engines and configuring security on each one, you secure one endpoint: the gateway. This simplifies client configuration and dramatically reduces the security attack surface.</p>

<h4 id="true-multi-tenancy-through-dynamic-resource-isolation" data-numberify>True Multi-Tenancy through Dynamic Resource Isolation<a class="anchor ms-1" href="#true-multi-tenancy-through-dynamic-resource-isolation"></a></h4>
<p>This is Kyuubi&rsquo;s most critical feature. It enforces multi-tenancy not just logically, but physically.</p>
<ul>
<li>
<p><strong>Engine-per-User Isolation:</strong> When a user connects and runs a query, Kyuubi does not send it to a shared, monolithic cluster. Instead, it dynamically provisions a dedicated compute engine (e.g., a Spark application on Kubernetes) scoped <strong>specifically to that user or their session</strong>. This is the key to true multi-tenancy: one user&rsquo;s poorly written query consuming 100% of its engine&rsquo;s CPU has zero impact on the performance or stability of another user&rsquo;s engine.</p>
</li>
<li>
<p><strong>Configurable Sharing Levels:</strong> Kyuubi also offers flexible engine-sharing modes to balance isolation and performance. While the default is to isolate by user (<code>USER</code> level), you can configure it to share an engine at the <code>CONNECTION</code> level or even create server-scoped engines for specific use cases, giving architects fine-grained control over resource trade-offs.</p>
</li>
</ul>

<h4 id="unified-interface-and-intelligent-routing" data-numberify>Unified Interface and Intelligent Routing<a class="anchor ms-1" href="#unified-interface-and-intelligent-routing"></a></h4>
<p>The gateway abstracts away the diversity of the backend. A user connects to a single JDBC/ODBC endpoint and speaks standard SQL. They don&rsquo;t need to know which engine is best for their job.</p>
<p>Based on session properties set by the user or administrator (e.g., <code>kyuubi.engine.type=FLINK</code>), the gateway intelligently routes the request. It accepts the standard SQL, identifies the target engine type, manages the entire lifecycle of that engine (provisioning, query submission, termination), and streams the results back. This turns a complex ecosystem of Spark, Flink, and Trino into a single, unified, and easy-to-use SQL service for the entire organization.</p>

<h3 id="3-compute-layer-powerhouse-engines" data-numberify>3. Compute Layer: Powerhouse Engines<a class="anchor ms-1" href="#3-compute-layer-powerhouse-engines"></a></h3>
<p><em>What does the gateway connect to? Isolated, on-demand Spark, Flink, and Trino engines.</em></p>
<p>This is where the actual data processing occurs. The compute layer consists of powerful, specialized engines, but they are no longer directly exposed to users. Instead, they are treated as a backend resource, managed entirely by the gateway based on the workload.</p>
<ul>
<li><strong>Engines:</strong>
<ul>
<li><strong>Apache Spark:</strong> The workhorse for large-scale batch SQL processing and ETL.</li>
<li><strong>Apache Flink:</strong> The engine for real-time, continuous SQL queries on streaming data.</li>
<li><strong>Trino:</strong> The engine for high-performance, interactive federated queries across disparate data sources.</li>
</ul>
</li>
</ul>

<h4 id="gateways-role-a-lifecycle-manager-for-compute" data-numberify>Gateway&rsquo;s Role: A Lifecycle Manager for Compute<a class="anchor ms-1" href="#gateways-role-a-lifecycle-manager-for-compute"></a></h4>
<p>Kyuubi&rsquo;s most important function is to act as an intelligent and dynamic lifecycle manager for these powerful engines. It doesn&rsquo;t just proxy queries; it completely abstracts the complexity of resource management away from the end-user. Here’s how:</p>
<ul>
<li>
<p><strong>On-Demand, User-Scoped Provisioning:</strong> When a user connects and runs their first query, Kyuubi intercepts the request. It authenticates the user and, based on predefined rules, submits a request to a resource manager (like Kubernetes or YARN) to launch a brand new engine instance <strong>specifically for that user or session</strong>. This is the core of multi-tenancy: the engine is created on behalf of the user, runs with their permissions, and is completely isolated from other users&rsquo; engines. This eliminates the &ldquo;noisy neighbor&rdquo; problem, where one user&rsquo;s heavy query can destabilize the entire platform.</p>
</li>
<li>
<p><strong>Intelligent Caching and Sharing:</strong> While isolating engines is key, constantly launching new ones can introduce latency. Kyuubi manages this intelligently. It can be configured to keep a warm pool of engines ready or to share an engine across multiple queries from the <strong>same user</strong> within a single session. This provides the performance of a long-running session without sacrificing the isolation between different users.</p>
</li>
<li>
<p><strong>Automatic Termination and Cost Control:</strong> Kyuubi is also responsible for cleanup. It constantly monitors the engines it has launched. If an engine sits idle for a configured period (e.g., 30 minutes), Kyuubi will automatically terminate it and release its resources back to the cluster. This is absolutely critical for cost-efficiency in a cloud environment, ensuring you only pay for compute resources when they are actively being used.</p>
</li>
<li>
<p><strong>Seamless Abstraction:</strong> From the user&rsquo;s perspective, this entire lifecycle is invisible. They write and execute standard SQL. They don&rsquo;t need to know how to write a <code>spark-submit</code> command, craft a Kubernetes pod YAML file, or worry about memory allocation. Kyuubi handles the translation from a simple SQL query to a complex, resource-managed job on a distributed engine, ensuring the right tool is used for the right job without exposing any of the underlying complexity.</p>
</li>
</ul>

<h3 id="4-governance-layer-centralized-control-at-the-gateway" data-numberify>4. Governance Layer: Centralized Control at the Gateway<a class="anchor ms-1" href="#4-governance-layer-centralized-control-at-the-gateway"></a></h3>
<p><em>How do we enforce rules? The gateway is the natural &ldquo;chokepoint&rdquo; for governance.</em></p>
<p>In a self-service world, governance must be automated. The gateway architecture makes this feasible because every single query and every user session must pass through it. This creates a natural control point for applying rules, though the specific capabilities can vary by the backend engine.</p>

<h4 id="authorization-an-engine-specific-approach" data-numberify>Authorization: An Engine-Specific Approach<a class="anchor ms-1" href="#authorization-an-engine-specific-approach"></a></h4>
<p>The gateway acts as the central point for authenticating users and integrating with an authorization engine like <strong>Apache Ranger</strong>. However, the actual <em>enforcement</em> of security policies is delegated to the backend compute engines. This approach leverages the native security features of each engine, meaning the depth of integration differs across the platform, which is a critical architectural consideration.</p>
<ul>
<li><strong>For Apache Spark:</strong> The integration is the most mature and powerful. Kyuubi provides a specialized <strong>Kyuubi Spark Authz Plugin</strong> that deeply integrates with Apache Ranger. This plugin allows Ranger to enforce fine-grained policies—including <strong>row-level filtering and column-level masking</strong>—directly within the Spark engine for all Spark SQL queries. The gateway authenticates the user, launches a Spark engine on their behalf, and the engine then uses the Authz plugin to check with Ranger and apply policies before execution. This provides robust, centralized security for all Spark-based workloads.</li>
<li><strong>For Apache Flink:</strong> The story is more about isolation and coarse-grained control. Kyuubi provides essential security for Flink through <strong>robust authentication and session isolation</strong>. This guarantees that only authenticated users can submit jobs and that their Flink sessions are isolated from others. However, fine-grained, policy-based authorization for Flink jobs is not handled by a Kyuubi plugin. Instead, security must be enforced at the data source level. A common pattern is to use Flink&rsquo;s <code>HiveCatalog</code> to read data from Hive tables, where Apache Ranger&rsquo;s existing Hive-level policies can be applied for access control.</li>
<li><strong>For Trino:</strong> Authorization is handled by delegating to Trino&rsquo;s own powerful security model. Trino has a native <strong>Apache Ranger plugin</strong> that provides comprehensive, fine-grained access control. This plugin supports policies for catalogs, schemas, tables, and columns, as well as <strong>row-level filtering and column masking</strong>. In this architecture, Kyuubi&rsquo;s role is to authenticate the user and then pass the user&rsquo;s identity securely to the Trino engine. Trino then uses its own Ranger plugin to enforce the centralized policies. This allows organizations to manage Spark and Trino permissions within the same Ranger UI, while the enforcement happens natively within each respective engine.</li>
</ul>

<h4 id="lineage-a-universal-automated-view-of-your-datas-journey" data-numberify>Lineage: A Universal, Automated View of Your Data&rsquo;s Journey<a class="anchor ms-1" href="#lineage-a-universal-automated-view-of-your-datas-journey"></a></h4>
<p>Unlike authorization, which can be engine-specific, data lineage is applied universally and automatically across the platform. This creates a complete, trustworthy audit trail for every query, which is essential for data governance, impact analysis, and debugging complex data flows.</p>
<ul>
<li><strong>How It Works: Automated Agent Injection</strong>
The gateway-centric design is the key to automating lineage collection. Kyuubi is configured to automatically inject a lightweight <strong>OpenLineage</strong> agent into the runtime of every single Spark and Flink engine it provisions. Users do not need to add any special libraries or modify their code; lineage capture is a guaranteed part of the platform&rsquo;s execution process. For Trino, lineage is captured via a separate, native OpenLineage plugin configured directly within the Trino coordinator.</li>
<li><strong>What is Captured? Rich, Actionable Metadata</strong>
As jobs run, the OpenLineage agent observes the execution plan and collects detailed metadata. This isn&rsquo;t just a high-level overview. The agent captures:
<ul>
<li><strong>Job Information:</strong> The name of the job, its start and end times, and whether it succeeded or failed.</li>
<li><strong>Dataset Information (Inputs and Outputs):</strong> The physical sources (e.g., S3 paths, Kafka topics, database tables) and destinations of the data.</li>
<li><strong>Column-Level Lineage (for Spark and Trino):</strong> For supported engines, the agent can trace the dependencies between specific columns, showing exactly how an output column was derived from one or more input columns. This is invaluable for tracking sensitive data and understanding complex transformations.</li>
<li><strong>Operational Statistics:</strong> Metadata about the run itself, such as the number of rows written or bytes processed.</li>
</ul>
</li>
<li><strong>Governance Backend: Visualizing the Flow</strong>
The agent sends this stream of standardized JSON-formatted lineage events to a compatible metadata service, with <strong>Marquez</strong> being the reference implementation for OpenLineage. Marquez consumes this metadata and builds a comprehensive, interactive graph of your entire data ecosystem. This provides a single, unified map showing how all datasets are created and consumed across Spark, Flink, and Trino, answering critical questions:
<ul>
<li>&ldquo;If I change this table schema, what downstream jobs and dashboards will break?&rdquo;</li>
<li>&ldquo;This report looks wrong. Where did the data it uses come from?&rdquo;</li>
<li>&ldquo;Which jobs are processing PII data?&rdquo;</li>
</ul>
</li>
</ul>
<p>By making lineage collection an automatic, non-negotiable part of the platform architecture, governance is no longer an afterthought. It becomes a reliable, built-in feature that provides a complete and auditable view of how data moves and transforms across the entire organization.</p>

<h3 id="5-catalog--storage-layer-foundation-of-truth" data-numberify>5. Catalog & Storage Layer: Foundation of Truth<a class="anchor ms-1" href="#5-catalog--storage-layer-foundation-of-truth"></a></h3>
<p><em>Where does the data live? In a standard data lake, accessed via the gateway&rsquo;s managed engines.</em></p>
<p>This layer is the physical and logical foundation of the platform. The gateway itself doesn&rsquo;t store data, but it manages the engines that access it.</p>
<ul>
<li><strong>Storage:</strong> Cloud object stores like Amazon S3, GCS, or ADLS.</li>
<li><strong>Table Formats:</strong> Open table formats like <strong>Apache Iceberg</strong>, <strong>Apache Hudi</strong>, <strong>Delta Lake</strong>, and <strong>Apache Paimon</strong>, which provide transactional (ACID) capabilities on top of raw data files in your data lake.</li>
<li><strong>Catalog:</strong> A universal Hive Metastore.</li>
<li><strong>Gateway&rsquo;s Role:</strong> The gateway ensures that all access to the data lake is mediated. A user cannot simply spin up their own Spark session to bypass rules. For example, to read an Iceberg table, they must submit a SQL query through Kyuubi. The gateway then authenticates them and, particularly for Spark, ensures they are authorized before provisioning an engine to perform the read.</li>
</ul>

<h2 id="use-case-1-interactive-batch-query-spark" data-numberify>Use Case 1: Interactive Batch Query (Spark)<a class="anchor ms-1" href="#use-case-1-interactive-batch-query-spark"></a></h2>
<p>Let&rsquo;s trace a classic analytics workflow to see the architecture in action.</p>
<ol>
<li>An analyst connects <strong>Tableau</strong> to the <strong>Kyuubi JDBC</strong> endpoint.</li>
<li>They write a SQL query joining several large Iceberg tables. Tableau sends this standard SQL to the <strong>Kyuubi Gateway</strong>.</li>
<li><strong>Kyuubi</strong> receives the query. It authenticates the user and checks their detailed permissions with <strong>Apache Ranger</strong>.</li>
<li>Seeing this is the user&rsquo;s first query, Kyuubi provisions a new, dedicated <strong>Spark application</strong> for them on Kubernetes, injecting the OpenLineage agent.</li>
<li>Kyuubi forwards the SQL to the user&rsquo;s isolated Spark engine.</li>
<li>The Spark engine uses the <strong>Hive Metastore</strong> to find the data&rsquo;s location on <strong>S3</strong> and executes the query.</li>
<li>Results are streamed back <em>through the Spark engine, to the Kyuubi gateway, and finally to Tableau</em>. The entire process is isolated, governed, and transparent to the user.</li>
</ol>

<h2 id="use-case-2-real-time-streaming-query-flink--kafka" data-numberify>Use Case 2: Real-Time Streaming Query (Flink & Kafka)<a class="anchor ms-1" href="#use-case-2-real-time-streaming-query-flink--kafka"></a></h2>
<p>The platform&rsquo;s true power is its ability to handle more than just batch analytics. Let&rsquo;s see how an analyst can query a live stream of clickstream data from <strong>Apache Kafka</strong> using Flink SQL, all through the same gateway.</p>
<p>The user connects their SQL client to the <em>exact same</em> Kyuubi JDBC endpoint. They then define a table backed by a Kafka topic and query it continuously:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- Step 1: Map a table structure onto a live Kafka topic.
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1">-- The underlying Flink engine uses its Kafka connector to handle this.
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">ClickStream</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="o">`</span><span class="n">user_id</span><span class="o">`</span><span class="w"> </span><span class="nb">BIGINT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="o">`</span><span class="n">url</span><span class="o">`</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="o">`</span><span class="n">event_timestamp</span><span class="o">`</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="n">METADATA</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="s1">&#39;timestamp&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="n">WATERMARK</span><span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="o">`</span><span class="n">event_timestamp</span><span class="o">`</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="o">`</span><span class="n">event_timestamp</span><span class="o">`</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">&#39;5&#39;</span><span class="w"> </span><span class="k">SECOND</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="s1">&#39;connector&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;kafka&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">    </span><span class="s1">&#39;topic&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;clickstream_events&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="s1">&#39;properties.bootstrap.servers&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;kafka-broker-1:9092,...&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">    </span><span class="s1">&#39;scan.startup.mode&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;latest-offset&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">    </span><span class="s1">&#39;format&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;json&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w"></span><span class="c1">-- Step 2: Run a continuous query against the live stream.
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1">-- This query will run indefinitely, pushing new results to the client as they arrive.
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">    </span><span class="n">TUMBLE_START</span><span class="p">(</span><span class="n">event_timestamp</span><span class="p">,</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">&#39;1&#39;</span><span class="w"> </span><span class="k">MINUTE</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">window_start</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">    </span><span class="n">url</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">    </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">clicks</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">    </span><span class="n">ClickStream</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w"></span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">    </span><span class="n">TUMBLE</span><span class="p">(</span><span class="n">event_timestamp</span><span class="p">,</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">&#39;1&#39;</span><span class="w"> </span><span class="k">MINUTE</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">    </span><span class="n">url</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p><strong>How it Works:</strong></p>
<ol>
<li>The user sends the <code>CREATE TABLE</code> and <code>SELECT</code> statements to the <strong>Kyuubi JDBC</strong> endpoint.</li>
<li><strong>Kyuubi</strong> authenticates the user. Based on session properties, it recognizes this as a Flink workload and provisions a dedicated <strong>Flink session cluster</strong> on Kubernetes, again injecting the OpenLineage agent.</li>
<li>Kyuubi submits the SQL to the Flink cluster.</li>
<li>Flink&rsquo;s SQL engine parses the query. It uses its Kafka connector to connect to the <code>clickstream_events</code> topic and begins consuming the JSON data stream.</li>
<li>The <code>SELECT</code> query runs continuously. As new data arrives in Kafka, Flink processes it, calculates the tumbling window aggregates, and streams the updated results back <em>through the Kyuubi gateway to the analyst&rsquo;s SQL client</em>.</li>
<li>Simultaneously, the <strong>OpenLineage</strong> agent reports to <strong>Marquez</strong> that a new data flow has been established, drawing a lineage graph from the Kafka topic to this Flink SQL job.</li>
</ol>

<h2 id="conclusion-the-power-of-a-gateway-centric-design" data-numberify>Conclusion: The Power of a Gateway-Centric Design<a class="anchor ms-1" href="#conclusion-the-power-of-a-gateway-centric-design"></a></h2>
<p>Building a successful self-service platform is not about exposing raw power, but about providing controlled, stable, and governed access. By embracing a gateway-centric architecture built around the principles embodied by Apache Kyuubi, you can finally resolve the conflict between user empowerment and platform stability. The gateway acts as the indispensable control plane, turning a potential &ldquo;Wild West&rdquo; of big data engines into a well-regulated, multi-tenant, and powerful SQL service for the entire organization—for both batch and real-time workloads.</p>

      ]]></content:encoded></item><item><title>Flink Table API - Declarative Analytics for Supplier Stats in Real Time</title><link>https://jaehyeon.me/blog/2025-06-17-kotlin-getting-started-flink-table/</link><guid>https://jaehyeon.me/blog/2025-06-17-kotlin-getting-started-flink-table/</guid><pubDate>Tue, 17 Jun 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In the last post, we explored the fine-grained control of Flink&rsquo;s DataStream API. Now, we&rsquo;ll approach the same problem from a higher level of abstraction using the <strong>Flink Table API</strong>. This post demonstrates how to build a declarative analytics pipeline that processes our continuous stream of Avro-formatted order events. We will define a <code>Table</code> on top of a <code>DataStream</code> and use SQL-like expressions to perform windowed aggregations. This example highlights the power and simplicity of the Table API for analytical tasks and showcases Flink&rsquo;s seamless integration between its different API layers to handle complex requirements like late data.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In the last post, we explored the fine-grained control of Flink&rsquo;s DataStream API. Now, we&rsquo;ll approach the same problem from a higher level of abstraction using the <strong>Flink Table API</strong>. This post demonstrates how to build a declarative analytics pipeline that processes our continuous stream of Avro-formatted order events. We will define a <code>Table</code> on top of a <code>DataStream</code> and use SQL-like expressions to perform windowed aggregations. This example highlights the power and simplicity of the Table API for analytical tasks and showcases Flink&rsquo;s seamless integration between its different API layers to handle complex requirements like late data.</p>
<ul>
<li><a href="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients">Kafka Clients with JSON - Producing and Consuming Order Events</a></li>
<li><a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients">Kafka Clients with Avro - Schema Registry and Order Events</a></li>
<li><a href="/blog/2025-06-03-kotlin-getting-started-kafka-streams">Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">Flink DataStream API - Scalable Event Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-17-kotlin-getting-started-flink-table/#">Flink Table API - Declarative Analytics for Supplier Stats in Real Time</a> (this post)</li>
</ul>

<h2 id="flink-table-application" data-numberify>Flink Table Application<a class="anchor ms-1" href="#flink-table-application"></a></h2>
<p>We develop a Flink application that uses Flink&rsquo;s Table API and SQL-like expressions to perform real-time analytics. This application:</p>
<ul>
<li>Consumes Avro-formatted order events from a Kafka topic.</li>
<li>Uses a mix of the DataStream and Table APIs to prepare data, handle late events, and define watermarks.</li>
<li>Defines a table over the streaming data, complete with an event-time attribute and watermarks.</li>
<li>Runs a declarative, SQL-like query to compute supplier statistics (total price and count) in 5-second tumbling windows.</li>
<li>Splits the stream to route late-arriving records to a separate &ldquo;skipped&rdquo; topic for analysis.</li>
<li>Sinks the aggregated results to a Kafka topic using the built-in <code>avro-confluent</code> format connector.</li>
</ul>
<p>The source code for the application discussed in this post can be found in the <em>orders-stats-flink</em> folder of this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="the-build-configuration" data-numberify>The Build Configuration<a class="anchor ms-1" href="#the-build-configuration"></a></h3>
<p>The <code>build.gradle.kts</code> file sets up the project, its dependencies, and packaging. It&rsquo;s shared between the DataStream and Table API applications - see <a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">the previous post</a> for the Flink application that uses the DataStream API.</p>
<ul>
<li><strong>Plugins:</strong>
<ul>
<li><code>kotlin(&quot;jvm&quot;)</code>: Enables Kotlin language support.</li>
<li><code>com.github.davidmc24.gradle.plugin.avro</code>: Compiles Avro schemas into Java classes.</li>
<li><code>com.github.johnrengelman.shadow</code>: Creates an executable &ldquo;fat JAR&rdquo; with all dependencies.</li>
<li><code>application</code>: Configures the project to be runnable via Gradle.</li>
</ul>
</li>
<li><strong>Dependencies:</strong>
<ul>
<li><strong>Flink Core &amp; Table APIs:</strong> <code>flink-streaming-java</code>, <code>flink-clients</code>, and crucially, <code>flink-table-api-java-bridge</code>, <code>flink-table-planner-loader</code>, and <code>flink-table-runtime</code> for the Table API.</li>
<li><strong>Flink Connectors:</strong> <code>flink-connector-kafka</code> for Kafka integration.</li>
<li><strong>Flink Formats:</strong> <code>flink-avro</code> and <code>flink-avro-confluent-registry</code> for handling Avro data with Confluent Schema Registry.</li>
<li><strong>Note on Dependency Scope:</strong> The Flink dependencies are declared with <code>implementation</code>. This allows the application to be run directly with <code>./gradlew run</code>. For production deployments on a Flink cluster (where the Flink runtime is already provided), these dependencies should be changed to <code>compileOnly</code> to significantly reduce the size of the final JAR.</li>
</ul>
</li>
<li><strong>Application Configuration:</strong>
<ul>
<li>The <code>application</code> block sets the <code>mainClass</code> and passes necessary JVM arguments. The <code>run</code> task is configured with environment variables to specify Kafka and Schema Registry connection details.</li>
</ul>
</li>
<li><strong>Avro &amp; Shadow JAR:</strong>
<ul>
<li>The <code>avro</code> block configures code generation.</li>
<li>The <code>shadowJar</code> task configures the output JAR name and merges service files, which is crucial for Flink connectors to work correctly.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">plugins</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;jvm&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;2.1.20&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.davidmc24.gradle.plugin.avro&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;1.9.1&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.johnrengelman.shadow&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;8.1.1&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">application</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">group</span> <span class="p">=</span> <span class="s2">&#34;me.jaehyeon&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">version</span> <span class="p">=</span> <span class="s2">&#34;1.0-SNAPSHOT&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">repositories</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">mavenCentral</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">maven</span><span class="p">(</span><span class="s2">&#34;https://packages.confluent.io/maven&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// Flink Core and APIs
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-streaming-java:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-api-java:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-api-java-bridge:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-planner-loader:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-runtime:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-clients:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-connector-base:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="c1">// Flink Kafka and Avro
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-connector-kafka:3.4.0-1.20&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-avro:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-avro-confluent-registry:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">    <span class="c1">// Json
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;com.fasterxml.jackson.module:jackson-module-kotlin:2.13.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">    <span class="c1">// Logging
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.github.microutils:kotlin-logging-jvm:3.0.5&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;ch.qos.logback:logback-classic:1.5.13&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="c1">// Kotlin test
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;test&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">
</span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="n">kotlin</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="n">jvmToolchain</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="n">application</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">    <span class="n">mainClass</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;me.jaehyeon.MainKt&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">applicationDefaultJvmArgs</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">        <span class="n">listOf</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">            <span class="s2">&#34;--add-opens=java.base/java.util=ALL-UNNAMED&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">
</span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="n">avro</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">    <span class="n">setCreateSetters</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">    <span class="n">setFieldVisibility</span><span class="p">(</span><span class="s2">&#34;PUBLIC&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">
</span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;compileKotlin&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;generateAvroJava&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">
</span></span><span class="line"><span class="ln">59</span><span class="cl"><span class="n">sourceSets</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="n">named</span><span class="p">(</span><span class="s2">&#34;main&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">        <span class="n">java</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;build/generated/avro/main&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">        <span class="n">kotlin</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;src/main/kotlin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">
</span></span><span class="line"><span class="ln">66</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">withType</span><span class="p">&lt;</span><span class="n">com</span><span class="p">.</span><span class="n">github</span><span class="p">.</span><span class="n">jengelman</span><span class="p">.</span><span class="n">gradle</span><span class="p">.</span><span class="n">plugins</span><span class="p">.</span><span class="n">shadow</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">ShadowJar</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">    <span class="n">archiveBaseName</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;orders-stats-flink&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">    <span class="n">archiveClassifier</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">    <span class="n">archiveVersion</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">    <span class="n">mergeServiceFiles</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">
</span></span><span class="line"><span class="ln">73</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;build&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">76</span><span class="cl">
</span></span><span class="line"><span class="ln">77</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">&lt;</span><span class="n">JavaExec</span><span class="p">&gt;(</span><span class="s2">&#34;run&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">78</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;TO_SKIP_PRINT&#34;</span><span class="p">,</span> <span class="s2">&#34;false&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">79</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">,</span> <span class="s2">&#34;localhost:9092&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">,</span> <span class="s2">&#34;http://localhost:8081&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">81</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">
</span></span><span class="line"><span class="ln">83</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">test</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl">    <span class="n">useJUnitPlatform</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">85</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="avro-schema-for-supplier-statistics" data-numberify>Avro Schema for Supplier Statistics<a class="anchor ms-1" href="#avro-schema-for-supplier-statistics"></a></h3>
<p>The <code>SupplierStats.avsc</code> file defines the structure for the aggregated output data. This schema is used by the Flink Table API&rsquo;s Kafka connector with the <code>avro-confluent</code> format to serialize the final <code>Row</code> results into Avro, ensuring type safety for downstream consumers.</p>
<ul>
<li><strong>Type:</strong> A <code>record</code> named <code>SupplierStats</code> in the <code>me.jaehyeon.avro</code> namespace.</li>
<li><strong>Fields:</strong>
<ul>
<li><code>window_start</code> and <code>window_end</code> (string): The start and end times of the aggregation window.</li>
<li><code>supplier</code> (string): The supplier being aggregated.</li>
<li><code>total_price</code> (double): The sum of order prices within the window.</li>
<li><code>count</code> (long): The total number of orders within the window.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;record&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;SupplierStats&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="nt">&#34;namespace&#34;</span><span class="p">:</span> <span class="s2">&#34;me.jaehyeon.avro&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nt">&#34;fields&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_start&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_end&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;total_price&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;double&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;count&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;long&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="shared-utilities" data-numberify>Shared Utilities<a class="anchor ms-1" href="#shared-utilities"></a></h3>
<p>These utility files provide common functionality used by both the DataStream and Table API applications.</p>

<h4 id="kafka-admin-utilities" data-numberify>Kafka Admin Utilities<a class="anchor ms-1" href="#kafka-admin-utilities"></a></h4>
<p>This file provides two key helper functions for interacting with the Kafka ecosystem:</p>
<ul>
<li><strong><code>createTopicIfNotExists(...)</code></strong>: Uses Kafka&rsquo;s <code>AdminClient</code> to programmatically create topics. It&rsquo;s designed to be idempotent, safely handling cases where the topic already exists to prevent application startup failures.</li>
<li><strong><code>getLatestSchema(...)</code></strong>: Connects to the Confluent Schema Registry using <code>CachedSchemaRegistryClient</code> to fetch the latest Avro schema for a given subject. This is essential for the Flink source to correctly deserialize incoming Avro records without hardcoding the schema in the application.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.Schema</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClient</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClientConfig</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.NewTopic</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.TopicExistsException</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.use</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">fun</span> <span class="nf">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">topicName</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">numPartitions</span><span class="p">:</span> <span class="n">Int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">replicationFactor</span><span class="p">:</span> <span class="n">Short</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">
</span></span><span class="line"><span class="ln">30</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="k">val</span> <span class="py">newTopic</span> <span class="p">=</span> <span class="n">NewTopic</span><span class="p">(</span><span class="n">topicName</span><span class="p">,</span> <span class="n">numPartitions</span><span class="p">,</span> <span class="n">replicationFactor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">val</span> <span class="py">result</span> <span class="p">=</span> <span class="n">client</span><span class="p">.</span><span class="n">createTopics</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">newTopic</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">
</span></span><span class="line"><span class="ln">34</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Attempting to create topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">            <span class="n">result</span><span class="p">.</span><span class="n">all</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; created successfully!&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span> <span class="k">is</span> <span class="n">TopicExistsException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; was created concurrently or already existed. Continuing...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">                <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while creating a topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="k">fun</span> <span class="nf">getLatestSchema</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">    <span class="n">schemaSubject</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="p">):</span> <span class="n">Schema</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">    <span class="k">val</span> <span class="py">schemaRegistryClient</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">        <span class="n">CachedSchemaRegistryClient</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">            <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">            <span class="m">100</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">            <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">    <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Fetching latest schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39; from </span><span class="si">$registryUrl</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">        <span class="k">val</span> <span class="py">latestSchemaMetadata</span> <span class="p">=</span> <span class="n">schemaRegistryClient</span><span class="p">.</span><span class="n">getLatestSchemaMetadata</span><span class="p">(</span><span class="n">schemaSubject</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">            <span class="s2">&#34;Successfully fetched schema ID </span><span class="si">${latestSchemaMetadata.id}</span><span class="s2"> version </span><span class="si">${latestSchemaMetadata.version}</span><span class="s2"> for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39;&#34;</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">        <span class="k">return</span> <span class="nc">Schema</span><span class="p">.</span><span class="n">Parser</span><span class="p">().</span><span class="n">parse</span><span class="p">(</span><span class="n">latestSchemaMetadata</span><span class="p">.</span><span class="n">schema</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to retrieve schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39; from registry </span><span class="si">$registryUrl</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">        <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Failed to retrieve schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39;&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="flink-kafka-connectors" data-numberify>Flink Kafka Connectors<a class="anchor ms-1" href="#flink-kafka-connectors"></a></h4>
<p>This file centralizes the creation of Flink&rsquo;s Kafka sources and sinks. The Table API application uses <code>createOrdersSource</code> and <code>createSkippedSink</code> from this file. The sink for aggregated statistics is defined declaratively using a <code>TableDescriptor</code> instead of the <code>createStatsSink</code> function.</p>
<ul>
<li><strong><code>createOrdersSource(...)</code>:</strong> Configures a <code>KafkaSource</code> to consume <code>GenericRecord</code> Avro data.</li>
<li><strong><code>createSkippedSink(...)</code>:</strong> Creates a generic <code>KafkaSink</code> for late records, which are handled as simple key-value string pairs.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.SupplierStats</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.Schema</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.base.DeliveryGuarantee</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.sink.KafkaSink</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.source.KafkaSource</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.formats.avro.registry.confluent.ConfluentRegistryAvroDeserializationSchema</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.formats.avro.registry.confluent.ConfluentRegistryAvroSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerConfig</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerConfig</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">import</span> <span class="nn">java.nio.charset.StandardCharsets</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">
</span></span><span class="line"><span class="ln"> 18</span><span class="cl"><span class="k">fun</span> <span class="nf">createOrdersSource</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">    <span class="n">groupId</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">    <span class="n">schema</span><span class="p">:</span> <span class="n">Schema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSource</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">    <span class="n">KafkaSource</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="p">.</span><span class="n">setTopics</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">        <span class="p">.</span><span class="n">setGroupId</span><span class="p">(</span><span class="n">groupId</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">        <span class="p">.</span><span class="n">setStartingOffsets</span><span class="p">(</span><span class="nc">OffsetsInitializer</span><span class="p">.</span><span class="n">earliest</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">        <span class="p">.</span><span class="n">setValueOnlyDeserializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">            <span class="nc">ConfluentRegistryAvroDeserializationSchema</span><span class="p">.</span><span class="n">forGeneric</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">                <span class="n">schema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">                <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">                <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">            <span class="p">),</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">        <span class="p">).</span><span class="n">setProperties</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">FETCH_MAX_WAIT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;500&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">
</span></span><span class="line"><span class="ln"> 44</span><span class="cl"><span class="k">fun</span> <span class="nf">createStatsSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">    <span class="n">outputSubject</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSink</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">    <span class="n">KafkaSink</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">        <span class="p">.</span><span class="n">setKafkaProducerConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">ENABLE_IDEMPOTENCE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;true&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">LINGER_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;100&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">BATCH_SIZE_CONFIG</span><span class="p">,</span> <span class="p">(</span><span class="m">64</span> <span class="p">*</span> <span class="m">1024</span><span class="p">).</span><span class="n">toString</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">COMPRESSION_TYPE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;lz4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">        <span class="p">).</span><span class="n">setDeliveryGuarantee</span><span class="p">(</span><span class="nc">DeliveryGuarantee</span><span class="p">.</span><span class="n">AT_LEAST_ONCE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">        <span class="p">.</span><span class="n">setRecordSerializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">            <span class="n">KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">                <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">                <span class="p">.</span><span class="n">setTopic</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">                <span class="p">.</span><span class="n">setKeySerializationSchema</span> <span class="p">{</span> <span class="k">value</span><span class="p">:</span> <span class="n">SupplierStats</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">                    <span class="k">value</span><span class="p">.</span><span class="n">supplier</span><span class="p">.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">                <span class="p">}.</span><span class="n">setValueSerializationSchema</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">                    <span class="nc">ConfluentRegistryAvroSerializationSchema</span><span class="p">.</span><span class="n">forSpecific</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;(</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">                        <span class="n">SupplierStats</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">                        <span class="n">outputSubject</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">                        <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">                        <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">                    <span class="p">),</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">                <span class="p">).</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">
</span></span><span class="line"><span class="ln"> 78</span><span class="cl"><span class="k">fun</span> <span class="nf">createSkippedSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSink</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">    <span class="n">KafkaSink</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">        <span class="p">.</span><span class="n">setKafkaProducerConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">ENABLE_IDEMPOTENCE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;true&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">COMPRESSION_TYPE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;lz4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">KEY_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.ByteArraySerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">VALUE_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.ByteArraySerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">        <span class="p">).</span><span class="n">setDeliveryGuarantee</span><span class="p">(</span><span class="nc">DeliveryGuarantee</span><span class="p">.</span><span class="n">AT_LEAST_ONCE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">        <span class="p">.</span><span class="n">setRecordSerializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">            <span class="n">KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                <span class="p">.</span><span class="n">setTopic</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">                <span class="p">.</span><span class="n">setKeySerializationSchema</span> <span class="p">{</span> <span class="n">pair</span><span class="p">:</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">                    <span class="n">pair</span><span class="p">.</span><span class="n">first</span><span class="o">?.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">                <span class="p">}.</span><span class="n">setValueSerializationSchema</span> <span class="p">{</span> <span class="n">pair</span><span class="p">:</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">                    <span class="n">pair</span><span class="p">.</span><span class="n">second</span><span class="p">.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">                <span class="p">}.</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span></code></pre></div>
<h3 id="table-api-processing-logic" data-numberify>Table API Processing Logic<a class="anchor ms-1" href="#table-api-processing-logic"></a></h3>
<p>The following files contain the core logic specific to the Table API implementation.</p>

<h4 id="manual-late-data-routing" data-numberify>Manual Late Data Routing<a class="anchor ms-1" href="#manual-late-data-routing"></a></h4>
<p>Because the Table API does not have a direct equivalent to the DataStream API&rsquo;s <code>.sideOutputLateData()</code>, we must handle late records manually. This <code>ProcessFunction</code> is a key component. It inspects each record&rsquo;s timestamp against the current watermark and an <code>allowedLatenessMillis</code> threshold. Records deemed &ldquo;too late&rdquo; are routed to a side output, while on-time records are passed downstream to be converted into a <code>Table</code>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.flink.processing</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.functions.ProcessFunction</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.util.Collector</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.util.OutputTag</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">class</span> <span class="nc">LateDataRouter</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">lateOutputTag</span><span class="p">:</span> <span class="n">OutputTag</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">allowedLatenessMillis</span><span class="p">:</span> <span class="n">Long</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">)</span> <span class="p">:</span> <span class="n">ProcessFunction</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">,</span> <span class="n">RecordMap</span><span class="p">&gt;()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">init</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">require</span><span class="p">(</span><span class="n">allowedLatenessMillis</span> <span class="o">&gt;=</span> <span class="m">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="s2">&#34;allowedLatenessMillis cannot be negative. Got: </span><span class="si">$allowedLatenessMillis</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="nd">@Throws</span><span class="p">(</span><span class="n">Exception</span><span class="o">::</span><span class="k">class</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">processElement</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="k">value</span><span class="p">:</span> <span class="n">RecordMap</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="n">ctx</span><span class="p">:</span> <span class="n">ProcessFunction</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">,</span> <span class="n">RecordMap</span><span class="p">&gt;.</span><span class="n">Context</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="k">out</span><span class="p">:</span> <span class="n">Collector</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">        <span class="k">val</span> <span class="py">elementTimestamp</span><span class="p">:</span> <span class="n">Long</span><span class="p">?</span> <span class="p">=</span> <span class="n">ctx</span><span class="p">.</span><span class="n">timestamp</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">        <span class="k">val</span> <span class="py">currentWatermark</span><span class="p">:</span> <span class="n">Long</span> <span class="p">=</span> <span class="n">ctx</span><span class="p">.</span><span class="n">timerService</span><span class="p">().</span><span class="n">currentWatermark</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl">        <span class="c1">// Element has no timestamp or watermark is still at its initial value
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="c1"></span>        <span class="k">if</span> <span class="p">(</span><span class="n">elementTimestamp</span> <span class="o">==</span> <span class="k">null</span> <span class="o">||</span> <span class="n">currentWatermark</span> <span class="o">==</span> <span class="nc">Long</span><span class="p">.</span><span class="n">MIN_VALUE</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">            <span class="k">out</span><span class="p">.</span><span class="n">collect</span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">
</span></span><span class="line"><span class="ln">33</span><span class="cl">        <span class="c1">// Element has a timestamp and watermark is active.
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="c1"></span>        <span class="c1">// An element is &#34;too late&#34; if its timestamp is older than current watermark - allowed lateness.
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="c1"></span>        <span class="k">if</span> <span class="p">(</span><span class="n">elementTimestamp</span> <span class="p">&lt;</span> <span class="n">currentWatermark</span> <span class="p">-</span> <span class="n">allowedLatenessMillis</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">            <span class="n">ctx</span><span class="p">.</span><span class="n">output</span><span class="p">(</span><span class="n">lateOutputTag</span><span class="p">,</span> <span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">            <span class="k">out</span><span class="p">.</span><span class="n">collect</span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="timestamp-and-watermark-strategy-for-rows" data-numberify>Timestamp and Watermark Strategy for Rows<a class="anchor ms-1" href="#timestamp-and-watermark-strategy-for-rows"></a></h4>
<p>After the initial stream processing and before converting the <code>DataStream</code> to a <code>Table</code>, we need a watermark strategy that operates on Flink&rsquo;s <code>Row</code> type. This strategy extracts the timestamp from a specific field index in the <code>Row</code> (in this case, field 1, which holds the <code>bid_time</code> as a <code>Long</code>) and generates watermarks, allowing the Table API to correctly perform event-time windowing.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.flink.watermark</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.eventtime.WatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.types.Row</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">object</span> <span class="nc">RowWatermarkStrategy</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">val</span> <span class="py">strategy</span><span class="p">:</span> <span class="n">WatermarkStrategy</span><span class="p">&lt;</span><span class="n">Row</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="n">WatermarkStrategy</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="p">.</span><span class="n">forBoundedOutOfOrderness</span><span class="p">&lt;</span><span class="n">Row</span><span class="p">&gt;(</span><span class="n">java</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="p">.</span><span class="n">withTimestampAssigner</span> <span class="p">{</span> <span class="n">row</span><span class="p">:</span> <span class="n">Row</span><span class="p">,</span> <span class="n">_</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">                <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">                    <span class="c1">// Get the field by index. Assumes bid_time is at index 1 and is Long.
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span>                    <span class="k">val</span> <span class="py">timestamp</span> <span class="p">=</span> <span class="n">row</span><span class="p">.</span><span class="n">getField</span><span class="p">(</span><span class="m">1</span><span class="p">)</span> <span class="k">as</span><span class="p">?</span> <span class="n">Long</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">                    <span class="k">if</span> <span class="p">(</span><span class="n">timestamp</span> <span class="o">!=</span> <span class="k">null</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">                        <span class="n">timestamp</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">                    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Null or invalid timestamp at index 1 in Row: </span><span class="si">$row</span><span class="s2">. Using current time.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">                        <span class="nc">System</span><span class="p">.</span><span class="n">currentTimeMillis</span><span class="p">()</span> <span class="c1">// Fallback
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="c1"></span>                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">                    <span class="c1">// Catch potential ClassCastException or other issues
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"></span>                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error accessing timestamp at index 1 in Row: </span><span class="si">$row</span><span class="s2">. Using current time.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">                    <span class="nc">System</span><span class="p">.</span><span class="n">currentTimeMillis</span><span class="p">()</span> <span class="c1">// Fallback
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="c1"></span>                <span class="p">}</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">            <span class="p">}.</span><span class="n">withIdleness</span><span class="p">(</span><span class="n">java</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">10</span><span class="p">))</span> <span class="c1">// Same idleness
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="c1"></span><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="not-applicable-source-code" data-numberify>Not Applicable Source Code<a class="anchor ms-1" href="#not-applicable-source-code"></a></h4>
<p>The files <code>SupplierWatermarkStrategy</code>, <code>SupplierStatsAggregator</code>, and <code>SupplierStatsFunction</code> are used exclusively by the Flink DataStream API application for its specific watermark and aggregation logic. They are not relevant to this Table API implementation.</p>

<h3 id="core-table-api-application" data-numberify>Core Table API Application<a class="anchor ms-1" href="#core-table-api-application"></a></h3>
<p>This is the main driver for the Table API application. It demonstrates the powerful integration between the DataStream and Table APIs.</p>
<ol>
<li><strong>Environment Setup:</strong> It initializes both a <code>StreamExecutionEnvironment</code> and a <code>StreamTableEnvironment</code>.</li>
<li><strong>Data Ingestion and Preparation:</strong>
<ul>
<li>It consumes Avro <code>GenericRecord</code>s using a <code>KafkaSource</code> and maps them to a <code>DataStream&lt;RecordMap&gt;</code>.</li>
<li>It applies a <code>WatermarkStrategy</code> to the <code>RecordMap</code> stream so that the subsequent <code>LateDataRouter</code> can function correctly based on event time.</li>
</ul>
</li>
<li><strong>Late Data Splitting:</strong> It uses the custom <code>LateDataRouter</code> <code>ProcessFunction</code> to split the stream into an on-time stream and a late-data side output.</li>
<li><strong>DataStream-to-Table Conversion:</strong>
<ul>
<li>The on-time <code>DataStream&lt;RecordMap&gt;</code> is converted to a <code>DataStream&lt;Row&gt;</code>. This step transforms the data into the structured, columnar format required by the Table API.</li>
<li>A second <code>WatermarkStrategy</code> (<code>RowWatermarkStrategy</code>) is applied to the <code>DataStream&lt;Row&gt;</code>.</li>
<li><code>tEnv.createTemporaryView</code> registers the <code>DataStream&lt;Row&gt;</code> as a table named &ldquo;orders&rdquo;. A <code>Schema</code> is defined, crucially marking the <code>bid_time</code> column as the event-time attribute (<code>TIMESTAMP_LTZ(3)</code>) and telling Flink to use the watermarks generated by the DataStream (<code>SOURCE_WATERMARK()</code>).</li>
</ul>
</li>
<li><strong>Declarative Query:</strong> A high-level, declarative query is executed on the &ldquo;orders&rdquo; table. It uses <code>Tumble</code> to define 5-second windows and performs <code>groupBy</code> and <code>select</code> operations with aggregate functions (<code>sum</code>, <code>count</code>) to calculate the statistics.</li>
<li><strong>Sinking:</strong>
<ul>
<li>A <code>TableDescriptor</code> is defined for the Kafka sink. It specifies the sink schema and, most importantly, the <code>avro-confluent</code> format, which handles serialization to Avro and integration with Schema Registry automatically.</li>
<li><code>statsTable.executeInsert()</code> writes the results of the query to the sink.</li>
<li>The separate late data stream is processed and sunk to its own topic.</li>
</ul>
</li>
<li><strong>Execution:</strong> <code>env.execute()</code> starts the Flink job.</li>
</ol>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="nd">@file</span><span class="p">:</span><span class="n">Suppress</span><span class="p">(</span><span class="s2">&#34;ktlint:standard:no-wildcard-imports&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl">
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.ObjectMapper</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.module.kotlin.registerKotlinModule</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.LateDataRouter</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.RecordMap</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.watermark.RowWatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.watermark.SupplierWatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createOrdersSource</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createSkippedSink</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createTopicIfNotExists</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.getLatestSchema</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.eventtime.WatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.typeinfo.TypeHint</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.typeinfo.TypeInformation</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.java.typeutils.RowTypeInfo</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.datastream.DataStream</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.environment.StreamExecutionEnvironment</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.connectors.kafka.table.KafkaConnectorOptions</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.DataTypes</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.Expressions.*</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.Expressions.lit</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.FormatDescriptor</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.Schema</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.Table</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.TableDescriptor</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.Tumble</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.table.api.bridge.java.StreamTableEnvironment</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.types.Row</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.util.OutputTag</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Instant</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.LocalDateTime</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeParseException</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">
</span></span><span class="line"><span class="ln"> 42</span><span class="cl"><span class="k">object</span> <span class="nc">TableApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">toSkipPrint</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TO_SKIP_PRINT&#34;</span><span class="p">)</span><span class="o">?.</span><span class="n">toBoolean</span><span class="p">()</span> <span class="o">?:</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;kafka-1:19092&#34;</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">inputTopicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-avro&#34;</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryUrl</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;http://schema:8081&#34;</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryConfig</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">        <span class="n">mapOf</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">            <span class="s2">&#34;basic.auth.credentials.source&#34;</span> <span class="n">to</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">            <span class="s2">&#34;basic.auth.user.info&#34;</span> <span class="n">to</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">INPUT</span><span class="n">_SCHEMA_SUBJECT</span> <span class="p">=</span> <span class="s2">&#34;orders-avro-value&#34;</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">NUM</span><span class="n">_PARTITIONS</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">REPLICATION</span><span class="n">_FACTOR</span><span class="p">:</span> <span class="n">Short</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">    <span class="c1">// ObjectMapper for converting late data Map to JSON
</span></span></span><span class="line"><span class="ln"> 58</span><span class="cl"><span class="c1"></span>    <span class="k">private</span> <span class="k">val</span> <span class="py">objectMapper</span><span class="p">:</span> <span class="n">ObjectMapper</span> <span class="k">by</span> <span class="n">lazy</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">        <span class="n">ObjectMapper</span><span class="p">().</span><span class="n">registerKotlinModule</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">        <span class="c1">// Create output topics if not existing
</span></span></span><span class="line"><span class="ln"> 64</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">outputTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-ktl-stats&#34;</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">        <span class="k">val</span> <span class="py">skippedTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-ktl-skipped&#34;</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">        <span class="n">listOf</span><span class="p">(</span><span class="n">outputTopicName</span><span class="p">,</span> <span class="n">skippedTopicName</span><span class="p">).</span><span class="n">forEach</span> <span class="p">{</span> <span class="n">name</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">            <span class="n">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">                <span class="n">name</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">                <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">                <span class="n">NUM_PARTITIONS</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">                <span class="n">REPLICATION_FACTOR</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">        <span class="k">val</span> <span class="py">env</span> <span class="p">=</span> <span class="nc">StreamExecutionEnvironment</span><span class="p">.</span><span class="n">getExecutionEnvironment</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">        <span class="n">env</span><span class="p">.</span><span class="n">parallelism</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">        <span class="k">val</span> <span class="py">tEnv</span> <span class="p">=</span> <span class="nc">StreamTableEnvironment</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">        <span class="k">val</span> <span class="py">inputAvroSchema</span> <span class="p">=</span> <span class="n">getLatestSchema</span><span class="p">(</span><span class="n">INPUT_SCHEMA_SUBJECT</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">,</span> <span class="n">registryConfig</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">        <span class="k">val</span> <span class="py">ordersGenericRecordSource</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">            <span class="n">createOrdersSource</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                <span class="n">topic</span> <span class="p">=</span> <span class="n">inputTopicName</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">                <span class="n">groupId</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-flink-tl&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">                <span class="n">bootstrapAddress</span> <span class="p">=</span> <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">                <span class="n">registryUrl</span> <span class="p">=</span> <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">                <span class="n">registryConfig</span> <span class="p">=</span> <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">                <span class="n">schema</span> <span class="p">=</span> <span class="n">inputAvroSchema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">        <span class="c1">// 1. Stream of GenericRecords from Kafka
</span></span></span><span class="line"><span class="ln"> 91</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">genericRecordStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">            <span class="n">env</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">                <span class="p">.</span><span class="n">fromSource</span><span class="p">(</span><span class="n">ordersGenericRecordSource</span><span class="p">,</span> <span class="nc">WatermarkStrategy</span><span class="p">.</span><span class="n">noWatermarks</span><span class="p">(),</span> <span class="s2">&#34;KafkaGenericRecordSource&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">        <span class="c1">// 2. Convert GenericRecord to Map&lt;String, Any?&gt; (RecordMap)
</span></span></span><span class="line"><span class="ln"> 96</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">recordMapStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">            <span class="n">genericRecordStream</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">genericRecord</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">                    <span class="k">val</span> <span class="py">map</span> <span class="p">=</span> <span class="n">mutableMapOf</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Any</span><span class="p">?&gt;()</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">                    <span class="n">genericRecord</span><span class="p">.</span><span class="n">schema</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">forEach</span> <span class="p">{</span> <span class="k">field</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">                        <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">genericRecord</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">                        <span class="n">map</span><span class="p">[</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">()]</span> <span class="p">=</span> <span class="k">if</span> <span class="p">(</span><span class="k">value</span> <span class="k">is</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">avro</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">Utf8</span><span class="p">)</span> <span class="k">value</span><span class="p">.</span><span class="n">toString</span><span class="p">()</span> <span class="k">else</span> <span class="k">value</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">                    <span class="n">map</span> <span class="k">as</span> <span class="n">RecordMap</span> <span class="c1">// Cast to type alias
</span></span></span><span class="line"><span class="ln">105</span><span class="cl"><span class="c1"></span>                <span class="p">}.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;GenericRecordToMapConverter&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">                <span class="p">.</span><span class="n">returns</span><span class="p">(</span><span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;()</span> <span class="p">{}))</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">
</span></span><span class="line"><span class="ln">108</span><span class="cl">        <span class="c1">// 3. Define OutputTag for late data (now carrying RecordMap)
</span></span></span><span class="line"><span class="ln">109</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">lateMapOutputTag</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">            <span class="n">OutputTag</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">111</span><span class="cl">                <span class="s2">&#34;late-order-records&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">                <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;()</span> <span class="p">{}),</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">
</span></span><span class="line"><span class="ln">115</span><span class="cl">        <span class="c1">// 4. Split late records from on-time ones
</span></span></span><span class="line"><span class="ln">116</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">statsStreamOperator</span><span class="p">:</span> <span class="n">SingleOutputStreamOperator</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl">            <span class="n">recordMapStream</span>
</span></span><span class="line"><span class="ln">118</span><span class="cl">                <span class="p">.</span><span class="n">assignTimestampsAndWatermarks</span><span class="p">(</span><span class="nc">SupplierWatermarkStrategy</span><span class="p">.</span><span class="n">strategy</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">119</span><span class="cl">                <span class="p">.</span><span class="n">process</span><span class="p">(</span><span class="n">LateDataRouter</span><span class="p">(</span><span class="n">lateMapOutputTag</span><span class="p">,</span> <span class="n">allowedLatenessMillis</span> <span class="p">=</span> <span class="m">5000</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">120</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;LateDataRouter&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">121</span><span class="cl">
</span></span><span class="line"><span class="ln">122</span><span class="cl">        <span class="c1">// 5. Create source table (statsTable)
</span></span></span><span class="line"><span class="ln">123</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">statsStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span> <span class="n">statsStreamOperator</span>
</span></span><span class="line"><span class="ln">124</span><span class="cl">        <span class="k">val</span> <span class="py">rowStatsStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">Row</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">125</span><span class="cl">            <span class="n">statsStream</span>
</span></span><span class="line"><span class="ln">126</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">recordMap</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">127</span><span class="cl">                    <span class="k">val</span> <span class="py">orderId</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;order_id&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">128</span><span class="cl">                    <span class="k">val</span> <span class="py">price</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;price&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">Double</span>
</span></span><span class="line"><span class="ln">129</span><span class="cl">                    <span class="k">val</span> <span class="py">item</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;item&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">130</span><span class="cl">                    <span class="k">val</span> <span class="py">supplier</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;supplier&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">131</span><span class="cl">
</span></span><span class="line"><span class="ln">132</span><span class="cl">                    <span class="k">val</span> <span class="py">bidTimeString</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;bid_time&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">133</span><span class="cl">                    <span class="k">var</span> <span class="py">bidTimeInstant</span><span class="p">:</span> <span class="n">Instant</span><span class="p">?</span> <span class="p">=</span> <span class="k">null</span> <span class="c1">// Changed from bidTimeLong to bidTimeInstant
</span></span></span><span class="line"><span class="ln">134</span><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="ln">135</span><span class="cl">                    <span class="k">if</span> <span class="p">(</span><span class="n">bidTimeString</span> <span class="o">!=</span> <span class="k">null</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">136</span><span class="cl">                        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">137</span><span class="cl">                            <span class="k">val</span> <span class="py">formatter</span> <span class="p">=</span> <span class="nc">DateTimeFormatter</span><span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">138</span><span class="cl">                            <span class="k">val</span> <span class="py">localDateTime</span> <span class="p">=</span> <span class="nc">LocalDateTime</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">bidTimeString</span><span class="p">,</span> <span class="n">formatter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">139</span><span class="cl">                            <span class="c1">// Convert to Instant
</span></span></span><span class="line"><span class="ln">140</span><span class="cl"><span class="c1"></span>                            <span class="n">bidTimeInstant</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">141</span><span class="cl">                                <span class="n">localDateTime</span>
</span></span><span class="line"><span class="ln">142</span><span class="cl">                                    <span class="p">.</span><span class="n">atZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">())</span> <span class="c1">// Or ZoneOffset.UTC
</span></span></span><span class="line"><span class="ln">143</span><span class="cl"><span class="c1"></span>                                    <span class="p">.</span><span class="n">toInstant</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">144</span><span class="cl">                        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">DateTimeParseException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">145</span><span class="cl">                            <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to parse bid_time string &#39;</span><span class="si">$bidTimeString</span><span class="s2">&#39;. RecordMap: </span><span class="si">$recordMap</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">146</span><span class="cl">                        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">147</span><span class="cl">                            <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Unexpected error parsing bid_time string &#39;</span><span class="si">$bidTimeString</span><span class="s2">&#39;. RecordMap: </span><span class="si">$recordMap</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">148</span><span class="cl">                        <span class="p">}</span>
</span></span><span class="line"><span class="ln">149</span><span class="cl">                    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">150</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;bid_time string is null in RecordMap: </span><span class="si">$recordMap</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">151</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">152</span><span class="cl">
</span></span><span class="line"><span class="ln">153</span><span class="cl">                    <span class="nc">Row</span><span class="p">.</span><span class="n">of</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">154</span><span class="cl">                        <span class="n">orderId</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">155</span><span class="cl">                        <span class="n">bidTimeInstant</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">156</span><span class="cl">                        <span class="n">price</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">157</span><span class="cl">                        <span class="n">item</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">158</span><span class="cl">                        <span class="n">supplier</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">159</span><span class="cl">                    <span class="p">)</span>
</span></span><span class="line"><span class="ln">160</span><span class="cl">                <span class="p">}.</span><span class="n">returns</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">161</span><span class="cl">                    <span class="n">RowTypeInfo</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">162</span><span class="cl">                        <span class="n">arrayOf</span><span class="p">&lt;</span><span class="n">TypeInformation</span><span class="p">&lt;*&gt;&gt;(</span>
</span></span><span class="line"><span class="ln">163</span><span class="cl">                            <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">164</span><span class="cl">                            <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="n">Instant</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span> <span class="c1">// bid_time (as Long milliseconds for TIMESTAMP_LTZ)
</span></span></span><span class="line"><span class="ln">165</span><span class="cl"><span class="c1"></span>                            <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="n">Double</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">166</span><span class="cl">                            <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">167</span><span class="cl">                            <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">168</span><span class="cl">                        <span class="p">),</span>
</span></span><span class="line"><span class="ln">169</span><span class="cl">                        <span class="n">arrayOf</span><span class="p">(</span><span class="s2">&#34;order_id&#34;</span><span class="p">,</span> <span class="s2">&#34;bid_time&#34;</span><span class="p">,</span> <span class="s2">&#34;price&#34;</span><span class="p">,</span> <span class="s2">&#34;item&#34;</span><span class="p">,</span> <span class="s2">&#34;supplier&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">170</span><span class="cl">                    <span class="p">),</span>
</span></span><span class="line"><span class="ln">171</span><span class="cl">                <span class="p">).</span><span class="n">assignTimestampsAndWatermarks</span><span class="p">(</span><span class="nc">RowWatermarkStrategy</span><span class="p">.</span><span class="n">strategy</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">172</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;MapToRowConverter&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">173</span><span class="cl">        <span class="k">val</span> <span class="py">tableSchema</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">174</span><span class="cl">            <span class="n">Schema</span>
</span></span><span class="line"><span class="ln">175</span><span class="cl">                <span class="p">.</span><span class="n">newBuilder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">176</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;order_id&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">177</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;bid_time&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">TIMESTAMP_LTZ</span><span class="p">(</span><span class="m">3</span><span class="p">))</span> <span class="c1">// Event time attribute
</span></span></span><span class="line"><span class="ln">178</span><span class="cl"><span class="c1"></span>                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;price&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">DOUBLE</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">179</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;item&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">180</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">181</span><span class="cl">                <span class="p">.</span><span class="n">watermark</span><span class="p">(</span><span class="s2">&#34;bid_time&#34;</span><span class="p">,</span> <span class="s2">&#34;SOURCE_WATERMARK()&#34;</span><span class="p">)</span> <span class="c1">// Use watermarks from DataStream
</span></span></span><span class="line"><span class="ln">182</span><span class="cl"><span class="c1"></span>                <span class="p">.</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">183</span><span class="cl">        <span class="n">tEnv</span><span class="p">.</span><span class="n">createTemporaryView</span><span class="p">(</span><span class="s2">&#34;orders&#34;</span><span class="p">,</span> <span class="n">rowStatsStream</span><span class="p">,</span> <span class="n">tableSchema</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">184</span><span class="cl">
</span></span><span class="line"><span class="ln">185</span><span class="cl">        <span class="k">val</span> <span class="py">statsTable</span><span class="p">:</span> <span class="n">Table</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">186</span><span class="cl">            <span class="n">tEnv</span>
</span></span><span class="line"><span class="ln">187</span><span class="cl">                <span class="p">.</span><span class="n">from</span><span class="p">(</span><span class="s2">&#34;orders&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">188</span><span class="cl">                <span class="p">.</span><span class="n">window</span><span class="p">(</span><span class="nc">Tumble</span><span class="p">.</span><span class="n">over</span><span class="p">(</span><span class="n">lit</span><span class="p">(</span><span class="m">5</span><span class="p">).</span><span class="n">seconds</span><span class="p">()).</span><span class="n">on</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s2">&#34;bid_time&#34;</span><span class="p">)).</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;w&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">189</span><span class="cl">                <span class="p">.</span><span class="n">groupBy</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s2">&#34;supplier&#34;</span><span class="p">),</span> <span class="n">col</span><span class="p">(</span><span class="s2">&#34;w&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">190</span><span class="cl">                <span class="p">.</span><span class="n">select</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">191</span><span class="cl">                    <span class="n">col</span><span class="p">(</span><span class="s2">&#34;supplier&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">192</span><span class="cl">                    <span class="n">col</span><span class="p">(</span><span class="s2">&#34;w&#34;</span><span class="p">).</span><span class="n">start</span><span class="p">().</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;window_start&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">193</span><span class="cl">                    <span class="n">col</span><span class="p">(</span><span class="s2">&#34;w&#34;</span><span class="p">).</span><span class="n">end</span><span class="p">().</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;window_end&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">194</span><span class="cl">                    <span class="n">col</span><span class="p">(</span><span class="s2">&#34;price&#34;</span><span class="p">).</span><span class="n">sum</span><span class="p">().</span><span class="n">round</span><span class="p">(</span><span class="m">2</span><span class="p">).</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;total_price&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">195</span><span class="cl">                    <span class="n">col</span><span class="p">(</span><span class="s2">&#34;order_id&#34;</span><span class="p">).</span><span class="n">count</span><span class="p">().</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;count&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">196</span><span class="cl">                <span class="p">)</span>
</span></span><span class="line"><span class="ln">197</span><span class="cl">
</span></span><span class="line"><span class="ln">198</span><span class="cl">        <span class="c1">// 6. Create sink table
</span></span></span><span class="line"><span class="ln">199</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">sinkSchema</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">200</span><span class="cl">            <span class="n">Schema</span>
</span></span><span class="line"><span class="ln">201</span><span class="cl">                <span class="p">.</span><span class="n">newBuilder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">202</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">203</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;window_start&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">TIMESTAMP</span><span class="p">(</span><span class="m">3</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">204</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;window_end&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">TIMESTAMP</span><span class="p">(</span><span class="m">3</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">205</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;total_price&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">DOUBLE</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">206</span><span class="cl">                <span class="p">.</span><span class="n">column</span><span class="p">(</span><span class="s2">&#34;count&#34;</span><span class="p">,</span> <span class="nc">DataTypes</span><span class="p">.</span><span class="n">BIGINT</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">207</span><span class="cl">                <span class="p">.</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">208</span><span class="cl">        <span class="k">val</span> <span class="py">kafkaSinkDescriptor</span><span class="p">:</span> <span class="n">TableDescriptor</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">209</span><span class="cl">            <span class="n">TableDescriptor</span>
</span></span><span class="line"><span class="ln">210</span><span class="cl">                <span class="p">.</span><span class="n">forConnector</span><span class="p">(</span><span class="s2">&#34;kafka&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">211</span><span class="cl">                <span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="n">sinkSchema</span><span class="p">)</span> <span class="c1">// Set the schema for the sink
</span></span></span><span class="line"><span class="ln">212</span><span class="cl"><span class="c1"></span>                <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="nc">KafkaConnectorOptions</span><span class="p">.</span><span class="n">TOPIC</span><span class="p">,</span> <span class="n">listOf</span><span class="p">(</span><span class="n">outputTopicName</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">213</span><span class="cl">                <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="nc">KafkaConnectorOptions</span><span class="p">.</span><span class="n">PROPS_BOOTSTRAP_SERVERS</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">214</span><span class="cl">                <span class="p">.</span><span class="n">format</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">215</span><span class="cl">                    <span class="n">FormatDescriptor</span>
</span></span><span class="line"><span class="ln">216</span><span class="cl">                        <span class="p">.</span><span class="n">forFormat</span><span class="p">(</span><span class="s2">&#34;avro-confluent&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">217</span><span class="cl">                        <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&#34;url&#34;</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">218</span><span class="cl">                        <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&#34;basic-auth.credentials-source&#34;</span><span class="p">,</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">219</span><span class="cl">                        <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&#34;basic-auth.user-info&#34;</span><span class="p">,</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">220</span><span class="cl">                        <span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&#34;subject&#34;</span><span class="p">,</span> <span class="s2">&#34;</span><span class="si">$outputTopicName</span><span class="s2">-value&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">221</span><span class="cl">                        <span class="p">.</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">222</span><span class="cl">                <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">223</span><span class="cl">
</span></span><span class="line"><span class="ln">224</span><span class="cl">        <span class="c1">// 7. Handle late data as a pair of key and value
</span></span></span><span class="line"><span class="ln">225</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">lateDataMapStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span> <span class="n">statsStreamOperator</span><span class="p">.</span><span class="n">getSideOutput</span><span class="p">(</span><span class="n">lateMapOutputTag</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">226</span><span class="cl">        <span class="k">val</span> <span class="py">lateKeyPairStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">227</span><span class="cl">            <span class="n">lateDataMapStream</span>
</span></span><span class="line"><span class="ln">228</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">recordMap</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">229</span><span class="cl">                    <span class="k">val</span> <span class="py">mutableMap</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">.</span><span class="n">toMutableMap</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">230</span><span class="cl">                    <span class="n">mutableMap</span><span class="p">[</span><span class="s2">&#34;late&#34;</span><span class="p">]</span> <span class="p">=</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln">231</span><span class="cl">                    <span class="k">val</span> <span class="py">orderId</span> <span class="p">=</span> <span class="n">mutableMap</span><span class="p">[</span><span class="s2">&#34;order_id&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">232</span><span class="cl">                    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">233</span><span class="cl">                        <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">objectMapper</span><span class="p">.</span><span class="n">writeValueAsString</span><span class="p">(</span><span class="n">mutableMap</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">234</span><span class="cl">                        <span class="n">Pair</span><span class="p">(</span><span class="n">orderId</span><span class="p">,</span> <span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">235</span><span class="cl">                    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">236</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error serializing late RecordMap to JSON: </span><span class="si">$mutableMap</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">237</span><span class="cl">                        <span class="k">val</span> <span class="py">errorJson</span> <span class="p">=</span> <span class="s2">&#34;{ </span><span class="se">\&#34;</span><span class="s2">error</span><span class="se">\&#34;</span><span class="s2">: </span><span class="se">\&#34;</span><span class="s2">json_serialization_failed</span><span class="se">\&#34;</span><span class="s2">, </span><span class="se">\&#34;</span><span class="s2">data_keys</span><span class="se">\&#34;</span><span class="s2">: </span><span class="se">\&#34;</span><span class="s2">${</span>
</span></span><span class="line"><span class="ln">238</span><span class="cl">                            <span class="n">mutableMap</span><span class="p">.</span><span class="n">keys</span><span class="p">.</span><span class="n">joinToString</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">239</span><span class="cl">                                <span class="s2">&#34;,&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">240</span><span class="cl">                            <span class="p">)}</span><span class="err">\</span><span class="s2">&#34; }&#34;</span>
</span></span><span class="line"><span class="ln">241</span><span class="cl">                        <span class="n">Pair</span><span class="p">(</span><span class="n">orderId</span><span class="p">,</span> <span class="n">errorJson</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">242</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">243</span><span class="cl">                <span class="p">}.</span><span class="n">returns</span><span class="p">(</span><span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span> <span class="p">{}))</span>
</span></span><span class="line"><span class="ln">244</span><span class="cl">        <span class="k">val</span> <span class="py">skippedSink</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">245</span><span class="cl">            <span class="n">createSkippedSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">246</span><span class="cl">                <span class="n">topic</span> <span class="p">=</span> <span class="n">skippedTopicName</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">247</span><span class="cl">                <span class="n">bootstrapAddress</span> <span class="p">=</span> <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">248</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln">249</span><span class="cl">
</span></span><span class="line"><span class="ln">250</span><span class="cl">        <span class="k">if</span> <span class="p">(!</span><span class="n">toSkipPrint</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">251</span><span class="cl">            <span class="n">tEnv</span>
</span></span><span class="line"><span class="ln">252</span><span class="cl">                <span class="p">.</span><span class="n">toDataStream</span><span class="p">(</span><span class="n">statsTable</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">253</span><span class="cl">                <span class="p">.</span><span class="n">print</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">254</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;SupplierStatsPrint&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">255</span><span class="cl">            <span class="n">lateKeyPairStream</span>
</span></span><span class="line"><span class="ln">256</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="k">it</span><span class="p">.</span><span class="n">second</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">257</span><span class="cl">                <span class="p">.</span><span class="n">print</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">258</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;LateDataPrint&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">259</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">260</span><span class="cl">
</span></span><span class="line"><span class="ln">261</span><span class="cl">        <span class="n">statsTable</span><span class="p">.</span><span class="n">executeInsert</span><span class="p">(</span><span class="n">kafkaSinkDescriptor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">262</span><span class="cl">        <span class="n">lateKeyPairStream</span><span class="p">.</span><span class="n">sinkTo</span><span class="p">(</span><span class="n">skippedSink</span><span class="p">).</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;LateDataSink&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">263</span><span class="cl">        <span class="n">env</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&#34;SupplierStats&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">264</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">265</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="application-entry-point" data-numberify>Application Entry Point<a class="anchor ms-1" href="#application-entry-point"></a></h3>
<p>The <code>Main.kt</code> file serves as the entry point for the application. It parses a command-line argument (<code>datastream</code> or <code>table</code>) to determine which Flink application to run. A <code>try-catch</code> block ensures that any fatal error during execution is logged before the application exits.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.system.exitProcess</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">fun</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span> <span class="n">Array</span><span class="p">&lt;</span><span class="n">String</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">when</span> <span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">getOrNull</span><span class="p">(</span><span class="m">0</span><span class="p">)</span><span class="o">?.</span><span class="n">lowercase</span><span class="p">())</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;datastream&#34;</span> <span class="o">-&gt;</span> <span class="nc">DataStreamApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;table&#34;</span> <span class="o">-&gt;</span> <span class="nc">TableApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="k">else</span> <span class="o">-&gt;</span> <span class="n">println</span><span class="p">(</span><span class="s2">&#34;Usage: &lt;datastream | table&gt;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Fatal error in </span><span class="si">${args.getOrNull(0) ?: &#34;app&#34;}</span><span class="s2">. Shutting down.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">exitProcess</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="run-flink-application" data-numberify>Run Flink Application<a class="anchor ms-1" href="#run-flink-application"></a></h2>
<p>As with the DataStream job in the <a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">previous post</a>, running the Table API application involves setting up a local Kafka environment from the <a href="https://github.com/factorhouse/factorhouse-local" target="_blank" rel="noopener noreferrer">Factor House Local<i class="fas fa-external-link-square-alt ms-1"></i></a> project, starting the data producer, and then launching the Flink job with the correct argument.</p>

<h3 id="factor-house-local-setup" data-numberify>Factor House Local Setup<a class="anchor ms-1" href="#factor-house-local-setup"></a></h3>
<p>To set up your local Kafka environment, follow these steps:</p>
<ol>
<li>Clone the Factor House Local repository:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> factorhouse-local
</span></span></code></pre></div></li>
<li>Ensure your Kpow community license is configured (see the <a href="https://github.com/factorhouse/factorhouse-local?tab=readme-ov-file#update-kpow-and-flex-licenses" target="_blank" rel="noopener noreferrer">README<i class="fas fa-external-link-square-alt ms-1"></i></a> for details).</li>
<li>Start the services:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose -f compose-kpow-community.yml up -d
</span></span></code></pre></div></li>
</ol>
<p>Once initialized, Kpow will be accessible at <code>http://localhost:3000</code>, showing Kafka brokers, schema registry, and other components.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/kpow-overview.png" loading="lazy" width="1414" height="818" />
</picture>

</p>

<h3 id="start-the-kafka-order-producer" data-numberify>Start the Kafka Order Producer<a class="anchor ms-1" href="#start-the-kafka-order-producer"></a></h3>
<p>Next, start the Kafka data producer from the <code>orders-avro-clients</code> project (developed in <a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/">Part 2 of this series</a>) to populate the <code>orders-avro</code> topic. To properly test the application&rsquo;s handling of late events, it&rsquo;s crucial to run the producer with a randomized delay (up to 30 seconds).</p>
<p>Navigate to the producer&rsquo;s project directory (<code>orders-avro-clients</code>) in the <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a> and execute the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Assuming you are in the root of the &#39;orders-avro-clients&#39; project</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nv">DELAY_SECONDS</span><span class="o">=</span><span class="m">30</span> ./gradlew run --args<span class="o">=</span><span class="s2">&#34;producer&#34;</span>
</span></span></code></pre></div><p>This will start populating the <code>orders-avro</code> topic with Avro-encoded order messages. You can inspect these messages in Kpow. Ensure Kpow is configured with Key Deserializer: <em>String</em>, Value Deserializer: <em>AVRO</em>, and Schema Registry: <em>Local Schema Registry</em>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/orders-01.png" loading="lazy" width="1200" height="670" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/orders-02.png" loading="lazy" width="1199" height="663" />
</picture>

</p>

<h3 id="launch-the-flink-application" data-numberify>Launch the Flink Application<a class="anchor ms-1" href="#launch-the-flink-application"></a></h3>
<p>With the data pipeline ready, navigate to the <code>orders-stats-flink</code> project directory of the <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>. This time, we&rsquo;ll launch the job by providing <code>table</code> as the command-line argument to trigger the declarative, Table API-based logic.</p>
<p>The application can be run in two main ways:</p>
<ol>
<li><strong>With Gradle (Development Mode)</strong>: Ideal for development and quick testing.
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;table&#34;</span>
</span></span></code></pre></div></li>
<li><strong>Running the Shadow JAR (Deployment Mode)</strong>: For deploying the application as a standalone unit. First, build the fat JAR:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew shadowJar
</span></span></code></pre></div>This creates <code>build/libs/orders-stats-flink-1.0.jar</code>. Then run it:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">java --add-opens<span class="o">=</span>java.base/java.util<span class="o">=</span>ALL-UNNAMED <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  -jar build/libs/orders-stats-flink-1.0.jar table
</span></span></code></pre></div></li>
</ol>
<blockquote>
<p>💡 To build and run the application locally, ensure that <strong>JDK 17</strong> is installed.</p>
</blockquote>
<p>For this demonstration, we&rsquo;ll use Gradle to run the application in development mode. Upon starting, you&rsquo;ll see logs indicating the Flink application has initialized and is processing records from the <code>orders-avro</code> topic.</p>

<h3 id="observing-the-output" data-numberify>Observing the Output<a class="anchor ms-1" href="#observing-the-output"></a></h3>
<p>Our Flink application produces results to two topics:</p>
<ul>
<li><code>orders-avro-ktl-stats</code>: Contains the aggregated supplier statistics as Avro records.</li>
<li><code>orders-avro-ktl-skipped</code>: Contains records identified as &ldquo;late,&rdquo; serialized as JSON.</li>
</ul>
<p><strong>1. Supplier Statistics (<code>orders-avro-ktl-stats</code>):</strong></p>
<p>In Kpow, navigate to the <code>orders-avro-ktl-stats</code> topic. Configure Kpow to view these messages:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>AVRO</em></li>
<li><strong>Schema Registry:</strong> <em>Local Schema Registry</em></li>
</ul>
<p>You should see <code>SupplierStats</code> messages, each representing the total price and count of orders for a supplier within a 5-second (or 5000 millisecond) window. Notice the <code>window_start</code> and <code>window_end</code> fields.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/stats-01.png" loading="lazy" width="1198" height="674" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/stats-02.png" loading="lazy" width="1198" height="623" />
</picture>

</p>
<p><strong>2. Skipped (Late) Records (<code>orders-avro-ktl-skipped</code>):</strong></p>
<p>Next, inspect the <code>orders-avro-ktl-skipped</code> topic in Kpow. Configure Kpow as follows:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>JSON</em></li>
</ul>
<p>These records were intercepted and rerouted by our custom <code>LateDataRouter</code> <code>ProcessFunction</code>. This manual step was necessary to separate late data before converting the stream to a <code>Table</code>, demonstrating a powerful pattern of blending Flink&rsquo;s APIs to solve complex requirements.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/skipped-01.png" loading="lazy" width="1198" height="607" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-17-kotlin-getting-started-flink-table/skipped-02.png" loading="lazy" width="1201" height="703" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>In this final post of our series, we&rsquo;ve demonstrated the power and simplicity of Flink&rsquo;s Table API for real-time analytics. We successfully built a pipeline that produced the same supplier statistics as our previous examples, but with a more concise and declarative query. We&rsquo;ve seen how to define a table schema, apply event-time windowing with SQL-like expressions, and seamlessly bridge between the DataStream and Table APIs to implement custom logic like late-data routing. This journey, from basic Kafka clients to Kafka Streams and finally to the versatile APIs of Flink, illustrates the rich ecosystem available for building modern, real-time data applications in Kotlin. Flink&rsquo;s Table API, in particular, proves to be an invaluable tool for analysts and developers who need to perform complex analytics on data in motion.</p>
      ]]></content:encoded></item><item><title>Flink DataStream API - Scalable Event Processing for Supplier Stats</title><link>https://jaehyeon.me/blog/2025-06-10-kotlin-getting-started-flink-datastream/</link><guid>https://jaehyeon.me/blog/2025-06-10-kotlin-getting-started-flink-datastream/</guid><pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>Building on our exploration of stream processing, we now transition from Kafka&rsquo;s native library to <strong>Apache Flink</strong>, a powerful, general-purpose distributed processing engine. In this post, we&rsquo;ll dive into Flink&rsquo;s foundational <strong>DataStream API</strong>. We will tackle the same supplier statistics problem - analyzing a stream of Avro-formatted order events - but this time using Flink&rsquo;s robust features for stateful computation. This example will highlight Flink&rsquo;s sophisticated event-time processing with watermarks and its elegant, built-in mechanisms for handling late-arriving data through side outputs.</p>
      ]]></description><content:encoded><![CDATA[
        <p>Building on our exploration of stream processing, we now transition from Kafka&rsquo;s native library to <strong>Apache Flink</strong>, a powerful, general-purpose distributed processing engine. In this post, we&rsquo;ll dive into Flink&rsquo;s foundational <strong>DataStream API</strong>. We will tackle the same supplier statistics problem - analyzing a stream of Avro-formatted order events - but this time using Flink&rsquo;s robust features for stateful computation. This example will highlight Flink&rsquo;s sophisticated event-time processing with watermarks and its elegant, built-in mechanisms for handling late-arriving data through side outputs.</p>
<ul>
<li><a href="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients">Kafka Clients with JSON - Producing and Consuming Order Events</a></li>
<li><a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients">Kafka Clients with Avro - Schema Registry and Order Events</a></li>
<li><a href="/blog/2025-06-03-kotlin-getting-started-kafka-streams">Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream/#">Flink DataStream API - Scalable Event Processing for Supplier Stats</a> (this post)</li>
<li><a href="/blog/2025-06-17-kotlin-getting-started-flink-table">Flink Table API - Declarative Analytics for Supplier Stats in Real Time</a></li>
</ul>

<h2 id="flink-datastream-application" data-numberify>Flink DataStream Application<a class="anchor ms-1" href="#flink-datastream-application"></a></h2>
<p>We develop a Flink DataStream application designed for scalable, real-time event processing. The application:</p>
<ul>
<li>Consumes Avro-formatted order events from a Kafka topic.</li>
<li>Assigns event-time timestamps and watermarks to handle out-of-order data.</li>
<li>Aggregates order data into 5-second tumbling windows to calculate total price and order counts for each supplier.</li>
<li>Leverages Flink&rsquo;s side-output mechanism to gracefully handle and route late-arriving records to a separate topic.</li>
<li>Serializes the resulting supplier statistics and late records back to Kafka, using Avro and JSON respectively.</li>
</ul>
<p>The source code for the application discussed in this post can befound in the <em>orders-stats-flink</em> folder of this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="the-build-configuration" data-numberify>The Build Configuration<a class="anchor ms-1" href="#the-build-configuration"></a></h3>
<p>The <code>build.gradle.kts</code> file sets up the project, its dependencies, and packaging. It&rsquo;s shared between the DataStream and Table API applications - The Flink application that uses the Table API will be covered in the next post.</p>
<ul>
<li><strong>Plugins:</strong>
<ul>
<li><code>kotlin(&quot;jvm&quot;)</code>: Enables Kotlin language support.</li>
<li><code>com.github.davidmc24.gradle.plugin.avro</code>: Compiles Avro schemas into Java classes.</li>
<li><code>com.github.johnrengelman.shadow</code>: Creates an executable &ldquo;fat JAR&rdquo; with all dependencies.</li>
<li><code>application</code>: Configures the project to be runnable via Gradle.</li>
</ul>
</li>
<li><strong>Dependencies:</strong>
<ul>
<li><strong>Flink Core &amp; APIs:</strong> <code>flink-streaming-java</code>, <code>flink-clients</code>.</li>
<li><strong>Flink Connectors:</strong> <code>flink-connector-kafka</code> for Kafka integration.</li>
<li><strong>Flink Formats:</strong> <code>flink-avro</code> and <code>flink-avro-confluent-registry</code> for handling Avro data with Confluent Schema Registry.</li>
<li><strong>Note on Dependency Scope:</strong> The Flink dependencies are declared with <code>implementation</code>. This allows the application to be run directly with <code>./gradlew run</code>. For production deployments on a Flink cluster (where the Flink runtime is already provided), these dependencies should be changed to <code>compileOnly</code> to significantly reduce the size of the final JAR.</li>
</ul>
</li>
<li><strong>Application Configuration:</strong>
<ul>
<li>The <code>application</code> block sets the <code>mainClass</code> and passes necessary JVM arguments for Flink&rsquo;s runtime. The <code>run</code> task is configured with environment variables to specify Kafka and Schema Registry connection details.</li>
</ul>
</li>
<li><strong>Avro &amp; Shadow JAR:</strong>
<ul>
<li>The <code>avro</code> block configures code generation.</li>
<li>The <code>shadowJar</code> task configures the output JAR name and merges service files, which is crucial for Flink connectors to work correctly.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">plugins</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;jvm&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;2.1.20&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.davidmc24.gradle.plugin.avro&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;1.9.1&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.johnrengelman.shadow&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;8.1.1&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">application</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">group</span> <span class="p">=</span> <span class="s2">&#34;me.jaehyeon&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">version</span> <span class="p">=</span> <span class="s2">&#34;1.0-SNAPSHOT&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">repositories</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">mavenCentral</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">maven</span><span class="p">(</span><span class="s2">&#34;https://packages.confluent.io/maven&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// Flink Core and APIs
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-streaming-java:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-api-java:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-api-java-bridge:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-planner-loader:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-table-runtime:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-clients:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-connector-base:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="c1">// Flink Kafka and Avro
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-connector-kafka:3.4.0-1.20&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-avro:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.flink:flink-avro-confluent-registry:1.20.1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">    <span class="c1">// Json
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;com.fasterxml.jackson.module:jackson-module-kotlin:2.13.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">    <span class="c1">// Logging
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.github.microutils:kotlin-logging-jvm:3.0.5&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;ch.qos.logback:logback-classic:1.5.13&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="c1">// Kotlin test
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;test&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">
</span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="n">kotlin</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="n">jvmToolchain</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="n">application</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">    <span class="n">mainClass</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;me.jaehyeon.MainKt&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">applicationDefaultJvmArgs</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">        <span class="n">listOf</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">            <span class="s2">&#34;--add-opens=java.base/java.util=ALL-UNNAMED&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">
</span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="n">avro</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">    <span class="n">setCreateSetters</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">    <span class="n">setFieldVisibility</span><span class="p">(</span><span class="s2">&#34;PUBLIC&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">
</span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;compileKotlin&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;generateAvroJava&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">
</span></span><span class="line"><span class="ln">59</span><span class="cl"><span class="n">sourceSets</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="n">named</span><span class="p">(</span><span class="s2">&#34;main&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">        <span class="n">java</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;build/generated/avro/main&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">        <span class="n">kotlin</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;src/main/kotlin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">
</span></span><span class="line"><span class="ln">66</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">withType</span><span class="p">&lt;</span><span class="n">com</span><span class="p">.</span><span class="n">github</span><span class="p">.</span><span class="n">jengelman</span><span class="p">.</span><span class="n">gradle</span><span class="p">.</span><span class="n">plugins</span><span class="p">.</span><span class="n">shadow</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">ShadowJar</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">    <span class="n">archiveBaseName</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;orders-stats-flink&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">    <span class="n">archiveClassifier</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">    <span class="n">archiveVersion</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">    <span class="n">mergeServiceFiles</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">
</span></span><span class="line"><span class="ln">73</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;build&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">76</span><span class="cl">
</span></span><span class="line"><span class="ln">77</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">&lt;</span><span class="n">JavaExec</span><span class="p">&gt;(</span><span class="s2">&#34;run&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">78</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;TO_SKIP_PRINT&#34;</span><span class="p">,</span> <span class="s2">&#34;false&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">79</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">,</span> <span class="s2">&#34;localhost:9092&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">,</span> <span class="s2">&#34;http://localhost:8081&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">81</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">
</span></span><span class="line"><span class="ln">83</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">test</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl">    <span class="n">useJUnitPlatform</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">85</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="avro-schema-for-supplier-statistics" data-numberify>Avro Schema for Supplier Statistics<a class="anchor ms-1" href="#avro-schema-for-supplier-statistics"></a></h3>
<p>The <code>SupplierStats.avsc</code> file defines the structure for the aggregated output data. This schema is used by the Flink Kafka sink to serialize the <code>SupplierStats</code> objects into Avro format, ensuring type safety and enabling schema evolution for downstream consumers.</p>
<ul>
<li><strong>Type:</strong> A <code>record</code> named <code>SupplierStats</code> in the <code>me.jaehyeon.avro</code> namespace.</li>
<li><strong>Fields:</strong>
<ul>
<li><code>window_start</code> and <code>window_end</code> (string): The start and end times of the aggregation window.</li>
<li><code>supplier</code> (string): The supplier being aggregated.</li>
<li><code>total_price</code> (double): The sum of order prices within the window.</li>
<li><code>count</code> (long): The total number of orders within the window.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;record&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;SupplierStats&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="nt">&#34;namespace&#34;</span><span class="p">:</span> <span class="s2">&#34;me.jaehyeon.avro&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nt">&#34;fields&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_start&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_end&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;total_price&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;double&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;count&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;long&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="shared-utilities" data-numberify>Shared Utilities<a class="anchor ms-1" href="#shared-utilities"></a></h3>
<p>These utility files provide common functionality used by both the DataStream and Table API applications.</p>

<h4 id="kafka-admin-utilities" data-numberify>Kafka Admin Utilities<a class="anchor ms-1" href="#kafka-admin-utilities"></a></h4>
<p>This file provides two key helper functions for interacting with the Kafka ecosystem:</p>
<ul>
<li><strong><code>createTopicIfNotExists(...)</code></strong>: Uses Kafka&rsquo;s <code>AdminClient</code> to programmatically create topics. It&rsquo;s designed to be idempotent, safely handling cases where the topic already exists to prevent application startup failures.</li>
<li><strong><code>getLatestSchema(...)</code></strong>: Connects to the Confluent Schema Registry using <code>CachedSchemaRegistryClient</code> to fetch the latest Avro schema for a given subject. This is essential for the Flink source to correctly deserialize incoming Avro records without hardcoding the schema in the application.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.Schema</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClient</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClientConfig</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.NewTopic</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.TopicExistsException</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.use</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">fun</span> <span class="nf">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">topicName</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">numPartitions</span><span class="p">:</span> <span class="n">Int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">replicationFactor</span><span class="p">:</span> <span class="n">Short</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">
</span></span><span class="line"><span class="ln">30</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="k">val</span> <span class="py">newTopic</span> <span class="p">=</span> <span class="n">NewTopic</span><span class="p">(</span><span class="n">topicName</span><span class="p">,</span> <span class="n">numPartitions</span><span class="p">,</span> <span class="n">replicationFactor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">val</span> <span class="py">result</span> <span class="p">=</span> <span class="n">client</span><span class="p">.</span><span class="n">createTopics</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">newTopic</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">
</span></span><span class="line"><span class="ln">34</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Attempting to create topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">            <span class="n">result</span><span class="p">.</span><span class="n">all</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; created successfully!&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span> <span class="k">is</span> <span class="n">TopicExistsException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; was created concurrently or already existed. Continuing...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">                <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while creating a topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="k">fun</span> <span class="nf">getLatestSchema</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">    <span class="n">schemaSubject</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="p">):</span> <span class="n">Schema</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">    <span class="k">val</span> <span class="py">schemaRegistryClient</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">        <span class="n">CachedSchemaRegistryClient</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">            <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">            <span class="m">100</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">            <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">    <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Fetching latest schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39; from </span><span class="si">$registryUrl</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">        <span class="k">val</span> <span class="py">latestSchemaMetadata</span> <span class="p">=</span> <span class="n">schemaRegistryClient</span><span class="p">.</span><span class="n">getLatestSchemaMetadata</span><span class="p">(</span><span class="n">schemaSubject</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">            <span class="s2">&#34;Successfully fetched schema ID </span><span class="si">${latestSchemaMetadata.id}</span><span class="s2"> version </span><span class="si">${latestSchemaMetadata.version}</span><span class="s2"> for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39;&#34;</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">        <span class="k">return</span> <span class="nc">Schema</span><span class="p">.</span><span class="n">Parser</span><span class="p">().</span><span class="n">parse</span><span class="p">(</span><span class="n">latestSchemaMetadata</span><span class="p">.</span><span class="n">schema</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to retrieve schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39; from registry </span><span class="si">$registryUrl</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">        <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Failed to retrieve schema for subject &#39;</span><span class="si">$schemaSubject</span><span class="s2">&#39;&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="flink-kafka-connectors" data-numberify>Flink Kafka Connectors<a class="anchor ms-1" href="#flink-kafka-connectors"></a></h4>
<p>This file centralizes the creation of Flink&rsquo;s Kafka sources and sinks.</p>
<ul>
<li><strong><code>createOrdersSource(...)</code>:</strong> Configures a <code>KafkaSource</code> to consume <code>GenericRecord</code> Avro data. It uses <code>ConfluentRegistryAvroDeserializationSchema</code> to automatically deserialize messages using the schema from Confluent Schema Registry.</li>
<li><strong><code>createStatsSink(...)</code>:</strong> Configures a <code>KafkaSink</code> for the aggregated <code>SupplierStats</code>. It uses <code>ConfluentRegistryAvroSerializationSchema</code> to serialize the specific <code>SupplierStats</code> type and sets the Kafka message key to the supplier&rsquo;s name.</li>
<li><strong><code>createSkippedSink(...)</code>:</strong> Creates a generic <code>KafkaSink</code> for late records, which are handled as simple key-value string pairs.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.SupplierStats</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.Schema</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.base.DeliveryGuarantee</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.sink.KafkaSink</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.source.KafkaSource</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.formats.avro.registry.confluent.ConfluentRegistryAvroDeserializationSchema</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.formats.avro.registry.confluent.ConfluentRegistryAvroSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerConfig</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerConfig</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">import</span> <span class="nn">java.nio.charset.StandardCharsets</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">
</span></span><span class="line"><span class="ln"> 18</span><span class="cl"><span class="k">fun</span> <span class="nf">createOrdersSource</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">    <span class="n">groupId</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">    <span class="n">schema</span><span class="p">:</span> <span class="n">Schema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSource</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">    <span class="n">KafkaSource</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="p">.</span><span class="n">setTopics</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">        <span class="p">.</span><span class="n">setGroupId</span><span class="p">(</span><span class="n">groupId</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">        <span class="p">.</span><span class="n">setStartingOffsets</span><span class="p">(</span><span class="nc">OffsetsInitializer</span><span class="p">.</span><span class="n">earliest</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">        <span class="p">.</span><span class="n">setValueOnlyDeserializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">            <span class="nc">ConfluentRegistryAvroDeserializationSchema</span><span class="p">.</span><span class="n">forGeneric</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">                <span class="n">schema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">                <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">                <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">            <span class="p">),</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">        <span class="p">).</span><span class="n">setProperties</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">FETCH_MAX_WAIT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;500&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">
</span></span><span class="line"><span class="ln"> 44</span><span class="cl"><span class="k">fun</span> <span class="nf">createStatsSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">    <span class="n">registryUrl</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">    <span class="n">registryConfig</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">    <span class="n">outputSubject</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSink</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">    <span class="n">KafkaSink</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">        <span class="p">.</span><span class="n">setKafkaProducerConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">ENABLE_IDEMPOTENCE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;true&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">LINGER_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;100&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">BATCH_SIZE_CONFIG</span><span class="p">,</span> <span class="p">(</span><span class="m">64</span> <span class="p">*</span> <span class="m">1024</span><span class="p">).</span><span class="n">toString</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">COMPRESSION_TYPE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;lz4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">        <span class="p">).</span><span class="n">setDeliveryGuarantee</span><span class="p">(</span><span class="nc">DeliveryGuarantee</span><span class="p">.</span><span class="n">AT_LEAST_ONCE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">        <span class="p">.</span><span class="n">setRecordSerializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">            <span class="n">KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">                <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;()</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">                <span class="p">.</span><span class="n">setTopic</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">                <span class="p">.</span><span class="n">setKeySerializationSchema</span> <span class="p">{</span> <span class="k">value</span><span class="p">:</span> <span class="n">SupplierStats</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">                    <span class="k">value</span><span class="p">.</span><span class="n">supplier</span><span class="p">.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">                <span class="p">}.</span><span class="n">setValueSerializationSchema</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">                    <span class="nc">ConfluentRegistryAvroSerializationSchema</span><span class="p">.</span><span class="n">forSpecific</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;(</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">                        <span class="n">SupplierStats</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">                        <span class="n">outputSubject</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">                        <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">                        <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">                    <span class="p">),</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">                <span class="p">).</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">
</span></span><span class="line"><span class="ln"> 78</span><span class="cl"><span class="k">fun</span> <span class="nf">createSkippedSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">    <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl"><span class="p">):</span> <span class="n">KafkaSink</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">    <span class="n">KafkaSink</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">        <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">        <span class="p">.</span><span class="n">setBootstrapServers</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">        <span class="p">.</span><span class="n">setKafkaProducerConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">ENABLE_IDEMPOTENCE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;true&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">COMPRESSION_TYPE_CONFIG</span><span class="p">,</span> <span class="s2">&#34;lz4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">KEY_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.ByteArraySerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">                <span class="n">setProperty</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">VALUE_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.ByteArraySerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">        <span class="p">).</span><span class="n">setDeliveryGuarantee</span><span class="p">(</span><span class="nc">DeliveryGuarantee</span><span class="p">.</span><span class="n">AT_LEAST_ONCE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">        <span class="p">.</span><span class="n">setRecordSerializer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">            <span class="n">KafkaRecordSerializationSchema</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                <span class="p">.</span><span class="n">builder</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                <span class="p">.</span><span class="n">setTopic</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">                <span class="p">.</span><span class="n">setKeySerializationSchema</span> <span class="p">{</span> <span class="n">pair</span><span class="p">:</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">                    <span class="n">pair</span><span class="p">.</span><span class="n">first</span><span class="o">?.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">                <span class="p">}.</span><span class="n">setValueSerializationSchema</span> <span class="p">{</span> <span class="n">pair</span><span class="p">:</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">                    <span class="n">pair</span><span class="p">.</span><span class="n">second</span><span class="p">.</span><span class="n">toByteArray</span><span class="p">(</span><span class="nc">StandardCharsets</span><span class="p">.</span><span class="n">UTF_8</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">                <span class="p">}.</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">        <span class="p">).</span><span class="n">build</span><span class="p">()</span>
</span></span></code></pre></div>
<h3 id="datastream-processing-logic" data-numberify>DataStream Processing Logic<a class="anchor ms-1" href="#datastream-processing-logic"></a></h3>
<p>The following files contain the core logic specific to the DataStream API implementation.</p>

<h4 id="timestamp-and-watermark-strategy" data-numberify>Timestamp and Watermark Strategy<a class="anchor ms-1" href="#timestamp-and-watermark-strategy"></a></h4>
<p>For event-time processing, Flink needs to know each event&rsquo;s timestamp and how to handle out-of-order data. This <code>WatermarkStrategy</code> extracts the timestamp from the <code>bid_time</code> field of the incoming <code>RecordMap</code>. It uses <code>forBoundedOutOfOrderness</code> with a 5-second duration, telling Flink to expect records to be at most 5 seconds late.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.flink.watermark</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.RecordMap</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.eventtime.WatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.LocalDateTime</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="k">object</span> <span class="nc">SupplierWatermarkStrategy</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">val</span> <span class="py">strategy</span><span class="p">:</span> <span class="n">WatermarkStrategy</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="n">WatermarkStrategy</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">            <span class="p">.</span><span class="n">forBoundedOutOfOrderness</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">))</span> <span class="c1">// Operates on RecordMap
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"></span>            <span class="p">.</span><span class="n">withTimestampAssigner</span> <span class="p">{</span> <span class="n">recordMap</span><span class="p">,</span> <span class="n">_</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">                <span class="k">val</span> <span class="py">formatter</span> <span class="p">=</span> <span class="nc">DateTimeFormatter</span><span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">                <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">                    <span class="k">val</span> <span class="py">bidTimeString</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;bid_time&#34;</span><span class="p">]</span><span class="o">?.</span><span class="n">toString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">                    <span class="k">if</span> <span class="p">(</span><span class="n">bidTimeString</span> <span class="o">!=</span> <span class="k">null</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">                        <span class="k">val</span> <span class="py">ldt</span> <span class="p">=</span> <span class="nc">LocalDateTime</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">bidTimeString</span><span class="p">,</span> <span class="n">formatter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">                        <span class="n">ldt</span><span class="p">.</span><span class="n">atZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">()).</span><span class="n">toInstant</span><span class="p">().</span><span class="n">toEpochMilli</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">                    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Missing &#39;bid_time&#39; field in RecordMap: </span><span class="si">$recordMap</span><span class="s2">. Using processing time.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">                        <span class="nc">System</span><span class="p">.</span><span class="n">currentTimeMillis</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error parsing &#39;bid_time&#39; from RecordMap: </span><span class="si">$recordMap</span><span class="s2">. Using processing time.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">                    <span class="nc">System</span><span class="p">.</span><span class="n">currentTimeMillis</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">            <span class="p">}.</span><span class="n">withIdleness</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">10</span><span class="p">))</span> <span class="c1">// Optional: if partitions can be idle
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="c1"></span><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="custom-aggregation-and-windowing-functions" data-numberify>Custom Aggregation and Windowing Functions<a class="anchor ms-1" href="#custom-aggregation-and-windowing-functions"></a></h4>
<p>Flink&rsquo;s DataStream API provides fine-grained control over windowed aggregations using a combination of an <code>AggregateFunction</code> and a <code>WindowFunction</code>.</p>
<ul>
<li><strong><code>SupplierStatsAggregator</code>:</strong> This <code>AggregateFunction</code> performs efficient, incremental aggregation. For each record in a window, it updates an accumulator, adding the price to <code>totalPrice</code> and incrementing the <code>count</code>. This pre-aggregation is highly optimized as it doesn&rsquo;t need to store all records in the window.</li>
<li><strong><code>SupplierStatsFunction</code>:</strong> This <code>WindowFunction</code> is applied once the window is complete. It receives the final accumulator from the <code>AggregateFunction</code> and has access to the window&rsquo;s metadata (key, start time, end time). It uses this information to construct the final <code>SupplierStats</code> Avro object.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.flink.processing</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.functions.AggregateFunction</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">typealias</span> <span class="n">RecordMap</span> <span class="p">=</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Any</span><span class="p">?&gt;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">data</span> <span class="k">class</span> <span class="nc">SupplierStatsAccumulator</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">var</span> <span class="py">totalPrice</span><span class="p">:</span> <span class="n">Double</span> <span class="p">=</span> <span class="m">0.0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">var</span> <span class="py">count</span><span class="p">:</span> <span class="n">Long</span> <span class="p">=</span> <span class="m">0L</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">class</span> <span class="nc">SupplierStatsAggregator</span> <span class="p">:</span> <span class="n">AggregateFunction</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">,</span> <span class="n">SupplierStatsAccumulator</span><span class="p">,</span> <span class="n">SupplierStatsAccumulator</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">createAccumulator</span><span class="p">():</span> <span class="n">SupplierStatsAccumulator</span> <span class="p">=</span> <span class="n">SupplierStatsAccumulator</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">add</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">value</span><span class="p">:</span> <span class="n">RecordMap</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">accumulator</span><span class="p">:</span> <span class="n">SupplierStatsAccumulator</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">):</span> <span class="n">SupplierStatsAccumulator</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="n">SupplierStatsAccumulator</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">            <span class="n">accumulator</span><span class="p">.</span><span class="n">totalPrice</span> <span class="p">+</span> <span class="k">value</span><span class="p">[</span><span class="s2">&#34;price&#34;</span><span class="p">]</span> <span class="k">as</span> <span class="n">Double</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="n">accumulator</span><span class="p">.</span><span class="n">count</span> <span class="p">+</span> <span class="m">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">getResult</span><span class="p">(</span><span class="n">accumulator</span><span class="p">:</span> <span class="n">SupplierStatsAccumulator</span><span class="p">):</span> <span class="n">SupplierStatsAccumulator</span> <span class="p">=</span> <span class="n">accumulator</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">merge</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">        <span class="n">a</span><span class="p">:</span> <span class="n">SupplierStatsAccumulator</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="n">b</span><span class="p">:</span> <span class="n">SupplierStatsAccumulator</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">    <span class="p">):</span> <span class="n">SupplierStatsAccumulator</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">        <span class="n">SupplierStatsAccumulator</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">            <span class="n">totalPrice</span> <span class="p">=</span> <span class="n">a</span><span class="p">.</span><span class="n">totalPrice</span> <span class="p">+</span> <span class="n">b</span><span class="p">.</span><span class="n">totalPrice</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">            <span class="n">count</span> <span class="p">=</span> <span class="n">a</span><span class="p">.</span><span class="n">count</span> <span class="p">+</span> <span class="n">b</span><span class="p">.</span><span class="n">count</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.flink.processing</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.SupplierStats</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.functions.windowing.WindowFunction</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.windowing.windows.TimeWindow</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.util.Collector</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Instant</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">class</span> <span class="nc">SupplierStatsFunction</span> <span class="p">:</span> <span class="n">WindowFunction</span><span class="p">&lt;</span><span class="n">SupplierStatsAccumulator</span><span class="p">,</span> <span class="n">SupplierStats</span><span class="p">,</span> <span class="n">String</span><span class="p">,</span> <span class="n">TimeWindow</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">companion</span> <span class="k">object</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="k">private</span> <span class="k">val</span> <span class="py">formatter</span><span class="p">:</span> <span class="n">DateTimeFormatter</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="nc">DateTimeFormatter</span><span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">).</span><span class="n">withZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">apply</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="n">supplierKey</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="n">window</span><span class="p">:</span> <span class="n">TimeWindow</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="n">input</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">&lt;</span><span class="n">SupplierStatsAccumulator</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="k">out</span><span class="p">:</span> <span class="n">Collector</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="k">val</span> <span class="py">accumulator</span> <span class="p">=</span> <span class="n">input</span><span class="p">.</span><span class="n">firstOrNull</span><span class="p">()</span> <span class="o">?:</span> <span class="k">return</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">        <span class="k">val</span> <span class="py">windowStartStr</span> <span class="p">=</span> <span class="n">formatter</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="nc">Instant</span><span class="p">.</span><span class="n">ofEpochMilli</span><span class="p">(</span><span class="n">window</span><span class="p">.</span><span class="n">start</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">        <span class="k">val</span> <span class="py">windowEndStr</span> <span class="p">=</span> <span class="n">formatter</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="nc">Instant</span><span class="p">.</span><span class="n">ofEpochMilli</span><span class="p">(</span><span class="n">window</span><span class="p">.</span><span class="n">end</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl">        <span class="k">out</span><span class="p">.</span><span class="n">collect</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">            <span class="n">SupplierStats</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">                <span class="p">.</span><span class="n">newBuilder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">                <span class="p">.</span><span class="n">setWindowStart</span><span class="p">(</span><span class="n">windowStartStr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">                <span class="p">.</span><span class="n">setWindowEnd</span><span class="p">(</span><span class="n">windowEndStr</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">                <span class="p">.</span><span class="n">setSupplier</span><span class="p">(</span><span class="n">supplierKey</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">                <span class="p">.</span><span class="n">setTotalPrice</span><span class="p">(</span><span class="nc">String</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&#34;%.2f&#34;</span><span class="p">,</span> <span class="n">accumulator</span><span class="p">.</span><span class="n">totalPrice</span><span class="p">).</span><span class="n">toDouble</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">                <span class="p">.</span><span class="n">setCount</span><span class="p">(</span><span class="n">accumulator</span><span class="p">.</span><span class="n">count</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">                <span class="p">.</span><span class="n">build</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h4 id="not-applicable-source-code" data-numberify>Not Applicable Source Code<a class="anchor ms-1" href="#not-applicable-source-code"></a></h4>
<p><code>RowWatermarkStrategy</code> and <code>LateDataRouter</code> are used exclusively by the Flink Table API application and are not relevant to this DataStream implementation. The DataStream API handles late data using the built-in <code>.sideOutputLateData()</code> method, making a custom router unnecessary.</p>

<h3 id="core-datastream-application" data-numberify>Core DataStream Application<a class="anchor ms-1" href="#core-datastream-application"></a></h3>
<p>This is the main driver for the DataStream application. It defines and executes the Flink job topology.</p>
<ol>
<li><strong>Environment Setup:</strong> It initializes the <code>StreamExecutionEnvironment</code> and creates the necessary output Kafka topics.</li>
<li><strong>Source and Transformation:</strong> It creates a Kafka source for Avro <code>GenericRecord</code>s and then maps them to a more convenient <code>DataStream&lt;RecordMap&gt;</code>.</li>
<li><strong>Timestamping and Windowing:</strong>
<ul>
<li><code>assignTimestampsAndWatermarks</code> applies the custom <code>SupplierWatermarkStrategy</code>.</li>
<li>The stream is keyed by the <code>supplier</code> field.</li>
<li>A <code>TumblingEventTimeWindows</code> of 5 seconds is defined.</li>
<li><code>allowedLateness</code> is set to 5 seconds, allowing the window state to be kept for an additional 5 seconds after the watermark passes to accommodate late-but-not-too-late events.</li>
</ul>
</li>
<li><strong>Late Data Handling:</strong> <code>sideOutputLateData</code> is a key feature. It directs any records arriving after the <code>allowedLateness</code> period to a separate stream identified by an <code>OutputTag</code>.</li>
<li><strong>Aggregation:</strong> The <code>.aggregate()</code> call combines the efficient <code>SupplierStatsAggregator</code> with the final <code>SupplierStatsFunction</code> to produce the statistics.</li>
<li><strong>Sinking:</strong>
<ul>
<li>The main <code>statsStream</code> is sent to the <code>statsSink</code>.</li>
<li>The late data stream, retrieved via <code>getSideOutput</code>, is processed (converted to JSON with a &ldquo;late&rdquo; flag) and sent to the <code>skippedSink</code>.</li>
</ul>
</li>
<li><strong>Execution:</strong> <code>env.execute()</code> starts the Flink job.</li>
</ol>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.ObjectMapper</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.module.kotlin.registerKotlinModule</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.SupplierStats</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.RecordMap</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.SupplierStatsAggregator</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.processing.SupplierStatsFunction</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.flink.watermark.SupplierWatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createOrdersSource</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createSkippedSink</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createStatsSink</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createTopicIfNotExists</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.getLatestSchema</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.eventtime.WatermarkStrategy</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.typeinfo.TypeHint</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.api.common.typeinfo.TypeInformation</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.datastream.DataStream</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.environment.StreamExecutionEnvironment</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.flink.util.OutputTag</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">
</span></span><span class="line"><span class="ln"> 27</span><span class="cl"><span class="k">object</span> <span class="nc">DataStreamApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">toSkipPrint</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TO_SKIP_PRINT&#34;</span><span class="p">)</span><span class="o">?.</span><span class="n">toBoolean</span><span class="p">()</span> <span class="o">?:</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;kafka-1:19092&#34;</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">inputTopicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-avro&#34;</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryUrl</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;http://schema:8081&#34;</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryConfig</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">        <span class="n">mapOf</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">            <span class="s2">&#34;basic.auth.credentials.source&#34;</span> <span class="n">to</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">            <span class="s2">&#34;basic.auth.user.info&#34;</span> <span class="n">to</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">INPUT</span><span class="n">_SCHEMA_SUBJECT</span> <span class="p">=</span> <span class="s2">&#34;orders-avro-value&#34;</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">NUM</span><span class="n">_PARTITIONS</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">REPLICATION</span><span class="n">_FACTOR</span><span class="p">:</span> <span class="n">Short</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">    <span class="c1">// ObjectMapper for converting late data Map to JSON
</span></span></span><span class="line"><span class="ln"> 43</span><span class="cl"><span class="c1"></span>    <span class="k">private</span> <span class="k">val</span> <span class="py">objectMapper</span><span class="p">:</span> <span class="n">ObjectMapper</span> <span class="k">by</span> <span class="n">lazy</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">        <span class="n">ObjectMapper</span><span class="p">().</span><span class="n">registerKotlinModule</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">        <span class="c1">// Create output topics if not existing
</span></span></span><span class="line"><span class="ln"> 49</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">outputTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-kds-stats&#34;</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">        <span class="k">val</span> <span class="py">skippedTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-kds-skipped&#34;</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">        <span class="n">listOf</span><span class="p">(</span><span class="n">outputTopicName</span><span class="p">,</span> <span class="n">skippedTopicName</span><span class="p">).</span><span class="n">forEach</span> <span class="p">{</span> <span class="n">name</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">            <span class="n">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">                <span class="n">name</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">                <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">                <span class="n">NUM_PARTITIONS</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">                <span class="n">REPLICATION_FACTOR</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">        <span class="k">val</span> <span class="py">env</span> <span class="p">=</span> <span class="nc">StreamExecutionEnvironment</span><span class="p">.</span><span class="n">getExecutionEnvironment</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">        <span class="n">env</span><span class="p">.</span><span class="n">parallelism</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">        <span class="k">val</span> <span class="py">inputAvroSchema</span> <span class="p">=</span> <span class="n">getLatestSchema</span><span class="p">(</span><span class="n">INPUT_SCHEMA_SUBJECT</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">,</span> <span class="n">registryConfig</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">        <span class="k">val</span> <span class="py">ordersGenericRecordSource</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">            <span class="n">createOrdersSource</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">                <span class="n">topic</span> <span class="p">=</span> <span class="n">inputTopicName</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">                <span class="n">groupId</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-flink-ds&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">                <span class="n">bootstrapAddress</span> <span class="p">=</span> <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">                <span class="n">registryUrl</span> <span class="p">=</span> <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">                <span class="n">registryConfig</span> <span class="p">=</span> <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">                <span class="n">schema</span> <span class="p">=</span> <span class="n">inputAvroSchema</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">        <span class="c1">// 1. Stream of GenericRecords from Kafka
</span></span></span><span class="line"><span class="ln"> 75</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">genericRecordStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">            <span class="n">env</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">                <span class="p">.</span><span class="n">fromSource</span><span class="p">(</span><span class="n">ordersGenericRecordSource</span><span class="p">,</span> <span class="nc">WatermarkStrategy</span><span class="p">.</span><span class="n">noWatermarks</span><span class="p">(),</span> <span class="s2">&#34;KafkaGenericRecordSource&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">        <span class="c1">// 2. Convert GenericRecord to Map&lt;String, Any?&gt; (RecordMap)
</span></span></span><span class="line"><span class="ln"> 80</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">recordMapStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">            <span class="n">genericRecordStream</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">genericRecord</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">                    <span class="k">val</span> <span class="py">map</span> <span class="p">=</span> <span class="n">mutableMapOf</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Any</span><span class="p">?&gt;()</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">                    <span class="n">genericRecord</span><span class="p">.</span><span class="n">schema</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">forEach</span> <span class="p">{</span> <span class="k">field</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">                        <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">genericRecord</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">                        <span class="n">map</span><span class="p">[</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">()]</span> <span class="p">=</span> <span class="k">if</span> <span class="p">(</span><span class="k">value</span> <span class="k">is</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">avro</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">Utf8</span><span class="p">)</span> <span class="k">value</span><span class="p">.</span><span class="n">toString</span><span class="p">()</span> <span class="k">else</span> <span class="k">value</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">                    <span class="n">map</span> <span class="k">as</span> <span class="n">RecordMap</span> <span class="c1">// Cast to type alias
</span></span></span><span class="line"><span class="ln"> 89</span><span class="cl"><span class="c1"></span>                <span class="p">}.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;GenericRecordToMapConverter&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">                <span class="p">.</span><span class="n">returns</span><span class="p">(</span><span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;()</span> <span class="p">{}))</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">        <span class="c1">// 3. Define OutputTag for late data (now carrying RecordMap)
</span></span></span><span class="line"><span class="ln"> 93</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">lateMapOutputTag</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">            <span class="n">OutputTag</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                <span class="s2">&#34;late-order-records&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                <span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;()</span> <span class="p">{}),</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">        <span class="c1">// 4. Process the RecordMap stream
</span></span></span><span class="line"><span class="ln">100</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">statsStreamOperator</span><span class="p">:</span> <span class="n">SingleOutputStreamOperator</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">            <span class="n">recordMapStream</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">                <span class="p">.</span><span class="n">assignTimestampsAndWatermarks</span><span class="p">(</span><span class="nc">SupplierWatermarkStrategy</span><span class="p">.</span><span class="n">strategy</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">                <span class="p">.</span><span class="n">keyBy</span> <span class="p">{</span> <span class="n">recordMap</span> <span class="o">-&gt;</span> <span class="n">recordMap</span><span class="p">[</span><span class="s2">&#34;supplier&#34;</span><span class="p">].</span><span class="n">toString</span><span class="p">()</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">                <span class="p">.</span><span class="n">window</span><span class="p">(</span><span class="nc">TumblingEventTimeWindows</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">)))</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">                <span class="p">.</span><span class="n">allowedLateness</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">                <span class="p">.</span><span class="n">sideOutputLateData</span><span class="p">(</span><span class="n">lateMapOutputTag</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">                <span class="p">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">SupplierStatsAggregator</span><span class="p">(),</span> <span class="n">SupplierStatsFunction</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">        <span class="k">val</span> <span class="py">statsStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;</span> <span class="p">=</span> <span class="n">statsStreamOperator</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">
</span></span><span class="line"><span class="ln">110</span><span class="cl">        <span class="c1">// 5. Handle late data as a pair of key and value
</span></span></span><span class="line"><span class="ln">111</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">lateDataMapStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">RecordMap</span><span class="p">&gt;</span> <span class="p">=</span> <span class="n">statsStreamOperator</span><span class="p">.</span><span class="n">getSideOutput</span><span class="p">(</span><span class="n">lateMapOutputTag</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">        <span class="k">val</span> <span class="py">lateKeyPairStream</span><span class="p">:</span> <span class="n">DataStream</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">            <span class="n">lateDataMapStream</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">recordMap</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">115</span><span class="cl">                    <span class="k">val</span> <span class="py">mutableMap</span> <span class="p">=</span> <span class="n">recordMap</span><span class="p">.</span><span class="n">toMutableMap</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">116</span><span class="cl">                    <span class="n">mutableMap</span><span class="p">[</span><span class="s2">&#34;late&#34;</span><span class="p">]</span> <span class="p">=</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl">                    <span class="k">val</span> <span class="py">orderId</span> <span class="p">=</span> <span class="n">mutableMap</span><span class="p">[</span><span class="s2">&#34;order_id&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">String</span>
</span></span><span class="line"><span class="ln">118</span><span class="cl">                    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">119</span><span class="cl">                        <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">objectMapper</span><span class="p">.</span><span class="n">writeValueAsString</span><span class="p">(</span><span class="n">mutableMap</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">120</span><span class="cl">                        <span class="n">Pair</span><span class="p">(</span><span class="n">orderId</span><span class="p">,</span> <span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">121</span><span class="cl">                    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">122</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error serializing late RecordMap to JSON: </span><span class="si">$mutableMap</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">123</span><span class="cl">                        <span class="k">val</span> <span class="py">errorJson</span> <span class="p">=</span> <span class="s2">&#34;{ </span><span class="se">\&#34;</span><span class="s2">error</span><span class="se">\&#34;</span><span class="s2">: </span><span class="se">\&#34;</span><span class="s2">json_serialization_failed</span><span class="se">\&#34;</span><span class="s2">, </span><span class="se">\&#34;</span><span class="s2">data_keys</span><span class="se">\&#34;</span><span class="s2">: </span><span class="se">\&#34;</span><span class="s2">${</span>
</span></span><span class="line"><span class="ln">124</span><span class="cl">                            <span class="n">mutableMap</span><span class="p">.</span><span class="n">keys</span><span class="p">.</span><span class="n">joinToString</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">125</span><span class="cl">                                <span class="s2">&#34;,&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">126</span><span class="cl">                            <span class="p">)}</span><span class="err">\</span><span class="s2">&#34; }&#34;</span>
</span></span><span class="line"><span class="ln">127</span><span class="cl">                        <span class="n">Pair</span><span class="p">(</span><span class="n">orderId</span><span class="p">,</span> <span class="n">errorJson</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">128</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">129</span><span class="cl">                <span class="p">}.</span><span class="n">returns</span><span class="p">(</span><span class="nc">TypeInformation</span><span class="p">.</span><span class="n">of</span><span class="p">(</span><span class="k">object</span> <span class="err">: </span><span class="nc">TypeHint</span><span class="p">&lt;</span><span class="n">Pair</span><span class="p">&lt;</span><span class="n">String</span><span class="p">?,</span> <span class="n">String</span><span class="p">&gt;&gt;()</span> <span class="p">{}))</span>
</span></span><span class="line"><span class="ln">130</span><span class="cl">
</span></span><span class="line"><span class="ln">131</span><span class="cl">        <span class="k">if</span> <span class="p">(!</span><span class="n">toSkipPrint</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">132</span><span class="cl">            <span class="n">statsStream</span>
</span></span><span class="line"><span class="ln">133</span><span class="cl">                <span class="p">.</span><span class="n">print</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">134</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;SupplierStatsPrint&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">135</span><span class="cl">            <span class="n">lateKeyPairStream</span>
</span></span><span class="line"><span class="ln">136</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="k">it</span><span class="p">.</span><span class="n">second</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">137</span><span class="cl">                <span class="p">.</span><span class="n">print</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">138</span><span class="cl">                <span class="p">.</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;LateDataPrint&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">139</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">140</span><span class="cl">
</span></span><span class="line"><span class="ln">141</span><span class="cl">        <span class="k">val</span> <span class="py">statsSink</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">142</span><span class="cl">            <span class="n">createStatsSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">143</span><span class="cl">                <span class="n">topic</span> <span class="p">=</span> <span class="n">outputTopicName</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">144</span><span class="cl">                <span class="n">bootstrapAddress</span> <span class="p">=</span> <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">145</span><span class="cl">                <span class="n">registryUrl</span> <span class="p">=</span> <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">146</span><span class="cl">                <span class="n">registryConfig</span> <span class="p">=</span> <span class="n">registryConfig</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">147</span><span class="cl">                <span class="n">outputSubject</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$outputTopicName</span><span class="s2">-value&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">148</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln">149</span><span class="cl">
</span></span><span class="line"><span class="ln">150</span><span class="cl">        <span class="k">val</span> <span class="py">skippedSink</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">151</span><span class="cl">            <span class="n">createSkippedSink</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">152</span><span class="cl">                <span class="n">topic</span> <span class="p">=</span> <span class="n">skippedTopicName</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">153</span><span class="cl">                <span class="n">bootstrapAddress</span> <span class="p">=</span> <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">154</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln">155</span><span class="cl">
</span></span><span class="line"><span class="ln">156</span><span class="cl">        <span class="n">statsStream</span><span class="p">.</span><span class="n">sinkTo</span><span class="p">(</span><span class="n">statsSink</span><span class="p">).</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;SupplierStatsSink&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">157</span><span class="cl">        <span class="n">lateKeyPairStream</span><span class="p">.</span><span class="n">sinkTo</span><span class="p">(</span><span class="n">skippedSink</span><span class="p">).</span><span class="n">name</span><span class="p">(</span><span class="s2">&#34;LateDataSink&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">158</span><span class="cl">        <span class="n">env</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&#34;SupplierStats&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">159</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">160</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="application-entry-point" data-numberify>Application Entry Point<a class="anchor ms-1" href="#application-entry-point"></a></h3>
<p>The <code>Main.kt</code> file serves as the entry point for the application. It parses a command-line argument (<code>datastream</code> or <code>table</code>) to determine which Flink application to run. A <code>try-catch</code> block ensures that any fatal error during execution is logged before the application exits.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.system.exitProcess</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">fun</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span> <span class="n">Array</span><span class="p">&lt;</span><span class="n">String</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">when</span> <span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">getOrNull</span><span class="p">(</span><span class="m">0</span><span class="p">)</span><span class="o">?.</span><span class="n">lowercase</span><span class="p">())</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;datastream&#34;</span> <span class="o">-&gt;</span> <span class="nc">DataStreamApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;table&#34;</span> <span class="o">-&gt;</span> <span class="nc">TableApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="k">else</span> <span class="o">-&gt;</span> <span class="n">println</span><span class="p">(</span><span class="s2">&#34;Usage: &lt;datastream | table&gt;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Fatal error in </span><span class="si">${args.getOrNull(0) ?: &#34;app&#34;}</span><span class="s2">. Shutting down.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">exitProcess</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="run-flink-application" data-numberify>Run Flink Application<a class="anchor ms-1" href="#run-flink-application"></a></h2>
<p>To observe our Flink DataStream application in action, we&rsquo;ll follow the essential steps: setting up a local Kafka environment, generating a stream of test data, and then executing the Flink job.</p>

<h3 id="factor-house-local-setup" data-numberify>Factor House Local Setup<a class="anchor ms-1" href="#factor-house-local-setup"></a></h3>
<p>A local Kafka environment is a prerequisite. If you don&rsquo;t have one running, use the <a href="https://github.com/factorhouse/factorhouse-local" target="_blank" rel="noopener noreferrer">Factor House Local<i class="fas fa-external-link-square-alt ms-1"></i></a> project to quickly get started:</p>
<ol>
<li>Clone the repository:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> factorhouse-local
</span></span></code></pre></div></li>
<li>Configure your Kpow community license as detailed in the project&rsquo;s <a href="https://github.com/factorhouse/factorhouse-local?tab=readme-ov-file#update-kpow-and-flex-licenses" target="_blank" rel="noopener noreferrer">README<i class="fas fa-external-link-square-alt ms-1"></i></a>.</li>
<li>Start the Docker services:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose -f compose-kpow-community.yml up -d
</span></span></code></pre></div></li>
</ol>
<p>Once running, the Kpow UI at <code>http://localhost:3000</code> will provide visibility into your Kafka cluster.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/kpow-overview.png" loading="lazy" width="1414" height="818" />
</picture>

</p>

<h3 id="start-the-kafka-order-producer" data-numberify>Start the Kafka Order Producer<a class="anchor ms-1" href="#start-the-kafka-order-producer"></a></h3>
<p>Our Flink application is designed to consume order data from the <code>orders-avro</code> topic. We&rsquo;ll use the Kafka producer developed in <a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/">Part 2 of this series</a> to generate this data. To properly test Flink&rsquo;s event-time windowing, we&rsquo;ll configure the producer to add a randomized delay (up to 30 seconds) to the <code>bid_time</code> field.</p>
<p>Navigate to the directory of the producer application (<em>orders-avro-clients</em> from the <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>) and run:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Assuming you are in the root of the &#39;orders-avro-clients&#39; project</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nv">DELAY_SECONDS</span><span class="o">=</span><span class="m">30</span> ./gradlew run --args<span class="o">=</span><span class="s2">&#34;producer&#34;</span>
</span></span></code></pre></div><p>This will start populating the <code>orders-avro</code> topic with Avro-encoded order messages. You can inspect these messages in Kpow. Ensure Kpow is configured with Key Deserializer: <em>String</em>, Value Deserializer: <em>AVRO</em>, and Schema Registry: <em>Local Schema Registry</em>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/orders-01.png" loading="lazy" width="1200" height="670" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/orders-02.png" loading="lazy" width="1199" height="663" />
</picture>

</p>

<h3 id="launch-the-flink-application" data-numberify>Launch the Flink Application<a class="anchor ms-1" href="#launch-the-flink-application"></a></h3>
<p>With a steady stream of order events being produced, we can now launch our <code>orders-stats-flink</code> application. Navigate to its project directory. The application&rsquo;s entry point is designed to run different jobs based on a command-line argument; for this post, we&rsquo;ll use <code>datastream</code>.</p>
<p>The application can be run in two main ways:</p>
<ol>
<li><strong>With Gradle (Development Mode)</strong>:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;datastream&#34;</span>
</span></span></code></pre></div></li>
<li><strong>Running the Shadow JAR (Deployment Mode)</strong>:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># First, build the fat JAR</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">./gradlew shadowJar
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Then run it</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">java --add-opens<span class="o">=</span>java.base/java.util<span class="o">=</span>ALL-UNNAMED <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  -jar build/libs/orders-stats-flink-1.0.jar datastream
</span></span></code></pre></div></li>
</ol>
<blockquote>
<p>💡 To build and run the application locally, ensure that <strong>JDK 17</strong> is installed.</p>
</blockquote>
<p>For this demonstration, we&rsquo;ll use Gradle to run the application in development mode. Upon starting, you&rsquo;ll see logs indicating the Flink application has initialized and is processing records from the <code>orders-avro</code> topic.</p>

<h3 id="observing-the-output" data-numberify>Observing the Output<a class="anchor ms-1" href="#observing-the-output"></a></h3>
<p>Our Flink DataStream job writes its results to two distinct Kafka topics:</p>
<ul>
<li><code>orders-avro-kds-stats</code>: Contains the aggregated supplier statistics as Avro records.</li>
<li><code>orders-avro-kds-skipped</code>: Contains records identified as &ldquo;late,&rdquo; serialized as JSON.</li>
</ul>
<p><strong>1. Supplier Statistics (<code>orders-avro-kds-stats</code>):</strong></p>
<p>In Kpow, navigate to the <code>orders-avro-kds-stats</code> topic. Configure Kpow to view these messages:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>AVRO</em></li>
<li><strong>Schema Registry:</strong> <em>Local Schema Registry</em></li>
</ul>
<p>You should see <code>SupplierStats</code> messages, each representing the total price and count of orders for a supplier within a 5-second window. Notice the <code>window_start</code> and <code>window_end</code> fields.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/stats-01.png" loading="lazy" width="1201" height="669" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/stats-02.png" loading="lazy" width="1200" height="665" />
</picture>

</p>
<p><strong>2. Skipped (Late) Records (<code>orders-avro-kds-skipped</code>):</strong></p>
<p>Next, inspect the <code>orders-avro-kds-skipped</code> topic in Kpow. Configure Kpow as follows:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>JSON</em></li>
</ul>
<p>These records are the ones that arrived too late to be included in their windows, even after the <code>allowedLateness</code> period. They were captured using Flink&rsquo;s powerful <code>.sideOutputLateData()</code> function and then converted to JSON with a <code>&quot;late&quot;: true</code> field for confirmation.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/skipped-01.png" loading="lazy" width="1196" height="607" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-10-kotlin-getting-started-flink-datastream/skipped-02.png" loading="lazy" width="1201" height="703" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>In this post, we&rsquo;ve built a powerful, real-time analytics job using Apache Flink&rsquo;s DataStream API. We demonstrated how to implement a complete stateful pipeline in Kotlin, from consuming Avro records to performing windowed aggregations with a custom <code>AggregateFunction</code> and <code>WindowFunction</code>. We saw how Flink&rsquo;s <code>WatermarkStrategy</code> provides a robust foundation for event-time processing and how the <code>.sideOutputLateData()</code> operator offers a clean, first-class solution for isolating late records. This approach showcases the fine-grained control and high performance the DataStream API offers for complex stream processing challenges. Next, we will see how to solve the same problem with a much more declarative approach using Flink&rsquo;s Table API.</p>
      ]]></content:encoded></item><item><title>Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</title><link>https://jaehyeon.me/blog/2025-06-03-kotlin-getting-started-kafka-streams/</link><guid>https://jaehyeon.me/blog/2025-06-03-kotlin-getting-started-kafka-streams/</guid><pubDate>Tue, 03 Jun 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In this post, we shift our focus from basic Kafka clients to real-time stream processing with <strong>Kafka Streams</strong>. We&rsquo;ll explore a Kotlin application designed to analyze a continuous stream of Avro-formatted order events, calculate supplier statistics in tumbling windows, and intelligently handle late-arriving data. This example demonstrates the power of Kafka Streams for building lightweight, yet robust, stream processing applications directly within your Kafka ecosystem, leveraging event-time processing and custom logic.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In this post, we shift our focus from basic Kafka clients to real-time stream processing with <strong>Kafka Streams</strong>. We&rsquo;ll explore a Kotlin application designed to analyze a continuous stream of Avro-formatted order events, calculate supplier statistics in tumbling windows, and intelligently handle late-arriving data. This example demonstrates the power of Kafka Streams for building lightweight, yet robust, stream processing applications directly within your Kafka ecosystem, leveraging event-time processing and custom logic.</p>
<ul>
<li><a href="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients">Kafka Clients with JSON - Producing and Consuming Order Events</a></li>
<li><a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients">Kafka Clients with Avro - Schema Registry and Order Events</a></li>
<li><a href="/blog/2025-06-03-kotlin-getting-started-kafka-streams/#">Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</a> (this post)</li>
<li><a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">Flink DataStream API - Scalable Event Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-17-kotlin-getting-started-flink-table">Flink Table API - Declarative Analytics for Supplier Stats in Real Time</a></li>
</ul>

<h2 id="kafka-streams-application-for-supplier-statistics" data-numberify>Kafka Streams Application for Supplier Statistics<a class="anchor ms-1" href="#kafka-streams-application-for-supplier-statistics"></a></h2>
<p>This project showcases a Kafka Streams application that:</p>
<ul>
<li>Consumes Avro-formatted order data from an input Kafka topic.</li>
<li>Extracts event timestamps from the order data to enable accurate time-based processing.</li>
<li>Proactively identifies and separates late-arriving records.</li>
<li>Aggregates order data to compute supplier statistics (total price and count) within defined time windows.</li>
<li>Outputs the calculated statistics and late records to separate Kafka topics.</li>
</ul>
<p>The source code for the application discussed in this post can be found in the <em>orders-stats-streams</em> folder of this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="the-build-configuration" data-numberify>The Build Configuration<a class="anchor ms-1" href="#the-build-configuration"></a></h3>
<p>The <code>build.gradle.kts</code> file orchestrates the build, dependencies, and packaging of our Kafka Streams application.</p>
<ul>
<li><strong>Plugins:</strong>
<ul>
<li><code>kotlin(&quot;jvm&quot;)</code>: Enables Kotlin language support for the JVM.</li>
<li><code>com.github.davidmc24.gradle.plugin.avro</code>: Manages Avro schema compilation into Java classes.</li>
<li><code>com.github.johnrengelman.shadow</code>: Creates a &ldquo;fat JAR&rdquo; containing all dependencies.</li>
<li><code>application</code>: Configures the project as a runnable application.</li>
</ul>
</li>
<li><strong>Repositories:</strong>
<ul>
<li><code>mavenCentral()</code>: Standard Maven repository.</li>
<li><code>maven(&quot;https://packages.confluent.io/maven/&quot;)</code>: Confluent repository for Kafka Streams Avro SerDes and other Confluent components.</li>
</ul>
</li>
<li><strong>Dependencies:</strong>
<ul>
<li><strong>Kafka:</strong> <code>org.apache.kafka:kafka-clients</code> and importantly, <code>org.apache.kafka:kafka-streams</code> for the stream processing DSL and Processor API.</li>
<li><strong>Avro:</strong>
<ul>
<li><code>org.apache.avro:avro</code> for the core Avro library.</li>
<li><code>io.confluent:kafka-streams-avro-serde</code> for Confluent&rsquo;s Kafka Streams Avro SerDes, which integrate with Schema Registry.</li>
</ul>
</li>
<li><strong>JSON:</strong> <code>com.fasterxml.jackson.module:jackson-module-kotlin</code> for serializing late records to JSON.</li>
<li><strong>Logging:</strong> <code>io.github.microutils:kotlin-logging-jvm</code> and <code>ch.qos.logback:logback-classic</code>.</li>
</ul>
</li>
<li><strong>Application Configuration:</strong>
<ul>
<li><code>mainClass.set(&quot;me.jaehyeon.MainKt&quot;)</code>: Defines the application&rsquo;s entry point.</li>
<li>The <code>run</code> task is configured with environment variables (<code>BOOTSTRAP</code>, <code>TOPIC</code>, <code>REGISTRY_URL</code>) for Kafka connection details, simplifying local execution.</li>
</ul>
</li>
<li><strong>Avro Configuration:</strong>
<ul>
<li>The <code>avro</code> block customizes Avro code generation (e.g., <code>setCreateSetters(false)</code>).</li>
<li><code>tasks.named(&quot;compileKotlin&quot;) { dependsOn(&quot;generateAvroJava&quot;) }</code> ensures Avro classes are generated before Kotlin compilation.</li>
<li>Generated Avro Java sources are added to the <code>main</code> source set.</li>
</ul>
</li>
<li><strong>Shadow JAR Configuration:</strong>
<ul>
<li>Configures the output fat JAR name (<code>orders-stats-streams</code>) and version.</li>
<li><code>mergeServiceFiles()</code> handles merging service provider files from dependencies.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">plugins</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;jvm&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;2.1.20&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.davidmc24.gradle.plugin.avro&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;1.9.1&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.johnrengelman.shadow&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;8.1.1&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">application</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">group</span> <span class="p">=</span> <span class="s2">&#34;me.jaehyeon&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">version</span> <span class="p">=</span> <span class="s2">&#34;1.0-SNAPSHOT&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">repositories</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">mavenCentral</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">maven</span><span class="p">(</span><span class="s2">&#34;https://packages.confluent.io/maven/&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// Kafka
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.kafka:kafka-clients:3.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.kafka:kafka-streams:3.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="c1">// AVRO
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.avro:avro:1.11.4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.confluent:kafka-streams-avro-serde:7.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="c1">// Json
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;com.fasterxml.jackson.module:jackson-module-kotlin:2.13.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="c1">// Logging
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.github.microutils:kotlin-logging-jvm:3.0.5&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;ch.qos.logback:logback-classic:1.5.13&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="c1">// Test
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;test&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="n">kotlin</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">jvmToolchain</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">
</span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="n">application</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">    <span class="n">mainClass</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;me.jaehyeon.MainKt&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">
</span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="n">avro</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="n">setCreateSetters</span><span class="p">(</span><span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">    <span class="n">setFieldVisibility</span><span class="p">(</span><span class="s2">&#34;PRIVATE&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">
</span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;compileKotlin&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;generateAvroJava&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">
</span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="n">sourceSets</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">    <span class="n">named</span><span class="p">(</span><span class="s2">&#34;main&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">        <span class="n">java</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;build/generated/avro/main&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">        <span class="n">kotlin</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;src/main/kotlin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">
</span></span><span class="line"><span class="ln">56</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">withType</span><span class="p">&lt;</span><span class="n">com</span><span class="p">.</span><span class="n">github</span><span class="p">.</span><span class="n">jengelman</span><span class="p">.</span><span class="n">gradle</span><span class="p">.</span><span class="n">plugins</span><span class="p">.</span><span class="n">shadow</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">ShadowJar</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">    <span class="n">archiveBaseName</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;orders-stats-streams&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">    <span class="n">archiveClassifier</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">    <span class="n">archiveVersion</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="n">mergeServiceFiles</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">
</span></span><span class="line"><span class="ln">63</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;build&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">
</span></span><span class="line"><span class="ln">67</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">&lt;</span><span class="n">JavaExec</span><span class="p">&gt;(</span><span class="s2">&#34;run&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">,</span> <span class="s2">&#34;localhost:9092&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;TOPIC&#34;</span><span class="p">,</span> <span class="s2">&#34;orders-avro&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">    <span class="n">environment</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">,</span> <span class="s2">&#34;http://localhost:8081&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">
</span></span><span class="line"><span class="ln">73</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">test</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">    <span class="n">useJUnitPlatform</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="avro-schema-for-aggregated-statistics" data-numberify>Avro Schema for Aggregated Statistics<a class="anchor ms-1" href="#avro-schema-for-aggregated-statistics"></a></h3>
<p>The <code>SupplierStats.avsc</code> file defines the Avro schema for the output of our stream aggregation. This ensures type safety and schema evolution for the processed statistics.</p>
<ul>
<li><strong>Type:</strong> A <code>record</code> named <code>SupplierStats</code> within the <code>me.jaehyeon.avro</code> namespace.</li>
<li><strong>Fields:</strong>
<ul>
<li><code>window_start</code> (string): Marks the beginning of the aggregation window.</li>
<li><code>window_end</code> (string): Marks the end of the aggregation window.</li>
<li><code>supplier</code> (string): The identifier for the supplier.</li>
<li><code>total_price</code> (double): The sum of order prices for the supplier within the window.</li>
<li><code>count</code> (long): The number of orders for the supplier within the window.</li>
</ul>
</li>
<li><strong>Usage:</strong> This schema is used by the <code>SpecificAvroSerde</code> to serialize the aggregated <code>SupplierStats</code> objects before they are written to the output Kafka topic.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;record&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;SupplierStats&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="nt">&#34;namespace&#34;</span><span class="p">:</span> <span class="s2">&#34;me.jaehyeon.avro&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nt">&#34;fields&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_start&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;window_end&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;total_price&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;double&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;count&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;long&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="kafka-admin-utilities" data-numberify>Kafka Admin Utilities<a class="anchor ms-1" href="#kafka-admin-utilities"></a></h3>
<p>The <code>Utils.kt</code> file provides a helper function for Kafka topic management, ensuring that necessary topics exist before the stream processing begins.</p>
<ul>
<li><strong><code>createTopicIfNotExists(...)</code>:</strong>
<ul>
<li>This function uses Kafka&rsquo;s <code>AdminClient</code> to programmatically create Kafka topics.</li>
<li>It takes the topic name, bootstrap server address, number of partitions, and replication factor as parameters.</li>
<li>It&rsquo;s designed to be idempotent: if the topic already exists (due to prior creation or concurrent attempts), it logs a warning and proceeds without error, preventing application startup failures.</li>
<li>For other errors during topic creation, it throws a runtime exception.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClient</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClientConfig</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.NewTopic</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.TopicExistsException</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.use</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">fun</span> <span class="nf">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">topicName</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">numPartitions</span><span class="p">:</span> <span class="n">Int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">replicationFactor</span><span class="p">:</span> <span class="n">Short</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="k">val</span> <span class="py">newTopic</span> <span class="p">=</span> <span class="n">NewTopic</span><span class="p">(</span><span class="n">topicName</span><span class="p">,</span> <span class="n">numPartitions</span><span class="p">,</span> <span class="n">replicationFactor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">        <span class="k">val</span> <span class="py">result</span> <span class="p">=</span> <span class="n">client</span><span class="p">.</span><span class="n">createTopics</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">newTopic</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Attempting to create topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">            <span class="n">result</span><span class="p">.</span><span class="n">all</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; created successfully!&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span> <span class="k">is</span> <span class="n">TopicExistsException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; was created concurrently or already existed. Continuing...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while creating a topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="custom-timestamp-extraction" data-numberify>Custom Timestamp Extraction<a class="anchor ms-1" href="#custom-timestamp-extraction"></a></h3>
<p>For accurate event-time processing, Kafka Streams needs to know the actual time an event occurred, not just when it arrived at Kafka. The <code>BidTimeTimestampExtractor</code> customizes this logic.</p>
<ul>
<li><strong>Implementation:</strong> Implements the <code>TimestampExtractor</code> interface.</li>
<li><strong>Logic:</strong>
<ul>
<li>It attempts to parse a <code>bid_time</code> field (expected format: &ldquo;yyyy-MM-dd HH:mm:ss&rdquo;) from the incoming Avro <code>GenericRecord</code>.</li>
<li>The parsed string is converted to epoch milliseconds.</li>
</ul>
</li>
<li><strong>Error Handling:</strong>
<ul>
<li>If the <code>bid_time</code> field is missing, blank, or cannot be parsed (e.g., due to <code>DateTimeParseException</code>), the extractor logs the issue and gracefully falls back to using the <code>partitionTime</code> (the timestamp assigned by Kafka, typically close to ingestion time). This ensures the stream doesn&rsquo;t halt due to malformed data.</li>
</ul>
</li>
<li><strong>Significance:</strong> Using event time extracted from the data payload allows windowed operations to be based on when events truly happened, leading to more meaningful aggregations, especially in systems where data might arrive out of order or with delays.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.streams.extractor</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.processor.TimestampExtractor</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.LocalDateTime</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeParseException</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">class</span> <span class="nc">BidTimeTimestampExtractor</span> <span class="p">:</span> <span class="n">TimestampExtractor</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">formatter</span> <span class="p">=</span> <span class="nc">DateTimeFormatter</span><span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">extract</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">record</span><span class="p">:</span> <span class="n">ConsumerRecord</span><span class="p">&lt;</span><span class="n">Any</span><span class="p">,</span> <span class="n">Any</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="n">partitionTime</span><span class="p">:</span> <span class="n">Long</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="p">):</span> <span class="n">Long</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">record</span><span class="p">.</span><span class="k">value</span><span class="p">()</span> <span class="k">as</span><span class="p">?</span> <span class="n">GenericRecord</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">            <span class="k">val</span> <span class="py">bidTime</span> <span class="p">=</span> <span class="k">value</span><span class="o">?.</span><span class="k">get</span><span class="p">(</span><span class="s2">&#34;bid_time&#34;</span><span class="p">)</span><span class="o">?.</span><span class="n">toString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">            <span class="k">when</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">                <span class="n">bidTime</span><span class="p">.</span><span class="n">isNullOrBlank</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Missing or blank &#39;bid_time&#39;. Falling back to partitionTime: </span><span class="si">$partitionTime</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">                    <span class="n">partitionTime</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">                <span class="k">else</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">                    <span class="k">val</span> <span class="py">parsedTimestamp</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">                        <span class="n">LocalDateTime</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">                            <span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">bidTime</span><span class="p">,</span> <span class="n">formatter</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">                            <span class="p">.</span><span class="n">atZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">                            <span class="p">.</span><span class="n">toInstant</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">                            <span class="p">.</span><span class="n">toEpochMilli</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">debug</span> <span class="p">{</span> <span class="s2">&#34;Extracted timestamp </span><span class="si">$parsedTimestamp</span><span class="s2"> from bid_time &#39;</span><span class="si">$bidTime</span><span class="s2">&#39;&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">                    <span class="n">parsedTimestamp</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">            <span class="k">when</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">                <span class="k">is</span> <span class="n">DateTimeParseException</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to parse &#39;bid_time&#39;. Falling back to partitionTime: </span><span class="si">$partitionTime</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">                    <span class="n">partitionTime</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">                <span class="k">else</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Unexpected error extracting timestamp. Falling back to partitionTime: </span><span class="si">$partitionTime</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">                    <span class="n">partitionTime</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="proactive-late-record-handling" data-numberify>Proactive Late Record Handling<a class="anchor ms-1" href="#proactive-late-record-handling"></a></h3>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/late-record-processor.png" loading="lazy" width="612" height="576" />
</picture>

</p>
<p>The <code>LateRecordProcessor</code> is a custom Kafka Streams <code>Processor</code> (using the lower-level Processor API) designed to identify records that would arrive too late to be included in their intended time windows.</p>
<ul>
<li><strong>Parameters:</strong> Initialized with the <code>windowSize</code> and <code>gracePeriod</code> durations used by downstream windowed aggregations.</li>
<li><strong>Logic:</strong> For each incoming record:
<ol>
<li>It retrieves the record&rsquo;s event timestamp (as assigned by <code>BidTimeTimestampExtractor</code>).</li>
<li>It calculates the <code>windowEnd</code> time for the window this record <em>should</em> belong to.</li>
<li>It then determines the <code>windowCloseTime</code> (window end + grace period), which is the deadline for records to be accepted into that window.</li>
<li>It compares the current <code>streamTime</code> (the maximum event time seen so far by this processing task) against the record&rsquo;s <code>windowCloseTime</code>.</li>
<li>If <code>streamTime</code> is already past <code>windowCloseTime</code>, the record is considered &ldquo;late.&rdquo;</li>
</ol>
</li>
<li><strong>Output:</strong> The processor forwards a <code>Pair</code> containing the original <code>GenericRecord</code> and a <code>Boolean</code> flag indicating whether the record is late (<code>true</code>) or not (<code>false</code>).</li>
<li><strong>Purpose:</strong> This allows the application to explicitly route late records to a separate processing path (e.g., a &ldquo;skipped&rdquo; topic) <em>before</em> they are simply dropped by downstream stateful windowed operators. This provides visibility into late data and allows for alternative handling strategies.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.streams.processor</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.processor.api.Processor</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.processor.api.ProcessorContext</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.processor.api.Record</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">class</span> <span class="nc">LateRecordProcessor</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">windowSize</span><span class="p">:</span> <span class="n">Duration</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">gracePeriod</span><span class="p">:</span> <span class="n">Duration</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="p">)</span> <span class="p">:</span> <span class="n">Processor</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">,</span> <span class="n">String</span><span class="p">,</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">,</span> <span class="n">Boolean</span><span class="p">&gt;&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">private</span> <span class="k">lateinit</span> <span class="k">var</span> <span class="py">context</span><span class="p">:</span> <span class="n">ProcessorContext</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">,</span> <span class="n">Boolean</span><span class="p">&gt;&gt;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">windowSizeMs</span> <span class="p">=</span> <span class="n">windowSize</span><span class="p">.</span><span class="n">toMillis</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">gracePeriodMs</span> <span class="p">=</span> <span class="n">gracePeriod</span><span class="p">.</span><span class="n">toMillis</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">init</span><span class="p">(</span><span class="n">context</span><span class="p">:</span> <span class="n">ProcessorContext</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">,</span> <span class="n">Boolean</span><span class="p">&gt;&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="k">this</span><span class="p">.</span><span class="n">context</span> <span class="p">=</span> <span class="n">context</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="c1">// The main processing method for the Processor API
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="c1"></span>    <span class="k">override</span> <span class="k">fun</span> <span class="nf">process</span><span class="p">(</span><span class="n">record</span><span class="p">:</span> <span class="n">Record</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">        <span class="k">val</span> <span class="py">key</span> <span class="p">=</span> <span class="n">record</span><span class="p">.</span><span class="n">key</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">        <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">record</span><span class="p">.</span><span class="k">value</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="c1">// 1. Get the timestamp assigned to this specific record.
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="c1"></span>        <span class="c1">//    This comes from your BidTimeTimestampExtractor.
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">recordTimestamp</span> <span class="p">=</span> <span class="n">record</span><span class="p">.</span><span class="n">timestamp</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="c1">// Handle cases where timestamp extraction might have failed.
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="c1"></span>        <span class="c1">// These records can&#39;t be placed in a window correctly anyway.
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="c1"></span>        <span class="k">if</span> <span class="p">(</span><span class="n">recordTimestamp</span> <span class="p">&lt;</span> <span class="m">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Record has invalid timestamp </span><span class="si">$recordTimestamp</span><span class="s2">. Cannot determine window. Forwarding as NOT LATE. Key=</span><span class="si">$key</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">            <span class="c1">// Explicitly forward the result using the context
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="c1"></span>            <span class="n">context</span><span class="p">.</span><span class="n">forward</span><span class="p">(</span><span class="n">Record</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">Pair</span><span class="p">(</span><span class="k">value</span><span class="p">,</span> <span class="k">false</span><span class="p">),</span> <span class="n">recordTimestamp</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">            <span class="k">return</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">
</span></span><span class="line"><span class="ln">41</span><span class="cl">        <span class="c1">// 2. Determine the time window this record *should* belong to based on its timestamp.
</span></span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="c1"></span>        <span class="c1">//    Calculate the END time of that window.
</span></span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="c1"></span>        <span class="c1">//    Example: If window size is 5s and recordTimestamp is 12s, it belongs to
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="c1"></span>        <span class="c1">//             window [10s, 15s). The windowEnd is 15s (15000ms).
</span></span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="c1"></span>        <span class="c1">//             Calculation: ((12000 / 5000) + 1) * 5000 = (2 + 1) * 5000 = 15000
</span></span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">windowEnd</span> <span class="p">=</span> <span class="p">((</span><span class="n">recordTimestamp</span> <span class="p">/</span> <span class="n">windowSizeMs</span><span class="p">)</span> <span class="p">+</span> <span class="m">1</span><span class="p">)</span> <span class="p">*</span> <span class="n">windowSizeMs</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">
</span></span><span class="line"><span class="ln">48</span><span class="cl">        <span class="c1">// 3. Calculate when this specific window &#34;closes&#34; for accepting late records.
</span></span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="c1"></span>        <span class="c1">//    This is the window&#39;s end time plus the allowed grace period.
</span></span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="c1"></span>        <span class="c1">//    Example: If windowEnd is 15s and gracePeriod is 0s, windowCloseTime is 15s.
</span></span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="c1"></span>        <span class="c1">//             If windowEnd is 15s and gracePeriod is 2s, windowCloseTime is 17s.
</span></span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">windowCloseTime</span> <span class="p">=</span> <span class="n">windowEnd</span> <span class="p">+</span> <span class="n">gracePeriodMs</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">
</span></span><span class="line"><span class="ln">54</span><span class="cl">        <span class="c1">// 4. Get the current &#34;Stream Time&#34;.
</span></span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="c1"></span>        <span class="c1">//    This represents the maximum record timestamp seen *so far* by this stream task.
</span></span></span><span class="line"><span class="ln">56</span><span class="cl"><span class="c1"></span>        <span class="c1">//    It indicates how far along the stream processing has progressed in event time.
</span></span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">streamTime</span> <span class="p">=</span> <span class="n">context</span><span class="p">.</span><span class="n">currentStreamTimeMs</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">
</span></span><span class="line"><span class="ln">59</span><span class="cl">        <span class="c1">// 5. THE CORE CHECK: Is the stream&#39;s progress (streamTime) already past
</span></span></span><span class="line"><span class="ln">60</span><span class="cl"><span class="c1"></span>        <span class="c1">//    the point where this record&#39;s window closed (windowCloseTime)?
</span></span></span><span class="line"><span class="ln">61</span><span class="cl"><span class="c1"></span>        <span class="c1">//    If yes, the record is considered &#34;late&#34; because the stream has moved on
</span></span></span><span class="line"><span class="ln">62</span><span class="cl"><span class="c1"></span>        <span class="c1">//    past the time it could have been included in its window (+ grace period).
</span></span></span><span class="line"><span class="ln">63</span><span class="cl"><span class="c1"></span>        <span class="c1">//    This mimics the logic the downstream aggregate operator uses to drop late records.
</span></span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">isLate</span> <span class="p">=</span> <span class="n">streamTime</span> <span class="p">&gt;</span> <span class="n">windowCloseTime</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">
</span></span><span class="line"><span class="ln">66</span><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">isLate</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">debug</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">                <span class="s2">&#34;Tagging record as LATE: RecordTime=</span><span class="si">$recordTimestamp</span><span class="s2"> belongs to window ending at </span><span class="si">$windowEnd</span><span class="s2"> (closes at </span><span class="si">$windowCloseTime</span><span class="s2">), but StreamTime is already </span><span class="si">$streamTime</span><span class="s2">. Key=</span><span class="si">$key</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">trace</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">                <span class="s2">&#34;Tagging record as NOT LATE: RecordTime=</span><span class="si">$recordTimestamp</span><span class="s2">, WindowCloseTime=</span><span class="si">$windowCloseTime</span><span class="s2">, StreamTime=</span><span class="si">$streamTime</span><span class="s2">. Key=</span><span class="si">$key</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">73</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl">
</span></span><span class="line"><span class="ln">76</span><span class="cl">        <span class="c1">// 6. Explicitly forward the result (key, tagged value, timestamp) using the context
</span></span></span><span class="line"><span class="ln">77</span><span class="cl"><span class="c1"></span>        <span class="c1">// Ensure you preserve the original timestamp if needed downstream
</span></span></span><span class="line"><span class="ln">78</span><span class="cl"><span class="c1"></span>        <span class="n">context</span><span class="p">.</span><span class="n">forward</span><span class="p">(</span><span class="n">Record</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">Pair</span><span class="p">(</span><span class="k">value</span><span class="p">,</span> <span class="n">isLate</span><span class="p">),</span> <span class="n">recordTimestamp</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">79</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">
</span></span><span class="line"><span class="ln">81</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">close</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">        <span class="c1">// No resources to close
</span></span></span><span class="line"><span class="ln">83</span><span class="cl"><span class="c1"></span>    <span class="p">}</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="core-stream-processing-logic" data-numberify>Core Stream Processing Logic<a class="anchor ms-1" href="#core-stream-processing-logic"></a></h3>
<p>The <code>StreamsApp.kt</code> object defines the Kafka Streams topology, orchestrating the flow of data from input to output.</p>
<ul>
<li><strong>Configuration:</strong>
<ul>
<li>Environment variables (<code>BOOTSTRAP</code>, <code>TOPIC</code>, <code>REGISTRY_URL</code>) configure Kafka and Schema Registry connections.</li>
<li><code>windowSize</code> (5 seconds) and <code>gracePeriod</code> (5 seconds) are defined for windowed aggregations.</li>
<li>Output topic names are derived from the input topic name.</li>
</ul>
</li>
<li><strong>Setup:</strong>
<ul>
<li>Calls <code>createTopicIfNotExists</code> to ensure the statistics output topic and the late/skipped records topic are present.</li>
<li>Configures <code>StreamsConfig</code> properties, including application ID, bootstrap servers, default SerDes, and importantly, sets <code>BidTimeTimestampExtractor</code> as the default timestamp extractor.</li>
<li>Sets up Avro SerDes (<code>GenericAvroSerde</code> for input, <code>SpecificAvroSerde&lt;SupplierStats&gt;</code> for output) with Schema Registry configuration.</li>
</ul>
</li>
<li><strong>Topology Definition (<code>StreamsBuilder</code>):</strong>
<ol>
<li><strong>Source:</strong> Consumes <code>GenericRecord</code> Avro messages from the <code>inputTopicName</code> (<em>orders-avro-stats</em>).</li>
<li><strong>Late Record Tagging:</strong> The stream is processed by <code>LateRecordProcessor</code> to tag each record with a boolean indicating if it&rsquo;s late.</li>
<li><strong>Branching:</strong> The stream is split based on the &ldquo;late&rdquo; flag:
<ul>
<li><code>validSource</code>: Records not marked as late.</li>
<li><code>lateSource</code>: Records marked as late.</li>
</ul>
</li>
<li><strong>Handling Late Records:</strong>
<ul>
<li>Records in <code>lateSource</code> are transformed: an extra <code>&quot;late&quot;: true</code> field is added to their content, and they are serialized to JSON.</li>
<li>These JSON strings are then sent to the <code>skippedTopicName</code> (<em>orders-avro-skipped</em>).</li>
</ul>
</li>
<li><strong>Aggregating Valid Records:</strong>
<ul>
<li>Records in <code>validSource</code> are re-keyed by <code>supplier</code> (extracted from the record) and their <code>price</code> becomes the value.</li>
<li>A <code>groupByKey</code> operation is performed.</li>
<li><code>windowedBy(TimeWindows.ofSizeAndGrace(windowSize, gracePeriod))</code> defines 5-second tumbling windows with a 5-second grace period.</li>
<li>An <code>aggregate</code> operation computes <code>SupplierStats</code> (total price and count) for each supplier within each window.</li>
</ul>
</li>
<li><strong>Outputting Statistics:</strong>
<ul>
<li>The aggregated <code>SupplierStats</code> stream is further processed to populate <code>window_start</code> and <code>window_end</code> fields from the window metadata.</li>
<li>These final <code>SupplierStats</code> objects are sent to the <code>outputTopicName</code>.</li>
</ul>
</li>
</ol>
</li>
<li><strong>Execution:</strong> A <code>KafkaStreams</code> instance is created with the topology and properties, then started. A shutdown hook ensures graceful closing.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.ObjectMapper</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.module.kotlin.registerKotlinModule</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">io.confluent.kafka.streams.serdes.avro.GenericAvroSerde</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.SupplierStats</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createTopicIfNotExists</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.streams.extractor.BidTimeTimestampExtractor</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.streams.processor.LateRecordProcessor</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerConfig</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.KeyValue</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.StreamsBuilder</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Branched</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Consumed</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Grouped</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.KStream</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.KTable</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Materialized</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Named</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Produced</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.TimeWindows</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.kstream.Windowed</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.streams.processor.api.ProcessorSupplier</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">
</span></span><span class="line"><span class="ln"> 33</span><span class="cl"><span class="k">object</span> <span class="nc">StreamsApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;kafka-1:19092&#34;</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">inputTopicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-avro&#34;</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryUrl</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;http://schema:8081&#34;</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryConfig</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">        <span class="n">mapOf</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">            <span class="s2">&#34;schema.registry.url&#34;</span> <span class="n">to</span> <span class="n">registryUrl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">            <span class="s2">&#34;basic.auth.credentials.source&#34;</span> <span class="n">to</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">            <span class="s2">&#34;basic.auth.user.info&#34;</span> <span class="n">to</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">windowSize</span> <span class="p">=</span> <span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">gracePeriod</span> <span class="p">=</span> <span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">NUM</span><span class="n">_PARTITIONS</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">REPLICATION</span><span class="n">_FACTOR</span><span class="p">:</span> <span class="n">Short</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">    <span class="c1">// ObjectMapper for converting late source to JSON
</span></span></span><span class="line"><span class="ln"> 50</span><span class="cl"><span class="c1"></span>    <span class="k">private</span> <span class="k">val</span> <span class="py">objectMapper</span><span class="p">:</span> <span class="n">ObjectMapper</span> <span class="k">by</span> <span class="n">lazy</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">        <span class="n">ObjectMapper</span><span class="p">().</span><span class="n">registerKotlinModule</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">        <span class="c1">// Create output topics if not existing
</span></span></span><span class="line"><span class="ln"> 56</span><span class="cl"><span class="c1"></span>        <span class="k">val</span> <span class="py">outputTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-stats&#34;</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">        <span class="k">val</span> <span class="py">skippedTopicName</span> <span class="p">=</span> <span class="s2">&#34;</span><span class="si">$inputTopicName</span><span class="s2">-skipped&#34;</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">        <span class="n">listOf</span><span class="p">(</span><span class="n">outputTopicName</span><span class="p">,</span> <span class="n">skippedTopicName</span><span class="p">).</span><span class="n">forEach</span> <span class="p">{</span> <span class="n">name</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">            <span class="n">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">                <span class="n">name</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">                <span class="n">bootstrapAddress</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">                <span class="n">NUM_PARTITIONS</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">                <span class="n">REPLICATION_FACTOR</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">        <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">StreamsConfig</span><span class="p">.</span><span class="n">APPLICATION_ID_CONFIG</span><span class="p">,</span> <span class="s2">&#34;</span><span class="si">$outputTopicName</span><span class="s2">-kafka-streams&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">StreamsConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">StreamsConfig</span><span class="p">.</span><span class="n">DEFAULT_KEY_SERDE_CLASS_CONFIG</span><span class="p">,</span> <span class="nc">Serdes</span><span class="p">.</span><span class="n">String</span><span class="p">()</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">StreamsConfig</span><span class="p">.</span><span class="n">consumerPrefix</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">AUTO_OFFSET_RESET_CONFIG</span><span class="p">),</span> <span class="s2">&#34;earliest&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">StreamsConfig</span><span class="p">.</span><span class="n">DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG</span><span class="p">,</span> <span class="n">BidTimeTimestampExtractor</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;schema.registry.url&#34;</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.credentials.source&#34;</span><span class="p">,</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.user.info&#34;</span><span class="p">,</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">        <span class="k">val</span> <span class="py">keySerde</span> <span class="p">=</span> <span class="nc">Serdes</span><span class="p">.</span><span class="n">String</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">        <span class="k">val</span> <span class="py">valueSerde</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">            <span class="n">GenericAvroSerde</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                <span class="n">configure</span><span class="p">(</span><span class="n">registryConfig</span><span class="p">,</span> <span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">        <span class="k">val</span> <span class="py">supplierStatsSerde</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">            <span class="n">SpecificAvroSerde</span><span class="p">&lt;</span><span class="n">SupplierStats</span><span class="p">&gt;().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">                <span class="n">configure</span><span class="p">(</span><span class="n">registryConfig</span><span class="p">,</span> <span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">        <span class="k">val</span> <span class="py">builder</span> <span class="p">=</span> <span class="n">StreamsBuilder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">        <span class="k">val</span> <span class="py">source</span><span class="p">:</span> <span class="n">KStream</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span> <span class="n">builder</span><span class="p">.</span><span class="n">stream</span><span class="p">(</span><span class="n">inputTopicName</span><span class="p">,</span> <span class="nc">Consumed</span><span class="p">.</span><span class="n">with</span><span class="p">(</span><span class="n">keySerde</span><span class="p">,</span> <span class="n">valueSerde</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">        <span class="k">val</span> <span class="py">taggedStream</span><span class="p">:</span> <span class="n">KStream</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">,</span> <span class="n">Boolean</span><span class="p">&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">            <span class="n">source</span><span class="p">.</span><span class="n">process</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">                <span class="n">ProcessorSupplier</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                    <span class="n">LateRecordProcessor</span><span class="p">(</span><span class="n">windowSize</span><span class="p">,</span> <span class="n">gracePeriod</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">                <span class="nc">Named</span><span class="p">.</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;process-late-records&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">
</span></span><span class="line"><span class="ln">100</span><span class="cl">        <span class="k">val</span> <span class="py">branches</span><span class="p">:</span> <span class="n">Map</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">KStream</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Pair</span><span class="p">&lt;</span><span class="n">GenericRecord</span><span class="p">,</span> <span class="n">Boolean</span><span class="p">&gt;&gt;&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">            <span class="n">taggedStream</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">                <span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="nc">Named</span><span class="p">.</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;branch-&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">                <span class="p">.</span><span class="n">branch</span><span class="p">({</span> <span class="n">_</span><span class="p">,</span> <span class="k">value</span> <span class="o">-&gt;</span> <span class="p">!</span><span class="k">value</span><span class="p">.</span><span class="n">second</span> <span class="p">},</span> <span class="nc">Branched</span><span class="p">.</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;valid&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">                <span class="p">.</span><span class="n">branch</span><span class="p">({</span> <span class="n">_</span><span class="p">,</span> <span class="k">value</span> <span class="o">-&gt;</span> <span class="k">value</span><span class="p">.</span><span class="n">second</span> <span class="p">},</span> <span class="nc">Branched</span><span class="p">.</span><span class="n">`as`</span><span class="p">(</span><span class="s2">&#34;late&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">                <span class="p">.</span><span class="n">noDefaultBranch</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">
</span></span><span class="line"><span class="ln">107</span><span class="cl">        <span class="k">val</span> <span class="py">validSource</span><span class="p">:</span> <span class="n">KStream</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">            <span class="n">branches</span><span class="p">[</span><span class="s2">&#34;branch-valid&#34;</span><span class="p">]</span><span class="o">!!</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">                <span class="p">.</span><span class="n">mapValues</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">pair</span> <span class="o">-&gt;</span> <span class="n">pair</span><span class="p">.</span><span class="n">first</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">
</span></span><span class="line"><span class="ln">111</span><span class="cl">        <span class="k">val</span> <span class="py">lateSource</span><span class="p">:</span> <span class="n">KStream</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">            <span class="n">branches</span><span class="p">[</span><span class="s2">&#34;branch-late&#34;</span><span class="p">]</span><span class="o">!!</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">                <span class="p">.</span><span class="n">mapValues</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">pair</span> <span class="o">-&gt;</span> <span class="n">pair</span><span class="p">.</span><span class="n">first</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">
</span></span><span class="line"><span class="ln">115</span><span class="cl">        <span class="n">lateSource</span>
</span></span><span class="line"><span class="ln">116</span><span class="cl">            <span class="p">.</span><span class="n">mapValues</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">genericRecord</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl">                <span class="k">val</span> <span class="py">map</span> <span class="p">=</span> <span class="n">mutableMapOf</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Any</span><span class="p">?&gt;()</span>
</span></span><span class="line"><span class="ln">118</span><span class="cl">                <span class="n">genericRecord</span><span class="p">.</span><span class="n">schema</span><span class="p">.</span><span class="n">fields</span><span class="p">.</span><span class="n">forEach</span> <span class="p">{</span> <span class="k">field</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">119</span><span class="cl">                    <span class="k">val</span> <span class="py">value</span> <span class="p">=</span> <span class="n">genericRecord</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">120</span><span class="cl">                    <span class="n">map</span><span class="p">[</span><span class="k">field</span><span class="p">.</span><span class="n">name</span><span class="p">()]</span> <span class="p">=</span> <span class="k">if</span> <span class="p">(</span><span class="k">value</span> <span class="k">is</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">avro</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">Utf8</span><span class="p">)</span> <span class="k">value</span><span class="p">.</span><span class="n">toString</span><span class="p">()</span> <span class="k">else</span> <span class="k">value</span>
</span></span><span class="line"><span class="ln">121</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">122</span><span class="cl">                <span class="n">map</span><span class="p">[</span><span class="s2">&#34;late&#34;</span><span class="p">]</span> <span class="p">=</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln">123</span><span class="cl">                <span class="n">map</span>
</span></span><span class="line"><span class="ln">124</span><span class="cl">            <span class="p">}.</span><span class="n">peek</span> <span class="p">{</span> <span class="n">key</span><span class="p">,</span> <span class="n">mapValue</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">125</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Potentially late record - key=</span><span class="si">$key</span><span class="s2">, value=</span><span class="si">$mapValue</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">126</span><span class="cl">            <span class="p">}.</span><span class="n">mapValues</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">mapValue</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">127</span><span class="cl">                <span class="n">objectMapper</span><span class="p">.</span><span class="n">writeValueAsString</span><span class="p">(</span><span class="n">mapValue</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">128</span><span class="cl">            <span class="p">}.</span><span class="n">to</span><span class="p">(</span><span class="n">skippedTopicName</span><span class="p">,</span> <span class="nc">Produced</span><span class="p">.</span><span class="n">with</span><span class="p">(</span><span class="n">keySerde</span><span class="p">,</span> <span class="nc">Serdes</span><span class="p">.</span><span class="n">String</span><span class="p">()))</span>
</span></span><span class="line"><span class="ln">129</span><span class="cl">
</span></span><span class="line"><span class="ln">130</span><span class="cl">        <span class="k">val</span> <span class="py">aggregated</span><span class="p">:</span> <span class="n">KTable</span><span class="p">&lt;</span><span class="n">Windowed</span><span class="p">&lt;</span><span class="n">String</span><span class="p">&gt;,</span> <span class="n">SupplierStats</span><span class="p">&gt;</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">131</span><span class="cl">            <span class="n">validSource</span>
</span></span><span class="line"><span class="ln">132</span><span class="cl">                <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="k">value</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">133</span><span class="cl">                    <span class="k">val</span> <span class="py">supplier</span> <span class="p">=</span> <span class="k">value</span><span class="p">[</span><span class="s2">&#34;supplier&#34;</span><span class="p">]</span><span class="o">?.</span><span class="n">toString</span><span class="p">()</span> <span class="o">?:</span> <span class="s2">&#34;UNKNOWN&#34;</span>
</span></span><span class="line"><span class="ln">134</span><span class="cl">                    <span class="k">val</span> <span class="py">price</span> <span class="p">=</span> <span class="k">value</span><span class="p">[</span><span class="s2">&#34;price&#34;</span><span class="p">]</span> <span class="k">as</span><span class="p">?</span> <span class="n">Double</span> <span class="o">?:</span> <span class="m">0.0</span>
</span></span><span class="line"><span class="ln">135</span><span class="cl">                    <span class="n">KeyValue</span><span class="p">(</span><span class="n">supplier</span><span class="p">,</span> <span class="n">price</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">136</span><span class="cl">                <span class="p">}.</span><span class="n">groupByKey</span><span class="p">(</span><span class="nc">Grouped</span><span class="p">.</span><span class="n">with</span><span class="p">(</span><span class="n">keySerde</span><span class="p">,</span> <span class="nc">Serdes</span><span class="p">.</span><span class="n">Double</span><span class="p">()))</span>
</span></span><span class="line"><span class="ln">137</span><span class="cl">                <span class="p">.</span><span class="n">windowedBy</span><span class="p">(</span><span class="nc">TimeWindows</span><span class="p">.</span><span class="n">ofSizeAndGrace</span><span class="p">(</span><span class="n">windowSize</span><span class="p">,</span> <span class="n">gracePeriod</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">138</span><span class="cl">                <span class="p">.</span><span class="n">aggregate</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">139</span><span class="cl">                    <span class="p">{</span>
</span></span><span class="line"><span class="ln">140</span><span class="cl">                        <span class="n">SupplierStats</span>
</span></span><span class="line"><span class="ln">141</span><span class="cl">                            <span class="p">.</span><span class="n">newBuilder</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">142</span><span class="cl">                            <span class="p">.</span><span class="n">setWindowStart</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">143</span><span class="cl">                            <span class="p">.</span><span class="n">setWindowEnd</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">144</span><span class="cl">                            <span class="p">.</span><span class="n">setSupplier</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">145</span><span class="cl">                            <span class="p">.</span><span class="n">setTotalPrice</span><span class="p">(</span><span class="m">0.0</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">146</span><span class="cl">                            <span class="p">.</span><span class="n">setCount</span><span class="p">(</span><span class="m">0L</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">147</span><span class="cl">                            <span class="p">.</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">148</span><span class="cl">                    <span class="p">},</span>
</span></span><span class="line"><span class="ln">149</span><span class="cl">                    <span class="p">{</span> <span class="n">key</span><span class="p">,</span> <span class="k">value</span><span class="p">,</span> <span class="n">aggregate</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">150</span><span class="cl">                        <span class="k">val</span> <span class="py">updated</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">151</span><span class="cl">                            <span class="n">SupplierStats</span>
</span></span><span class="line"><span class="ln">152</span><span class="cl">                                <span class="p">.</span><span class="n">newBuilder</span><span class="p">(</span><span class="n">aggregate</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">153</span><span class="cl">                                <span class="p">.</span><span class="n">setSupplier</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">154</span><span class="cl">                                <span class="p">.</span><span class="n">setTotalPrice</span><span class="p">(</span><span class="n">aggregate</span><span class="p">.</span><span class="n">totalPrice</span> <span class="p">+</span> <span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">155</span><span class="cl">                                <span class="p">.</span><span class="n">setCount</span><span class="p">(</span><span class="n">aggregate</span><span class="p">.</span><span class="n">count</span> <span class="p">+</span> <span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">156</span><span class="cl">                        <span class="n">updated</span><span class="p">.</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">157</span><span class="cl">                    <span class="p">},</span>
</span></span><span class="line"><span class="ln">158</span><span class="cl">                    <span class="nc">Materialized</span><span class="p">.</span><span class="n">with</span><span class="p">(</span><span class="n">keySerde</span><span class="p">,</span> <span class="n">supplierStatsSerde</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">159</span><span class="cl">                <span class="p">)</span>
</span></span><span class="line"><span class="ln">160</span><span class="cl">        <span class="n">aggregated</span>
</span></span><span class="line"><span class="ln">161</span><span class="cl">            <span class="p">.</span><span class="n">toStream</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">162</span><span class="cl">            <span class="p">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">key</span><span class="p">,</span> <span class="k">value</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">163</span><span class="cl">                <span class="k">val</span> <span class="py">windowStart</span> <span class="p">=</span> <span class="n">key</span><span class="p">.</span><span class="n">window</span><span class="p">().</span><span class="n">startTime</span><span class="p">().</span><span class="n">toString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">164</span><span class="cl">                <span class="k">val</span> <span class="py">windowEnd</span> <span class="p">=</span> <span class="n">key</span><span class="p">.</span><span class="n">window</span><span class="p">().</span><span class="n">endTime</span><span class="p">().</span><span class="n">toString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">165</span><span class="cl">                <span class="k">val</span> <span class="py">updatedValue</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">166</span><span class="cl">                    <span class="n">SupplierStats</span>
</span></span><span class="line"><span class="ln">167</span><span class="cl">                        <span class="p">.</span><span class="n">newBuilder</span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">168</span><span class="cl">                        <span class="p">.</span><span class="n">setWindowStart</span><span class="p">(</span><span class="n">windowStart</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">169</span><span class="cl">                        <span class="p">.</span><span class="n">setWindowEnd</span><span class="p">(</span><span class="n">windowEnd</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">170</span><span class="cl">                        <span class="p">.</span><span class="n">build</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">171</span><span class="cl">                <span class="n">KeyValue</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">key</span><span class="p">(),</span> <span class="n">updatedValue</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">172</span><span class="cl">            <span class="p">}.</span><span class="n">peek</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="k">value</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">173</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Supplier Stats: </span><span class="si">$value</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">174</span><span class="cl">            <span class="p">}.</span><span class="n">to</span><span class="p">(</span><span class="n">outputTopicName</span><span class="p">,</span> <span class="nc">Produced</span><span class="p">.</span><span class="n">with</span><span class="p">(</span><span class="n">keySerde</span><span class="p">,</span> <span class="n">supplierStatsSerde</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">175</span><span class="cl">
</span></span><span class="line"><span class="ln">176</span><span class="cl">        <span class="k">val</span> <span class="py">streams</span> <span class="p">=</span> <span class="n">KafkaStreams</span><span class="p">(</span><span class="n">builder</span><span class="p">.</span><span class="n">build</span><span class="p">(),</span> <span class="n">props</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">177</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">178</span><span class="cl">            <span class="n">streams</span><span class="p">.</span><span class="n">start</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">179</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Kafka Streams started successfully.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">180</span><span class="cl">
</span></span><span class="line"><span class="ln">181</span><span class="cl">            <span class="nc">Runtime</span><span class="p">.</span><span class="n">getRuntime</span><span class="p">().</span><span class="n">addShutdownHook</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">182</span><span class="cl">                <span class="n">Thread</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">183</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Shutting down Kafka Streams...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">184</span><span class="cl">                    <span class="n">streams</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">185</span><span class="cl">                <span class="p">},</span>
</span></span><span class="line"><span class="ln">186</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln">187</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">188</span><span class="cl">            <span class="n">streams</span><span class="p">.</span><span class="n">close</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofSeconds</span><span class="p">(</span><span class="m">5</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">189</span><span class="cl">            <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Error while running Kafka Streams&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">190</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">191</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">192</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="application-entry-point" data-numberify>Application Entry Point<a class="anchor ms-1" href="#application-entry-point"></a></h3>
<p>The <code>Main.kt</code> file provides the <code>main</code> function, which is the starting point for the Kafka Streams application.</p>
<ul>
<li><strong>Execution:</strong> It simply calls <code>StreamsApp.run()</code> to initialize and start the stream processing topology.</li>
<li><strong>Error Handling:</strong> A global <code>try-catch</code> block wraps the execution. If any unhandled exception propagates up from the <code>StreamsApp</code>, it&rsquo;s caught here, logged as a fatal error, and the application exits with a non-zero status code (<code>exitProcess(1)</code>).</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.system.exitProcess</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">fun</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="nc">StreamsApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Fatal error in the streams app. Shutting down.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">exitProcess</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="run-kafka-streams-application" data-numberify>Run Kafka Streams Application<a class="anchor ms-1" href="#run-kafka-streams-application"></a></h2>
<p>To see our Kafka Streams application in action, we first need a running Kafka environment. We&rsquo;ll use the <a href="https://github.com/factorhouse/factorhouse-local" target="_blank" rel="noopener noreferrer">Factor House Local<i class="fas fa-external-link-square-alt ms-1"></i></a> project, which provides a Docker Compose setup for a Kafka cluster and Kpow for monitoring. Then, we&rsquo;ll start a data producer (from our previous blog post example) to generate input order events, and finally, launch our Kafka Streams application.</p>

<h3 id="factor-house-local-setup" data-numberify>Factor House Local Setup<a class="anchor ms-1" href="#factor-house-local-setup"></a></h3>
<p>If you haven&rsquo;t already, set up your local Kafka environment:</p>
<ol>
<li>Clone the Factor House Local repository:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> factorhouse-local
</span></span></code></pre></div></li>
<li>Ensure your Kpow community license is configured (see the <a href="https://github.com/factorhouse/factorhouse-local?tab=readme-ov-file#update-kpow-and-flex-licenses" target="_blank" rel="noopener noreferrer">README<i class="fas fa-external-link-square-alt ms-1"></i></a> for details).</li>
<li>Start the services:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">docker compose -f compose-kpow-community.yml up -d
</span></span></code></pre></div></li>
</ol>
<p>Once initialized, Kpow will be accessible at <code>http://localhost:3000</code>, showing Kafka brokers, schema registry, and other components.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/kpow-overview.png" loading="lazy" width="1414" height="818" />
</picture>

</p>

<h3 id="start-the-kafka-order-producer" data-numberify>Start the Kafka Order Producer<a class="anchor ms-1" href="#start-the-kafka-order-producer"></a></h3>
<p>Our Kafka Streams application consumes order data from the <code>orders-avro</code> topic. We&rsquo;ll use the Kafka producer developed in <a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/">Part 2 of this series</a> to generate this data. To effectively test our stream application&rsquo;s handling of event time and late records, we&rsquo;ll configure the producer to introduce a variable delay (up to 15 seconds) in the <code>bid_time</code> of the generated orders.</p>
<p>Navigate to the directory of the producer application (<em>orders-avro-clients</em> from the <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>) and run:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Assuming you are in the root of the &#39;orders-avro-clients&#39; project</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nv">DELAY_SECONDS</span><span class="o">=</span><span class="m">15</span> ./gradlew run --args<span class="o">=</span><span class="s2">&#34;producer&#34;</span>
</span></span></code></pre></div><p>This will start populating the <code>orders-avro</code> topic with Avro-encoded order messages. You can inspect these messages in Kpow. For the <code>orders-avro</code> topic, ensure Kpow is configured with Key Deserializer: <em>String</em>, Value Deserializer: <em>AVRO</em>, and Schema Registry: <em>Local Schema Registry</em>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/orders-01.png" loading="lazy" width="1197" height="673" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/orders-02.png" loading="lazy" width="1197" height="671" />
</picture>

</p>

<h3 id="launch-the-kafka-streams-application" data-numberify>Launch the Kafka Streams Application<a class="anchor ms-1" href="#launch-the-kafka-streams-application"></a></h3>
<p>With input data flowing, we can now launch our <code>orders-stats-streams</code> Kafka Streams application. Navigate to its project directory (<em>orders-stats-streams</em> from the <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>).</p>
<p>The application can be run in two main ways:</p>
<ol>
<li><strong>With Gradle (Development Mode)</strong>: Ideal for development and quick testing.
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew run
</span></span></code></pre></div></li>
<li><strong>Running the Shadow JAR (Deployment Mode)</strong>: For deploying the application as a standalone unit. First, build the fat JAR:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./gradlew shadowJar
</span></span></code></pre></div>This creates <code>build/libs/orders-stats-streams-1.0.jar</code>. Then run it:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">java -jar build/libs/orders-stats-streams-1.0.jar
</span></span></code></pre></div></li>
</ol>
<blockquote>
<p>💡 To build and run the application locally, ensure that <strong>JDK 17</strong> is installed.</p>
</blockquote>
<p>For this demonstration, we&rsquo;ll use Gradle to run the application in development mode. Upon starting, you&rsquo;ll see logs indicating the Kafka Streams application has initialized and is processing records from the <code>orders-avro</code> topic.</p>

<h3 id="observing-the-output" data-numberify>Observing the Output<a class="anchor ms-1" href="#observing-the-output"></a></h3>
<p>Our Kafka Streams application produces results to two topics:</p>
<ul>
<li><code>orders-avro-stats</code>: Contains the aggregated supplier statistics as Avro records.</li>
<li><code>orders-avro-skipped</code>: Contains records identified as &ldquo;late,&rdquo; serialized as JSON.</li>
</ul>
<p><strong>1. Supplier Statistics (<code>orders-avro-stats</code>):</strong></p>
<p>In Kpow, navigate to the <code>orders-avro-stats</code> topic. Configure Kpow to view these messages:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>AVRO</em></li>
<li><strong>Schema Registry:</strong> <em>Local Schema Registry</em></li>
</ul>
<p>You should see <code>SupplierStats</code> messages, each representing the total price and count of orders for a supplier within a 5-second window. Notice the <code>window_start</code> and <code>window_end</code> fields.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/stats-01.png" loading="lazy" width="1136" height="674" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/stats-02.png" loading="lazy" width="1137" height="671" />
</picture>

</p>
<p><strong>2. Skipped (Late) Records (<code>orders-avro-skipped</code>):</strong></p>
<p>Next, inspect the <code>orders-avro-skipped</code> topic in Kpow. Configure Kpow as follows:</p>
<ul>
<li><strong>Key Deserializer:</strong> <em>String</em></li>
<li><strong>Value Deserializer:</strong> <em>JSON</em></li>
</ul>
<p>Here, you&rsquo;ll find the original order records that were deemed &ldquo;late&rdquo; by our <code>LateRecordProcessor</code>. These messages have an additional <code>late: true</code> field, confirming they were routed by our custom logic.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/skipped-01.png" loading="lazy" width="1134" height="612" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/skipped-02.png" loading="lazy" width="1137" height="704" />
</picture>

</p>
<p>We can also track the performance of the application by filtering its consumer group (<code>orders-avro-stats-kafka-streams</code>) in the <strong>Consumers</strong> section. This displays key metrics like group state, assigned members, read throughput, and lag:</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-06-03-kotlin-getting-started-kafka-streams/consumer-group-01.png" loading="lazy" width="1186" height="853" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>In this post, we&rsquo;ve dived into Kafka Streams, building a Kotlin application that performs real-time aggregation of supplier order data. We&rsquo;ve seen how to leverage event-time processing with a custom <code>TimestampExtractor</code> and how to proactively manage late-arriving data using the Processor API with a custom <code>LateRecordProcessor</code>. By routing late data to a separate topic and outputting clean, windowed statistics, this application demonstrates a practical approach to building resilient and insightful stream processing pipelines directly with Kafka. The use of Avro ensures data integrity, while Kpow provides excellent visibility into the streams and topics.</p>
      ]]></content:encoded></item><item><title>Kafka Clients with Avro - Schema Registry and Order Events</title><link>https://jaehyeon.me/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/</link><guid>https://jaehyeon.me/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/</guid><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In this post, we&rsquo;ll explore a practical example of building Kafka client applications using Kotlin, Apache Avro for data serialization, and Gradle for build management. We&rsquo;ll walk through the setup of a Kafka producer that generates mock order data and a consumer that processes these orders. This example highlights best practices such as schema management with Avro, robust error handling, and graceful shutdown, providing a solid foundation for your own Kafka-based projects. We&rsquo;ll dive into the build configuration, the Avro schema definition, utility functions for Kafka administration, and the core logic of both the producer and consumer applications.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In this post, we&rsquo;ll explore a practical example of building Kafka client applications using Kotlin, Apache Avro for data serialization, and Gradle for build management. We&rsquo;ll walk through the setup of a Kafka producer that generates mock order data and a consumer that processes these orders. This example highlights best practices such as schema management with Avro, robust error handling, and graceful shutdown, providing a solid foundation for your own Kafka-based projects. We&rsquo;ll dive into the build configuration, the Avro schema definition, utility functions for Kafka administration, and the core logic of both the producer and consumer applications.</p>
<ul>
<li><a href="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients">Kafka Clients with JSON - Producing and Consuming Order Events</a></li>
<li><a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/#">Kafka Clients with Avro - Schema Registry and Order Events</a> (this post)</li>
<li><a href="/blog/2025-06-03-kotlin-getting-started-kafka-streams">Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">Flink DataStream API - Scalable Event Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-17-kotlin-getting-started-flink-table">Flink Table API - Declarative Analytics for Supplier Stats in Real Time</a></li>
</ul>

<h2 id="kafka-client-applications" data-numberify>Kafka Client Applications<a class="anchor ms-1" href="#kafka-client-applications"></a></h2>
<p>This project demonstrates two primary Kafka client applications:</p>
<ul>
<li>A <strong>Producer Application</strong> responsible for generating <code>Order</code> messages and publishing them to a Kafka topic using Avro serialization.</li>
<li>A <strong>Consumer Application</strong> designed to subscribe to the same Kafka topic, deserialize the Avro messages, and process them, including retry logic and graceful handling of shutdowns.</li>
</ul>
<p>Both applications are packaged into a single executable JAR, and their execution mode (producer or consumer) is determined by a command-line argument. The source code for the applications discussed in this post can be found in the <em>orders-avro-clients</em> folder of this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="the-build-configuration" data-numberify>The Build Configuration<a class="anchor ms-1" href="#the-build-configuration"></a></h3>
<p>The <code>build.gradle.kts</code> file is the heart of our project&rsquo;s build process, defining plugins, dependencies, and custom tasks.</p>
<ul>
<li><strong>Plugins:</strong>
<ul>
<li><code>kotlin(&quot;jvm&quot;)</code>: Enables Kotlin language support for the JVM.</li>
<li><code>com.github.davidmc24.gradle.plugin.avro</code>: Manages Avro schema compilation into Java classes.</li>
<li><code>com.github.johnrengelman.shadow</code>: Creates a &ldquo;fat JAR&rdquo; or &ldquo;uber JAR&rdquo; containing all dependencies, making the application easily deployable.</li>
<li><code>application</code>: Configures the project as a runnable application, specifying the main class.</li>
</ul>
</li>
<li><strong>Repositories:</strong>
<ul>
<li><code>mavenCentral()</code>: The standard Maven repository.</li>
<li><code>maven(&quot;https://packages.confluent.io/maven/&quot;)</code>: The Confluent repository, necessary for Confluent-specific dependencies like the Avro serializer.</li>
</ul>
</li>
<li><strong>Dependencies:</strong>
<ul>
<li><strong>Kafka:</strong> <code>org.apache.kafka:kafka-clients</code> for core Kafka producer/consumer APIs.</li>
<li><strong>Avro:</strong>
<ul>
<li><code>org.apache.avro:avro</code> for the Avro serialization library.</li>
<li><code>io.confluent:kafka-avro-serializer</code> for Confluent&rsquo;s Kafka Avro serializer/deserializer, which integrates with Schema Registry.</li>
</ul>
</li>
<li><strong>Logging:</strong> <code>io.github.microutils:kotlin-logging-jvm</code> (a Kotlin-friendly SLF4J wrapper) and <code>ch.qos.logback:logback-classic</code> (a popular SLF4J implementation).</li>
<li><strong>Faker:</strong> <code>net.datafaker:datafaker</code> for generating realistic mock data for our orders.</li>
<li><strong>Testing:</strong> <code>kotlin(&quot;test&quot;)</code> for unit testing with Kotlin.</li>
</ul>
</li>
<li><strong>Kotlin Configuration:</strong>
<ul>
<li><code>jvmToolchain(17)</code>: Specifies Java 17 as the target JVM.</li>
</ul>
</li>
<li><strong>Application Configuration:</strong>
<ul>
<li><code>mainClass.set(&quot;me.jaehyeon.MainKt&quot;)</code>: Sets the entry point of the application.</li>
</ul>
</li>
<li><strong>Shadow JAR Configuration:</strong>
<ul>
<li>The <code>tasks.withType&lt;ShadowJar&gt;</code> block customizes the fat JAR output, setting its base name, classifier (empty, so no classifier), and version.</li>
<li><code>mergeServiceFiles()</code>: Important for merging service provider configuration files (e.g., for SLF4J) from multiple dependencies.</li>
<li>The <code>build</code> task is configured to depend on <code>shadowJar</code>, ensuring the fat JAR is created during a standard build.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">plugins</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;jvm&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;2.1.20&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.davidmc24.gradle.plugin.avro&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;1.9.1&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.johnrengelman.shadow&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;8.1.1&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">application</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">group</span> <span class="p">=</span> <span class="s2">&#34;me.jaehyeon&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">version</span> <span class="p">=</span> <span class="s2">&#34;1.0-SNAPSHOT&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">repositories</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">mavenCentral</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">maven</span><span class="p">(</span><span class="s2">&#34;https://packages.confluent.io/maven/&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// Kafka
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.kafka:kafka-clients:3.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="c1">// AVRO
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.avro:avro:1.11.4&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.confluent:kafka-avro-serializer:7.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="c1">// Logging
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.github.microutils:kotlin-logging-jvm:3.0.5&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;ch.qos.logback:logback-classic:1.5.13&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="c1">// Faker
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;net.datafaker:datafaker:2.1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="c1">// Test
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;test&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="n">kotlin</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">    <span class="n">jvmToolchain</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">
</span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="n">application</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="n">mainClass</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;me.jaehyeon.MainKt&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">
</span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="n">avro</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="n">setCreateSetters</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="n">setFieldVisibility</span><span class="p">(</span><span class="s2">&#34;PRIVATE&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">
</span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;compileKotlin&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;generateAvroJava&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="n">sourceSets</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">    <span class="n">named</span><span class="p">(</span><span class="s2">&#34;main&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">        <span class="n">java</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;build/generated/avro/main&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">        <span class="n">kotlin</span><span class="p">.</span><span class="n">srcDirs</span><span class="p">(</span><span class="s2">&#34;src/main/kotlin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">
</span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">withType</span><span class="p">&lt;</span><span class="n">com</span><span class="p">.</span><span class="n">github</span><span class="p">.</span><span class="n">jengelman</span><span class="p">.</span><span class="n">gradle</span><span class="p">.</span><span class="n">plugins</span><span class="p">.</span><span class="n">shadow</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">ShadowJar</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">    <span class="n">archiveBaseName</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;orders-avro-clients&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">    <span class="n">archiveClassifier</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">    <span class="n">archiveVersion</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">    <span class="n">mergeServiceFiles</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">
</span></span><span class="line"><span class="ln">62</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;build&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">
</span></span><span class="line"><span class="ln">66</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">test</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">    <span class="n">useJUnitPlatform</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="avro-schema-and-code-generation" data-numberify>Avro Schema and Code Generation<a class="anchor ms-1" href="#avro-schema-and-code-generation"></a></h3>
<p>Apache Avro is used for data serialization, providing schema evolution and type safety.</p>
<ul>
<li><strong>Schema Definition (<code>Order.avsc</code>):</strong>
Located in <code>src/main/avro/Order.avsc</code>, this JSON file defines the structure of our <code>Order</code> messages:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;namespace&#34;</span><span class="p">:</span> <span class="s2">&#34;me.jaehyeon.avro&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;record&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;Order&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nt">&#34;fields&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;order_id&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;bid_time&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;price&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;double&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;item&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">{</span> <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;supplier&#34;</span><span class="p">,</span> <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>This schema will generate a Java class <code>me.jaehyeon.avro.Order</code>.</li>
<li><strong>Gradle Avro Plugin Configuration:</strong>
The <code>avro</code> block in <code>build.gradle.kts</code> configures the Avro code generation:
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">avro</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="n">setCreateSetters</span><span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="c1">// Generates setter methods for fields
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span>    <span class="n">setFieldVisibility</span><span class="p">(</span><span class="s2">&#34;PRIVATE&#34;</span><span class="p">)</span> <span class="c1">// Makes fields private
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="p">}</span>
</span></span></code></pre></div></li>
<li><strong>Integrating Generated Code:</strong>
<ul>
<li><code>tasks.named(&quot;compileKotlin&quot;) { dependsOn(&quot;generateAvroJava&quot;) }</code>: Ensures Avro Java classes are generated before Kotlin code is compiled.</li>
<li><code>sourceSets { named(&quot;main&quot;) { java.srcDirs(&quot;build/generated/avro/main&quot;) ... } }</code>: Adds the directory containing generated Avro Java classes to the main source set, making them available for Kotlin compilation.</li>
</ul>
</li>
</ul>

<h3 id="kafka-admin-utilities" data-numberify>Kafka Admin Utilities<a class="anchor ms-1" href="#kafka-admin-utilities"></a></h3>
<p>The <code>me.jaehyeon.kafka</code> package provides helper functions for interacting with Kafka&rsquo;s administrative features using the <code>AdminClient</code>.</p>
<ul>
<li><strong><code>createTopicIfNotExists(...)</code>:</strong>
<ul>
<li>Takes topic name, bootstrap server address, number of partitions, and replication factor as input.</li>
<li>Configures an <code>AdminClient</code> with appropriate timeouts and retries.</li>
<li>Attempts to create a new topic.</li>
<li>Gracefully handles <code>TopicExistsException</code> if the topic already exists or is created concurrently, logging a warning.</li>
<li>Throws a runtime exception for other unrecoverable errors.</li>
</ul>
</li>
<li><strong><code>verifyKafkaConnection(...)</code>:</strong>
<ul>
<li>Takes the bootstrap server address as input.</li>
<li>Configures an <code>AdminClient</code>.</li>
<li>Attempts to list topics as a simple way to check if the Kafka cluster is reachable.</li>
<li>Throws a runtime exception if the connection fails.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClient</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClientConfig</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.NewTopic</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.TopicExistsException</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.use</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">fun</span> <span class="nf">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">topicName</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">numPartitions</span><span class="p">:</span> <span class="n">Int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">replicationFactor</span><span class="p">:</span> <span class="n">Short</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="k">val</span> <span class="py">newTopic</span> <span class="p">=</span> <span class="n">NewTopic</span><span class="p">(</span><span class="n">topicName</span><span class="p">,</span> <span class="n">numPartitions</span><span class="p">,</span> <span class="n">replicationFactor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">        <span class="k">val</span> <span class="py">result</span> <span class="p">=</span> <span class="n">client</span><span class="p">.</span><span class="n">createTopics</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">newTopic</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Attempting to create topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">            <span class="n">result</span><span class="p">.</span><span class="n">all</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; created successfully!&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span> <span class="k">is</span> <span class="n">TopicExistsException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; was created concurrently or already existed. Continuing...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while creating a topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">
</span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="k">fun</span> <span class="nf">verifyKafkaConnection</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">
</span></span><span class="line"><span class="ln">55</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">            <span class="n">client</span><span class="p">.</span><span class="n">listTopics</span><span class="p">().</span><span class="n">names</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">            <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Failed to connect to Kafka at&#39;</span><span class="si">$bootstrapAddress</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="the-kafka-producer" data-numberify>The Kafka Producer<a class="anchor ms-1" href="#the-kafka-producer"></a></h3>
<p>The <code>ProducerApp</code> object is responsible for generating and sending <code>Order</code> messages to Kafka.</p>
<ul>
<li><strong>Configuration:</strong>
<ul>
<li>Reads <code>BOOTSTRAP</code> (Kafka brokers), <code>TOPIC_NAME</code>, and <code>REGISTRY_URL</code> (Schema Registry) from environment variables, with sensible defaults.</li>
<li>Defines constants for <code>NUM_PARTITIONS</code> and <code>REPLICATION_FACTOR</code> for topic creation.</li>
</ul>
</li>
<li><strong>Initialization:</strong>
<ul>
<li>Calls <code>createTopicIfNotExists</code> to ensure the target topic exists before producing.</li>
</ul>
</li>
<li><strong>Producer Properties:</strong>
<ul>
<li><code>BOOTSTRAP_SERVERS_CONFIG</code>: Kafka broker addresses.</li>
<li><code>KEY_SERIALIZER_CLASS_CONFIG</code>: <code>StringSerializer</code> for message keys.</li>
<li><code>VALUE_SERIALIZER_CLASS_CONFIG</code>: <code>io.confluent.kafka.serializers.KafkaAvroSerializer</code> for Avro-serializing message values. This serializer automatically registers schemas with the Schema Registry.</li>
<li><code>schema.registry.url</code>: URL of the Confluent Schema Registry.</li>
<li><code>basic.auth.credentials.source</code> &amp; <code>basic.auth.user.info</code>: Configuration for basic authentication with Schema Registry.</li>
<li>Retry and timeout configurations (<code>RETRIES_CONFIG</code>, <code>REQUEST_TIMEOUT_MS_CONFIG</code>, etc.) for resilience.</li>
</ul>
</li>
<li><strong>Message Generation and Sending:</strong>
<ul>
<li>Enters an infinite loop to continuously produce messages.</li>
<li>Uses <code>net.datafaker.Faker</code> to generate random data for each field of the <code>Order</code> object (order ID, bid time, price, item, supplier).
<ul>
<li>Note that <em>bid time</em> is delayed by an amount of seconds configured by an environment variable named <code>DELAY_SECONDS</code>, which is useful for testing late data handling.</li>
</ul>
</li>
<li>Creates a <code>ProducerRecord</code> with the topic name, order ID as key, and the <code>Order</code> object as value.</li>
<li>Sends the record asynchronously using <code>producer.send()</code>.
<ul>
<li>A callback logs success (topic, partition, offset) or logs errors.</li>
<li><code>.get()</code> is used to make the send synchronous for this example, simplifying error handling but reducing throughput in a real high-volume scenario.</li>
</ul>
</li>
<li>Handles <code>ExecutionException</code> and <code>KafkaException</code> during sending.</li>
<li>Pauses for 1 second between sends using <code>Thread.sleep()</code>.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.avro.Order</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createTopicIfNotExists</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">net.datafaker.Faker</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.KafkaProducer</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerConfig</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerRecord</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.KafkaException</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.UUID</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.TimeUnit</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="k">object</span> <span class="nc">ProducerApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;localhost:9092&#34;</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">inputTopicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC_NAME&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-avro&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryUrl</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;http://localhost:8081&#34;</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">delaySeconds</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DELAY_SECONDS&#34;</span><span class="p">)</span><span class="o">?.</span><span class="n">toIntOrNull</span><span class="p">()</span> <span class="o">?:</span> <span class="m">5</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">NUM</span><span class="n">_PARTITIONS</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">REPLICATION</span><span class="n">_FACTOR</span><span class="p">:</span> <span class="n">Short</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">faker</span> <span class="p">=</span> <span class="n">Faker</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="c1">// Create the input topic if not existing
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="c1"></span>        <span class="n">createTopicIfNotExists</span><span class="p">(</span><span class="n">inputTopicName</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">,</span> <span class="n">NUM_PARTITIONS</span><span class="p">,</span> <span class="n">REPLICATION_FACTOR</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">KEY_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.StringSerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">VALUE_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;io.confluent.kafka.serializers.KafkaAvroSerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;schema.registry.url&#34;</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.credentials.source&#34;</span><span class="p">,</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.user.info&#34;</span><span class="p">,</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">DELIVERY_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;6000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">MAX_BLOCK_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">
</span></span><span class="line"><span class="ln">46</span><span class="cl">        <span class="n">KafkaProducer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Order</span><span class="p">&gt;(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">producer</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">            <span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">                <span class="k">val</span> <span class="py">order</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">                    <span class="n">Order</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">                        <span class="n">orderId</span> <span class="p">=</span> <span class="nc">UUID</span><span class="p">.</span><span class="n">randomUUID</span><span class="p">().</span><span class="n">toString</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">                        <span class="n">bidTime</span> <span class="p">=</span> <span class="n">generateBidTime</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">                        <span class="n">price</span> <span class="p">=</span> <span class="n">faker</span><span class="p">.</span><span class="n">number</span><span class="p">().</span><span class="n">randomDouble</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">150</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">                        <span class="n">item</span> <span class="p">=</span> <span class="n">faker</span><span class="p">.</span><span class="n">commerce</span><span class="p">().</span><span class="n">productName</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">                        <span class="n">supplier</span> <span class="p">=</span> <span class="n">faker</span><span class="p">.</span><span class="n">regexify</span><span class="p">(</span><span class="s2">&#34;(Alice|Bob|Carol|Alex|Joe|James|Jane|Jack)&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">                <span class="k">val</span> <span class="py">record</span> <span class="p">=</span> <span class="n">ProducerRecord</span><span class="p">(</span><span class="n">inputTopicName</span><span class="p">,</span> <span class="n">order</span><span class="p">.</span><span class="n">orderId</span><span class="p">,</span> <span class="n">order</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">                <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">                    <span class="n">producer</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">                        <span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="n">record</span><span class="p">)</span> <span class="p">{</span> <span class="n">metadata</span><span class="p">,</span> <span class="n">exception</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">                            <span class="k">if</span> <span class="p">(</span><span class="n">exception</span> <span class="o">!=</span> <span class="k">null</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">                                <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">exception</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error sending record&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">                            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">                                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">                                    <span class="s2">&#34;Sent to </span><span class="si">${metadata.topic()}</span><span class="s2"> into partition </span><span class="si">${metadata.partition()}</span><span class="s2">, offset </span><span class="si">${metadata.offset()}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">                                <span class="p">}</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">                            <span class="p">}</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">                        <span class="p">}.</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while sending record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">KafkaException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">71</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Kafka error while sending record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">73</span><span class="cl">
</span></span><span class="line"><span class="ln">74</span><span class="cl">                <span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="m">1000L</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">76</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">77</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">78</span><span class="cl">
</span></span><span class="line"><span class="ln">79</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">generateBidTime</span><span class="p">():</span> <span class="n">String</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">        <span class="k">val</span> <span class="py">randomDate</span> <span class="p">=</span> <span class="n">faker</span><span class="p">.</span><span class="n">date</span><span class="p">().</span><span class="n">past</span><span class="p">(</span><span class="n">delaySeconds</span><span class="p">,</span> <span class="nc">TimeUnit</span><span class="p">.</span><span class="n">SECONDS</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">81</span><span class="cl">        <span class="k">val</span> <span class="py">formatter</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">            <span class="n">DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">83</span><span class="cl">                <span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl">                <span class="p">.</span><span class="n">withZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">85</span><span class="cl">        <span class="k">return</span> <span class="n">formatter</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="n">randomDate</span><span class="p">.</span><span class="n">toInstant</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">86</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">87</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="the-kafka-consumer" data-numberify>The Kafka Consumer<a class="anchor ms-1" href="#the-kafka-consumer"></a></h3>
<p>The <code>ConsumerApp</code> object consumes <code>Order</code> messages from Kafka, deserializes them, and processes them.</p>
<ul>
<li><strong>Configuration:</strong>
<ul>
<li>Reads <code>BOOTSTRAP</code> (Kafka brokers), <code>TOPIC</code> (input topic), and <code>REGISTRY_URL</code> (Schema Registry) from environment variables, with defaults.</li>
</ul>
</li>
<li><strong>Initialization:</strong>
<ul>
<li>Calls <code>verifyKafkaConnection</code> to check connectivity to Kafka brokers.</li>
</ul>
</li>
<li><strong>Consumer Properties:</strong>
<ul>
<li><code>BOOTSTRAP_SERVERS_CONFIG</code>: Kafka broker addresses.</li>
<li><code>GROUP_ID_CONFIG</code>: Consumer group ID, ensuring messages are distributed among instances of this consumer.</li>
<li><code>KEY_DESERIALIZER_CLASS_CONFIG</code>: <code>StringDeserializer</code> for message keys.</li>
<li><code>VALUE_DESERIALIZER_CLASS_CONFIG</code>: <code>io.confluent.kafka.serializers.KafkaAvroDeserializer</code> for deserializing Avro messages.</li>
<li><code>ENABLE_AUTO_COMMIT_CONFIG</code>: Set to <code>false</code> for manual offset management.</li>
<li><code>AUTO_OFFSET_RESET_CONFIG</code>: <code>earliest</code> to start consuming from the beginning of the topic if no offset is found.</li>
<li><code>specific.avro.reader</code>: Set to <code>false</code>, meaning the consumer will deserialize Avro messages into <code>GenericRecord</code> objects rather than specific generated Avro classes. This offers flexibility if the exact schema isn&rsquo;t compiled into the consumer.</li>
<li><code>schema.registry.url</code>: URL of the Confluent Schema Registry.</li>
<li><code>basic.auth.credentials.source</code> &amp; <code>basic.auth.user.info</code>: Configuration for basic authentication with Schema Registry.</li>
<li>Timeout configurations (<code>DEFAULT_API_TIMEOUT_MS_CONFIG</code>, <code>REQUEST_TIMEOUT_MS_CONFIG</code>).</li>
</ul>
</li>
<li><strong>Graceful Shutdown:</strong>
<ul>
<li>A <code>ShutdownHook</code> is registered with <code>Runtime.getRuntime()</code>.</li>
<li>When a shutdown signal (e.g., Ctrl+C) is received, <code>keepConsuming</code> is set to <code>false</code>, and <code>consumer.wakeup()</code> is called. This causes the <code>consumer.poll()</code> method to throw a <code>WakeupException</code>, allowing the consumer loop to terminate cleanly.</li>
</ul>
</li>
<li><strong>Consuming Loop:</strong>
<ul>
<li>Subscribes to the specified topic.</li>
<li>Enters a <code>while (keepConsuming)</code> loop.</li>
<li><code>pollSafely()</code>: A helper function that calls <code>consumer.poll()</code> and handles potential <code>WakeupException</code> (exiting loop if shutdown initiated) or other polling errors.</li>
<li>Iterates through received records.</li>
<li><code>processRecordWithRetry()</code>:
<ul>
<li>Processes each <code>GenericRecord</code>.</li>
<li>Includes a retry mechanism (<code>MAX_RETRIES = 3</code>).</li>
<li>Simulates processing errors using <code>(0..99).random() &lt; ERROR_THRESHOLD</code> (currently <code>ERROR_THRESHOLD = -1</code>, so no errors are simulated by default).</li>
<li>If an error occurs, it logs a warning and retries with an exponential backoff (<code>Thread.sleep(500L * attempt.toLong())</code>).</li>
<li>If all retries fail, it logs an error and skips the record.</li>
<li>Successfully processed records are logged.</li>
</ul>
</li>
<li><code>consumer.commitSync()</code>: Manually commits offsets synchronously after a batch of records has been processed.</li>
</ul>
</li>
<li><strong>Error Handling:</strong>
<ul>
<li>General <code>Exception</code> handling around the main consumer loop.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.verifyKafkaConnection</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.avro.generic.GenericRecord</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerConfig</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.KafkaConsumer</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.WakeupException</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.use</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl">
</span></span><span class="line"><span class="ln"> 14</span><span class="cl"><span class="k">object</span> <span class="nc">ConsumerApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;localhost:9092&#34;</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">topicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-avro&#34;</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">registryUrl</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REGISTRY_URL&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;http://localhost:8081&#34;</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">MAX</span><span class="n">_RETRIES</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">ERROR</span><span class="n">_THRESHOLD</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="nd">@Volatile</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="k">private</span> <span class="k">var</span> <span class="py">keepConsuming</span> <span class="p">=</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">
</span></span><span class="line"><span class="ln"> 25</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">        <span class="c1">// Verify kafka connection
</span></span></span><span class="line"><span class="ln"> 27</span><span class="cl"><span class="c1"></span>        <span class="n">verifyKafkaConnection</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">GROUP_ID_CONFIG</span><span class="p">,</span> <span class="s2">&#34;</span><span class="si">$topicName</span><span class="s2">-group&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">KEY_DESERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.StringDeserializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">VALUE_DESERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;io.confluent.kafka.serializers.KafkaAvroDeserializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">ENABLE_AUTO_COMMIT_CONFIG</span><span class="p">,</span> <span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">AUTO_OFFSET_RESET_CONFIG</span><span class="p">,</span> <span class="s2">&#34;earliest&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;specific.avro.reader&#34;</span><span class="p">,</span> <span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;schema.registry.url&#34;</span><span class="p">,</span> <span class="n">registryUrl</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.credentials.source&#34;</span><span class="p">,</span> <span class="s2">&#34;USER_INFO&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="s2">&#34;basic.auth.user.info&#34;</span><span class="p">,</span> <span class="s2">&#34;admin:admin&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">        <span class="k">val</span> <span class="py">consumer</span> <span class="p">=</span> <span class="n">KafkaConsumer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;(</span><span class="n">props</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">        <span class="nc">Runtime</span><span class="p">.</span><span class="n">getRuntime</span><span class="p">().</span><span class="n">addShutdownHook</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">            <span class="n">Thread</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Shutdown detected. Waking up Kafka consumer...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">                <span class="n">keepConsuming</span> <span class="p">=</span> <span class="k">false</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">                <span class="n">consumer</span><span class="p">.</span><span class="n">wakeup</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">            <span class="n">consumer</span><span class="p">.</span><span class="n">use</span> <span class="p">{</span> <span class="n">c</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">                <span class="n">c</span><span class="p">.</span><span class="n">subscribe</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">topicName</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">                <span class="k">while</span> <span class="p">(</span><span class="n">keepConsuming</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">                    <span class="k">val</span> <span class="py">records</span> <span class="p">=</span> <span class="n">pollSafely</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">                    <span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="k">in</span> <span class="n">records</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">                        <span class="n">processRecordWithRetry</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">                    <span class="n">consumer</span><span class="p">.</span><span class="n">commitSync</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">            <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while consuming record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">pollSafely</span><span class="p">(</span><span class="n">consumer</span><span class="p">:</span> <span class="n">KafkaConsumer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;)</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">        <span class="n">runCatching</span> <span class="p">{</span> <span class="n">consumer</span><span class="p">.</span><span class="n">poll</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofMillis</span><span class="p">(</span><span class="m">1000</span><span class="p">))</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">            <span class="p">.</span><span class="n">getOrElse</span> <span class="p">{</span> <span class="n">e</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">                <span class="k">when</span> <span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">                    <span class="k">is</span> <span class="n">WakeupException</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">                        <span class="k">if</span> <span class="p">(</span><span class="n">keepConsuming</span><span class="p">)</span> <span class="k">throw</span> <span class="n">e</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;ConsumerApp wakeup for shutdown.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">                        <span class="n">emptyList</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">                    <span class="k">else</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Unexpected error while polling records&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                        <span class="n">emptyList</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">processRecordWithRetry</span><span class="p">(</span><span class="n">record</span><span class="p">:</span> <span class="n">ConsumerRecord</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">GenericRecord</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">        <span class="k">var</span> <span class="py">attempt</span> <span class="p">=</span> <span class="m">0</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">        <span class="k">while</span> <span class="p">(</span><span class="n">attempt</span> <span class="p">&lt;</span> <span class="n">MAX_RETRIES</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">            <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">                <span class="n">attempt</span><span class="o">++</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">                <span class="k">if</span> <span class="p">((</span><span class="m">0.</span><span class="p">.</span><span class="m">99</span><span class="p">).</span><span class="n">random</span><span class="p">()</span> <span class="p">&lt;</span> <span class="n">ERROR_THRESHOLD</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">                        <span class="s2">&#34;Simulated error for </span><span class="si">${record.value()}</span><span class="s2"> from partition </span><span class="si">${record.partition()}</span><span class="s2">, offset </span><span class="si">${record.offset()}</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                    <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Received </span><span class="si">${record.value()}</span><span class="s2"> from partition </span><span class="si">${record.partition()}</span><span class="s2">, offset </span><span class="si">${record.offset()}</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">                <span class="k">return</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">            <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error processing record (attempt </span><span class="si">$attempt</span><span class="s2"> of </span><span class="si">$MAX</span><span class="s2">_RETRIES)&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="n">attempt</span> <span class="o">==</span> <span class="n">MAX_RETRIES</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to process record after </span><span class="si">$MAX</span><span class="s2">_RETRIES attempts, skipping...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">                    <span class="k">return</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">                <span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="m">500L</span> <span class="p">*</span> <span class="n">attempt</span><span class="p">.</span><span class="n">toLong</span><span class="p">())</span> <span class="c1">// exponential backoff
</span></span></span><span class="line"><span class="ln">106</span><span class="cl"><span class="c1"></span>            <span class="p">}</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="the-application-entry-point" data-numberify>The Application Entry Point<a class="anchor ms-1" href="#the-application-entry-point"></a></h3>
<p>The <code>Main.kt</code> file contains the <code>main</code> function, which serves as the entry point for the packaged application.</p>
<ul>
<li>It checks the first command-line argument (<code>args.getOrNull(0)</code>).</li>
<li>If the argument is <code>&quot;producer&quot;</code> (case-insensitive), it runs <code>ProducerApp.run()</code>.</li>
<li>If the argument is <code>&quot;consumer&quot;</code> (case-insensitive), it runs <code>ConsumerApp.run()</code>.</li>
<li>If no argument or an invalid argument is provided, it prints usage instructions.</li>
<li>A top-level <code>try-catch</code> block handles any uncaught exceptions from the producer or consumer, logs a fatal error, and exits the application with a non-zero status code (<code>exitProcess(1)</code>).</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.system.exitProcess</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">fun</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span> <span class="n">Array</span><span class="p">&lt;</span><span class="n">String</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">when</span> <span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">getOrNull</span><span class="p">(</span><span class="m">0</span><span class="p">)</span><span class="o">?.</span><span class="n">lowercase</span><span class="p">())</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;producer&#34;</span> <span class="o">-&gt;</span> <span class="nc">ProducerApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;consumer&#34;</span> <span class="o">-&gt;</span> <span class="nc">ConsumerApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="k">else</span> <span class="o">-&gt;</span> <span class="n">println</span><span class="p">(</span><span class="s2">&#34;Usage: &lt;producer|consumer&gt;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Fatal error in </span><span class="si">${args.getOrNull(0) ?: &#34;app&#34;}</span><span class="s2">. Shutting down.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">exitProcess</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="run-kafka-applications" data-numberify>Run Kafka Applications<a class="anchor ms-1" href="#run-kafka-applications"></a></h2>
<p>We begin by setting up our local Kafka environment using the <a href="https://github.com/factorhouse/factorhouse-local" target="_blank" rel="noopener noreferrer">Factor House Local<i class="fas fa-external-link-square-alt ms-1"></i></a> project. This project conveniently provisions a Kafka cluster along with Kpow, a powerful tool for Kafka management and control, all managed via Docker Compose. Once our Kafka environment is running, we will start our Kotlin-based producer and consumer applications.</p>

<h3 id="factor-house-local" data-numberify>Factor House Local<a class="anchor ms-1" href="#factor-house-local"></a></h3>
<p>To get our Kafka cluster and Kpow up and running, we&rsquo;ll first need to clone the project repository and navigate into its directory. Then, we can start the services using Docker Compose as shown below. <strong>Note that we need to have a community license for Kpow to get started.</strong> See <a href="https://github.com/factorhouse/factorhouse-local?tab=readme-ov-file#update-kpow-and-flex-licenses" target="_blank" rel="noopener noreferrer">this section<i class="fas fa-external-link-square-alt ms-1"></i></a> of the project <em>README</em> for details on how to request a license and configure it before proceeding with the <code>docker compose</code> command.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> factorhouse-local
</span></span><span class="line"><span class="ln">3</span><span class="cl">docker compose -f compose-kpow-community.yml up -d
</span></span></code></pre></div><p>Once the services are initialized, we can access the Kpow user interface by navigating to <code>http://localhost:3000</code> in the web browser, where we observe the provisioned environment, including three Kafka brokers, one schema registry, and one Kafka Connect instance.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/kpow-overview.png" loading="lazy" width="1414" height="818" />
</picture>

</p>

<h3 id="launch-applications" data-numberify>Launch Applications<a class="anchor ms-1" href="#launch-applications"></a></h3>
<p>Our Kotlin Kafka applications can be launched in a couple of ways, catering to different stages of development and deployment:</p>
<ol>
<li><strong>With Gradle (Development Mode)</strong>: This method is convenient during development, allowing for quick iterations without needing to build a full JAR file each time.</li>
<li><strong>Running the Shadow JAR (Deployment Mode)</strong>: After building a &ldquo;fat&rdquo; JAR (also known as a shadow JAR) that includes all dependencies, the application can be run as a standalone executable. This is typical for deploying to non-development environments.</li>
</ol>
<blockquote>
<p>💡 To build and run the application locally, ensure that <strong>JDK 17</strong> is installed.</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 👉 With Gradle (Dev Mode)</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;producer&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;consumer&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 👉 Build Shadow (Fat) JAR:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">./gradlew shadowJar
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># Resulting JAR:</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># build/libs/orders-avro-clients-1.0.jar</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># 👉 Run the Fat JAR:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">java -jar build/libs/orders-avro-clients-1.0.jar producer
</span></span><span class="line"><span class="ln">13</span><span class="cl">java -jar build/libs/orders-avro-clients-1.0.jar consumer
</span></span></code></pre></div><p>For this post, we demonstrate starting the applications in development mode using Gradle. Once started, we see logs from both the producer sending messages and the consumer receiving them.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/kafka-avro-apps.gif" loading="lazy" width="2346" height="1654" />
</picture>

</p>
<p>Within the Kpow interface, we can check that a new schema, <code>orders-avro-value</code>, is now registered with the <em>Local Schema Registry</em>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/schema-registry.png" loading="lazy" width="1415" height="375" />
</picture>

</p>
<p>With the applications actively producing and consuming Avro data, Kpow enables inspection of messages on the <code>orders-avro</code> topic. In the Kpow UI, navigate to this topic. To correctly view the Avro messages, configure the deserialization settings as follows: set the <strong>Key Deserializer</strong> to <em>String</em>, choose <em>AVRO</em> for the <strong>Value Deserializer</strong>, and ensure the <strong>Schema Registry</strong> selection is set to <em>Local Schema Registry</em>. After applying these configurations, click the Search button to display the messages.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/message-view-01.png" loading="lazy" width="1427" height="670" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients/message-view-02.png" loading="lazy" width="1426" height="628" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>In this post, we successfully built robust Kafka producer and consumer applications in Kotlin, using Avro for schema-enforced data serialization and Gradle for an efficient build process. We demonstrated practical deployment with a local Kafka setup via the <em>Factor House Local</em> project with Kpow, showcasing a complete workflow for developing type-safe, resilient data pipelines with Kafka and a Schema Registry.</p>
      ]]></content:encoded></item><item><title>Kafka Clients with JSON - Producing and Consuming Order Events</title><link>https://jaehyeon.me/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/</link><guid>https://jaehyeon.me/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/</guid><pubDate>Tue, 20 May 2025 00:00:00 +0000</pubDate><description>
&lt;p>This post explores a Kotlin-based Kafka project, meticulously detailing the construction and operation of both a Kafka producer application, responsible for generating and sending order data, and a Kafka consumer application, designed to receive and process these orders. We&amp;rsquo;ll delve into each component, from build configuration to message handling, to understand how they work together in an event-driven system.&lt;/p></description><content:encoded><![CDATA[
        <p>This post explores a Kotlin-based Kafka project, meticulously detailing the construction and operation of both a Kafka producer application, responsible for generating and sending order data, and a Kafka consumer application, designed to receive and process these orders. We&rsquo;ll delve into each component, from build configuration to message handling, to understand how they work together in an event-driven system.</p>
<ul>
<li><a href="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/#">Kafka Clients with JSON - Producing and Consuming Order Events</a> (this post)</li>
<li><a href="/blog/2025-05-27-kotlin-getting-started-kafka-avro-clients">Kafka Clients with Avro - Schema Registry and Order Events</a></li>
<li><a href="/blog/2025-06-03-kotlin-getting-started-kafka-streams">Kafka Streams - Lightweight Real-Time Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-10-kotlin-getting-started-flink-datastream">Flink DataStream API - Scalable Event Processing for Supplier Stats</a></li>
<li><a href="/blog/2025-06-17-kotlin-getting-started-flink-table">Flink Table API - Declarative Analytics for Supplier Stats in Real Time</a></li>
</ul>

<h2 id="kafka-client-applications" data-numberify>Kafka Client Applications<a class="anchor ms-1" href="#kafka-client-applications"></a></h2>
<p>We will build producer and consumer apps using the <a href="https://www.jetbrains.com/idea/download/?section=windows" target="_blank" rel="noopener noreferrer">IntelliJ IDEA Community<i class="fas fa-external-link-square-alt ms-1"></i></a> edition. The source code for the applications discussed in this post can be found in the <em>orders-json-clients</em> folder of this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/kotlin-examples" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>. This project demonstrates a practical approach to developing event-driven systems with Kafka and Kotlin. Below, we&rsquo;ll explore the key components that make up these applications.</p>

<h3 id="build-configuration" data-numberify>Build Configuration<a class="anchor ms-1" href="#build-configuration"></a></h3>
<p>The <code>build.gradle.kts</code> file is the cornerstone of our project, defining how the Kotlin Kafka application is built and packaged using Gradle with its Kotlin DSL. It orchestrates several key aspects:</p>
<ul>
<li><strong>Plugins:</strong>
<ul>
<li><code>kotlin(&quot;jvm&quot;)</code>: Provides essential support for compiling Kotlin code for the Java Virtual Machine.</li>
<li><code>com.github.johnrengelman.shadow</code>: Creates a &ldquo;fat JAR&rdquo; (or &ldquo;uber JAR&rdquo;), bundling the application and all its dependencies into a single, easily deployable executable file.</li>
<li><code>application</code>: Configures the project as a runnable application, specifying its main entry point.</li>
</ul>
</li>
<li><strong>Dependencies:</strong>
<ul>
<li><code>org.apache.kafka:kafka-clients</code>: The official Kafka client library for interacting with Kafka brokers.</li>
<li><code>com.fasterxml.jackson.module:jackson-module-kotlin</code>: Enables seamless JSON serialization and deserialization for Kotlin data classes using the Jackson library.</li>
<li><code>io.github.microutils:kotlin-logging-jvm</code> &amp; <code>ch.qos.logback:logback-classic</code>: A combination for flexible and robust logging capabilities.</li>
<li><code>net.datafaker:datafaker</code>: Used to generate realistic mock data for the <code>Order</code> objects.</li>
<li><code>kotlin(&quot;test&quot;)</code>: Supports writing unit tests for the application.</li>
</ul>
</li>
<li><strong>Key Configurations:</strong>
<ul>
<li>Specifies Java 17 as the target JVM via <code>jvmToolchain(17)</code>.</li>
<li>Sets <code>me.jaehyeon.MainKt</code> as the <code>mainClass</code> for execution.</li>
<li>The <code>shadowJar</code> task is configured to name the output artifact <code>orders-json-clients-1.0.jar</code> and to correctly merge service files from dependencies.</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="n">plugins</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;jvm&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;2.1.20&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">id</span><span class="p">(</span><span class="s2">&#34;com.github.johnrengelman.shadow&#34;</span><span class="p">)</span> <span class="n">version</span> <span class="s2">&#34;8.1.1&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">application</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">group</span> <span class="p">=</span> <span class="s2">&#34;me.jaehyeon&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">version</span> <span class="p">=</span> <span class="s2">&#34;1.0-SNAPSHOT&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">repositories</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">mavenCentral</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">dependencies</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="c1">// Kafka
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;org.apache.kafka:kafka-clients:3.9.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="c1">// JSON (using Jackson)
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;com.fasterxml.jackson.module:jackson-module-kotlin:2.17.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="c1">// Logging
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;io.github.microutils:kotlin-logging-jvm:3.0.5&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;ch.qos.logback:logback-classic:1.5.13&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="c1">// Faker
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"></span>    <span class="n">implementation</span><span class="p">(</span><span class="s2">&#34;net.datafaker:datafaker:2.1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="c1">// Test
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"></span>    <span class="n">testImplementation</span><span class="p">(</span><span class="n">kotlin</span><span class="p">(</span><span class="s2">&#34;test&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="n">kotlin</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">    <span class="n">jvmToolchain</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="n">application</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">mainClass</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;me.jaehyeon.MainKt&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">
</span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">&lt;</span><span class="n">com</span><span class="p">.</span><span class="n">github</span><span class="p">.</span><span class="n">jengelman</span><span class="p">.</span><span class="n">gradle</span><span class="p">.</span><span class="n">plugins</span><span class="p">.</span><span class="n">shadow</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">ShadowJar</span><span class="p">&gt;(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">    <span class="n">archiveBaseName</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;orders-json-clients&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">    <span class="n">archiveClassifier</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="n">archiveVersion</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="s2">&#34;1.0&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="n">mergeServiceFiles</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">named</span><span class="p">(</span><span class="s2">&#34;build&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">dependsOn</span><span class="p">(</span><span class="s2">&#34;shadowJar&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">
</span></span><span class="line"><span class="ln">47</span><span class="cl"><span class="n">tasks</span><span class="p">.</span><span class="n">test</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">    <span class="n">useJUnitPlatform</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="data-model" data-numberify>Data Model<a class="anchor ms-1" href="#data-model"></a></h3>
<p>At the heart of the messages exchanged is the <code>me.jaehyeon.model.Order</code> data class. This Kotlin data class concisely defines the structure of an &ldquo;order&rdquo; event. It includes fields like <code>orderId</code> (a unique string), <code>bidTime</code> (a string timestamp), <code>price</code> (a double), <code>item</code> (a string for the product name), and <code>supplier</code> (a string). Importantly, all properties are declared with default values (e.g., <code>&quot;&quot;</code> for strings, <code>0.0</code> for doubles). This design choice is crucial for JSON deserialization libraries like Jackson, which often require a no-argument constructor to instantiate objects, a feature Kotlin data classes don&rsquo;t automatically provide if all properties are constructor parameters without defaults.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.model</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1">// Java classes usually have a default constructor automatically, but not Kotlin data classes.
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1">// Jackson expects a default way to instantiate objects unless you give it detailed instructions.
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"></span><span class="k">data</span> <span class="k">class</span> <span class="nc">Order</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">val</span> <span class="py">orderId</span><span class="p">:</span> <span class="n">String</span> <span class="p">=</span> <span class="s2">&#34;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">val</span> <span class="py">bidTime</span><span class="p">:</span> <span class="n">String</span> <span class="p">=</span> <span class="s2">&#34;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">val</span> <span class="py">price</span><span class="p">:</span> <span class="n">Double</span> <span class="p">=</span> <span class="m">0.0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">val</span> <span class="py">item</span><span class="p">:</span> <span class="n">String</span> <span class="p">=</span> <span class="s2">&#34;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">val</span> <span class="py">supplier</span><span class="p">:</span> <span class="n">String</span> <span class="p">=</span> <span class="s2">&#34;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">)</span>
</span></span></code></pre></div>
<h3 id="custom-json-deserializers" data-numberify>Custom JSON (De)Serializers<a class="anchor ms-1" href="#custom-json-deserializers"></a></h3>
<p>To convert our Kotlin <code>Order</code> objects into byte arrays for Kafka transmission and vice-versa, the <code>me.jaehyeon.serializer</code> package provides custom implementations.</p>
<p>The <code>JsonSerializer&lt;T&gt;</code> class implements Kafka&rsquo;s <code>Serializer&lt;T&gt;</code> interface. It uses Jackson&rsquo;s <code>ObjectMapper</code> to transform any given object <code>T</code> into a JSON byte array. This <code>ObjectMapper</code> is specifically configured with <code>PropertyNamingStrategies.SNAKE_CASE</code>, ensuring that Kotlin&rsquo;s camelCase property names (e.g., <code>orderId</code>) are serialized as snake_case (e.g., <code>order_id</code>) in the JSON output.</p>
<p>Complementing this, the <code>JsonDeserializer&lt;T&gt;</code> class implements Kafka&rsquo;s <code>Deserializer&lt;T&gt;</code> interface. It takes a <code>targetClass</code> (such as <code>Order::class.java</code>) during its instantiation and uses a similarly configured <code>ObjectMapper</code> (also with <code>SNAKE_CASE</code> strategy) to convert incoming JSON byte arrays back into objects of that specified type.</p>
<p><strong>JsonSerializer.kt</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.serializer</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.ObjectMapper</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.PropertyNamingStrategies</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.serialization.Serializer</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">class</span> <span class="nc">JsonSerializer</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;</span> <span class="p">:</span> <span class="n">Serializer</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">objectMapper</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="n">ObjectMapper</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="p">.</span><span class="n">setPropertyNamingStrategy</span><span class="p">(</span><span class="nc">PropertyNamingStrategies</span><span class="p">.</span><span class="n">SNAKE_CASE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">serialize</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">        <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">?,</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="k">data</span><span class="p">:</span> <span class="n">T</span><span class="p">?,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">):</span> <span class="n">ByteArray</span><span class="p">?</span> <span class="p">=</span> <span class="k">data</span><span class="o">?.</span><span class="n">let</span> <span class="p">{</span> <span class="n">objectMapper</span><span class="p">.</span><span class="n">writeValueAsBytes</span><span class="p">(</span><span class="k">it</span><span class="p">)</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p><strong>JsonDeserializer.kt</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.serializer</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.ObjectMapper</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">com.fasterxml.jackson.databind.PropertyNamingStrategies</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.serialization.Deserializer</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">class</span> <span class="nc">JsonDeserializer</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;(</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">targetClass</span><span class="p">:</span> <span class="n">Class</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="p">)</span> <span class="p">:</span> <span class="n">Deserializer</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">objectMapper</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="n">ObjectMapper</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="p">.</span><span class="n">setPropertyNamingStrategy</span><span class="p">(</span><span class="nc">PropertyNamingStrategies</span><span class="p">.</span><span class="n">SNAKE_CASE</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="k">override</span> <span class="k">fun</span> <span class="nf">deserialize</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="n">topic</span><span class="p">:</span> <span class="n">String</span><span class="p">?,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="k">data</span><span class="p">:</span> <span class="n">ByteArray</span><span class="p">?,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="p">):</span> <span class="n">T</span><span class="p">?</span> <span class="p">=</span> <span class="k">data</span><span class="o">?.</span><span class="n">let</span> <span class="p">{</span> <span class="n">objectMapper</span><span class="p">.</span><span class="n">readValue</span><span class="p">(</span><span class="k">it</span><span class="p">,</span> <span class="n">targetClass</span><span class="p">)</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="kafka-admin-utilities" data-numberify>Kafka Admin Utilities<a class="anchor ms-1" href="#kafka-admin-utilities"></a></h3>
<p>The <code>me.jaehyeon.kafka</code> package houses utility functions for administrative Kafka tasks, primarily topic creation and connection verification.</p>
<p>The <code>createTopicIfNotExists</code> function proactively ensures that the target Kafka topic (e.g., &ldquo;orders-json&rdquo;) is available before the application attempts to use it. It uses Kafka&rsquo;s <code>AdminClient</code>, configured with the bootstrap server address and appropriate timeouts, to attempt topic creation with a specified number of partitions and replication factor. A key feature is its ability to gracefully handle <code>TopicExistsException</code>, allowing the application to continue smoothly if the topic already exists or was created concurrently.</p>
<p>The <code>verifyKafkaConnection</code> function serves as a quick pre-flight check, especially for the consumer. It also employs an <code>AdminClient</code> to try listing topics on the cluster. If this fails, it throws a <code>RuntimeException</code>, signaling a connectivity issue with the Kafka brokers and preventing the application from starting in a potentially faulty state.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon.kafka</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClient</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.AdminClientConfig</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.admin.NewTopic</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.TopicExistsException</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="k">fun</span> <span class="nf">createTopicIfNotExists</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">topicName</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">numPartitions</span><span class="p">:</span> <span class="n">Int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">replicationFactor</span><span class="p">:</span> <span class="n">Short</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">        <span class="k">val</span> <span class="py">newTopic</span> <span class="p">=</span> <span class="n">NewTopic</span><span class="p">(</span><span class="n">topicName</span><span class="p">,</span> <span class="n">numPartitions</span><span class="p">,</span> <span class="n">replicationFactor</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="k">val</span> <span class="py">result</span> <span class="p">=</span> <span class="n">client</span><span class="p">.</span><span class="n">createTopics</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">newTopic</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Attempting to create topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">            <span class="n">result</span><span class="p">.</span><span class="n">all</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">            <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; created successfully!&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">cause</span> <span class="k">is</span> <span class="n">TopicExistsException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span> <span class="p">{</span> <span class="s2">&#34;Topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39; was created concurrently or already existed. Continuing...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">                <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while creating a topic &#39;</span><span class="si">$topicName</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">
</span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="k">fun</span> <span class="nf">verifyKafkaConnection</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">:</span> <span class="n">String</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">            <span class="n">put</span><span class="p">(</span><span class="nc">AdminClientConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;1&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">
</span></span><span class="line"><span class="ln">54</span><span class="cl">    <span class="nc">AdminClient</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">client</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">            <span class="n">client</span><span class="p">.</span><span class="n">listTopics</span><span class="p">().</span><span class="n">names</span><span class="p">().</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">            <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Failed to connect to Kafka at&#39;</span><span class="si">$bootstrapAddress</span><span class="s2">&#39;.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="kafka-producer" data-numberify>Kafka Producer<a class="anchor ms-1" href="#kafka-producer"></a></h3>
<p>The <code>me.jaehyeon.ProducerApp</code> object is responsible for generating <code>Order</code> messages and publishing them to a Kafka topic. Its operations include:</p>
<ul>
<li><strong>Configuration:</strong>
<ul>
<li>Reads the Kafka <code>BOOTSTRAP_ADDRESS</code> and target <code>TOPIC_NAME</code> (defaulting to &ldquo;orders-json&rdquo;) from environment variables, allowing for flexible deployment.</li>
<li>Defines constants like <code>NUM_PARTITIONS</code> and <code>REPLICATION_FACTOR</code> for topic creation if needed.</li>
</ul>
</li>
<li><strong>Initialization (<code>run</code> method):</strong>
<ul>
<li>First, it calls <code>createTopicIfNotExists</code> (from the admin utilities) to ensure the output topic is ready.</li>
<li>It then configures and instantiates a <code>KafkaProducer</code>, setting properties like bootstrap servers, using <code>StringSerializer</code> for message keys, and our custom <code>JsonSerializer</code> for the <code>Order</code> object values.</li>
<li>Retry mechanisms and timeout settings (<code>REQUEST_TIMEOUT_MS_CONFIG</code>, <code>DELIVERY_TIMEOUT_MS_CONFIG</code>, <code>MAX_BLOCK_MS_CONFIG</code>) are configured for enhanced robustness.</li>
</ul>
</li>
<li><strong>Message Production Loop:</strong>
<ul>
<li>Continuously generates new <code>Order</code> objects using <code>Datafaker</code> for random yet plausible data. This includes generating a UUID for <code>orderId</code> and a formatted recent timestamp via <code>generateBidTime()</code>.
<ul>
<li>Note that <code>bidTime</code> is delayed by an amount of seconds configured by an environment variable named <code>DELAY_SECONDS</code>, which is useful for testing late data handling.</li>
</ul>
</li>
<li>Wraps each <code>Order</code> in a <code>ProducerRecord</code>, using the <code>orderId</code> as the message key.</li>
<li>Sends the record using <code>producer.send()</code>. The call to <code>.get()</code> on the returned <code>Future</code> makes this send operation synchronous for this example, waiting for acknowledgment. A callback logs success (topic, partition, offset) or any exceptions.</li>
<li>Pauses for one second between messages to simulate a steady event stream.</li>
</ul>
</li>
<li><strong>Error Handling:</strong> Includes <code>try-catch</code> blocks to handle potential <code>ExecutionException</code> or <code>KafkaException</code> during the send process.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.createTopicIfNotExists</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.model.Order</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.serializer.JsonSerializer</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">import</span> <span class="nn">net.datafaker.Faker</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.KafkaProducer</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerConfig</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.producer.ProducerRecord</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.KafkaException</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.ZoneId</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.format.DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.UUID</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.ExecutionException</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.concurrent.TimeUnit</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="k">object</span> <span class="nc">ProducerApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP_ADDRESS&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;localhost:9092&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">inputTopicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC_NAME&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-json&#34;</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">delaySeconds</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DELAY_SECONDS&#34;</span><span class="p">)</span><span class="o">?.</span><span class="n">toIntOrNull</span><span class="p">()</span> <span class="o">?:</span> <span class="m">5</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">NUM</span><span class="n">_PARTITIONS</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">REPLICATION</span><span class="n">_FACTOR</span><span class="p">:</span> <span class="n">Short</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">faker</span> <span class="p">=</span> <span class="n">Faker</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">        <span class="c1">// Create the input topic if not existing
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="c1"></span>        <span class="n">createTopicIfNotExists</span><span class="p">(</span><span class="n">inputTopicName</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">,</span> <span class="n">NUM_PARTITIONS</span><span class="p">,</span> <span class="n">REPLICATION_FACTOR</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">KEY_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.StringSerializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">VALUE_SERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="n">JsonSerializer</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">RETRIES_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">DELIVERY_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;6000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ProducerConfig</span><span class="p">.</span><span class="n">MAX_BLOCK_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">
</span></span><span class="line"><span class="ln">43</span><span class="cl">        <span class="n">KafkaProducer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Order</span><span class="p">&gt;(</span><span class="n">props</span><span class="p">).</span><span class="n">use</span> <span class="p">{</span> <span class="n">producer</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">            <span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">                <span class="k">val</span> <span class="py">order</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">                    <span class="n">Order</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">                        <span class="nc">UUID</span><span class="p">.</span><span class="n">randomUUID</span><span class="p">().</span><span class="n">toString</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">                        <span class="n">generateBidTime</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">                        <span class="n">faker</span><span class="p">.</span><span class="n">number</span><span class="p">().</span><span class="n">randomDouble</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">150</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">                        <span class="n">faker</span><span class="p">.</span><span class="n">commerce</span><span class="p">().</span><span class="n">productName</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">                        <span class="n">faker</span><span class="p">.</span><span class="n">regexify</span><span class="p">(</span><span class="s2">&#34;(Alice|Bob|Carol|Alex|Joe|James|Jane|Jack)&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">                    <span class="p">)</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">                <span class="k">val</span> <span class="py">record</span> <span class="p">=</span> <span class="n">ProducerRecord</span><span class="p">(</span><span class="n">inputTopicName</span><span class="p">,</span> <span class="n">order</span><span class="p">.</span><span class="n">orderId</span><span class="p">,</span> <span class="n">order</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">                <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">                    <span class="n">producer</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">                        <span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="n">record</span><span class="p">)</span> <span class="p">{</span> <span class="n">metadata</span><span class="p">,</span> <span class="n">exception</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">                            <span class="k">if</span> <span class="p">(</span><span class="n">exception</span> <span class="o">!=</span> <span class="k">null</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">58</span><span class="cl">                                <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">exception</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error sending record&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">                            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">                                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">                                    <span class="s2">&#34;Sent to </span><span class="si">${metadata.topic()}</span><span class="s2"> into partition </span><span class="si">${metadata.partition()}</span><span class="s2">, offset </span><span class="si">${metadata.offset()}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">                                <span class="p">}</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">                            <span class="p">}</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">                        <span class="p">}.</span><span class="k">get</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">ExecutionException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while sending record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">                <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">KafkaException</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Kafka error while sending record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">
</span></span><span class="line"><span class="ln">71</span><span class="cl">                <span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="m">1000L</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">73</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl">
</span></span><span class="line"><span class="ln">76</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">generateBidTime</span><span class="p">():</span> <span class="n">String</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">77</span><span class="cl">        <span class="k">val</span> <span class="py">randomDate</span> <span class="p">=</span> <span class="n">faker</span><span class="p">.</span><span class="n">date</span><span class="p">().</span><span class="n">past</span><span class="p">(</span><span class="n">delaySeconds</span><span class="p">,</span> <span class="nc">TimeUnit</span><span class="p">.</span><span class="n">SECONDS</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">78</span><span class="cl">        <span class="k">val</span> <span class="py">formatter</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln">79</span><span class="cl">            <span class="n">DateTimeFormatter</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">                <span class="p">.</span><span class="n">ofPattern</span><span class="p">(</span><span class="s2">&#34;yyyy-MM-dd HH:mm:ss&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">81</span><span class="cl">                <span class="p">.</span><span class="n">withZone</span><span class="p">(</span><span class="nc">ZoneId</span><span class="p">.</span><span class="n">systemDefault</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">        <span class="k">return</span> <span class="n">formatter</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="n">randomDate</span><span class="p">.</span><span class="n">toInstant</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">83</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="kafka-consumer" data-numberify>Kafka Consumer<a class="anchor ms-1" href="#kafka-consumer"></a></h3>
<p>The <code>me.jaehyeon.ConsumerApp</code> object is designed to subscribe to the Kafka topic, fetch the <code>Order</code> messages, and process them. Its key functionalities are:</p>
<ul>
<li><strong>Configuration:</strong>
<ul>
<li>Retrieves <code>BOOTSTRAP_ADDRESS</code> and <code>TOPIC_NAME</code> from environment variables.</li>
</ul>
</li>
<li><strong>Initialization (<code>run</code> method):</strong>
<ul>
<li>Begins by calling <code>verifyKafkaConnection</code> (from admin utilities) to check Kafka cluster accessibility.</li>
<li>Configures and creates a <code>KafkaConsumer</code>. Essential properties include <code>GROUP_ID_CONFIG</code> (e.g., &ldquo;orders-json-group&rdquo; for consumer group coordination), <code>StringDeserializer</code> for keys, and an instance of our custom <code>JsonDeserializer(Order::class.java)</code> for message values.</li>
<li>Disables auto-commit (<code>ENABLE_AUTO_COMMIT_CONFIG = false</code>) for manual offset control and sets <code>AUTO_OFFSET_RESET_CONFIG = &quot;earliest&quot;</code> to start reading from the beginning of the topic for new consumer groups.</li>
</ul>
</li>
<li><strong>Graceful Shutdown:</strong>
<ul>
<li>A <code>Runtime.getRuntime().addShutdownHook</code> is registered. On a shutdown signal (e.g., Ctrl+C), it sets a <code>keepConsuming</code> flag to <code>false</code> and calls <code>consumer.wakeup()</code>. This action causes <code>consumer.poll()</code> to throw a <code>WakeupException</code>, allowing the consumption loop to terminate cleanly.</li>
</ul>
</li>
<li><strong>Message Consumption Loop:</strong>
<ul>
<li>The consumer subscribes to the specified topic.</li>
<li>In a <code>while (keepConsuming)</code> loop:
<ul>
<li><code>pollSafely()</code> is called to fetch records. This wrapper robustly handles <code>WakeupException</code> for shutdown and logs other polling errors.</li>
<li>Each received <code>ConsumerRecord</code> is processed by <code>processRecordWithRetry()</code>. This method logs the <code>Order</code> details and includes a retry mechanism for simulated errors (currently, <code>ERROR_THRESHOLD</code> is set to -1, disabling simulated errors). If an error occurs, it retries up to <code>MAX_RETRIES</code> with exponential backoff. If all retries fail, the error is logged, and the message is skipped.</li>
<li>After processing a batch, <code>consumer.commitSync()</code> is called to manually commit offsets.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Error Handling:</strong> A general <code>try-catch</code> block surrounds the main consumption logic for unrecoverable errors.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln">  1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl">
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.kafka.verifyKafkaConnection</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.model.Order</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="k">import</span> <span class="nn">me.jaehyeon.serializer.JsonDeserializer</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerConfig</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.KafkaConsumer</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.errors.WakeupException</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="k">import</span> <span class="nn">org.apache.kafka.common.serialization.StringDeserializer</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">import</span> <span class="nn">java.time.Duration</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl"><span class="k">import</span> <span class="nn">java.util.Properties</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl">
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">object</span> <span class="nc">ConsumerApp</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">bootstrapAddress</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;BOOTSTRAP_ADDRESS&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;localhost:9092&#34;</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">topicName</span> <span class="p">=</span> <span class="nc">System</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;TOPIC_NAME&#34;</span><span class="p">)</span> <span class="o">?:</span> <span class="s2">&#34;orders-json&#34;</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">    <span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">MAX</span><span class="n">_RETRIES</span> <span class="p">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">    <span class="k">private</span> <span class="k">const</span> <span class="k">val</span> <span class="py">ERROR</span><span class="n">_THRESHOLD</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="nd">@Volatile</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="k">private</span> <span class="k">var</span> <span class="py">keepConsuming</span> <span class="p">=</span> <span class="k">true</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">
</span></span><span class="line"><span class="ln"> 25</span><span class="cl">    <span class="k">fun</span> <span class="nf">run</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">        <span class="c1">// Verify kafka connection
</span></span></span><span class="line"><span class="ln"> 27</span><span class="cl"><span class="c1"></span>        <span class="n">verifyKafkaConnection</span><span class="p">(</span><span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="k">val</span> <span class="py">props</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">            <span class="n">Properties</span><span class="p">().</span><span class="n">apply</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">BOOTSTRAP_SERVERS_CONFIG</span><span class="p">,</span> <span class="n">bootstrapAddress</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">GROUP_ID_CONFIG</span><span class="p">,</span> <span class="s2">&#34;</span><span class="si">$topicName</span><span class="s2">-group&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">KEY_DESERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;org.apache.kafka.common.serialization.StringDeserializer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">VALUE_DESERIALIZER_CLASS_CONFIG</span><span class="p">,</span> <span class="n">Order</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">ENABLE_AUTO_COMMIT_CONFIG</span><span class="p">,</span> <span class="k">false</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">AUTO_OFFSET_RESET_CONFIG</span><span class="p">,</span> <span class="s2">&#34;earliest&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">DEFAULT_API_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;5000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">                <span class="n">put</span><span class="p">(</span><span class="nc">ConsumerConfig</span><span class="p">.</span><span class="n">REQUEST_TIMEOUT_MS_CONFIG</span><span class="p">,</span> <span class="s2">&#34;3000&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">        <span class="k">val</span> <span class="py">consumer</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">            <span class="n">KafkaConsumer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Order</span><span class="p">&gt;(</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">                <span class="n">props</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">                <span class="n">StringDeserializer</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">                <span class="n">JsonDeserializer</span><span class="p">(</span><span class="n">Order</span><span class="o">::</span><span class="k">class</span><span class="p">.</span><span class="n">java</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">        <span class="nc">Runtime</span><span class="p">.</span><span class="n">getRuntime</span><span class="p">().</span><span class="n">addShutdownHook</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">            <span class="n">Thread</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&#34;Shutdown detected. Waking up Kafka consumer...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">                <span class="n">keepConsuming</span> <span class="p">=</span> <span class="k">false</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">                <span class="n">consumer</span><span class="p">.</span><span class="n">wakeup</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">        <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">            <span class="n">consumer</span><span class="p">.</span><span class="n">use</span> <span class="p">{</span> <span class="n">c</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">                <span class="n">c</span><span class="p">.</span><span class="n">subscribe</span><span class="p">(</span><span class="n">listOf</span><span class="p">(</span><span class="n">topicName</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">                <span class="k">while</span> <span class="p">(</span><span class="n">keepConsuming</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">                    <span class="k">val</span> <span class="py">records</span> <span class="p">=</span> <span class="n">pollSafely</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">                    <span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="k">in</span> <span class="n">records</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">                        <span class="n">processRecordWithRetry</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">                    <span class="n">consumer</span><span class="p">.</span><span class="n">commitSync</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">        <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">            <span class="n">RuntimeException</span><span class="p">(</span><span class="s2">&#34;Unrecoverable error while consuming record.&#34;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">pollSafely</span><span class="p">(</span><span class="n">consumer</span><span class="p">:</span> <span class="n">KafkaConsumer</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Order</span><span class="p">&gt;)</span> <span class="p">=</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">        <span class="n">runCatching</span> <span class="p">{</span> <span class="n">consumer</span><span class="p">.</span><span class="n">poll</span><span class="p">(</span><span class="nc">Duration</span><span class="p">.</span><span class="n">ofMillis</span><span class="p">(</span><span class="m">1000</span><span class="p">))</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">            <span class="p">.</span><span class="n">getOrElse</span> <span class="p">{</span> <span class="n">e</span> <span class="o">-&gt;</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">                <span class="k">when</span> <span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">                    <span class="k">is</span> <span class="n">WakeupException</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">                        <span class="k">if</span> <span class="p">(</span><span class="n">keepConsuming</span><span class="p">)</span> <span class="k">throw</span> <span class="n">e</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;ConsumerApp wakeup for shutdown.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">                        <span class="n">emptyList</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">                    <span class="k">else</span> <span class="o">-&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Unexpected error while polling records&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">                        <span class="n">emptyList</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">                    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">    <span class="k">private</span> <span class="k">fun</span> <span class="nf">processRecordWithRetry</span><span class="p">(</span><span class="n">record</span><span class="p">:</span> <span class="n">ConsumerRecord</span><span class="p">&lt;</span><span class="n">String</span><span class="p">,</span> <span class="n">Order</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">        <span class="k">var</span> <span class="py">attempt</span> <span class="p">=</span> <span class="m">0</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">        <span class="k">while</span> <span class="p">(</span><span class="n">attempt</span> <span class="p">&lt;</span> <span class="n">MAX_RETRIES</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">            <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">                <span class="n">attempt</span><span class="o">++</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">                <span class="k">if</span> <span class="p">((</span><span class="m">0.</span><span class="p">.</span><span class="m">99</span><span class="p">).</span><span class="n">random</span><span class="p">()</span> <span class="p">&lt;</span> <span class="n">ERROR_THRESHOLD</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">                    <span class="k">throw</span> <span class="n">RuntimeException</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">                        <span class="s2">&#34;Simulated error for </span><span class="si">${record.value()}</span><span class="s2"> from partition </span><span class="si">${record.partition()}</span><span class="s2">, offset </span><span class="si">${record.offset()}</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">                    <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">info</span> <span class="p">{</span> <span class="s2">&#34;Received </span><span class="si">${record.value()}</span><span class="s2"> from partition </span><span class="si">${record.partition()}</span><span class="s2">, offset </span><span class="si">${record.offset()}</span><span class="s2">&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">                <span class="k">return</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">            <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">                <span class="n">logger</span><span class="p">.</span><span class="n">warn</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Error processing record (attempt </span><span class="si">$attempt</span><span class="s2"> of </span><span class="si">$MAX</span><span class="s2">_RETRIES)&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="n">attempt</span> <span class="o">==</span> <span class="n">MAX_RETRIES</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">                    <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Failed to process record after </span><span class="si">$MAX</span><span class="s2">_RETRIES attempts, skipping...&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">                    <span class="k">return</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">                <span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="m">500L</span> <span class="p">*</span> <span class="n">attempt</span><span class="p">.</span><span class="n">toLong</span><span class="p">())</span> <span class="c1">// exponential backoff
</span></span></span><span class="line"><span class="ln">107</span><span class="cl"><span class="c1"></span>            <span class="p">}</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="application-entry-point" data-numberify>Application Entry Point<a class="anchor ms-1" href="#application-entry-point"></a></h3>
<p>The <code>me.jaehyeon.MainKt</code> file provides the <code>main</code> function, serving as the application&rsquo;s command-line dispatcher. It examines the first command-line argument (<code>args.getOrNull(0)</code>). If it&rsquo;s &ldquo;producer&rdquo; (case-insensitive), <code>ProducerApp.run()</code> is executed. If it&rsquo;s &ldquo;consumer&rdquo;, <code>ConsumerApp.run()</code> is called. For any other input, or if no argument is provided, it prints a usage message. The entire logic is enclosed in a <code>try-catch</code> block to capture and log any fatal unhandled exceptions, ensuring the application exits with an error code (<code>exitProcess(1)</code>) if such an event occurs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-kotlin" data-lang="kotlin"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">package</span> <span class="nn">me.jaehyeon</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">import</span> <span class="nn">mu.KotlinLogging</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="k">import</span> <span class="nn">kotlin.system.exitProcess</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">private</span> <span class="k">val</span> <span class="py">logger</span> <span class="p">=</span> <span class="nc">KotlinLogging</span><span class="p">.</span><span class="n">logger</span> <span class="p">{}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">fun</span> <span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span> <span class="n">Array</span><span class="p">&lt;</span><span class="n">String</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">when</span> <span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">getOrNull</span><span class="p">(</span><span class="m">0</span><span class="p">)</span><span class="o">?.</span><span class="n">lowercase</span><span class="p">())</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="s2">&#34;producer&#34;</span> <span class="o">-&gt;</span> <span class="nc">ProducerApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="s2">&#34;consumer&#34;</span> <span class="o">-&gt;</span> <span class="nc">ConsumerApp</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="k">else</span> <span class="o">-&gt;</span> <span class="n">println</span><span class="p">(</span><span class="s2">&#34;Usage: &lt;producer|consumer&gt;&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">:</span> <span class="n">Exception</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="n">logger</span><span class="p">.</span><span class="n">error</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="s2">&#34;Fatal error in </span><span class="si">${args.getOrNull(0) ?: &#34;app&#34;}</span><span class="s2">. Shutting down.&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="n">exitProcess</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="run-kafka-applications" data-numberify>Run Kafka Applications<a class="anchor ms-1" href="#run-kafka-applications"></a></h2>
<p>We begin by setting up our local Kafka environment using the <a href="https://github.com/factorhouse/factorhouse-local" target="_blank" rel="noopener noreferrer">Factor House Local<i class="fas fa-external-link-square-alt ms-1"></i></a> project. This project conveniently provisions a Kafka cluster along with Kpow, a powerful tool for Kafka management and control, all managed via Docker Compose. Once our Kafka environment is running, we will start our Kotlin-based producer and consumer applications.</p>

<h3 id="factor-house-local" data-numberify>Factor House Local<a class="anchor ms-1" href="#factor-house-local"></a></h3>
<p>To get our Kafka cluster and Kpow up and running, we&rsquo;ll first need to clone the project repository and navigate into its directory. Then, we can start the services using Docker Compose as shown below. <strong>Note that we need to have a community license for Kpow to get started.</strong> See <a href="https://github.com/factorhouse/factorhouse-local?tab=readme-ov-file#update-kpow-and-flex-licenses" target="_blank" rel="noopener noreferrer">this section<i class="fas fa-external-link-square-alt ms-1"></i></a> of the project <em>README</em> for details on how to request a license and configure it before proceeding with the <code>docker compose</code> command.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">git clone https://github.com/factorhouse/factorhouse-local.git
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> factorhouse-local
</span></span><span class="line"><span class="ln">3</span><span class="cl">docker compose -f compose-kpow-community.yml up -d
</span></span></code></pre></div><p>Once the services are initialized, we can access the Kpow user interface by navigating to <code>http://localhost:3000</code> in the web browser, where we observe the provisioned environment, including three Kafka brokers, one schema registry, and one Kafka Connect instance.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/kpow-overview.png" loading="lazy" width="1414" height="818" />
</picture>

</p>

<h3 id="launch-applications" data-numberify>Launch Applications<a class="anchor ms-1" href="#launch-applications"></a></h3>
<p>Our Kotlin Kafka applications can be launched in a couple of ways, catering to different stages of development and deployment:</p>
<ol>
<li><strong>With Gradle (Development Mode)</strong>: This method is convenient during development, allowing for quick iterations without needing to build a full JAR file each time.</li>
<li><strong>Running the Shadow JAR (Deployment Mode)</strong>: After building a &ldquo;fat&rdquo; JAR (also known as a shadow JAR) that includes all dependencies, the application can be run as a standalone executable. This is typical for deploying to non-development environments.</li>
</ol>
<blockquote>
<p>💡 To build and run the application locally, ensure that <strong>JDK 17</strong> is installed.</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 👉 With Gradle (Dev Mode)</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;producer&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">./gradlew run --args<span class="o">=</span><span class="s2">&#34;consumer&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 👉 Build Shadow (Fat) JAR:</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">./gradlew shadowJar
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># Resulting JAR:</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># build/libs/orders-json-clients-1.0.jar</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># 👉 Run the Fat JAR:</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">java -jar build/libs/orders-json-clients-1.0.jar producer
</span></span><span class="line"><span class="ln">13</span><span class="cl">java -jar build/libs/orders-json-clients-1.0.jar consumer
</span></span></code></pre></div><p>For this post, we demonstrate starting the applications in development mode using Gradle. Once started, we see logs from both the producer sending messages and the consumer receiving them.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/kafka-json-apps.gif" loading="lazy" width="2364" height="1670" />
</picture>

</p>
<p>With the applications running and producing/consuming data, we can inspect the messages flowing through our <code>orders-json</code> topic using Kpow. In the Kpow UI, navigate to your topic. To correctly view the messages, we should configure the deserializers: set the <strong>Key Deserializer</strong> to <em>String</em> and the <strong>Value Deserializer</strong> to <em>JSON</em>. After applying these settings, click the <em>Search</em> button to view the messages.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/message-view-01.png" loading="lazy" width="1427" height="612" />
</picture>


<picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-20-kotlin-getting-started-kafka-json-clients/message-view-02.png" loading="lazy" width="1424" height="627" />
</picture>

</p>

<h2 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h2>
<p>This post detailed the creation of Kotlin Kafka producer and consumer applications for handling JSON order data. We covered project setup, data modeling, custom serialization, client logic with error handling, deployment against a local Kafka cluster using the <em>Factor House Local</em> project with <em>Kpow</em>.</p>
      ]]></content:encoded></item><item><title>Meet the Streamhouse Trio - Paimon, Fluss, and Iceberg for Unified Data Architectures</title><link>https://jaehyeon.me/blog/2025-05-06-streamhouse-trio/</link><guid>https://jaehyeon.me/blog/2025-05-06-streamhouse-trio/</guid><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>The world of data is converging. The traditional divide between batch processing for historical analytics and stream processing for real-time insights is becoming increasingly blurry. Businesses demand architectures that handle both seamlessly. Enter the &ldquo;Streamhouse&rdquo; - an evolution of the Lakehouse concept, designed with streaming as a first-class citizen.</p>
<p>Today, we&rsquo;ll introduce three key open-source technologies shaping this space: <a href="https://paimon.apache.org/" target="_blank" rel="noopener noreferrer"><strong>Apache Paimon™</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>, <a href="https://alibaba.github.io/fluss-docs/" target="_blank" rel="noopener noreferrer"><strong>Fluss</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>, and <a href="https://iceberg.apache.org/" target="_blank" rel="noopener noreferrer"><strong>Apache Iceberg</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>. While each has unique strengths, their true power lies in how they can be integrated to build robust, flexible, and performant data platforms.</p>
      ]]></description><content:encoded><![CDATA[
        <p>The world of data is converging. The traditional divide between batch processing for historical analytics and stream processing for real-time insights is becoming increasingly blurry. Businesses demand architectures that handle both seamlessly. Enter the &ldquo;Streamhouse&rdquo; - an evolution of the Lakehouse concept, designed with streaming as a first-class citizen.</p>
<p>Today, we&rsquo;ll introduce three key open-source technologies shaping this space: <a href="https://paimon.apache.org/" target="_blank" rel="noopener noreferrer"><strong>Apache Paimon™</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>, <a href="https://alibaba.github.io/fluss-docs/" target="_blank" rel="noopener noreferrer"><strong>Fluss</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>, and <a href="https://iceberg.apache.org/" target="_blank" rel="noopener noreferrer"><strong>Apache Iceberg</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>. While each has unique strengths, their true power lies in how they can be integrated to build robust, flexible, and performant data platforms.</p>
<p>Let&rsquo;s dive into each component:</p>

<h3 id="1-apache-paimon-the-streaming-lakehouse-table" data-numberify>1. Apache Paimon: The Streaming Lakehouse Table<a class="anchor ms-1" href="#1-apache-paimon-the-streaming-lakehouse-table"></a></h3>
<ul>
<li><strong>In Simple Terms:</strong> Think of Paimon as a specialized Lakehouse table format built from the ground up for <strong>unified streaming and batch processing</strong>. It excels where real-time updates meet analytical queries.</li>
<li><strong>Type:</strong> Stream-native Lakehouse Table Storage.</li>
<li><strong>Designed For:</strong> Managing dynamic tables that require frequent updates, deletions, or change data capture (CDC), while still being accessible for batch analytics.</li>
<li><strong>Use Cases:</strong>
<ul>
<li>Real-time data warehousing / OLAP.</li>
<li>Ingesting CDC streams (e.g., from databases).</li>
<li>Building machine learning feature stores that need fresh data.</li>
<li>Replacing Kappa architectures with a more manageable table abstraction.</li>
</ul>
</li>
<li><strong>Strengths:</strong>
<ul>
<li><strong>True Streaming &amp; Batch Unification:</strong> Natively handles both writers and readers.</li>
<li><strong>Efficient Upserts:</strong> Optimized for high-frequency inserts, updates, and deletes using merge-on-read strategies (similar to HBase/LSM-trees).</li>
<li><strong>Low Latency Updates:</strong> Achieves latency in the seconds-to-minutes range for data visibility.</li>
<li><strong>Strong Flink Integration:</strong> Developed with Apache Flink as a primary engine, offering first-class support via Flink SQL and DataStream API.</li>
<li><strong>ACID Transactions:</strong> Ensures data consistency across concurrent operations.</li>
</ul>
</li>
<li><strong>Drawbacks:</strong>
<ul>
<li>Newer ecosystem compared to Iceberg.</li>
<li>Latency, while good, is not in the millisecond range like pure streaming systems.</li>
</ul>
</li>
<li><strong>Origin:</strong> Originated at Alibaba, now at the Apache Software Foundation.</li>
</ul>

<h3 id="2-fluss-ultra-low-latency-streaming-storage" data-numberify>2. Fluss: Ultra-Low-Latency Streaming Storage<a class="anchor ms-1" href="#2-fluss-ultra-low-latency-streaming-storage"></a></h3>
<ul>
<li><strong>In Simple Terms:</strong> Fluss is purpose-built <strong>pure streaming storage</strong>, optimized for getting data in and out <em>extremely</em> fast. It&rsquo;s like a queryable, structured, columnar version of Kafka topics.</li>
<li><strong>Type:</strong> Real-time Optimized Streaming Storage.</li>
<li><strong>Designed For:</strong> Scenarios demanding the absolute lowest latency for stream ingestion and consumption.</li>
<li><strong>Use Cases:</strong>
<ul>
<li>Real-time monitoring and alerting.</li>
<li>Powering real-time dashboards and APIs.</li>
<li>Feeding data into stream processing jobs with minimal delay.</li>
<li>Serving as a high-speed buffer before data lands in Paimon or other storage.</li>
</ul>
</li>
<li><strong>Strengths:</strong>
<ul>
<li><strong>Very Low Latency:</strong> Achieves millisecond-level latency for reads and writes.</li>
<li><strong>Columnar Streaming:</strong> Allows for efficient data access, including projection pushdown (reading only needed columns) directly on the stream.</li>
<li><strong>Native Flink Support:</strong> Designed to work seamlessly as a Flink source/sink.</li>
<li><strong>Structured Streaming:</strong> Unlike message queues, data has a defined schema.</li>
</ul>
</li>
<li><strong>Drawbacks:</strong>
<ul>
<li>Not designed for analytical queries over long historical periods (it&rsquo;s optimized for the <em>stream</em>).</li>
<li>Doesn&rsquo;t natively support updates/deletes or ACID transactions like table formats.</li>
<li>Limited ecosystem support beyond Flink currently (though intended for ASF donation).</li>
<li>No built-in catalog integration; schema defined within the processing job (e.g., Flink SQL DDL).</li>
</ul>
</li>
<li><strong>Origin:</strong> Originated at Ververica (derived from work at Alibaba).</li>
</ul>

<h3 id="3-apache-iceberg-the-battle-tested-batch-lakehouse-standard" data-numberify>3. Apache Iceberg: The Battle-Tested Batch Lakehouse Standard<a class="anchor ms-1" href="#3-apache-iceberg-the-battle-tested-batch-lakehouse-standard"></a></h3>
<ul>
<li><strong>In Simple Terms:</strong> Iceberg is a widely adopted open table format primarily designed for <strong>reliable, large-scale batch analytics</strong> on data lakes. It has been progressively adding streaming capabilities.</li>
<li><strong>Type:</strong> Lakehouse Table Storage (Batch-Optimized).</li>
<li><strong>Designed For:</strong> Managing large, relatively static datasets for SQL-based analytics and BI.</li>
<li><strong>Use Cases:</strong>
<ul>
<li>Building large-scale data lakes for BI and reporting (OLAP).</li>
<li>Replacing Hive tables with better reliability and performance.</li>
<li>Batch ETL/ELT processes.</li>
<li>Providing a stable, versioned dataset for diverse query engines.</li>
</ul>
</li>
<li><strong>Strengths:</strong>
<ul>
<li><strong>Massive Ecosystem:</strong> Supported by Spark, Flink, Trino, Presto, Dremio, Snowflake, major cloud vendors, and many more.</li>
<li><strong>Robustness &amp; Scalability:</strong> Handles petabyte-scale tables reliably.</li>
<li><strong>Advanced Features:</strong> Time travel, schema evolution guarantees, hidden partitioning, ACID transactions.</li>
<li><strong>Mature and Stable:</strong> Widely deployed in production environments.</li>
</ul>
</li>
<li><strong>Drawbacks:</strong>
<ul>
<li>Streaming support is an addition, not its core design; update/CDC latencies are typically higher (minutes to hours) compared to Paimon.</li>
<li>Less optimized for high-frequency concurrent writes compared to stream-native formats.</li>
</ul>
</li>
<li><strong>Origin:</strong> Created by Netflix, now a top-level Apache Software Foundation project.</li>
</ul>

<h3 id="putting-it-all-together-the-streamhouse-architecture" data-numberify>Putting It All Together: The Streamhouse Architecture<a class="anchor ms-1" href="#putting-it-all-together-the-streamhouse-architecture"></a></h3>
<p>These three technologies aren&rsquo;t competitors; they are complementary components of a powerful, tiered Streamhouse architecture:</p>
<ol>
<li><strong>Ingestion &amp; Real-time Access (Hot Layer):</strong> <strong>Fluss</strong> acts as the high-speed ingestion point. Data arriving here is immediately available for millisecond-latency stream processing or real-time dashboards.</li>
<li><strong>Transactional &amp; Unified Access (Warm Layer):</strong> Data flows from Fluss (or directly ingested) into <strong>Paimon</strong>. Paimon provides durable, ACID-compliant tables supporting both real-time updates/CDC and batch/SQL queries with seconds-to-minutes freshness. This is the operational core for mixed workloads.</li>
<li><strong>Archival &amp; Batch Analytics (Cold Layer):</strong> For long-term retention and cost-effective, large-scale batch analysis, data snapshots from Paimon (or potentially older Fluss data) can be periodically moved or archived into <strong>Iceberg</strong>. Iceberg&rsquo;s broad ecosystem support makes it ideal for BI tools and large analytical jobs that don&rsquo;t require sub-minute freshness.</li>
</ol>
<p><strong>Benefits of Integration:</strong></p>
<ul>
<li><strong>Low Latency:</strong> Fluss ensures data is actionable instantly.</li>
<li><strong>Transactional Integrity:</strong> Paimon provides reliable, mutable tables for complex workloads.</li>
<li><strong>Scalable Analytics:</strong> Iceberg offers cost-effective, deep historical analysis with wide tooling support.</li>
<li><strong>Flexibility:</strong> Choose the right storage for the right job based on latency, query patterns, and cost.</li>
</ul>

<h3 id="integration-with-flink-and-spark" data-numberify>Integration with Flink and Spark<a class="anchor ms-1" href="#integration-with-flink-and-spark"></a></h3>
<ul>
<li><strong>Flink:</strong> Apache Flink shines in this architecture.
<ul>
<li><strong>Fluss &amp; Paimon:</strong> Have native, first-class support in Flink. Flink SQL can treat both Fluss streams and Paimon tables as regular tables, enabling powerful unified queries across real-time and operational data.</li>
<li><strong>Iceberg:</strong> Has good Flink connector support for both batch and streaming reads/writes.</li>
<li><strong>Catalog Integration:</strong> Flink&rsquo;s Catalog API works natively with Paimon (paimon catalog) and Iceberg (iceberg catalog), allowing unified metadata management. Fluss doesn&rsquo;t use a catalog; its schema is defined in the job.</li>
</ul>
</li>
<li><strong>Spark:</strong>
<ul>
<li><strong>Iceberg:</strong> Excellent, mature Spark support is a key strength of Iceberg.</li>
<li><strong>Paimon:</strong> Also provides Spark connectors for reading and writing Paimon tables, aiming for broad engine compatibility.</li>
<li><strong>Fluss:</strong> Less direct integration currently; data would typically flow <em>through</em> Flink <em>to</em> Spark or be read from Paimon/Iceberg by Spark.</li>
</ul>
</li>
</ul>

<h3 id="real-time-user-activity-tracking-for-an-e-commerce-platform-scenario" data-numberify>Real-time User Activity Tracking for an E-commerce Platform Scenario<a class="anchor ms-1" href="#real-time-user-activity-tracking-for-an-e-commerce-platform-scenario"></a></h3>
<p>Imagine an e-commerce website that wants to track user actions (like clicks, page views, adding items to cart) in real-time. They have several goals:</p>
<ol>
<li><strong>Immediate Reaction:</strong> Trigger actions <em>instantly</em> based on certain events (e.g., fraud detection on rapid clicks, real-time offer personalization).</li>
<li><strong>Operational Dashboard:</strong> Maintain an up-to-date view of user activity for the last few days, allowing analysts to query recent trends with reasonable latency (seconds to minutes). This data might need updates later (e.g., correcting event types).</li>
<li><strong>Long-term Analytics:</strong> Store all historical event data cost-effectively for large-scale batch analysis, BI reporting (e.g., monthly user engagement reports), and machine learning model training.</li>
</ol>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-05-06-streamhouse-trio/streamhouse.png" loading="lazy" width="1155" height="576" />
</picture>

</p>

<h3 id="conclusion" data-numberify>Conclusion<a class="anchor ms-1" href="#conclusion"></a></h3>
<p>The Streamhouse concept, powered by technologies like Paimon, Fluss, and Iceberg, represents a significant step towards truly unified data architectures. By leveraging the specific strengths of each component - Fluss for speed, Paimon for unified transactional tables, and Iceberg for scalable batch analytics and archival – organizations can build platforms that are performant, flexible, and ready for both real-time demands and deep historical insights. This tiered approach, already common in large tech companies, is now becoming accessible to a wider audience through these powerful open-source projects.</p>
      ]]></content:encoded></item><item><title>Run Flink SQL Cookbook in Docker</title><link>https://jaehyeon.me/blog/2025-04-15-sql-cookbook/</link><guid>https://jaehyeon.me/blog/2025-04-15-sql-cookbook/</guid><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>The <a href="https://github.com/ververica/flink-sql-cookbook" target="_blank" rel="noopener noreferrer">Flink SQL Cookbook<i class="fas fa-external-link-square-alt ms-1"></i></a> by Ververica is a hands-on, example-rich guide to mastering <a href="https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/overview/" target="_blank" rel="noopener noreferrer">Apache Flink SQL<i class="fas fa-external-link-square-alt ms-1"></i></a> for real-time stream processing. It offers a wide range of self-contained recipes, from basic queries and table operations to more advanced use cases like windowed aggregations, complex joins, user-defined functions (UDFs), and pattern detection. These examples are designed to be run on the Ververica Platform, and as such, the cookbook doesn&rsquo;t include instructions for setting up a Flink cluster.</p>
<p>To help you run these recipes locally and explore Flink SQL without external dependencies, this post walks through setting up a fully functional local Flink cluster using Docker Compose. With this setup, you can experiment with the cookbook examples right on your machine.</p>
      ]]></description><content:encoded><![CDATA[
        <p>The <a href="https://github.com/ververica/flink-sql-cookbook" target="_blank" rel="noopener noreferrer">Flink SQL Cookbook<i class="fas fa-external-link-square-alt ms-1"></i></a> by Ververica is a hands-on, example-rich guide to mastering <a href="https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/overview/" target="_blank" rel="noopener noreferrer">Apache Flink SQL<i class="fas fa-external-link-square-alt ms-1"></i></a> for real-time stream processing. It offers a wide range of self-contained recipes, from basic queries and table operations to more advanced use cases like windowed aggregations, complex joins, user-defined functions (UDFs), and pattern detection. These examples are designed to be run on the Ververica Platform, and as such, the cookbook doesn&rsquo;t include instructions for setting up a Flink cluster.</p>
<p>To help you run these recipes locally and explore Flink SQL without external dependencies, this post walks through setting up a fully functional local Flink cluster using Docker Compose. With this setup, you can experiment with the cookbook examples right on your machine.</p>
<h2 id="flink-cluster-on-docker" data-numberify>Flink Cluster on Docker<a class="anchor ms-1" href="#flink-cluster-on-docker"></a></h2>
<p>The cookbook generates sample data using the <a href="https://github.com/knaufk/flink-faker" target="_blank" rel="noopener noreferrer">Flink SQL Faker Connector<i class="fas fa-external-link-square-alt ms-1"></i></a>, which allows for realistic, randomized record generation. To streamline the setup, we use a custom Docker image where the connector&rsquo;s JAR file is downloaded into the <code>/opt/flink/lib/</code> directory. This approach eliminates the need to manually register the connector each time we launch the <a href="https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sqlclient/" target="_blank" rel="noopener noreferrer">Flink SQL client<i class="fas fa-external-link-square-alt ms-1"></i></a>, making it easier to jump straight into experimenting with the cookbook&rsquo;s examples. The source for this post is available in this <a href="https://github.com/jaehyeon-kim/flink-demos/tree/master/flink-sql-cookbook" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-Dockerfile" data-lang="Dockerfile"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">FROM</span><span class="s"> flink:1.20.1</span><span class="err">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="err">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="err"></span><span class="c"># add faker connector</span><span class="err">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="err"></span><span class="k">RUN</span> wget -P /opt/flink/lib/ <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar<span class="err">
</span></span></span></code></pre></div><p>We deploy a local Apache Flink cluster using Docker Compose. It defines one <code>JobManager</code> and three <code>TaskManagers</code>, all using the custom image. The <code>JobManager</code> handles coordination and exposes the Flink web UI on port 8081, while each <code>TaskManager</code> provides 10 task slots for parallel processing. All components share a custom network and use a filesystem-based state backend with checkpointing and savepoint directories configured for local testing. A health check ensures the <code>JobManager</code> is ready before <code>TaskManagers</code> start.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;3&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">jobmanager</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">flink-sql-cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l">jobmanager</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">jobmanager</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;8081:8081&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="l">cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="sd">        FLINK_PROPERTIES=
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="sd">        jobmanager.rpc.address: jobmanager
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="sd">        state.backend: filesystem
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="sd">        state.checkpoints.dir: file:///tmp/flink-checkpoints
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="sd">        state.savepoints.dir: file:///tmp/flink-savepoints
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="sd">        heartbeat.interval: 1000
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="sd">        heartbeat.timeout: 5000
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="sd">        rest.flamegraph.enabled: true
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="sd">        web.backpressure.refresh-interval: 10000</span><span class="w">        
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">    </span><span class="nt">healthcheck</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">      </span><span class="nt">test</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;CMD&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;curl&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;-f&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;http://localhost:8081/config&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">      </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">      </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">      </span><span class="nt">retries</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">  </span><span class="nt">taskmanager-1</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">flink-sql-cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager</span><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager-1</span><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w">      </span>- <span class="l">cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="w">      </span><span class="nt">jobmanager</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l">service_healthy</span><span class="w">
</span></span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="w">      </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="sd">        FLINK_PROPERTIES=
</span></span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="sd">        jobmanager.rpc.address: jobmanager
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="sd">        taskmanager.numberOfTaskSlots: 10
</span></span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="sd">        state.backend: filesystem
</span></span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="sd">        state.checkpoints.dir: file:///tmp/flink-checkpoints
</span></span></span><span class="line"><span class="ln">47</span><span class="cl"><span class="sd">        state.savepoints.dir: file:///tmp/flink-savepoints
</span></span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="sd">        heartbeat.interval: 1000
</span></span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="sd">        heartbeat.timeout: 5000</span><span class="w">        
</span></span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="w">  </span><span class="nt">taskmanager-2</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">flink-sql-cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln">54</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager</span><span class="w">
</span></span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager-2</span><span class="w">
</span></span></span><span class="line"><span class="ln">56</span><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="w">      </span>- <span class="l">cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">58</span><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">59</span><span class="cl"><span class="w">      </span><span class="nt">jobmanager</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">60</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l">service_healthy</span><span class="w">
</span></span></span><span class="line"><span class="ln">61</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">62</span><span class="cl"><span class="w">      </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">63</span><span class="cl"><span class="sd">        FLINK_PROPERTIES=
</span></span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="sd">        jobmanager.rpc.address: jobmanager
</span></span></span><span class="line"><span class="ln">65</span><span class="cl"><span class="sd">        taskmanager.numberOfTaskSlots: 10
</span></span></span><span class="line"><span class="ln">66</span><span class="cl"><span class="sd">        state.backend: filesystem
</span></span></span><span class="line"><span class="ln">67</span><span class="cl"><span class="sd">        state.checkpoints.dir: file:///tmp/flink-checkpoints
</span></span></span><span class="line"><span class="ln">68</span><span class="cl"><span class="sd">        state.savepoints.dir: file:///tmp/flink-savepoints
</span></span></span><span class="line"><span class="ln">69</span><span class="cl"><span class="sd">        heartbeat.interval: 1000
</span></span></span><span class="line"><span class="ln">70</span><span class="cl"><span class="sd">        heartbeat.timeout: 5000</span><span class="w">        
</span></span></span><span class="line"><span class="ln">71</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">72</span><span class="cl"><span class="w">  </span><span class="nt">taskmanager-3</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">73</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">flink-sql-cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">74</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln">75</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager</span><span class="w">
</span></span></span><span class="line"><span class="ln">76</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">taskmanager-3</span><span class="w">
</span></span></span><span class="line"><span class="ln">77</span><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">78</span><span class="cl"><span class="w">      </span>- <span class="l">cookbook</span><span class="w">
</span></span></span><span class="line"><span class="ln">79</span><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">80</span><span class="cl"><span class="w">      </span><span class="nt">jobmanager</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">81</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l">service_healthy</span><span class="w">
</span></span></span><span class="line"><span class="ln">82</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">83</span><span class="cl"><span class="w">      </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">84</span><span class="cl"><span class="sd">        FLINK_PROPERTIES=
</span></span></span><span class="line"><span class="ln">85</span><span class="cl"><span class="sd">        jobmanager.rpc.address: jobmanager
</span></span></span><span class="line"><span class="ln">86</span><span class="cl"><span class="sd">        taskmanager.numberOfTaskSlots: 10
</span></span></span><span class="line"><span class="ln">87</span><span class="cl"><span class="sd">        state.backend: filesystem
</span></span></span><span class="line"><span class="ln">88</span><span class="cl"><span class="sd">        state.checkpoints.dir: file:///tmp/flink-checkpoints
</span></span></span><span class="line"><span class="ln">89</span><span class="cl"><span class="sd">        state.savepoints.dir: file:///tmp/flink-savepoints
</span></span></span><span class="line"><span class="ln">90</span><span class="cl"><span class="sd">        heartbeat.interval: 1000
</span></span></span><span class="line"><span class="ln">91</span><span class="cl"><span class="sd">        heartbeat.timeout: 5000</span><span class="w">        
</span></span></span><span class="line"><span class="ln">92</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">93</span><span class="cl"><span class="w"></span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">94</span><span class="cl"><span class="w">  </span><span class="nt">cookbook</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">95</span><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">flink-sql-cookbook</span><span class="w">
</span></span></span></code></pre></div><p>The Flink cluster can be deployed as follows.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># start containers</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">$ docker compose up -d
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># list containers</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">$ docker-compose ps
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># NAME                COMMAND                  SERVICE             STATUS              PORTS</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># jobmanager          &#34;/docker-entrypoint.…&#34;   jobmanager          running (healthy)   6123/tcp, 0.0.0.0:8081-&gt;8081/tcp, :::8081-&gt;8081/tcp</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># taskmanager-1       &#34;/docker-entrypoint.…&#34;   taskmanager-1       running             6123/tcp, 8081/tcp</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># taskmanager-2       &#34;/docker-entrypoint.…&#34;   taskmanager-2       running             6123/tcp, 8081/tcp</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># taskmanager-3       &#34;/docker-entrypoint.…&#34;   taskmanager-3       running             6123/tcp, 8081/tcp</span>
</span></span></code></pre></div>
<h2 id="flink-sql-client" data-numberify>Flink SQL Client<a class="anchor ms-1" href="#flink-sql-client"></a></h2>
<p>We can start the SQL client from the <code>JobManager</code> container as shown below.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">$ docker <span class="nb">exec</span> -it jobmanager /opt/flink/bin/sql-client.sh
</span></span></code></pre></div><p>On the SQL shell, we can execute Flink SQL statements.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- // create a temporary table
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TEMPORARY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">heros</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="o">`</span><span class="n">name</span><span class="o">`</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="o">`</span><span class="n">power</span><span class="o">`</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="o">`</span><span class="n">age</span><span class="o">`</span><span class="w"> </span><span class="nb">INT</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">  </span><span class="s1">&#39;connector&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;faker&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.name.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{superhero.name}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.power.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{superhero.power}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.power.null-rate&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;0.05&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.age.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{number.numberBetween &#39;&#39;0&#39;&#39;,&#39;&#39;1000&#39;&#39;}&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w"></span><span class="c1">-- [INFO] Execute statement succeeded.
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w"></span><span class="c1">-- list tables
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">TABLES</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w"></span><span class="c1">-- +------------+
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1">-- | table name |
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="c1">-- +------------+
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1">-- |      heros |
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1">-- +------------+
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="c1">-- 1 row in set
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w"></span><span class="c1">-- query records from the heros table
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1">-- hit &#39;q&#39; to exit the record view
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">heros</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w"></span><span class="c1">-- quit sql shell
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="c1"></span><span class="n">quit</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-04-15-sql-cookbook/featured.gif" loading="lazy" width="1811" height="826" />
</picture>

</p>
<p>The associating Flink job of the SELECT query can be found on the Flink Web UI at <code>http://localhost:8081</code>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-04-15-sql-cookbook/web-ui.png" loading="lazy" width="1179" height="492" />
</picture>

</p>

<h2 id="caveat" data-numberify>Caveat<a class="anchor ms-1" href="#caveat"></a></h2>
<p>Some examples in the cookbook rely on an older version of the Faker connector, and as a result, certain directives used in the queries are no longer supported in the latest version—leading to runtime errors. For instance, the following query fails because the <code>#{Internet.userAgentAny}</code> directive has been removed. To resolve this, you can either remove the <code>user_agent</code> field from the query or replace the outdated directive with a supported one, such as using <code>regexify</code> to generate similar values.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">server_logs</span><span class="w"> </span><span class="p">(</span><span class="w"> 
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">    </span><span class="n">client_ip</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="n">client_identity</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="n">userid</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="n">user_agent</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="n">log_time</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">(</span><span class="mi">3</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="n">request_line</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="n">status_code</span><span class="w"> </span><span class="n">STRING</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="k">size</span><span class="w"> </span><span class="nb">INT</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="s1">&#39;connector&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;faker&#39;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.client_ip.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{Internet.publicIpV4Address}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.client_identity.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w">  </span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.userid.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w">  </span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.user_agent.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{Internet.userAgentAny}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.log_time.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w">  </span><span class="s1">&#39;#{date.past &#39;&#39;15&#39;&#39;,&#39;&#39;5&#39;&#39;,&#39;&#39;SECONDS&#39;&#39;}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.request_line.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{regexify &#39;&#39;(GET|POST|PUT|PATCH){1}&#39;&#39;} #{regexify &#39;&#39;(/search\.html|/login\.html|/prod\.html|cart\.html|/order\.html){1}&#39;&#39;} #{regexify &#39;&#39;(HTTP/1\.1|HTTP/2|/HTTP/1\.0){1}&#39;&#39;}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.status_code.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{regexify &#39;&#39;(200|201|204|400|401|403|301){1}&#39;&#39;}&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">  </span><span class="s1">&#39;fields.size.expression&#39;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;#{number.numberBetween &#39;&#39;100&#39;&#39;,&#39;&#39;10000000&#39;&#39;}&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-04-15-sql-cookbook/sql-error.gif" loading="lazy" width="1811" height="826" />
</picture>

</p>
      ]]></content:encoded></item><item><title>Realtime Dashboard with FastAPI, Streamlit and Next.js - Part 3 Next.js Dashboard</title><link>https://jaehyeon.me/blog/2025-03-04-realtime-dashboard-3/</link><guid>https://jaehyeon.me/blog/2025-03-04-realtime-dashboard-3/</guid><pubDate>Tue, 04 Mar 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In this post, we build a real-time monitoring dashboard using <a href="https://nextjs.org/" target="_blank" rel="noopener noreferrer">Next.js<i class="fas fa-external-link-square-alt ms-1"></i></a>, a React framework that supports server-side rendering, static site generation, and full-stack capabilities with built-in performance optimizations. Similar to the <em>Streamlit</em> app we developed in <a href="/blog/2025-02-25-realtime-dashboard-2">Part 2</a>, this dashboard connects to the WebSocket server from <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a> to continuously fetch and visualize key metrics such as <strong>order counts</strong>, <strong>sales data</strong>, and <strong>revenue by traffic source and country</strong>. With interactive bar charts and dynamic metrics, users can monitor sales trends and other critical business KPIs in real-time.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In this post, we build a real-time monitoring dashboard using <a href="https://nextjs.org/" target="_blank" rel="noopener noreferrer">Next.js<i class="fas fa-external-link-square-alt ms-1"></i></a>, a React framework that supports server-side rendering, static site generation, and full-stack capabilities with built-in performance optimizations. Similar to the <em>Streamlit</em> app we developed in <a href="/blog/2025-02-25-realtime-dashboard-2">Part 2</a>, this dashboard connects to the WebSocket server from <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a> to continuously fetch and visualize key metrics such as <strong>order counts</strong>, <strong>sales data</strong>, and <strong>revenue by traffic source and country</strong>. With interactive bar charts and dynamic metrics, users can monitor sales trends and other critical business KPIs in real-time.</p>
<ul>
<li><a href="/blog/2025-02-18-realtime-dashboard-1">Part 1 Data Producer</a></li>
<li><a href="/blog/2025-02-25-realtime-dashboard-2">Part 2 Streamlit Dashboard</a></li>
<li><a href="/blog/2025-03-04-realtime-dashboard-3/#">Part 3 Next.js Dashboard</a> (this post)</li>
</ul>

<h2 id="nextjs-frontend" data-numberify>Next.js Frontend<a class="anchor ms-1" href="#nextjs-frontend"></a></h2>
<p>The Next.js dashboard processes and displays real-time <em>theLook eCommerce data</em>. It connects to the WebSocket server using the <a href="https://github.com/robtaussig/react-use-websocket" target="_blank" rel="noopener noreferrer"><em>React useWebSocket</em><i class="fas fa-external-link-square-alt ms-1"></i></a> package, while the UI is styled with <a href="https://www.heroui.com/" target="_blank" rel="noopener noreferrer">HeroUI (formerly NextUI)<i class="fas fa-external-link-square-alt ms-1"></i></a> and <a href="https://tailwindcss.com/" target="_blank" rel="noopener noreferrer">Tailwind CSS<i class="fas fa-external-link-square-alt ms-1"></i></a>. Visualizations are powered by <a href="https://github.com/hustcc/echarts-for-react" target="_blank" rel="noopener noreferrer">Apache ECharts<i class="fas fa-external-link-square-alt ms-1"></i></a>. The source code for this post is available in this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/product-demos" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="metric-component" data-numberify>Metric Component<a class="anchor ms-1" href="#metric-component"></a></h3>
<p>We use a React component called <code>Metric</code> that displays a metric card with the following props:</p>
<ul>
<li><code>label</code>: The title or name of the metric.</li>
<li><code>value</code>: The value of the metric (could represent a number or currency).</li>
<li><code>delta</code>: The change in the metric value (used to indicate increase or decrease).</li>
<li><code>is_currency</code>: A boolean flag to indicate whether the value should be formatted as a currency.</li>
</ul>
<p>The card&rsquo;s visual layout includes the label at the top, the formatted value in large text, and the delta change with an arrow beneath it.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-jsx" data-lang="jsx"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// nextjs/src/components/metric.tsx
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="s2">&#34;use client&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kr">import</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="nx">Card</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="nx">CardHeader</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  <span class="nx">CardBody</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  <span class="nx">Divider</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="nx">CardFooter</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;@nextui-org/react&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="kr">export</span> <span class="kr">interface</span> <span class="nx">MetricProps</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  <span class="nx">label</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="nx">value</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="nx">delta</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="nx">is_currency</span><span class="o">:</span> <span class="kr">boolean</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="kr">export</span> <span class="k">default</span> <span class="kd">function</span> <span class="nx">Metric</span><span class="p">({</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">  <span class="nx">label</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">  <span class="nx">value</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">  <span class="nx">delta</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">  <span class="nx">is_currency</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="p">}</span><span class="o">:</span> <span class="nx">MetricProps</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">  <span class="kr">const</span> <span class="nx">formatted_value</span> <span class="o">=</span> <span class="nx">is_currency</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="o">?</span> <span class="s2">&#34;$ &#34;</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="o">:</span> <span class="nx">value</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">();</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">  <span class="kr">const</span> <span class="nx">arrowColor</span> <span class="o">=</span> <span class="nx">delta</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">?</span> <span class="s2">&#34;black&#34;</span> <span class="o">:</span> <span class="nx">delta</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="s2">&#34;green&#34;</span> <span class="o">:</span> <span class="s2">&#34;red&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">  <span class="k">return</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">    <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;col-span-12 md:col-span-4&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">      <span class="p">&lt;</span><span class="nt">Card</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">        <span class="p">&lt;</span><span class="nt">CardHeader</span><span class="p">&gt;{</span><span class="nx">label</span><span class="p">}&lt;/</span><span class="nt">CardHeader</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">        <span class="p">&lt;</span><span class="nt">CardBody</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">          <span class="p">&lt;</span><span class="nt">h1</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;text-4xl font-bold&#34;</span><span class="p">&gt;{</span><span class="nx">formatted_value</span><span class="p">}&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">        <span class="p">&lt;/</span><span class="nt">CardBody</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">        <span class="p">&lt;</span><span class="nt">Divider</span> <span class="p">/&gt;</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">        <span class="p">&lt;</span><span class="nt">CardFooter</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">          <span class="p">&lt;</span><span class="nt">svg</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">            <span class="na">height</span><span class="o">=</span><span class="p">{</span><span class="mi">25</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">            <span class="na">viewBox</span><span class="o">=</span><span class="s">&#34;0 0 24 24&#34;</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">            <span class="na">aria</span><span class="err">-</span><span class="na">hidden</span><span class="o">=</span><span class="s">&#34;true&#34;</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">            <span class="na">focusable</span><span class="o">=</span><span class="s">&#34;false&#34;</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">            <span class="na">fill</span><span class="o">=</span><span class="p">{</span><span class="nx">arrowColor</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">            <span class="na">xmlns</span><span class="o">=</span><span class="s">&#34;http://www.w3.org/2000/svg&#34;</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">            <span class="na">color</span><span class="o">=</span><span class="s">&#34;inherit&#34;</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">            <span class="na">data</span><span class="err">-</span><span class="na">testid</span><span class="o">=</span><span class="s">&#34;stMetricDeltaIcon-Up&#34;</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">            <span class="na">className</span><span class="o">=</span><span class="s">&#34;e14lo1l1 st-emotion-cache-1ksdj5j ex0cdmw0&#34;</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">          <span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">            <span class="p">&lt;</span><span class="nt">path</span> <span class="na">fill</span><span class="o">=</span><span class="s">&#34;none&#34;</span> <span class="na">d</span><span class="o">=</span><span class="s">&#34;M0 0h24v24H0V0z&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">path</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">            <span class="p">&lt;</span><span class="nt">path</span> <span class="na">d</span><span class="o">=</span><span class="s">&#34;M4 12l1.41 1.41L11 7.83V20h2V7.83l5.58 5.59L20 12l-8-8-8 8z&#34;</span><span class="p">&gt;&lt;/</span><span class="nt">path</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">          <span class="p">&lt;/</span><span class="nt">svg</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">          <span class="p">&lt;</span><span class="nt">h1</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;text-xl&#34;</span><span class="p">&gt;{</span><span class="nx">delta</span><span class="p">.</span><span class="nx">toLocaleString</span><span class="p">()}&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">        <span class="p">&lt;/</span><span class="nt">CardFooter</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">      <span class="p">&lt;/</span><span class="nt">Card</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">    <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">  <span class="p">);</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="data-processing-utility" data-numberify>Data Processing Utility<a class="anchor ms-1" href="#data-processing-utility"></a></h3>
<p>Since I have yet to find an effective data manipulation library comparable to Python&rsquo;s Pandas, data processing is handled using custom objects and functions. The code primarily operates on arrays of <code>Record</code>s to compute sales metrics and generate visual representations. The <code>getMetrics</code> and <code>createMetricItems</code> functions are used to calculate current/delta metrics and construct an array of <code>MetricProp</code>s that can be added to the <em>Metric</em> component. Also, the <code>createOptionsItems</code> function is responsible for generating data visualizations, specifically bar charts that show revenue by categories such as country and traffic source.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-jsx" data-lang="jsx"><span class="line"><span class="ln">  1</span><span class="cl"><span class="c1">// nextjs/src/lib/processing.tsx
</span></span></span><span class="line"><span class="ln">  2</span><span class="cl"><span class="c1"></span><span class="kr">import</span> <span class="p">{</span> <span class="nx">MetricProps</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;@/components/metric&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">  3</span><span class="cl">
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="kr">export</span> <span class="kr">interface</span> <span class="nx">Metrics</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl">  <span class="nx">num_orders</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl">  <span class="nx">num_order_items</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl">  <span class="nx">total_sales</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl">
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="kr">export</span> <span class="kr">interface</span> <span class="nx">Record</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl">  <span class="nx">user_id</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl">  <span class="nx">age</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl">  <span class="nx">gender</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl">  <span class="nx">country</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl">  <span class="nx">traffic_source</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">  <span class="nx">order_id</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">  <span class="nx">item_id</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">  <span class="nx">category</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">  <span class="nx">item_status</span><span class="o">:</span> <span class="nx">string</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">  <span class="nx">sale_price</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">  <span class="nx">created_at</span><span class="o">:</span> <span class="nx">number</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">
</span></span><span class="line"><span class="ln"> 24</span><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">defaultMetrics</span><span class="o">:</span> <span class="nx">Metrics</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl">  <span class="nx">num_orders</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">  <span class="nx">num_order_items</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">  <span class="nx">total_sales</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">
</span></span><span class="line"><span class="ln"> 30</span><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">defaultMetricItems</span><span class="o">:</span> <span class="nx">MetricProps</span><span class="p">[]</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">  <span class="p">{</span> <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Number of Orders&#34;</span><span class="p">,</span> <span class="nx">value</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">delta</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">false</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">  <span class="p">{</span> <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Number of Order Items&#34;</span><span class="p">,</span> <span class="nx">value</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">delta</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">false</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">  <span class="p">{</span> <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Total Sales&#34;</span><span class="p">,</span> <span class="nx">value</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">delta</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">true</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl"><span class="p">];</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">
</span></span><span class="line"><span class="ln"> 36</span><span class="cl"><span class="kr">export</span> <span class="kd">function</span> <span class="nx">getMetrics</span><span class="p">(</span><span class="nx">records</span><span class="o">:</span> <span class="nx">Record</span><span class="p">[])</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">  <span class="kr">const</span> <span class="nx">num_orders</span> <span class="o">=</span> <span class="p">[...</span><span class="k">new</span> <span class="nx">Set</span><span class="p">(</span><span class="nx">records</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">r</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nx">r</span><span class="p">.</span><span class="nx">order_id</span><span class="p">))].</span><span class="nx">length</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">  <span class="kr">const</span> <span class="nx">num_order_items</span> <span class="o">=</span> <span class="p">[...</span><span class="k">new</span> <span class="nx">Set</span><span class="p">(</span><span class="nx">records</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">r</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nx">r</span><span class="p">.</span><span class="nx">item_id</span><span class="p">))].</span><span class="nx">length</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">  <span class="kr">const</span> <span class="nx">total_sales</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">round</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">    <span class="nx">records</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">r</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nb">Number</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">sale_price</span><span class="p">)).</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nx">a</span> <span class="o">+</span> <span class="nx">b</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">  <span class="p">);</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">  <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">    <span class="nx">num_orders</span><span class="o">:</span> <span class="nx">num_orders</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">    <span class="nx">num_order_items</span><span class="o">:</span> <span class="nx">num_order_items</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">    <span class="nx">total_sales</span><span class="o">:</span> <span class="nx">total_sales</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">  <span class="p">};</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">
</span></span><span class="line"><span class="ln"> 49</span><span class="cl"><span class="kr">export</span> <span class="kd">function</span> <span class="nx">createMetricItems</span><span class="p">(</span><span class="nx">currMetrics</span><span class="o">:</span> <span class="nx">Metrics</span><span class="p">,</span> <span class="nx">prevMetrics</span><span class="o">:</span> <span class="nx">Metrics</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">  <span class="kr">const</span> <span class="nx">labels</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">    <span class="p">{</span> <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Number of Orders&#34;</span><span class="p">,</span> <span class="nx">metric</span><span class="o">:</span> <span class="s2">&#34;num_orders&#34;</span><span class="p">,</span> <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">false</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">      <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Number of Order Items&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">      <span class="nx">metric</span><span class="o">:</span> <span class="s2">&#34;num_order_items&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">      <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">    <span class="p">{</span> <span class="nx">label</span><span class="o">:</span> <span class="s2">&#34;Total Sales&#34;</span><span class="p">,</span> <span class="nx">metric</span><span class="o">:</span> <span class="s2">&#34;total_sales&#34;</span><span class="p">,</span> <span class="nx">is_currency</span><span class="o">:</span> <span class="kc">true</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">  <span class="p">];</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">  <span class="k">return</span> <span class="nx">labels</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">obj</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">    <span class="kr">const</span> <span class="nx">label</span> <span class="o">=</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">label</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">    <span class="kr">const</span> <span class="nx">value</span> <span class="o">=</span> <span class="nx">currMetrics</span><span class="p">[</span><span class="nx">obj</span><span class="p">.</span><span class="nx">metric</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Metrics</span><span class="p">];</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">    <span class="kr">const</span> <span class="nx">delta</span> <span class="o">=</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">      <span class="nx">currMetrics</span><span class="p">[</span><span class="nx">obj</span><span class="p">.</span><span class="nx">metric</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Metrics</span><span class="p">]</span> <span class="o">-</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">      <span class="nx">prevMetrics</span><span class="p">[</span><span class="nx">obj</span><span class="p">.</span><span class="nx">metric</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Metrics</span><span class="p">];</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">    <span class="kr">const</span> <span class="nx">is_currency</span> <span class="o">=</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">is_currency</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">      <span class="nx">label</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">      <span class="nx">value</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">      <span class="nx">delta</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">      <span class="nx">is_currency</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">    <span class="p">};</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">  <span class="p">});</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">
</span></span><span class="line"><span class="ln"> 75</span><span class="cl"><span class="kr">export</span> <span class="kd">function</span> <span class="nx">createOptionsItems</span><span class="p">(</span><span class="nx">records</span><span class="o">:</span> <span class="nx">Record</span><span class="p">[])</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">  <span class="kr">const</span> <span class="nx">chartCols</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">    <span class="p">{</span> <span class="nx">x</span><span class="o">:</span> <span class="s2">&#34;country&#34;</span><span class="p">,</span> <span class="nx">y</span><span class="o">:</span> <span class="s2">&#34;sale_price&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">    <span class="p">{</span> <span class="nx">x</span><span class="o">:</span> <span class="s2">&#34;traffic_source&#34;</span><span class="p">,</span> <span class="nx">y</span><span class="o">:</span> <span class="s2">&#34;sale_price&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">  <span class="p">];</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">  <span class="k">return</span> <span class="nx">chartCols</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">col</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">    <span class="c1">// key is string but it throws the following error. Change the type to &#39;string | number&#39;.
</span></span></span><span class="line"><span class="ln"> 82</span><span class="cl"><span class="c1"></span>    <span class="c1">// Argument of type &#39;string | number&#39; is not assignable to parameter of type &#39;string&#39;.
</span></span></span><span class="line"><span class="ln"> 83</span><span class="cl"><span class="c1"></span>    <span class="c1">// Type &#39;number&#39; is not assignable to type &#39;string&#39;.ts(2345)
</span></span></span><span class="line"><span class="ln"> 84</span><span class="cl"><span class="c1"></span>    <span class="kr">const</span> <span class="nx">recordsMap</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Map</span><span class="p">&lt;</span><span class="nt">string</span> <span class="err">|</span> <span class="na">number</span><span class="err">,</span> <span class="na">number</span><span class="p">&gt;();</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">    <span class="k">for</span> <span class="p">(</span><span class="kr">const</span> <span class="nx">r</span> <span class="k">of</span> <span class="nx">records</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">      <span class="nx">recordsMap</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">        <span class="nx">r</span><span class="p">[</span><span class="nx">col</span><span class="p">.</span><span class="nx">x</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Record</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">        <span class="p">(</span><span class="nx">recordsMap</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">r</span><span class="p">[</span><span class="nx">col</span><span class="p">.</span><span class="nx">x</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Record</span><span class="p">])</span> <span class="o">||</span> <span class="mi">0</span><span class="p">)</span> <span class="o">+</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">          <span class="nb">Number</span><span class="p">(</span><span class="nx">r</span><span class="p">[</span><span class="nx">col</span><span class="p">.</span><span class="nx">y</span> <span class="nx">as</span> <span class="nx">keyof</span> <span class="nx">Record</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">      <span class="p">);</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">    <span class="kr">const</span> <span class="nx">recordsItems</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="nx">from</span><span class="p">(</span><span class="nx">recordsMap</span><span class="p">,</span> <span class="p">([</span><span class="nx">x</span><span class="p">,</span> <span class="nx">y</span><span class="p">])</span> <span class="p">=&gt;</span> <span class="p">({</span> <span class="nx">x</span><span class="p">,</span> <span class="nx">y</span> <span class="p">})).</span><span class="nx">sort</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">      <span class="p">(</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="p">(</span><span class="nx">a</span><span class="p">.</span><span class="nx">y</span> <span class="o">&gt;</span> <span class="nx">b</span><span class="p">.</span><span class="nx">y</span> <span class="o">?</span> <span class="o">-</span><span class="mi">1</span> <span class="o">:</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">    <span class="p">);</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">    <span class="kr">const</span> <span class="nx">suffix</span> <span class="o">=</span> <span class="nx">col</span><span class="p">.</span><span class="nx">x</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">      <span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s2">&#34;_&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">      <span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">w</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nx">w</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="nx">toUpperCase</span><span class="p">()</span> <span class="o">+</span> <span class="nx">w</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">      <span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s2">&#34; &#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">      <span class="nx">title</span><span class="o">:</span> <span class="p">{</span> <span class="nx">text</span><span class="o">:</span> <span class="s2">&#34;Revenue by &#34;</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="nx">suffix</span><span class="p">)</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">      <span class="nx">yAxis</span><span class="o">:</span> <span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="s2">&#34;value&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">      <span class="nx">xAxis</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">        <span class="nx">type</span><span class="o">:</span> <span class="s2">&#34;category&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">        <span class="nx">data</span><span class="o">:</span> <span class="nx">recordsItems</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">r</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nx">r</span><span class="p">.</span><span class="nx">x</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">        <span class="nx">axisLabel</span><span class="o">:</span> <span class="p">{</span> <span class="nx">show</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nx">rotate</span><span class="o">:</span> <span class="mi">75</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">      <span class="p">},</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">      <span class="nx">series</span><span class="o">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">          <span class="nx">data</span><span class="o">:</span> <span class="nx">recordsItems</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">r</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">round</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">y</span><span class="p">)),</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">          <span class="nx">type</span><span class="o">:</span> <span class="s2">&#34;bar&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">111</span><span class="cl">          <span class="nx">colorBy</span><span class="o">:</span> <span class="s2">&#34;data&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">      <span class="p">],</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">      <span class="nx">tooltip</span><span class="o">:</span> <span class="p">{</span> <span class="nx">trigger</span><span class="o">:</span> <span class="s2">&#34;axis&#34;</span><span class="p">,</span> <span class="nx">axisPointer</span><span class="o">:</span> <span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="s2">&#34;shadow&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">115</span><span class="cl">    <span class="p">};</span>
</span></span><span class="line"><span class="ln">116</span><span class="cl">  <span class="p">});</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h3 id="application" data-numberify>Application<a class="anchor ms-1" href="#application"></a></h3>
<p>The main component builds a real-time eCommerce dashboard that connects to a WebSocket server at <code>ws://localhost:8000/ws</code> to fetch and display live data. It uses the <em>React useWebSocket</em> package (<code>react-use-websocket</code>) to manage the WebSocket connection, and whenever new data is received, it updates the state with the latest metrics and chart options. The data processing is handled by helper functions (<code>getMetrics</code>, <code>createMetricItems</code>, and <code>createOptionsItems</code>), which compute summary metrics and prepare visualization data. The UI dynamically updates to display key business metrics using the <em>Metric</em> component and interactive bar charts powered by <em>Apache ECharts</em> (<code>echarts-for-react</code>). A checkbox allows users to toggle the WebSocket connection on or off, giving them control over real-time updates.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-jsx" data-lang="jsx"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// nextjs/src/app/page.tsx
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="s2">&#34;use client&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kr">import</span> <span class="p">{</span> <span class="nx">useEffect</span><span class="p">,</span> <span class="nx">useState</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;react&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kr">import</span> <span class="p">{</span> <span class="nx">Checkbox</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;@nextui-org/react&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kr">import</span> <span class="nx">ReactECharts</span><span class="p">,</span> <span class="p">{</span> <span class="nx">EChartsOption</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;echarts-for-react&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kr">import</span> <span class="nx">useWebSocket</span> <span class="nx">from</span> <span class="s2">&#34;react-use-websocket&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kr">import</span> <span class="nx">Metric</span><span class="p">,</span> <span class="p">{</span> <span class="nx">MetricProps</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;@/components/metric&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="kr">import</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="nx">getMetrics</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  <span class="nx">createMetricItems</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  <span class="nx">defaultMetrics</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="nx">defaultMetricItems</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="nx">createOptionsItems</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;@/lib/processing&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="kr">export</span> <span class="k">default</span> <span class="kd">function</span> <span class="nx">Home</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">  <span class="kr">const</span> <span class="p">[</span><span class="nx">toConnect</span><span class="p">,</span> <span class="nx">toggleToConnect</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">  <span class="kr">const</span> <span class="p">[</span><span class="nx">currMetrics</span><span class="p">,</span> <span class="nx">setCurrMetrics</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="p">(</span><span class="nx">defaultMetrics</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">  <span class="kr">const</span> <span class="p">[</span><span class="nx">prevMetrics</span><span class="p">,</span> <span class="nx">setPrevMetrics</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="p">(</span><span class="nx">defaultMetrics</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">  <span class="kr">const</span> <span class="p">[</span><span class="nx">metricItems</span><span class="p">,</span> <span class="nx">setMetricItems</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="p">(</span><span class="nx">defaultMetricItems</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">  <span class="kr">const</span> <span class="p">[</span><span class="nx">chartOptions</span><span class="p">,</span> <span class="nx">setChartOptions</span><span class="p">]</span> <span class="o">=</span> <span class="nx">useState</span><span class="p">([]</span> <span class="nx">as</span> <span class="nx">EChartsOption</span><span class="p">[]);</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">
</span></span><span class="line"><span class="ln">25</span><span class="cl">  <span class="kr">const</span> <span class="p">{</span> <span class="nx">lastJsonMessage</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">useWebSocket</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">    <span class="s2">&#34;ws://localhost:8000/ws&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">      <span class="nx">share</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">      <span class="nx">shouldReconnect</span><span class="o">:</span> <span class="p">()</span> <span class="p">=&gt;</span> <span class="kc">true</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">    <span class="nx">toConnect</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl">  <span class="p">);</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">
</span></span><span class="line"><span class="ln">34</span><span class="cl">  <span class="nx">useEffect</span><span class="p">(()</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="kr">const</span> <span class="nx">records</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">lastJsonMessage</span> <span class="nx">as</span> <span class="nx">string</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="o">!!</span><span class="nx">records</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">      <span class="nx">setPrevMetrics</span><span class="p">(</span><span class="nx">currMetrics</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">      <span class="nx">setCurrMetrics</span><span class="p">(</span><span class="nx">getMetrics</span><span class="p">(</span><span class="nx">records</span><span class="p">));</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">      <span class="nx">setMetricItems</span><span class="p">(</span><span class="nx">createMetricItems</span><span class="p">(</span><span class="nx">currMetrics</span><span class="p">,</span> <span class="nx">prevMetrics</span><span class="p">));</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">      <span class="nx">setChartOptions</span><span class="p">(</span><span class="nx">createOptionsItems</span><span class="p">(</span><span class="nx">records</span><span class="p">));</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl">  <span class="p">},</span> <span class="p">[</span><span class="nx">lastJsonMessage</span><span class="p">]);</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl">
</span></span><span class="line"><span class="ln">44</span><span class="cl">  <span class="kr">const</span> <span class="nx">createMetrics</span> <span class="o">=</span> <span class="p">(</span><span class="nx">metricItems</span><span class="o">:</span> <span class="nx">MetricProps</span><span class="p">[])</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="k">return</span> <span class="nx">metricItems</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">item</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">      <span class="k">return</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="p">&lt;</span><span class="nt">Metric</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">          <span class="na">key</span><span class="o">=</span><span class="p">{</span><span class="nx">i</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">          <span class="na">label</span><span class="o">=</span><span class="p">{</span><span class="nx">item</span><span class="p">.</span><span class="nx">label</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">          <span class="na">value</span><span class="o">=</span><span class="p">{</span><span class="nx">item</span><span class="p">.</span><span class="nx">value</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl">          <span class="na">delta</span><span class="o">=</span><span class="p">{</span><span class="nx">item</span><span class="p">.</span><span class="nx">delta</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">          <span class="na">is_currency</span><span class="o">=</span><span class="p">{</span><span class="nx">item</span><span class="p">.</span><span class="nx">is_currency</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">53</span><span class="cl">        <span class="p">&gt;&lt;/</span><span class="nt">Metric</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">54</span><span class="cl">      <span class="p">);</span>
</span></span><span class="line"><span class="ln">55</span><span class="cl">    <span class="p">});</span>
</span></span><span class="line"><span class="ln">56</span><span class="cl">  <span class="p">};</span>
</span></span><span class="line"><span class="ln">57</span><span class="cl">
</span></span><span class="line"><span class="ln">58</span><span class="cl">  <span class="kr">const</span> <span class="nx">createCharts</span> <span class="o">=</span> <span class="p">(</span><span class="nx">chartOptions</span><span class="o">:</span> <span class="nx">EChartsOption</span><span class="p">[])</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">59</span><span class="cl">    <span class="k">return</span> <span class="nx">chartOptions</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">option</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">60</span><span class="cl">      <span class="k">return</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">61</span><span class="cl">        <span class="p">&lt;</span><span class="nt">ReactECharts</span>
</span></span><span class="line"><span class="ln">62</span><span class="cl">          <span class="na">key</span><span class="o">=</span><span class="p">{</span><span class="nx">i</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">63</span><span class="cl">          <span class="na">className</span><span class="o">=</span><span class="s">&#34;col-span-12 md:col-span-6&#34;</span>
</span></span><span class="line"><span class="ln">64</span><span class="cl">          <span class="na">option</span><span class="o">=</span><span class="p">{</span><span class="nx">option</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">65</span><span class="cl">          <span class="na">style</span><span class="o">=</span><span class="p">{{</span> <span class="nx">height</span><span class="o">:</span> <span class="s2">&#34;500px&#34;</span> <span class="p">}}</span>
</span></span><span class="line"><span class="ln">66</span><span class="cl">        <span class="p">/&gt;</span>
</span></span><span class="line"><span class="ln">67</span><span class="cl">      <span class="p">);</span>
</span></span><span class="line"><span class="ln">68</span><span class="cl">    <span class="p">});</span>
</span></span><span class="line"><span class="ln">69</span><span class="cl">  <span class="p">};</span>
</span></span><span class="line"><span class="ln">70</span><span class="cl">
</span></span><span class="line"><span class="ln">71</span><span class="cl">  <span class="k">return</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">72</span><span class="cl">    <span class="p">&lt;</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">73</span><span class="cl">      <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;mt-20&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">74</span><span class="cl">        <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;flex m-2 justify-between items-center&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">75</span><span class="cl">          <span class="p">&lt;</span><span class="nt">h1</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;text-4xl font-bold&#34;</span><span class="p">&gt;</span><span class="nx">theLook</span> <span class="nx">eCommerce</span> <span class="nx">Dashboard</span><span class="p">&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">76</span><span class="cl">        <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">77</span><span class="cl">        <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;flex m-2 mt-5 justify-between items-center mt-5&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">78</span><span class="cl">          <span class="p">&lt;</span><span class="nt">Checkbox</span>
</span></span><span class="line"><span class="ln">79</span><span class="cl">            <span class="na">color</span><span class="o">=</span><span class="s">&#34;primary&#34;</span>
</span></span><span class="line"><span class="ln">80</span><span class="cl">            <span class="na">onChange</span><span class="o">=</span><span class="p">{()</span> <span class="p">=&gt;</span> <span class="nx">toggleToConnect</span><span class="p">(</span><span class="o">!</span><span class="nx">toConnect</span><span class="p">)}</span>
</span></span><span class="line"><span class="ln">81</span><span class="cl">          <span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">82</span><span class="cl">            <span class="nx">Connect</span> <span class="nx">to</span> <span class="nx">WS</span> <span class="nx">Server</span>
</span></span><span class="line"><span class="ln">83</span><span class="cl">          <span class="p">&lt;/</span><span class="nt">Checkbox</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">84</span><span class="cl">          <span class="p">;</span>
</span></span><span class="line"><span class="ln">85</span><span class="cl">        <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">86</span><span class="cl">      <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">87</span><span class="cl">      <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;grid grid-cols-12 gap-4 mt-5&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">88</span><span class="cl">        <span class="p">{</span><span class="nx">createMetrics</span><span class="p">(</span><span class="nx">metricItems</span><span class="p">)}</span>
</span></span><span class="line"><span class="ln">89</span><span class="cl">      <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">90</span><span class="cl">      <span class="p">&lt;</span><span class="nt">div</span> <span class="na">className</span><span class="o">=</span><span class="s">&#34;grid grid-cols-12 gap-4 mt-5&#34;</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">91</span><span class="cl">        <span class="p">{</span><span class="nx">createCharts</span><span class="p">(</span><span class="nx">chartOptions</span><span class="p">)}</span>
</span></span><span class="line"><span class="ln">92</span><span class="cl">      <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">93</span><span class="cl">    <span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</span></span><span class="line"><span class="ln">94</span><span class="cl">  <span class="p">);</span>
</span></span><span class="line"><span class="ln">95</span><span class="cl"><span class="p">}</span>
</span></span></code></pre></div>
<h2 id="deployment" data-numberify>Deployment<a class="anchor ms-1" href="#deployment"></a></h2>

<h3 id="data-producer-and-websocket-server" data-numberify>Data Producer and WebSocket Server<a class="anchor ms-1" href="#data-producer-and-websocket-server"></a></h3>
<p>As discussed in <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a>, the data generator and WebSocket server can be deployed using Docker Compose with the command <code>docker-compose -f producer/docker-compose.yml up -d</code>. Once started, the server can be checked with a <a href="https://github.com/lewoudar/ws/" target="_blank" rel="noopener noreferrer">WebSocket client<i class="fas fa-external-link-square-alt ms-1"></i></a> by executing <code>ws listen ws://localhost:8000/ws</code>, and its logs can be monitored by running <code>docker logs -f producer</code>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-03-04-realtime-dashboard-3/backend.gif" loading="lazy" width="1835" height="776" />
</picture>

</p>

<h3 id="frontend-dashboard" data-numberify>Frontend Dashboard<a class="anchor ms-1" href="#frontend-dashboard"></a></h3>
<p>The dashboard can be started in development mode as shown below. Once started, it can be accessed in a browser at <em>http://localhost:3000</em>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">## install pnpm if not done</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># https://pnpm.io/installation</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1">## install dependent packages</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">$ pnpm install
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1">## start the app</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">$ pnpm dev
</span></span></code></pre></div><p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-03-04-realtime-dashboard-3/featured.gif" loading="lazy" width="1573" height="753" />
</picture>

</p>
      ]]></content:encoded></item><item><title>Realtime Dashboard with FastAPI, Streamlit and Next.js - Part 2 Streamlit Dashboard</title><link>https://jaehyeon.me/blog/2025-02-25-realtime-dashboard-2/</link><guid>https://jaehyeon.me/blog/2025-02-25-realtime-dashboard-2/</guid><pubDate>Tue, 25 Feb 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In this post, we develop a real-time monitoring dashboard using <a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer">Streamlit<i class="fas fa-external-link-square-alt ms-1"></i></a>, an open-source Python framework that allows data scientists and AI/ML engineers to create interactive data apps. The app connects to the WebSocket server we developed in <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a> and continuously fetches data to visualize key metrics such as <strong>order counts</strong>, <strong>sales data</strong>, and <strong>revenue by traffic source and country</strong>. With interactive bar charts and dynamic metrics, users can monitor sales trends and other important business KPIs in real-time.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In this post, we develop a real-time monitoring dashboard using <a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer">Streamlit<i class="fas fa-external-link-square-alt ms-1"></i></a>, an open-source Python framework that allows data scientists and AI/ML engineers to create interactive data apps. The app connects to the WebSocket server we developed in <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a> and continuously fetches data to visualize key metrics such as <strong>order counts</strong>, <strong>sales data</strong>, and <strong>revenue by traffic source and country</strong>. With interactive bar charts and dynamic metrics, users can monitor sales trends and other important business KPIs in real-time.</p>
<ul>
<li><a href="/blog/2025-02-18-realtime-dashboard-1">Part 1 Data Producer</a></li>
<li><a href="/blog/2025-02-25-realtime-dashboard-2/#">Part 2 Streamlit Dashboard</a> (this post)</li>
<li><a href="/blog/2025-03-04-realtime-dashboard-3">Part 3 Next.js Dashboard</a></li>
</ul>

<h2 id="streamlit-frontend" data-numberify>Streamlit Frontend<a class="anchor ms-1" href="#streamlit-frontend"></a></h2>
<p>This Streamlit dashboard is designed to process and display real-time <em>theLook eCommerce data</em> using <em>pandas</em> for data manipulation, Streamlit&rsquo;s built-in <em>metric</em> component for KPIs, and <em>Apache ECharts</em> for visualizations. The source code for this post can be found in this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/product-demos" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="components" data-numberify>Components<a class="anchor ms-1" href="#components"></a></h3>
<p>The dashboard components and data processing logic are managed by functions in <code>streamlit/utils.py</code>. Below are the details of those functions.</p>
<ol>
<li>
<p>Loading Records:</p>
<ul>
<li>The <code>load_records()</code> function accepts a list of records, converts them into a <strong>pandas DataFrame</strong>, and ensures that the necessary columns are present.</li>
<li>It then processes data by converting the <strong>age</strong> column to integers and the <strong>cost</strong> and <strong>sale_price</strong> columns to floating-point values, rounding the sale price to one decimal place.</li>
<li>Key metrics like the <strong>number of orders</strong>, <strong>number of order items</strong>, and <strong>total sales</strong> are calculated from the DataFrame and returned for display.</li>
</ul>
</li>
<li>
<p>Generating Metrics:</p>
<ul>
<li>The <code>create_metric_items()</code> function creates a list of dictionaries containing metrics and their respective changes (<code>delta</code>), comparing the current and previous values of <strong>orders</strong>, <strong>order items</strong>, and <strong>total sales</strong>.</li>
<li>The <code>generate_metrics()</code> function takes the calculated metrics and displays them in Streamlit’s <strong>metric components</strong> within the <code>placeholder</code> container. It uses a column layout to show the metrics side-by-side.</li>
</ul>
</li>
<li>
<p>Creating Chart Options:</p>
<ul>
<li>The <code>create_options_items()</code> function generates configuration data for bar charts that display <strong>revenue by country</strong> and <strong>revenue by traffic source</strong>.</li>
<li>It groups the data by <strong>country</strong> and <strong>traffic source</strong>, sums the <strong>sale_price</strong> for each, and sorts it in descending order.</li>
<li><strong>ECharts options</strong> are defined for each chart with custom colors, titles, and axis settings. The charts are configured to show <strong>tooltips</strong> when hovering over the bars.</li>
</ul>
</li>
<li>
<p>Displaying Charts:</p>
<ul>
<li>The <code>generate_charts()</code> function takes the <strong>ECharts options</strong> and renders each chart within a container using the <code>st_echarts</code> component of the <a href="https://pypi.org/project/streamlit-echarts/" target="_blank" rel="noopener noreferrer">streamlit-echarts<i class="fas fa-external-link-square-alt ms-1"></i></a> package.</li>
<li>Each chart is placed in its own column, with dynamic layout adjustments based on the number of charts being displayed. The charts are rendered with a fixed height of <strong>500px</strong>.</li>
</ul>
</li>
</ol>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">  1</span><span class="cl"><span class="c1">## producer/streamlit/utils.py</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl"><span class="kn">from</span> <span class="nn">uuid</span> <span class="kn">import</span> <span class="n">uuid4</span>
</span></span><span class="line"><span class="ln">  3</span><span class="cl">
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl"><span class="kn">import</span> <span class="nn">streamlit</span> <span class="k">as</span> <span class="nn">st</span>
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="kn">from</span> <span class="nn">streamlit.delta_generator</span> <span class="kn">import</span> <span class="n">DeltaGenerator</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="kn">from</span> <span class="nn">streamlit_echarts</span> <span class="kn">import</span> <span class="n">st_echarts</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl">
</span></span><span class="line"><span class="ln">  9</span><span class="cl">
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="k">def</span> <span class="nf">load_records</span><span class="p">(</span><span class="n">records</span><span class="p">:</span> <span class="nb">list</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl">    <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">records</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl">    <span class="k">assert</span> <span class="nb">set</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl">        <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl">            <span class="s2">&#34;order_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl">            <span class="s2">&#34;item_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">            <span class="s2">&#34;country&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">            <span class="s2">&#34;traffic_source&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">            <span class="s2">&#34;age&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">            <span class="s2">&#34;cost&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl">            <span class="s2">&#34;sale_price&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">        <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="p">)</span><span class="o">.</span><span class="n">issubset</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="n">df</span><span class="p">[</span><span class="s2">&#34;age&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s2">&#34;age&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">    <span class="n">df</span><span class="p">[</span><span class="s2">&#34;cost&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s2">&#34;cost&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl">    <span class="n">df</span><span class="p">[</span><span class="s2">&#34;sale_price&#34;</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s2">&#34;sale_price&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">    <span class="n">metric_values</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">        <span class="s2">&#34;num_orders&#34;</span><span class="p">:</span> <span class="n">df</span><span class="p">[</span><span class="s2">&#34;order_id&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">nunique</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">        <span class="s2">&#34;num_order_items&#34;</span><span class="p">:</span> <span class="n">df</span><span class="p">[</span><span class="s2">&#34;item_id&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">nunique</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="s2">&#34;total_sales&#34;</span><span class="p">:</span> <span class="nb">round</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s2">&#34;sale_price&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">sum</span><span class="p">()),</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">    <span class="k">return</span> <span class="n">metric_values</span><span class="p">,</span> <span class="n">df</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">
</span></span><span class="line"><span class="ln"> 34</span><span class="cl"><span class="k">def</span> <span class="nf">create_metric_items</span><span class="p">(</span><span class="n">metric_values</span><span class="p">,</span> <span class="n">prev_values</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">    <span class="k">return</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">            <span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Number of Orders&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">            <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="n">metric_values</span><span class="p">[</span><span class="s2">&#34;num_orders&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">            <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="p">(</span><span class="n">metric_values</span><span class="p">[</span><span class="s2">&#34;num_orders&#34;</span><span class="p">]</span> <span class="o">-</span> <span class="n">prev_values</span><span class="p">[</span><span class="s2">&#34;num_orders&#34;</span><span class="p">]),</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">            <span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Number of Order Items&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">            <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="n">metric_values</span><span class="p">[</span><span class="s2">&#34;num_order_items&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">            <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">                <span class="n">metric_values</span><span class="p">[</span><span class="s2">&#34;num_order_items&#34;</span><span class="p">]</span> <span class="o">-</span> <span class="n">prev_values</span><span class="p">[</span><span class="s2">&#34;num_order_items&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">            <span class="p">),</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">            <span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Total Sales&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">            <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;$ </span><span class="si">{</span><span class="n">metric_values</span><span class="p">[</span><span class="s1">&#39;total_sales&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">            <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="p">(</span><span class="n">metric_values</span><span class="p">[</span><span class="s2">&#34;total_sales&#34;</span><span class="p">]</span> <span class="o">-</span> <span class="n">prev_values</span><span class="p">[</span><span class="s2">&#34;total_sales&#34;</span><span class="p">]),</span>
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">
</span></span><span class="line"><span class="ln"> 56</span><span class="cl"><span class="k">def</span> <span class="nf">generate_metrics</span><span class="p">(</span><span class="n">placeholder</span><span class="p">:</span> <span class="n">DeltaGenerator</span><span class="p">,</span> <span class="n">metric_items</span><span class="p">:</span> <span class="nb">list</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">    <span class="k">if</span> <span class="n">metric_items</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">        <span class="n">metric_items</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">            <span class="p">{</span><span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Number of Orders&#34;</span><span class="p">,</span> <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">            <span class="p">{</span><span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Number of Order Items&#34;</span><span class="p">,</span> <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">            <span class="p">{</span><span class="s2">&#34;label&#34;</span><span class="p">:</span> <span class="s2">&#34;Total Sales&#34;</span><span class="p">,</span> <span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&#34;delta&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">        <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">    <span class="k">with</span> <span class="n">placeholder</span><span class="o">.</span><span class="n">container</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">st</span><span class="o">.</span><span class="n">columns</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">metric_items</span><span class="p">))):</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">            <span class="n">metric</span> <span class="o">=</span> <span class="n">metric_items</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">            <span class="n">col</span><span class="o">.</span><span class="n">metric</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">                <span class="n">label</span><span class="o">=</span><span class="n">metric</span><span class="p">[</span><span class="s2">&#34;label&#34;</span><span class="p">],</span> <span class="n">value</span><span class="o">=</span><span class="n">metric</span><span class="p">[</span><span class="s2">&#34;value&#34;</span><span class="p">],</span> <span class="n">delta</span><span class="o">=</span><span class="n">metric</span><span class="p">[</span><span class="s2">&#34;delta&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">
</span></span><span class="line"><span class="ln"> 71</span><span class="cl"><span class="k">def</span> <span class="nf">create_options_items</span><span class="p">(</span><span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">    <span class="n">colors</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">        <span class="s2">&#34;#00008b&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">        <span class="s2">&#34;#b22234&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">        <span class="s2">&#34;#00247d&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">        <span class="s2">&#34;#f00&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">        <span class="s2">&#34;#ffde00&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">        <span class="s2">&#34;#002a8f&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">        <span class="s2">&#34;#000&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">        <span class="s2">&#34;#003580&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">        <span class="s2">&#34;#ed2939&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">        <span class="s2">&#34;#003897&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">        <span class="s2">&#34;#f93&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">        <span class="s2">&#34;#bc002d&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">        <span class="s2">&#34;#024fa2&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">        <span class="s2">&#34;#000&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">        <span class="s2">&#34;#00247d&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">        <span class="s2">&#34;#ef2b2d&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">        <span class="s2">&#34;#dc143c&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">        <span class="s2">&#34;#d52b1e&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">        <span class="s2">&#34;#e30a17&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">    <span class="n">chart_cols</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">        <span class="p">{</span><span class="s2">&#34;x&#34;</span><span class="p">:</span> <span class="s2">&#34;country&#34;</span><span class="p">,</span> <span class="s2">&#34;y&#34;</span><span class="p">:</span> <span class="s2">&#34;sale_price&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">        <span class="p">{</span><span class="s2">&#34;x&#34;</span><span class="p">:</span> <span class="s2">&#34;traffic_source&#34;</span><span class="p">,</span> <span class="s2">&#34;y&#34;</span><span class="p">:</span> <span class="s2">&#34;sale_price&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">    <span class="n">option_items</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">    <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">chart_cols</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">        <span class="n">data</span> <span class="o">=</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">            <span class="n">df</span><span class="p">[[</span><span class="n">col</span><span class="p">[</span><span class="s2">&#34;x&#34;</span><span class="p">],</span> <span class="n">col</span><span class="p">[</span><span class="s2">&#34;y&#34;</span><span class="p">]]]</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">            <span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="n">col</span><span class="p">[</span><span class="s2">&#34;x&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">            <span class="o">.</span><span class="n">sum</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">            <span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">            <span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="s2">&#34;y&#34;</span><span class="p">],</span> <span class="n">ascending</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">        <span class="n">options</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">            <span class="s2">&#34;title&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;Revenue by </span><span class="si">{</span><span class="n">col</span><span class="p">[</span><span class="s1">&#39;x&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;_&#39;</span><span class="p">,</span> <span class="s1">&#39; &#39;</span><span class="p">)</span><span class="o">.</span><span class="n">title</span><span class="p">()</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">            <span class="s2">&#34;xAxis&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">                <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;category&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">                <span class="s2">&#34;data&#34;</span><span class="p">:</span> <span class="n">data</span><span class="p">[</span><span class="n">col</span><span class="p">[</span><span class="s2">&#34;x&#34;</span><span class="p">]]</span><span class="o">.</span><span class="n">to_list</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln">111</span><span class="cl">                <span class="s2">&#34;axisLabel&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;show&#34;</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span> <span class="s2">&#34;rotate&#34;</span><span class="p">:</span> <span class="mi">75</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">            <span class="s2">&#34;yAxis&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;value&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">            <span class="s2">&#34;series&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">115</span><span class="cl">                <span class="p">{</span>
</span></span><span class="line"><span class="ln">116</span><span class="cl">                    <span class="s2">&#34;data&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl">                        <span class="p">{</span><span class="s2">&#34;value&#34;</span><span class="p">:</span> <span class="n">d</span><span class="p">,</span> <span class="s2">&#34;itemStyle&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;color&#34;</span><span class="p">:</span> <span class="n">colors</span><span class="p">[</span><span class="n">i</span><span class="p">]}}</span>
</span></span><span class="line"><span class="ln">118</span><span class="cl">                        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">d</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">col</span><span class="p">[</span><span class="s2">&#34;y&#34;</span><span class="p">]]</span><span class="o">.</span><span class="n">to_list</span><span class="p">())</span>
</span></span><span class="line"><span class="ln">119</span><span class="cl">                    <span class="p">],</span>
</span></span><span class="line"><span class="ln">120</span><span class="cl">                    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;bar&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">121</span><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="ln">122</span><span class="cl">            <span class="p">],</span>
</span></span><span class="line"><span class="ln">123</span><span class="cl">            <span class="s2">&#34;tooltip&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;trigger&#34;</span><span class="p">:</span> <span class="s2">&#34;axis&#34;</span><span class="p">,</span> <span class="s2">&#34;axisPointer&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;shadow&#34;</span><span class="p">}},</span>
</span></span><span class="line"><span class="ln">124</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">125</span><span class="cl">        <span class="n">option_items</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">options</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">126</span><span class="cl">    <span class="k">return</span> <span class="n">option_items</span>
</span></span><span class="line"><span class="ln">127</span><span class="cl">
</span></span><span class="line"><span class="ln">128</span><span class="cl">
</span></span><span class="line"><span class="ln">129</span><span class="cl"><span class="k">def</span> <span class="nf">generate_charts</span><span class="p">(</span><span class="n">placeholder</span><span class="p">:</span> <span class="n">DeltaGenerator</span><span class="p">,</span> <span class="n">option_items</span><span class="p">:</span> <span class="nb">list</span><span class="p">):</span>
</span></span><span class="line"><span class="ln">130</span><span class="cl">    <span class="k">with</span> <span class="n">placeholder</span><span class="o">.</span><span class="n">container</span><span class="p">():</span>
</span></span><span class="line"><span class="ln">131</span><span class="cl">        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">st</span><span class="o">.</span><span class="n">columns</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">option_items</span><span class="p">))):</span>
</span></span><span class="line"><span class="ln">132</span><span class="cl">            <span class="n">options</span> <span class="o">=</span> <span class="n">option_items</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">133</span><span class="cl">            <span class="k">with</span> <span class="n">col</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">134</span><span class="cl">                <span class="n">st_echarts</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">options</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="s2">&#34;500px&#34;</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="n">uuid4</span><span class="p">()))</span>
</span></span></code></pre></div>
<h3 id="application" data-numberify>Application<a class="anchor ms-1" href="#application"></a></h3>
<p>The dashboard connects to the <strong>WebSocket server</strong> to fetch and display real-time <em>theLook eCommerce</em> data. Here&rsquo;s a detailed breakdown of its functionality:</p>
<ol>
<li>
<p>WebSocket Connection:</p>
<ul>
<li>The <code>generate()</code> function is an <strong>asynchronous</strong> task that establishes a connection to a WebSocket server (<code>ws://localhost:8000/ws</code>) using an asynchronous HTTP client by the <a href="https://pypi.org/project/aiohttp/" target="_blank" rel="noopener noreferrer">aiohttp<i class="fas fa-external-link-square-alt ms-1"></i></a> package. It listens for incoming messages from the server, which contain the eCommerce data.</li>
<li>As each message is received, the function loads the data using the <code>load_records()</code> function, which processes the data into <strong>pandas DataFrame</strong> format and computes key metrics.</li>
</ul>
</li>
<li>
<p>Generating and Displaying Metrics:</p>
<ul>
<li>The <code>create_metric_items()</code> function calculates and prepares key metrics (such as <strong>number of orders</strong>, <strong>order items</strong>, and <strong>total sales</strong>) along with the delta (changes) from the previous values.</li>
<li>The <code>generate_metrics()</code> function updates the Streamlit dashboard by displaying these metrics in the <code>metric_placeholder</code>.</li>
</ul>
</li>
<li>
<p>Creating and Displaying Charts:</p>
<ul>
<li>The <code>create_options_items()</code> function processes the data and generates configuration options for bar charts, displaying <strong>revenue by country</strong> and <strong>traffic source</strong>.</li>
<li>The <code>generate_charts()</code> function renders the charts in the <code>chart_placeholder</code> container, using <strong>ECharts</strong> for interactive data visualizations.</li>
</ul>
</li>
<li>
<p>Real-time Updates:</p>
<ul>
<li>The loop continuously listens for new data from the WebSocket server. As data is received, it updates both the metrics and charts in real-time.</li>
</ul>
</li>
<li>
<p>User Interface:</p>
<ul>
<li>The app sets a wide layout with the title <strong>&ldquo;theLook eCommerce Dashboard&rdquo;</strong>.</li>
<li>A checkbox (<code>Connect to WS Server</code>) lets the user choose whether to connect to the WebSocket server. When checked, the dashboard fetches data live and updates metrics and charts accordingly.</li>
<li>If the checkbox is unchecked, only the static metrics are displayed.</li>
</ul>
</li>
</ol>
<p>This setup provides a <strong>dynamic dashboard</strong> that pulls and visualizes real-time eCommerce data, making it interactive and responsive for monitoring sales and performance metrics.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">## producer/streamlit/app.py</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">json</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">import</span> <span class="nn">asyncio</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kn">import</span> <span class="nn">aiohttp</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="kn">import</span> <span class="nn">streamlit</span> <span class="k">as</span> <span class="nn">st</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="kn">from</span> <span class="nn">streamlit.delta_generator</span> <span class="kn">import</span> <span class="n">DeltaGenerator</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="kn">from</span> <span class="nn">utils</span> <span class="kn">import</span> <span class="p">(</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">load_records</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">create_metric_items</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    <span class="n">generate_metrics</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">create_options_items</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">generate_charts</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">generate</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">metric_placeholder</span><span class="p">:</span> <span class="n">DeltaGenerator</span><span class="p">,</span> <span class="n">chart_placeholder</span><span class="p">:</span> <span class="n">DeltaGenerator</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">prev_values</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&#34;num_orders&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&#34;num_order_items&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&#34;total_sales&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="k">async</span> <span class="k">with</span> <span class="n">aiohttp</span><span class="o">.</span><span class="n">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">        <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">ws_connect</span><span class="p">(</span><span class="s2">&#34;ws://localhost:8000/ws&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">ws</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">            <span class="k">async</span> <span class="k">for</span> <span class="n">msg</span> <span class="ow">in</span> <span class="n">ws</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">                <span class="n">metric_values</span><span class="p">,</span> <span class="n">df</span> <span class="o">=</span> <span class="n">load_records</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">msg</span><span class="o">.</span><span class="n">json</span><span class="p">()))</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">                <span class="n">generate_metrics</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">                    <span class="n">metric_placeholder</span><span class="p">,</span> <span class="n">create_metric_items</span><span class="p">(</span><span class="n">metric_values</span><span class="p">,</span> <span class="n">prev_values</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">                <span class="p">)</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">                <span class="n">generate_charts</span><span class="p">(</span><span class="n">chart_placeholder</span><span class="p">,</span> <span class="n">create_options_items</span><span class="p">(</span><span class="n">df</span><span class="p">))</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">                <span class="n">prev_values</span> <span class="o">=</span> <span class="n">metric_values</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">
</span></span><span class="line"><span class="ln">32</span><span class="cl">
</span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="n">st</span><span class="o">.</span><span class="n">set_page_config</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="n">page_title</span><span class="o">=</span><span class="s2">&#34;theLook eCommerce&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="n">page_icon</span><span class="o">=</span><span class="s2">&#34;✅&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="n">layout</span><span class="o">=</span><span class="s2">&#34;wide&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">
</span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="n">st</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">&#34;theLook eCommerce Dashboard&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">
</span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="n">connect</span> <span class="o">=</span> <span class="n">st</span><span class="o">.</span><span class="n">checkbox</span><span class="p">(</span><span class="s2">&#34;Connect to WS Server&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="n">metric_placeholder</span> <span class="o">=</span> <span class="n">st</span><span class="o">.</span><span class="n">empty</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="n">chart_placeholder</span> <span class="o">=</span> <span class="n">st</span><span class="o">.</span><span class="n">empty</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">
</span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="k">if</span> <span class="n">connect</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">        <span class="n">generate</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl">            <span class="n">metric_placeholder</span><span class="o">=</span><span class="n">metric_placeholder</span><span class="p">,</span> <span class="n">chart_placeholder</span><span class="o">=</span><span class="n">chart_placeholder</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">52</span><span class="cl">    <span class="n">generate_metrics</span><span class="p">(</span><span class="n">metric_placeholder</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
</span></span></code></pre></div>
<h2 id="deployment" data-numberify>Deployment<a class="anchor ms-1" href="#deployment"></a></h2>

<h3 id="data-producer-and-websocket-server" data-numberify>Data Producer and WebSocket Server<a class="anchor ms-1" href="#data-producer-and-websocket-server"></a></h3>
<p>As discussed in <a href="/blog/2025-02-18-realtime-dashboard-1">Part 1</a>, the data generator and WebSocket server can be deployed using Docker Compose with the command <code>docker-compose -f producer/docker-compose.yml up -d</code>. Once started, the server can be checked with a <a href="https://github.com/lewoudar/ws/" target="_blank" rel="noopener noreferrer">WebSocket client<i class="fas fa-external-link-square-alt ms-1"></i></a> by executing <code>ws listen ws://localhost:8000/ws</code>, and its logs can be monitored by running <code>docker logs -f producer</code>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-02-25-realtime-dashboard-2/backend.gif" loading="lazy" width="1835" height="776" />
</picture>

</p>

<h3 id="frontend-dashboard" data-numberify>Frontend Dashboard<a class="anchor ms-1" href="#frontend-dashboard"></a></h3>
<p>The dashboard can be started by running the Streamlit app as shown below. Once started, it can be accessed in a browser at <em>http://localhost:8501</em>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">## create and activate a virtual environment</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># https://docs.python.org/3/library/venv.html</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1">## install pip packages</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">$ pip install -r requirements.txt
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1">## start the app</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">$ streamlit run streamlit/app.py
</span></span></code></pre></div><p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-02-25-realtime-dashboard-2/featured.gif" loading="lazy" width="1573" height="753" />
</picture>

</p>
      ]]></content:encoded></item><item><title>Realtime Dashboard with FastAPI, Streamlit and Next.js - Part 1 Data Producer</title><link>https://jaehyeon.me/blog/2025-02-18-realtime-dashboard-1/</link><guid>https://jaehyeon.me/blog/2025-02-18-realtime-dashboard-1/</guid><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><description><![CDATA[
        <p>In this series, we develop real-time monitoring dashboard applications. A data generating app is created with Python, and it ingests the <a href="https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce" target="_blank" rel="noopener noreferrer">theLook eCommerce<i class="fas fa-external-link-square-alt ms-1"></i></a> data continuously into a PostgreSQL database. A WebSocket server, built by <a href="https://fastapi.tiangolo.com/" target="_blank" rel="noopener noreferrer">FastAPI<i class="fas fa-external-link-square-alt ms-1"></i></a>, periodically queries the data to serve its clients. The monitoring dashboards will be developed using <a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer">Streamlit<i class="fas fa-external-link-square-alt ms-1"></i></a> and <a href="https://nextjs.org/" target="_blank" rel="noopener noreferrer">Next.js<i class="fas fa-external-link-square-alt ms-1"></i></a>, with <a href="https://echarts.apache.org/en/index.html" target="_blank" rel="noopener noreferrer">Apache ECharts<i class="fas fa-external-link-square-alt ms-1"></i></a> for visualization. In this post, we walk through the data generation app and backend API, while the monitoring dashboards will be discussed in later posts.</p>
      ]]></description><content:encoded><![CDATA[
        <p>In this series, we develop real-time monitoring dashboard applications. A data generating app is created with Python, and it ingests the <a href="https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce" target="_blank" rel="noopener noreferrer">theLook eCommerce<i class="fas fa-external-link-square-alt ms-1"></i></a> data continuously into a PostgreSQL database. A WebSocket server, built by <a href="https://fastapi.tiangolo.com/" target="_blank" rel="noopener noreferrer">FastAPI<i class="fas fa-external-link-square-alt ms-1"></i></a>, periodically queries the data to serve its clients. The monitoring dashboards will be developed using <a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer">Streamlit<i class="fas fa-external-link-square-alt ms-1"></i></a> and <a href="https://nextjs.org/" target="_blank" rel="noopener noreferrer">Next.js<i class="fas fa-external-link-square-alt ms-1"></i></a>, with <a href="https://echarts.apache.org/en/index.html" target="_blank" rel="noopener noreferrer">Apache ECharts<i class="fas fa-external-link-square-alt ms-1"></i></a> for visualization. In this post, we walk through the data generation app and backend API, while the monitoring dashboards will be discussed in later posts.</p>
<ul>
<li><a href="/blog/2025-02-18-realtime-dashboard-1/#">Part 1 Data Producer</a> (this post)</li>
<li><a href="/blog/2025-02-25-realtime-dashboard-2">Part 2 Streamlit Dashboard</a></li>
<li><a href="/blog/2025-03-04-realtime-dashboard-3">Part 3 Next.js Dashboard</a></li>
</ul>
<!-- raw HTML omitted -->

<h2 id="docker-compose-services" data-numberify>Docker Compose Services<a class="anchor ms-1" href="#docker-compose-services"></a></h2>
<p>We have three docker-compose services, and they are illustrated separately below. The source of this post can be found in this <a href="https://github.com/jaehyeon-kim/streaming-demos/tree/main/product-demos" target="_blank" rel="noopener noreferrer"><strong>GitHub repository</strong><i class="fas fa-external-link-square-alt ms-1"></i></a>.</p>

<h3 id="postgresql" data-numberify>PostgreSQL<a class="anchor ms-1" href="#postgresql"></a></h3>
<p>A PostgreSQL database server is configured with persistent storage, automatic initialization, and a health check. The health check is set up so that the remaining services wait until the database is ready.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># producer/docker-compose.yml</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;3&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">postgres</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">postgres:16</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span>- <span class="m">5432</span><span class="p">:</span><span class="m">5432</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span>- <span class="l">./config/:/docker-entrypoint-initdb.d</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span>- <span class="l">postgres_data:/var/lib/postgresql/data</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">password</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">      </span><span class="nt">PGUSER</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">      </span><span class="nt">TZ</span><span class="p">:</span><span class="w"> </span><span class="l">Australia/Sydney</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">    </span><span class="nt">healthcheck</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">      </span><span class="nt">test</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;CMD-SHELL&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;pg_isready -U develop&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">      </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">      </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l">5s</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">      </span><span class="nt">retries</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w"></span><span class="nn">...</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w"></span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">  </span><span class="nt">postgres_data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">    </span><span class="nt">driver</span><span class="p">:</span><span class="w"> </span><span class="l">local</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">postgres_data</span><span class="w">
</span></span></span></code></pre></div><p>The bootstrap script creates a dedicated schema named <em>ecommerce</em> and sets the schema as the default search path.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- producer/config/postgres/bootstrap.sql
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">ecommerce</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">GRANT</span><span class="w"> </span><span class="k">ALL</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">SCHEMA</span><span class="w"> </span><span class="n">ecommerce</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">develop</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- change search_path on a connection-level
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="n">search_path</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">ecommerce</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="c1">-- change search_path on a database-level
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">database</span><span class="w"> </span><span class="s2">&#34;develop&#34;</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">search_path</span><span class="w"> </span><span class="k">TO</span><span class="w"> </span><span class="n">ecommerce</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div>
<h3 id="data-generator" data-numberify>Data Generator<a class="anchor ms-1" href="#data-generator"></a></h3>
<p>The following Dockerfile is created for the data generation app and WebSocket server. It sets up a lightweight Python 3.10 environment for an application. It copies and installs dependencies from <code>requirements.txt</code>, then creates a dedicated <strong>user</strong> (<code>app</code>) with a home directory (<code>/home/app</code>) for security. The container runs as the <code>app</code> user instead of root, with <code>/home/app</code> set as the working directory.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-dockerfile" data-lang="dockerfile"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># producer/Dockerfile</span><span class="err">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="err"></span><span class="k">FROM</span><span class="s"> python:3.10-slim</span><span class="err">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="err">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="err"></span><span class="c">## install dependent packages</span><span class="err">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="err"></span><span class="k">COPY</span> requirements.txt requirements.txt<span class="err">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="err">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="err"></span><span class="k">RUN</span> pip install -r requirements.txt<span class="err">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="err">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="err"></span><span class="c">## create a user</span><span class="err">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="err"></span><span class="k">RUN</span> useradd app <span class="o">&amp;&amp;</span> mkdir /home/app <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>    <span class="o">&amp;&amp;</span> chown app:app /home/app<span class="err">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="err">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="err"></span><span class="k">USER</span><span class="s"> app</span><span class="err">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="err"></span><span class="k">WORKDIR</span><span class="s"> /home/app</span><span class="err">
</span></span></span></code></pre></div><p>The data generation app builds from the local Dockerfile, runs as <code>datagen</code>, and connects to the PostgreSQL database using environment variables for credentials. The container executes <code>generator.py</code> with a 0.5-second delay between iterations and runs indefinitely (<code>--max_iter -1</code>). It mounts the current directory to <code>/home/app</code> for access to scripts and dependencies. The service starts only after the database is healthy, ensuring proper database availability.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># producer/docker-compose.yml</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nn">...</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">datagen</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l">Dockerfile</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">datagen</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span><span class="nt">DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span><span class="nt">DB_PASS</span><span class="p">:</span><span class="w"> </span><span class="l">password</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span><span class="nt">DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span><span class="nt">DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span>- <span class="l">python</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">      </span>- <span class="l">generator.py</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">      </span>- --<span class="l">wait_for</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;0.5&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">      </span>- --<span class="l">max_iter</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;-1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">      </span>- <span class="l">.:/home/app</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">      </span><span class="nt">postgres</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l">service_healthy</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w"></span><span class="nn">...</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="data-generator-source" data-numberify>Data Generator Source<a class="anchor ms-1" href="#data-generator-source"></a></h4>
<p>The <em>theLook eCommerce</em> dataset consists of seven entities, five of which are dynamically generated. In each iteration, a <em>user</em> record is created, associated with zero or more orders. Each <em>order</em>, in turn, generates zero or more order items. Finally, each <em>order item</em> produces zero or more <em>event</em> and <em>inventory item</em> records. Once all records are generated, they are ingested into the corresponding database tables using pandas&rsquo; <code>to_sql</code> method.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">  1</span><span class="cl"><span class="c1"># producer/generator.py</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl"><span class="kn">import</span> <span class="nn">argparse</span>
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="kn">import</span> <span class="nn">time</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl">
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl">
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="kn">from</span> <span class="nn">src.models</span> <span class="kn">import</span> <span class="n">User</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl"><span class="kn">from</span> <span class="nn">src.utils</span> <span class="kn">import</span> <span class="n">create_connection</span><span class="p">,</span> <span class="n">insert_to_db</span><span class="p">,</span> <span class="n">Connection</span><span class="p">,</span> <span class="n">generate_from_csv</span>
</span></span><span class="line"><span class="ln"> 10</span><span class="cl">
</span></span><span class="line"><span class="ln"> 11</span><span class="cl"><span class="n">extraneous_headers</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 12</span><span class="cl">    <span class="s2">&#34;event_type&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl">    <span class="s2">&#34;ip_address&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl">    <span class="s2">&#34;browser&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl">    <span class="s2">&#34;traffic_source&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">    <span class="s2">&#34;session_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">    <span class="s2">&#34;sequence_number&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">    <span class="s2">&#34;uri&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">    <span class="s2">&#34;is_sold&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 20</span><span class="cl"><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">
</span></span><span class="line"><span class="ln"> 23</span><span class="cl"><span class="k">def</span> <span class="nf">write_dynamic_data</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">    <span class="n">conn</span><span class="p">:</span> <span class="n">Connection</span><span class="p">,</span> <span class="n">schema_name</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&#34;ecommerce&#34;</span><span class="p">,</span> <span class="n">if_exists</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="s2">&#34;replace&#34;</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">    <span class="n">tbl_map</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">        <span class="s2">&#34;users&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">        <span class="s2">&#34;orders&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">        <span class="s2">&#34;order_items&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">        <span class="s2">&#34;inventory_items&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">        <span class="s2">&#34;events&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">    <span class="n">user</span> <span class="o">=</span> <span class="n">User</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">    <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;start to create user events - user id: </span><span class="si">{</span><span class="n">user</span><span class="o">.</span><span class="n">id</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">    <span class="n">tbl_map</span><span class="p">[</span><span class="s2">&#34;users&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">user</span><span class="o">.</span><span class="n">asdict</span><span class="p">([</span><span class="s2">&#34;orders&#34;</span><span class="p">])])</span>
</span></span><span class="line"><span class="ln"> 36</span><span class="cl">    <span class="n">orders</span> <span class="o">=</span> <span class="n">user</span><span class="o">.</span><span class="n">orders</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">    <span class="n">tbl_map</span><span class="p">[</span><span class="s2">&#34;orders&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">o</span><span class="o">.</span><span class="n">asdict</span><span class="p">([</span><span class="s2">&#34;order_items&#34;</span><span class="p">])</span> <span class="k">for</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">orders</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">    <span class="k">for</span> <span class="n">order</span> <span class="ow">in</span> <span class="n">orders</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 39</span><span class="cl">        <span class="n">order_items</span> <span class="o">=</span> <span class="n">order</span><span class="o">.</span><span class="n">order_items</span>
</span></span><span class="line"><span class="ln"> 40</span><span class="cl">        <span class="n">tbl_map</span><span class="p">[</span><span class="s2">&#34;order_items&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 41</span><span class="cl">            <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 42</span><span class="cl">                <span class="n">o</span><span class="o">.</span><span class="n">asdict</span><span class="p">([</span><span class="s2">&#34;events&#34;</span><span class="p">,</span> <span class="s2">&#34;inventory_items&#34;</span><span class="p">]</span> <span class="o">+</span> <span class="n">extraneous_headers</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 43</span><span class="cl">                <span class="k">for</span> <span class="n">o</span> <span class="ow">in</span> <span class="n">order_items</span>
</span></span><span class="line"><span class="ln"> 44</span><span class="cl">            <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 45</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 46</span><span class="cl">        <span class="k">for</span> <span class="n">order_item</span> <span class="ow">in</span> <span class="n">order_items</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 47</span><span class="cl">            <span class="n">tbl_map</span><span class="p">[</span><span class="s2">&#34;inventory_items&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 48</span><span class="cl">                <span class="p">[</span><span class="n">i</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">order_item</span><span class="o">.</span><span class="n">inventory_items</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 49</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 50</span><span class="cl">            <span class="n">tbl_map</span><span class="p">[</span><span class="s2">&#34;events&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">e</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">order_item</span><span class="o">.</span><span class="n">events</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 51</span><span class="cl">
</span></span><span class="line"><span class="ln"> 52</span><span class="cl">    <span class="k">for</span> <span class="n">tbl</span> <span class="ow">in</span> <span class="n">tbl_map</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 53</span><span class="cl">        <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">tbl_map</span><span class="p">[</span><span class="n">tbl</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 54</span><span class="cl">        <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 55</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">if_exists</span><span class="si">}</span><span class="s2"> records, table - </span><span class="si">{</span><span class="n">tbl</span><span class="si">}</span><span class="s2">, # records - </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">            <span class="n">insert_to_db</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">                <span class="n">df</span><span class="o">=</span><span class="n">df</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">                <span class="n">tbl_name</span><span class="o">=</span><span class="n">tbl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">                <span class="n">schema_name</span><span class="o">=</span><span class="n">schema_name</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">                <span class="n">conn</span><span class="o">=</span><span class="n">conn</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">                <span class="n">if_exists</span><span class="o">=</span><span class="n">if_exists</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">                <span class="sa">f</span><span class="s2">&#34;skip records as no user event, table - </span><span class="si">{</span><span class="n">tbl</span><span class="si">}</span><span class="s2">, # records - </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 67</span><span class="cl">
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">
</span></span><span class="line"><span class="ln"> 69</span><span class="cl"><span class="k">def</span> <span class="nf">write_static_data</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 70</span><span class="cl">    <span class="n">conn</span><span class="p">:</span> <span class="n">Connection</span><span class="p">,</span> <span class="n">schema_name</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&#34;ecommerce&#34;</span><span class="p">,</span> <span class="n">if_exists</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="s2">&#34;replace&#34;</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">    <span class="n">tbl_map</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">        <span class="s2">&#34;products&#34;</span><span class="p">:</span> <span class="n">generate_from_csv</span><span class="p">(</span><span class="s2">&#34;products.csv&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">        <span class="s2">&#34;dist_centers&#34;</span><span class="p">:</span> <span class="n">generate_from_csv</span><span class="p">(</span><span class="s2">&#34;distribution_centers.csv&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">    <span class="k">for</span> <span class="n">tbl</span> <span class="ow">in</span> <span class="n">tbl_map</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">        <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">tbl_map</span><span class="p">[</span><span class="n">tbl</span><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">        <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">if_exists</span><span class="si">}</span><span class="s2"> records, table - </span><span class="si">{</span><span class="n">tbl</span><span class="si">}</span><span class="s2">, # records - </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">            <span class="n">insert_to_db</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">                <span class="n">df</span><span class="o">=</span><span class="n">df</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">                <span class="n">tbl_name</span><span class="o">=</span><span class="n">tbl</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">                <span class="n">schema_name</span><span class="o">=</span><span class="n">schema_name</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">                <span class="n">conn</span><span class="o">=</span><span class="n">conn</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">                <span class="n">if_exists</span><span class="o">=</span><span class="n">if_exists</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">            <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;skip writing, table - </span><span class="si">{</span><span class="n">tbl</span><span class="si">}</span><span class="s2">, # records - </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">
</span></span><span class="line"><span class="ln"> 91</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">wait_for</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">max_iter</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">if_exists</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 92</span><span class="cl">    <span class="n">conn</span> <span class="o">=</span> <span class="n">create_connection</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">    <span class="n">write_static_data</span><span class="p">(</span><span class="n">conn</span><span class="o">=</span><span class="n">conn</span><span class="p">,</span> <span class="n">if_exists</span><span class="o">=</span><span class="s2">&#34;replace&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">    <span class="n">curr_iter</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="ln"> 95</span><span class="cl">    <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl">        <span class="n">write_dynamic_data</span><span class="p">(</span><span class="n">conn</span><span class="o">=</span><span class="n">conn</span><span class="p">,</span> <span class="n">if_exists</span><span class="o">=</span><span class="n">if_exists</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">        <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">wait_for</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">        <span class="n">curr_iter</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">        <span class="k">if</span> <span class="n">max_iter</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">curr_iter</span> <span class="o">&gt;=</span> <span class="n">max_iter</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">100</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;stop generating records after </span><span class="si">{</span><span class="n">curr_iter</span><span class="si">}</span><span class="s2"> iterations&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="ln">102</span><span class="cl">
</span></span><span class="line"><span class="ln">103</span><span class="cl">
</span></span><span class="line"><span class="ln">104</span><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">    <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">()</span><span class="o">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">    <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&#34;Generate theLook eCommerce data...&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">
</span></span><span class="line"><span class="ln">108</span><span class="cl">    <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span><span class="n">description</span><span class="o">=</span><span class="s2">&#34;Generate theLook eCommerce data&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">        <span class="s2">&#34;--if_exists&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">111</span><span class="cl">        <span class="s2">&#34;-i&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">        <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">        <span class="n">default</span><span class="o">=</span><span class="s2">&#34;append&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">        <span class="n">choices</span><span class="o">=</span><span class="p">[</span><span class="s2">&#34;fail&#34;</span><span class="p">,</span> <span class="s2">&#34;replace&#34;</span><span class="p">,</span> <span class="s2">&#34;append&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">115</span><span class="cl">        <span class="n">help</span><span class="o">=</span><span class="s2">&#34;The time to wait before generating new user records&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">116</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">117</span><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">118</span><span class="cl">        <span class="s2">&#34;--wait_for&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">119</span><span class="cl">        <span class="s2">&#34;-w&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">120</span><span class="cl">        <span class="nb">type</span><span class="o">=</span><span class="nb">float</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">121</span><span class="cl">        <span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">122</span><span class="cl">        <span class="n">help</span><span class="o">=</span><span class="s2">&#34;The time to wait before generating new user records&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">123</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">124</span><span class="cl">    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">125</span><span class="cl">        <span class="s2">&#34;--max_iter&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">126</span><span class="cl">        <span class="s2">&#34;-m&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">127</span><span class="cl">        <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">128</span><span class="cl">        <span class="n">default</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">129</span><span class="cl">        <span class="n">help</span><span class="o">=</span><span class="s2">&#34;The maxium number of iterations to generate user records&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">130</span><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="ln">131</span><span class="cl">    <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">132</span><span class="cl">    <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">133</span><span class="cl">    <span class="n">main</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">wait_for</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">max_iter</span><span class="p">,</span> <span class="n">if_exists</span><span class="o">=</span><span class="n">args</span><span class="o">.</span><span class="n">if_exists</span><span class="p">)</span>
</span></span></code></pre></div><p>In the following example, we see data is generated in every two seconds (<code>-w 2</code>).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">$ python data_gen.py -w <span class="m">2</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">INFO:root:Generate theLook eCommerce data...
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">INFO:root:Namespace<span class="o">(</span><span class="nv">if_exists</span><span class="o">=</span><span class="s1">&#39;append&#39;</span>, <span class="nv">wait_for</span><span class="o">=</span>2.0, <span class="nv">max_iter</span><span class="o">=</span>-1<span class="o">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">INFO:root:replace records, table - products, <span class="c1"># records - 29120</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">INFO:root:replace records, table - dist_centers, <span class="c1"># records - 10</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">INFO:root:start to create user events - user id: 2a444cd4-aa70-4247-b1c1-9cf9c8cc1924
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">INFO:root:append records, table - users, <span class="c1"># records - 1</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">INFO:root:append records, table - orders, <span class="c1"># records - 1</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">INFO:root:append records, table - order_items, <span class="c1"># records - 2</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">INFO:root:append records, table - inventory_items, <span class="c1"># records - 5</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">INFO:root:append records, table - events, <span class="c1"># records - 14</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">INFO:root:start to create user events - user id: 7d40f7f8-c022-4104-a1a0-9228da07fbe4
</span></span><span class="line"><span class="ln">13</span><span class="cl">INFO:root:append records, table - users, <span class="c1"># records - 1</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">INFO:root:skip records as no user event, table - orders, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">INFO:root:skip records as no user event, table - order_items, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">INFO:root:skip records as no user event, table - inventory_items, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">INFO:root:skip records as no user event, table - events, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">INFO:root:start to create user events - user id: 45f8469c-3e79-40ee-9639-1cb17cd98132
</span></span><span class="line"><span class="ln">19</span><span class="cl">INFO:root:append records, table - users, <span class="c1"># records - 1</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">INFO:root:skip records as no user event, table - orders, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">INFO:root:skip records as no user event, table - order_items, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">INFO:root:skip records as no user event, table - inventory_items, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">INFO:root:skip records as no user event, table - events, <span class="c1"># records - 0</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">INFO:root:start to create user events - user id: 839e353f-07ee-4d77-b1de-2f1af9b12501
</span></span><span class="line"><span class="ln">25</span><span class="cl">INFO:root:append records, table - users, <span class="c1"># records - 1</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">INFO:root:append records, table - orders, <span class="c1"># records - 2</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">INFO:root:append records, table - order_items, <span class="c1"># records - 3</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">INFO:root:append records, table - inventory_items, <span class="c1"># records - 9</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">INFO:root:append records, table - events, <span class="c1"># records - 19</span>
</span></span></code></pre></div><p>When the data gets ingested into the database, we see the following tables are created in the <em>ecommerce</em> schema.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-02-18-realtime-dashboard-1/diagram.png" loading="lazy" width="887" height="783" />
</picture>

</p>

<h3 id="websocket-server" data-numberify>WebSocket Server<a class="anchor ms-1" href="#websocket-server"></a></h3>
<p>This WebSocket server runs a FastAPI-based API using <code>uvicorn</code>. It builds from the local Dockerfile, exposing port 8000, and connects to the PostgreSQL database with credentials and configuration variables. The service processes data with a 5-minute lookback window and refreshes every 5 seconds. The working directory is mounted for access to code, and the service starts only after PostgreSQL is healthy, ensuring database readiness.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># producer/docker-compose.yml</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nn">...</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">producer</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">build</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l">.</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l">Dockerfile</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">producer</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;8000:8000&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span><span class="nt">DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span><span class="nt">DB_PASS</span><span class="p">:</span><span class="w"> </span><span class="l">password</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span><span class="nt">DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span><span class="nt">DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l">develop</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">      </span><span class="nt">LOOKBACK_MINUTES</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">      </span><span class="nt">REFRESH_SECONDS</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">      </span>- <span class="l">uvicorn</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">      </span>- <span class="l">api:app</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">      </span>- --<span class="l">host</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;0.0.0.0&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">      </span>- --<span class="l">port</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;8000&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">      </span>- <span class="l">.:/home/app</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">      </span><span class="nt">postgres</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l">service_healthy</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w"></span><span class="nn">...</span><span class="w">
</span></span></span></code></pre></div>
<h4 id="websocket-server-source" data-numberify>WebSocket Server Source<a class="anchor ms-1" href="#websocket-server-source"></a></h4>
<p>This FastAPI WebSocket server streams real-time data from a PostgreSQL database. It connects using <em>SQLAlchemy</em>, fetches order-related data with a configurable <em>lookback window</em>, and sends updates every few seconds as defined by <em>refresh seconds</em>. A WebSocket manager handles multiple connections, converting database results into JSON before streaming them. The app continuously queries the database, sending fresh data to connected clients until they disconnect. Logging ensures visibility into connections, queries, and errors.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">  1</span><span class="cl"><span class="c1"># producer/api.py</span>
</span></span><span class="line"><span class="ln">  2</span><span class="cl"><span class="kn">import</span> <span class="nn">os</span>
</span></span><span class="line"><span class="ln">  3</span><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="ln">  4</span><span class="cl"><span class="kn">import</span> <span class="nn">asyncio</span>
</span></span><span class="line"><span class="ln">  5</span><span class="cl">
</span></span><span class="line"><span class="ln">  6</span><span class="cl"><span class="kn">from</span> <span class="nn">sqlalchemy</span> <span class="kn">import</span> <span class="n">create_engine</span><span class="p">,</span> <span class="n">Engine</span><span class="p">,</span> <span class="n">Connection</span>
</span></span><span class="line"><span class="ln">  7</span><span class="cl"><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
</span></span><span class="line"><span class="ln">  8</span><span class="cl"><span class="kn">from</span> <span class="nn">fastapi</span> <span class="kn">import</span> <span class="n">FastAPI</span><span class="p">,</span> <span class="n">WebSocket</span><span class="p">,</span> <span class="n">WebSocketDisconnect</span>
</span></span><span class="line"><span class="ln">  9</span><span class="cl">
</span></span><span class="line"><span class="ln"> 10</span><span class="cl"><span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 11</span><span class="cl">
</span></span><span class="line"><span class="ln"> 12</span><span class="cl"><span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 13</span><span class="cl">    <span class="n">LOOKBACK_MINUTES</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;LOOKBACK_MINUTES&#34;</span><span class="p">,</span> <span class="s2">&#34;5&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 14</span><span class="cl">    <span class="n">REFRESH_SECONDS</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;REFRESH_SECONDS&#34;</span><span class="p">,</span> <span class="s2">&#34;5&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 15</span><span class="cl"><span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 16</span><span class="cl">    <span class="n">LOOKBACK_MINUTES</span> <span class="o">=</span> <span class="mi">5</span>
</span></span><span class="line"><span class="ln"> 17</span><span class="cl">    <span class="n">REFRESH_SECONDS</span> <span class="o">=</span> <span class="mi">5</span>
</span></span><span class="line"><span class="ln"> 18</span><span class="cl">
</span></span><span class="line"><span class="ln"> 19</span><span class="cl">
</span></span><span class="line"><span class="ln"> 20</span><span class="cl"><span class="k">def</span> <span class="nf">get_db_engine</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">Engine</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 21</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Creates and returns a SQLAlchemy engine.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 22</span><span class="cl">    <span class="n">user</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DB_USER&#34;</span><span class="p">,</span> <span class="s2">&#34;develop&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 23</span><span class="cl">    <span class="n">password</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DB_PASS&#34;</span><span class="p">,</span> <span class="s2">&#34;password&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 24</span><span class="cl">    <span class="n">host</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DB_HOST&#34;</span><span class="p">,</span> <span class="s2">&#34;localhost&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 25</span><span class="cl">    <span class="n">db_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;DB_NAME&#34;</span><span class="p">,</span> <span class="s2">&#34;develop&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 26</span><span class="cl">
</span></span><span class="line"><span class="ln"> 27</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 28</span><span class="cl">        <span class="k">return</span> <span class="n">create_engine</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 29</span><span class="cl">            <span class="sa">f</span><span class="s2">&#34;postgresql+psycopg2://</span><span class="si">{</span><span class="n">user</span><span class="si">}</span><span class="s2">:</span><span class="si">{</span><span class="n">password</span><span class="si">}</span><span class="s2">@</span><span class="si">{</span><span class="n">host</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="n">db_name</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="n">echo</span><span class="o">=</span><span class="kc">True</span>
</span></span><span class="line"><span class="ln"> 30</span><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="ln"> 31</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 32</span><span class="cl">        <span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Database connection error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 33</span><span class="cl">        <span class="k">raise</span>
</span></span><span class="line"><span class="ln"> 34</span><span class="cl">
</span></span><span class="line"><span class="ln"> 35</span><span class="cl">
</span></span><span class="line"><span class="ln"> 36</span><span class="cl"><span class="k">def</span> <span class="nf">fetch_data</span><span class="p">(</span><span class="n">conn</span><span class="p">:</span> <span class="n">Connection</span><span class="p">,</span> <span class="n">minutes</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 37</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Fetches data from the database with an optional lookback filter.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 38</span><span class="cl">    <span class="n">sql</span> <span class="o">=</span> <span class="s2">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln"> 39</span><span class="cl"><span class="s2">    SELECT
</span></span></span><span class="line"><span class="ln"> 40</span><span class="cl"><span class="s2">        u.id AS user_id
</span></span></span><span class="line"><span class="ln"> 41</span><span class="cl"><span class="s2">        , u.age
</span></span></span><span class="line"><span class="ln"> 42</span><span class="cl"><span class="s2">        , u.gender
</span></span></span><span class="line"><span class="ln"> 43</span><span class="cl"><span class="s2">        , u.country
</span></span></span><span class="line"><span class="ln"> 44</span><span class="cl"><span class="s2">        , u.traffic_source
</span></span></span><span class="line"><span class="ln"> 45</span><span class="cl"><span class="s2">        , o.order_id
</span></span></span><span class="line"><span class="ln"> 46</span><span class="cl"><span class="s2">        , o.id AS item_id
</span></span></span><span class="line"><span class="ln"> 47</span><span class="cl"><span class="s2">        , p.category
</span></span></span><span class="line"><span class="ln"> 48</span><span class="cl"><span class="s2">        , p.cost
</span></span></span><span class="line"><span class="ln"> 49</span><span class="cl"><span class="s2">        , o.status AS item_status
</span></span></span><span class="line"><span class="ln"> 50</span><span class="cl"><span class="s2">        , o.sale_price
</span></span></span><span class="line"><span class="ln"> 51</span><span class="cl"><span class="s2">        , o.created_at
</span></span></span><span class="line"><span class="ln"> 52</span><span class="cl"><span class="s2">    FROM users AS u
</span></span></span><span class="line"><span class="ln"> 53</span><span class="cl"><span class="s2">    JOIN order_items AS o ON u.id = o.user_id
</span></span></span><span class="line"><span class="ln"> 54</span><span class="cl"><span class="s2">    JOIN products AS p ON p.id = o.product_id
</span></span></span><span class="line"><span class="ln"> 55</span><span class="cl"><span class="s2">    &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 56</span><span class="cl">    <span class="k">if</span> <span class="n">minutes</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 57</span><span class="cl">        <span class="n">sql</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">sql</span><span class="si">}</span><span class="s2"> WHERE o.created_at &gt;= current_timestamp - interval &#39;</span><span class="si">{</span><span class="n">minutes</span><span class="si">}</span><span class="s2"> minute&#39;&#34;</span>
</span></span><span class="line"><span class="ln"> 58</span><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 59</span><span class="cl">        <span class="n">sql</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">sql</span><span class="si">}</span><span class="s2"> LIMIT 1&#34;</span>
</span></span><span class="line"><span class="ln"> 60</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 61</span><span class="cl">        <span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_sql</span><span class="p">(</span><span class="n">sql</span><span class="o">=</span><span class="n">sql</span><span class="p">,</span> <span class="n">con</span><span class="o">=</span><span class="n">conn</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 62</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 63</span><span class="cl">        <span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Error reading from database: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 64</span><span class="cl">        <span class="k">return</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 65</span><span class="cl">
</span></span><span class="line"><span class="ln"> 66</span><span class="cl">
</span></span><span class="line"><span class="ln"> 67</span><span class="cl"><span class="n">app</span> <span class="o">=</span> <span class="n">FastAPI</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 68</span><span class="cl">
</span></span><span class="line"><span class="ln"> 69</span><span class="cl">
</span></span><span class="line"><span class="ln"> 70</span><span class="cl"><span class="k">class</span> <span class="nc">ConnectionManager</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 71</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Manages WebSocket connections.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 72</span><span class="cl">
</span></span><span class="line"><span class="ln"> 73</span><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 74</span><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">active_connections</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">WebSocket</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="ln"> 75</span><span class="cl">
</span></span><span class="line"><span class="ln"> 76</span><span class="cl">    <span class="k">async</span> <span class="k">def</span> <span class="nf">connect</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">websocket</span><span class="p">:</span> <span class="n">WebSocket</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 77</span><span class="cl">        <span class="k">await</span> <span class="n">websocket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 78</span><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">active_connections</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">websocket</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 79</span><span class="cl">        <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;New WebSocket connection: </span><span class="si">{</span><span class="n">websocket</span><span class="o">.</span><span class="n">client</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 80</span><span class="cl">
</span></span><span class="line"><span class="ln"> 81</span><span class="cl">    <span class="k">def</span> <span class="nf">disconnect</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">websocket</span><span class="p">:</span> <span class="n">WebSocket</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 82</span><span class="cl">        <span class="k">if</span> <span class="n">websocket</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">active_connections</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 83</span><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">active_connections</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">websocket</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 84</span><span class="cl">            <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;WebSocket disconnected: </span><span class="si">{</span><span class="n">websocket</span><span class="o">.</span><span class="n">client</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 85</span><span class="cl">
</span></span><span class="line"><span class="ln"> 86</span><span class="cl">    <span class="k">async</span> <span class="k">def</span> <span class="nf">send_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">df</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">websocket</span><span class="p">:</span> <span class="n">WebSocket</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 87</span><span class="cl">        <span class="s2">&#34;&#34;&#34;Converts DataFrame to JSON and sends it via WebSocket.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 88</span><span class="cl">        <span class="k">if</span> <span class="ow">not</span> <span class="n">df</span><span class="o">.</span><span class="n">empty</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 89</span><span class="cl">            <span class="k">await</span> <span class="n">websocket</span><span class="o">.</span><span class="n">send_json</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">to_json</span><span class="p">(</span><span class="n">orient</span><span class="o">=</span><span class="s2">&#34;records&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 90</span><span class="cl">
</span></span><span class="line"><span class="ln"> 91</span><span class="cl">
</span></span><span class="line"><span class="ln"> 92</span><span class="cl"><span class="n">manager</span> <span class="o">=</span> <span class="n">ConnectionManager</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 93</span><span class="cl">
</span></span><span class="line"><span class="ln"> 94</span><span class="cl">
</span></span><span class="line"><span class="ln"> 95</span><span class="cl"><span class="nd">@app.websocket</span><span class="p">(</span><span class="s2">&#34;/ws&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 96</span><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">websocket_endpoint</span><span class="p">(</span><span class="n">websocket</span><span class="p">:</span> <span class="n">WebSocket</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 97</span><span class="cl">    <span class="s2">&#34;&#34;&#34;Handles WebSocket connections and continuously streams data.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="ln"> 98</span><span class="cl">    <span class="k">await</span> <span class="n">manager</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">websocket</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 99</span><span class="cl">
</span></span><span class="line"><span class="ln">100</span><span class="cl">    <span class="n">engine</span> <span class="o">=</span> <span class="n">get_db_engine</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">101</span><span class="cl">
</span></span><span class="line"><span class="ln">102</span><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">103</span><span class="cl">        <span class="k">with</span> <span class="n">engine</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span> <span class="k">as</span> <span class="n">conn</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">104</span><span class="cl">            <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">105</span><span class="cl">                <span class="n">df</span> <span class="o">=</span> <span class="n">fetch_data</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">LOOKBACK_MINUTES</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">106</span><span class="cl">                <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Fetched </span><span class="si">{</span><span class="n">df</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s2"> records from database&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">107</span><span class="cl">                <span class="k">await</span> <span class="n">manager</span><span class="o">.</span><span class="n">send_data</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">websocket</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">108</span><span class="cl">                <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">REFRESH_SECONDS</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">109</span><span class="cl">    <span class="k">except</span> <span class="n">WebSocketDisconnect</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">110</span><span class="cl">        <span class="n">manager</span><span class="o">.</span><span class="n">disconnect</span><span class="p">(</span><span class="n">websocket</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">111</span><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">112</span><span class="cl">        <span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;WebSocket error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">113</span><span class="cl">    <span class="k">finally</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">114</span><span class="cl">        <span class="n">engine</span><span class="o">.</span><span class="n">dispose</span><span class="p">()</span>
</span></span></code></pre></div>
<h2 id="deploy-services" data-numberify>Deploy Services<a class="anchor ms-1" href="#deploy-services"></a></h2>
<p>The Docker Compose services can be deployed using the command <code>docker-compose -f producer/docker-compose.yml up -d</code>. Once started, the server can be checked with a <a href="https://github.com/lewoudar/ws/" target="_blank" rel="noopener noreferrer">WebSocket client<i class="fas fa-external-link-square-alt ms-1"></i></a> by executing <code>ws listen ws://localhost:8000/ws</code>, and its logs can be monitored by running <code>docker logs -f producer</code>.</p>
<p><picture><img class="img-fluid mx-auto d-block" alt="" src="/blog/2025-02-18-realtime-dashboard-1/featured.gif" loading="lazy" width="1835" height="776" />
</picture>

</p>
      ]]></content:encoded></item></channel></rss>