<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Parquet on Jaehyeon Kim</title><link>https://jaehyeon.me/tags/parquet/</link><description>Recent content in Parquet on Jaehyeon Kim</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Copyright © 2023-2026 Jaehyeon Kim. All Rights Reserved.</copyright><lastBuildDate>Thu, 21 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://jaehyeon.me/tags/parquet/index.xml" rel="self" type="application/rss+xml"/><item><title>One Simulation, Two Pipelines: Batch Training and Live Inference with Dynamic DES v0.8.1</title><link>https://jaehyeon.me/blog/2026-05-25-dynamic-des-parquet-support/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://jaehyeon.me/blog/2026-05-25-dynamic-des-parquet-support/</guid><description>Training a machine learning model on simulated data is straightforward until you try to deploy it. The disconnect usually happens at the pipeline level: training requires massive, historical batch data (like Parquet files in an S3 bucket), but production inference requires real-time, event-driven streams (like Kafka or Redis).
Maintaining two separate simulation codebases, one for generating training data and another for streaming live events, introduces friction, schema mismatches, and duplicated engineering effort.</description><enclosure url="https://jaehyeon.me/blog/2026-05-25-dynamic-des-parquet-support/featured.png" length="168665" type="image/png"/></item></channel></rss>