Apache Iceberg

Building an Agentic Analytics System Over an Iceberg Lakehouse

July 18, 20266 min read Artificial Intelligence Data Engineering Open Source Agentic AI Apache Iceberg Dynamic-Des Mem0 Odctl Semantic Layer Strands Text-to-SQL Trino WrenAI

Direct text-to-SQL is hard to trust in production because raw schemas do not capture governed metrics or business meaning. This post walks through a local, open-source proof of concept that puts a semantic layer between the language model and the lakehouse, combining Strands, WrenAI, Trino, Iceberg, and long-term agent memory.

May 6, 20256 min read Big Data Data Architecture Data Engineering Data Streaming Apache Flink Apache Iceberg Apache Paimon Fluss

The world of data is converging. The traditional divide between batch processing for historical analytics and stream processing for real-time insights is becoming increasingly blurry. Businesses demand architectures that handle both seamlessly. Enter the “Streamhouse” - an evolution of the Lakehouse concept, designed with streaming as a first-class citizen.

Today, we’ll introduce three key open-source technologies shaping this space: Apache Paimon™, Fluss, and Apache Iceberg. While each has unique strengths, their true power lies in how they can be integrated to build robust, flexible, and performant data platforms.

June 26, 202212 min read Data Engineering Amazon EMR Apache Iceberg Apache Spark AWS PySpark Python

We'll discuss how to implement data warehousing ETL using Iceberg for data storage/management and Spark for data processing. A Pyspark ETL app will be used for demonstration in an EMR local environment. Finally the ETL results will be queried by Athena for verification.

Building an Agentic Analytics System Over an Iceberg Lakehouse

Meet the Streamhouse Trio - Paimon, Fluss, and Iceberg for Unified Data Architectures

Data Warehousing ETL Demo With Apache Iceberg on EMR Local Environment