PostgreSQL

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 1 Data Producer

February 18, 202510 min read Development Realtime Dashboard With FastAPI, Streamlit and Next.js Docker FastAPI PostgreSQL Python WebSocket

In this series, we develop real-time monitoring dashboard applications. A data generating app is created with Python, and it ingests the theLook eCommerce data continuously into a PostgreSQL database. A WebSocket server, built by FastAPI, periodically queries the data to serve its clients. The monitoring dashboards will be developed using Streamlit and Next.js, with Apache ECharts for visualization. In this post, we walk through the data generation app and backend API, while the monitoring dashboards will be discussed in later posts.

November 7, 202411 min read Data Integration Data Streaming Change Data Capture (CDC)Debezium GCP Sub PostgreSQL Sub Emulator

Change data capture (CDC) is a data integration pattern to track changes in a database so that actions can be taken using the changed data. Debezium is probably the most popular open source platform for CDC. Originally providing Kafka source connectors, it also supports a ready-to-use application called Debezium server. The standalone application can be used to stream change events to other messaging infrastructure such as Google Cloud Pub/Sub, Amazon Kinesis and Apache Pulsar. In this post, we develop a CDC solution locally using Docker. The source of the theLook eCommerce is modified to generate data continuously, and the data is inserted into multiple tables of a PostgreSQL database. Among those tables, two of them are tracked by the Debezium server, and it pushes row-level changes of those tables into Pub/Sub topics on the Pub/Sub emulator. Finally, messages of the topics are read by a Python application.

January 25, 20249 min read Data Engineering DBT Pizza Shop Demo Apache Airflow Dbt Docker PostgreSQL Python

In this series of posts, we discuss data warehouse/lakehouse examples using data build tool (dbt) including ETL orchestration with Apache Airflow. In Part 1, we developed a dbt project on PostgreSQL with fictional pizza shop data. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. In this post, we discuss how to set up an ETL process on the project using Apache Airflow.

January 18, 202415 min read Data Engineering DBT Pizza Shop Demo Dbt Docker PostgreSQL Python

The data build tool (dbt) is a popular data transformation tool for data warehouse development. Moreover, it can be used for data lakehouse development thanks to open table formats such as Apache Iceberg, Apache Hudi and Delta Lake. dbt supports key AWS analytics services and I wrote a series of posts that discuss how to utilise dbt with Redshift, Glue, EMR on EC2, EMR on EKS, and Athena. Those posts focus on platform integration, however, they do not show realistic ETL scenarios. In this series of posts, we discuss practical data warehouse/lakehouse examples including ETL orchestration with Apache Airflow. As a starting point, we develop a dbt project on PostgreSQL using fictional pizza shop data in this post.

February 6, 202213 min read Development Amazon Aurora AWS PostgreSQL SoftEther VPN Terraform

We'll discuss how to set up a development infrastructure on AWS with Terraform. Terraform is used as an effective way of managing resources on AWS. An Aurora PostgreSQL cluster is created in a private subnet and SoftEther VPN is configured to access the database from the developer machine.

Realtime Dashboard With FastAPI, Streamlit and Next.js - Part 1 Data Producer

Change Data Capture (CDC) Local Development With PostgreSQL, Debezium Server and Pub/Sub Emulator

Data Build Tool (Dbt) Pizza Shop Demo - Part 2 ETL on PostgreSQL via Airflow

Data Build Tool (Dbt) Pizza Shop Demo - Part 1 Modelling on PostgreSQL

Simplify Your Development on AWS With Terraform