Amazon Athena

Data Build Tool (Dbt) Pizza Shop Demo - Part 6 ETL on Amazon Athena via Airflow

March 14, 20248 min read Data Engineering DBT Pizza Shop Demo Amazon Athena Apache Airflow AWS Dbt Docker Python

In Part 5, we developed a dbt project that that targets Apache Iceberg where transformations are performed on Amazon Athena. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. To improve query performance, the fact table is denormalized to pre-join records from the dimension tables using the array and struct data types. In this post, we discuss how to set up an ETL process on the project using Apache Airflow.

March 7, 202416 min read Data Engineering DBT Pizza Shop Demo Amazon Athena AWS Dbt Docker Python

In Part 1 and Part 3, we developed data build tool (dbt) projects that target PostgreSQL and BigQuery using fictional pizza shop data. The data is modelled by SCD type 2 dimension tables and one transactional fact table. While the order records should be joined with dimension tables to get complete details for PostgreSQL, the fact table is denormalized using nested and repeated fields to improve query performance for BigQuery. Open Table Formats such as Apache Iceberg bring a new opportunity that implements data warehousing features in a data lake (i.e. data lakehouse) and Amazon Athena is probably the easiest way to perform such tasks on AWS. In this post, we create a new dbt project that targets Apache Iceberg where transformations are performed on Amazon Athena. Data modelling is similar to the BigQuery project where the dimension tables are modelled by the SCD type 2 approach and the fact table is denormalized using the array and struct data types.

March 14, 202312 min read Data Streaming Simplify Streaming Ingestion on AWS Amazon Athena Amazon MSK Apache Kafka AWS AWS Lambda Python

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 2, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Athena. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

December 6, 202215 min read Data Engineering DBT for Effective Data Transformation on AWS Amazon Athena Amazon QuickSight AWS AWS Glue Dbt

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services - Redshift, Glue, EMR and Athena. In the last part of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon Athena. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Data Build Tool (Dbt) Pizza Shop Demo - Part 6 ETL on Amazon Athena via Airflow

Data Build Tool (Dbt) Pizza Shop Demo - Part 5 Modelling on Amazon Athena

Simplify Streaming Ingestion on AWS – Part 2 MSK and Athena

Data Build Tool (Dbt) for Effective Data Transformation on AWS – Part 5 Athena