Amazon Athena

Data Build Tool (Dbt) Pizza Shop Demo - Part 6 ETL on Amazon Athena via Airflow

March 14, 20248 min read Data Engineering DBT Pizza Shop Demo Amazon Athena Apache Airflow Apache Iceberg AWS Data Build Tool (DBT)Docker Docker Compose Python

In Part 5, we developed a dbt project that that targets Apache Iceberg where transformations are performed on Amazon Athena. Two dimension tables that keep product and user records are created as Type 2 slowly changing dimension (SCD Type 2) tables, and one transactional fact table is built to keep pizza orders. To improve query performance, the fact table is denormalized to pre-join records from the dimension tables using the array and struct data types. In this post, we discuss how to set up an ETL process on the project using Apache Airflow.

March 7, 202416 min read Data Engineering DBT Pizza Shop Demo Amazon Athena Apache Iceberg AWS Data Build Tool (DBT)Python

In Part 1 and Part 3, we developed data build tool (dbt) projects that target PostgreSQL and BigQuery using fictional pizza shop data. The data is modelled by SCD type 2 dimension tables and one transactional fact table. While the order records should be joined with dimension tables to get complete details for PostgreSQL, the fact table is denormalized using nested and repeated fields to improve query performance for BigQuery. Open Table Formats such as Apache Iceberg bring a new opportunity that implements data warehousing features in a data lake (i.e. data lakehouse) and Amazon Athena is probably the easiest way to perform such tasks on AWS. In this post, we create a new dbt project that targets Apache Iceberg where transformations are performed on Amazon Athena. Data modelling is similar to the BigQuery project where the dimension tables are modelled by the SCD type 2 approach and the fact table is denormalized using the array and struct data types.

November 16, 202316 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon Athena Amazon MSK Amazon S3 Apache Flink Apache Kafka AWS Docker Docker Compose Pyflink Python

In this lab, we will create a Pyflink application that exports Kafka topic messages into a S3 bucket. The app enriches the records by adding a new column using a user defined function and writes them via the FileSystem SQL connector. This allows us to achieve a simpler architecture compared to the original lab where the records are sent into Amazon Kinesis Data Firehose, enriched by a separate Lambda function and written to a S3 bucket afterwards. While the records are being written to the S3 bucket, a Glue table will be created to query them on Amazon Athena.

October 5, 20236 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon Athena Amazon DyanmoDB Amazon MSK Amazon MSK Connect Amazon OpenSearch Service Amazon S3 Apache Camel Apache Flink Apache Kafka AWS AWS Glue AWS Lambda Docker Docker Compose Kafka Connect OpenSearch Pyflink Python

This series updates a real time analytics app based on Amazon Kinesis from an AWS workshop. Data is ingested from multiple sources into a Kafka cluster instead and Flink (Pyflink) apps are used extensively for data ingesting and processing. As an introduction, this post compares the original architecture with the new architecture, and the app will be implemented in subsequent posts.

March 14, 202312 min read Data Streaming Simplify Streaming Ingestion on AWS Amazon Athena Amazon EventBridge Amazon MSK Apache Kafka AWS AWS Lambda AWS SAM Python Terraform

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 2, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Athena. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

December 6, 202215 min read Data Engineering DBT for Effective Data Transformation on AWS Amazon Athena Amazon QuickSight AWS Data Build Tool (DBT)Terraform

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services - Redshift, Glue, EMR and Athena. In the last part of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon Athena. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Data Build Tool (Dbt) Pizza Shop Demo - Part 6 ETL on Amazon Athena via Airflow

Data Build Tool (Dbt) Pizza Shop Demo - Part 5 Modelling on Amazon Athena

Real Time Streaming With Kafka and Flink - Lab 3 Transform and Write Data to S3 From Kafka Using Flink

Real Time Streaming With Kafka and Flink - Introduction

Simplify Streaming Ingestion on AWS – Part 2 MSK and Athena

Data Build Tool (Dbt) for Effective Data Transformation on AWS – Part 5 Athena