featured.png

I recently obtained the Confluent Certified Developer for Apache Kafka (CCDAK) certification. It focuses on knowledge of developing applications that work with Kafka, and is targeted to developers and solutions architects. As it assumes Java APIs for development and testing, I am contacted to share how I prepared for it as a non-Java developer from time to time. I thought it would be better to write a post to summarise how I did it rather than answering to them individually.

featured.png

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It can be used to build real-time data pipeline on AWS effectively. In this post, I will introduce available Kafka connectors mainly for AWS services integration. Also, developing and deploying some of them will be covered in later posts.

featured.png

Glue Schema Registry provides a centralized repository for managing and validating schemas for topic message data. Its features can be utilized by many AWS services when building data streaming applications. In this post, we will discuss how to integrate Python Kafka producer and consumer apps in AWS Lambda with the Glue Schema Registry.

featured.png

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 2, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Athena. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

featured.png

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 1, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Redshift. We also use AWS SAM integrated with Terraform for developing the producer Lambda function locally.

featured.png

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services - Redshift, Glue, EMR and Athena. In the last part of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon Athena. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

featured.png

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services - Redshift, Glue, EMR and Athena. In part 4 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon EMR on EKS. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.