Blogs

Kafka Connect for AWS Services Integration - Part 4 Develop Aiven OpenSearch Sink Connector

October 23, 202312 min read Data Integration Data Streaming Kafka Connect for AWS Services Integration Apache Kafka AWS Docker Kafka Connect Kpow OpenSearch

Kafka Connect can be an effective tool to ingest data from Apache Kafka into OpenSearch. In this post, we will discuss how to develop a data pipeline from Apache Kafka into OpenSearch locally using Docker while the pipeline will be deployed on AWS in the next post. Fake impressions and clicks data will be pushed into Kafka topics using a Kafka source connector and those records will be ingested into OpenSearch indexes using a sink connector for near-real time analytics.

October 19, 20236 min read Data Streaming Apache Flink Apache Kafka Docker Pyflink Python

Building Apache Flink Applications in Java by Confluent is a course to introduce Apache Flink through a series of hands-on exercises. Utilising the Flink DataStream API, the course develops three Flink applications from ingesting source data into calculating usage statistics. As part of learning the Flink DataStream API in Pyflink, I converted the Java apps into Python equivalent while performing the course exercises in Pyflink. This post summarises the progress of the conversion and shows the final output.

October 12, 20235 min readCKAD Kubernetes

I recently obtained the Certified Kubernetes Application Developer (CKAD) certification. It is for Kubernetes engineers, cloud engineers and other IT professionals responsible for building, deploying, and configuring cloud native applications with Kubernetes. In this post, I will summarise how I prepared for the exam by reviewing three online courses and two practice tests that I went through.

October 5, 20236 min read Data Streaming Real Time Streaming With Kafka and Flink Amazon MSK Apache Flink Apache Kafka AWS Pyflink

This series updates a real time analytics app based on Amazon Kinesis from an AWS workshop. Data is ingested from multiple sources into a Kafka cluster instead and Flink (Pyflink) apps are used extensively for data ingesting and processing. As an introduction, this post compares the original architecture with the new architecture, and the app will be implemented in subsequent posts.

September 14, 202317 min read Data Streaming Kafka, Flink and DynamoDB for Real Time Fraud Detection Amazon DynamoDB Apache Flink Apache Kafka AWS Kpow Python

This series aims to help those who are new to Apache Flink and Amazon Managed Service for Apache Flink by re-implementing a simple fraud detection application that is discussed in an AWS workshop titled AWS Kafka and DynamoDB for real time fraud detection. In part 1, I demonstrated how to develop the application locally, and the app will be deployed via Amazon Managed Service for Apache Flink in this post.

September 4, 202313 min read Data Streaming Getting Started With Pyflink on AWS Amazon MSK Apache Kafka AWS Docker Kpow Pyflink Python

In this series of posts, we discuss a Flink (Pyflink) application that reads/writes from/to Kafka topics. In the previous posts, I demonstrated a Pyflink app that targets a local Kafka cluster as well as a Kafka cluster on Amazon MSK. The app was executed in a virtual environment as well as in a local Flink cluster for improved monitoring. In this post, the app will be deployed via Amazon Managed Service for Apache Flink.

August 28, 202320 min read Data Streaming Getting Started With Pyflink on AWS Amazon MSK Apache Flink Apache Kafka AWS Kpow Pyflink Python

In this series of posts, we discuss a Flink (Pyflink) application that reads/writes from/to Kafka topics. In part 1, an app that targets a local Kafka cluster was created. In this post, we will update the app by connecting a Kafka cluster on Amazon MSK. The Kafka cluster is authenticated by IAM and the app has additional jar dependency. As Amazon Managed Service for Apache Flink does not allow you to specify multiple pipeline jar files, we have to build a custom Uber Jar that combines multiple jar files. Same as part 1, the app will be executed in a virtual environment as well as in a local Flink cluster for improved monitoring with the updated pipeline jar file.

August 17, 202316 min read Data Streaming Getting Started With Pyflink on AWS Apache Flink Apache Kafka Docker Kpow Pyflink Python

Apache Flink is widely used for building real-time stream processing applications. On AWS, Amazon Managed Service for Apache Flink is the easiest option to develop a Flink app as it provides the underlying infrastructure. Updating a guide from AWS, this series of posts discuss how to develop and deploy a Flink (Pyflink) application via KDA where the data source and sink are Kafka topics. In part 1, the app will be developed locally targeting a Kafka cluster created by Docker. Furthermore, it will be executed in a virtual environment as well as in a local Flink cluster for improved monitoring.

August 10, 202316 min read Data Streaming Kafka, Flink and DynamoDB for Real Time Fraud Detection Amazon DynamoDB Apache Flink Apache Kafka AWS Docker Kpow Python

Apache Flink is widely used for building real-time stream processing applications. On AWS, Amazon Managed Service for Apache Flink is the easiest option to develop a Flink app as it provides the underlying infrastructure. Re-implementing a solution from an AWS workshop, this series of posts discuss how to develop and deploy a fraud detection app using Kafka, Flink and DynamoDB. Part 1 covers local development using Docker while deployment via KDA will be discussed in part 2.

July 20, 202314 min read Data Streaming Security Kafka Development With Docker Apache Kafka Docker Python

In the previous posts, we discussed how to implement client authentication by TLS (SSL or TLS/SSL) and SASL authentication. One of the key benefits of client authentication is achieving user access control. In this post, we will discuss how to configure Kafka authorization with Java and Python client examples while SASL is kept for client authentication.

Kafka Connect for AWS Services Integration - Part 4 Develop Aiven OpenSearch Sink Connector

Building Apache Flink Applications in Python

How I Prepared for Certified Kubernetes Application Developer (CKAD)

Real Time Streaming With Kafka and Flink - Introduction

Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 2 Deployment via AWS Managed Flink

Getting Started With Pyflink on AWS - Part 3 AWS Managed Flink and MSK

Getting Started With Pyflink on AWS - Part 2 Local Flink and MSK

Getting Started With Pyflink on AWS - Part 1 Local Flink and Local Kafka

Kafka, Flink and DynamoDB for Real Time Fraud Detection - Part 1 Local Development

Kafka Development With Docker - Part 11 Kafka Authorization