Kubernetes

Deploy Python Stream Processing App on Kubernetes - Part 2 Beam Pipeline on Flink Runner

June 6, 202416 min read Data Streaming Kubernetes Deploy Python Stream Processing App on Kubernetes Apache Beam Apache Flink Apache Kafka Docker Kubernetes Python

In this post, we develop an Apache Beam pipeline using the Python SDK and deploy it on an Apache Flink cluster via the Apache Flink Runner. Same as Part I, we deploy a Kafka cluster using the Strimzi Operator on a minikube cluster as the pipeline uses Apache Kafka topics for its data source and sink. Then, we develop the pipeline as a Python package and add the package to a custom Docker image so that Python user code can be executed externally. For deployment, we create a Flink session cluster via the Flink Kubernetes Operator, and deploy the pipeline using a Kubernetes job. Finally, we check the output of the application by sending messages to the input Kafka topic using a Python producer application.

May 30, 202413 min read Data Streaming Kubernetes Deploy Python Stream Processing App on Kubernetes Apache Flink Apache Kafka Docker Kubernetes Python

Flink Kubernetes Operator acts as a control plane to manage the complete deployment lifecycle of Apache Flink applications. With the operator, we can simplify deployment and management of Python stream processing applications. In this series, we discuss how to deploy a PyFlink application and Python Apache Beam pipeline on the Flink Runner on Kubernetes. In Part 1, we first deploy a Kafka cluster on a minikube cluster as the source and sink of the PyFlink application are Kafka topics. Then, the application source is packaged in a custom Docker image and deployed on the minikube cluster using the Flink Kubernetes Operator. Finally, the output of the application is checked by sending messages to the input Kafka topic using a Python producer application.

January 11, 20247 min read Data Integration Data Streaming Kubernetes Kafka Development on Kubernetes Apache Kafka Docker Kafka Connect Kubernetes Minikube Python

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. In this post, we discuss how to set up a data ingestion pipeline using Kafka connectors. Fake customer and order data is ingested into Kafka topics using the MSK Data Generator. Also, we use the Confluent S3 sink connector to save the messages of the topics into a S3 bucket. The Kafka Connect servers and individual connectors are deployed using the custom resources of Strimzi on Kubernetes.

January 4, 20248 min read Data Streaming Kubernetes Kafka Development on Kubernetes Apache Kafka Docker Kubernetes Minikube Python Strimzi

Apache Kafka has five core APIs, and we can develop applications to send/read streams of data to/from topics in a Kafka cluster using the producer and consumer APIs. While the main Kafka project maintains only the Java APIs, there are several open source projects that provide the Kafka client APIs in Python. In this post, we discuss how to develop Kafka client applications using the kafka-python package on Kubernetes.

December 21, 20237 min read Data Streaming Kubernetes Kafka Development on Kubernetes Apache Kafka Docker Kubernetes Minikube Python Strimzi

Apache Kafka is one of the key technologies for implementing data streaming architectures. Strimzi provides a way to run an Apache Kafka cluster and related resources on Kubernetes in various deployment configurations. In this series of posts, we will discuss how to create a Kafka cluster, to develop Kafka client applications in Python and to build a data pipeline using Kafka connectors on Kubernetes.

October 12, 20235 min readCKAD Kubernetes

I recently obtained the Certified Kubernetes Application Developer (CKAD) certification. It is for Kubernetes engineers, cloud engineers and other IT professionals responsible for building, deploying, and configuring cloud native applications with Kubernetes. In this post, I will summarise how I prepared for the exam by reviewing three online courses and two practice tests that I went through.

January 17, 202214 min read Data Engineering Amazon EKS Amazon EMR Apache Spark AWS EMR on EKS Kubernetes

EMR on EKS is a deployment option in EMR that allows you to automate the provisioning and management of open-source big data frameworks on EKS. It can be an effective way of running spark jobs to manage big data (as well as non-big data) workloads. In this post, we’ll discuss EMR on EKS with simple and elaborated examples.

November 1, 201912 min read Development Docker Kubernetes Minikube Python R WSL

In this post, I'll demonstrate how to create a Linux development environment on Windows using WSL. Also an example app (Rserve web service with a sidecar container) on Minikube will be demonstrated.

Deploy Python Stream Processing App on Kubernetes - Part 2 Beam Pipeline on Flink Runner

Deploy Python Stream Processing App on Kubernetes - Part 1 PyFlink Application

Kafka Development on Kubernetes - Part 3 Kafka Connect

Kafka Development on Kubernetes - Part 2 Producer and Consumer

Kafka Development on Kubernetes - Part 1 Cluster Setup

How I Prepared for Certified Kubernetes Application Developer (CKAD)

EMR on EKS by Example

Linux Dev Environment on Windows