GRPC - Tags - Jaehyeon Kim

Apache Beam Python Examples - Part 6 Call RPC Service in Batch With Defined Batch Size Using Stateful DoFn

October 2, 202414 min read Data Streaming Apache Beam Python Examples Apache Beam Apache Flink Apache Kafka GRPC Python

In the previous post, we continued discussing an Apache Beam pipeline that arguments input data by calling a Remote Procedure Call (RPC) service. A pipeline was developed that makes a single RPC call for a bundle of elements. The bundle size is determined by the runner, however, we may encounter an issue e.g. if an RPC service becomes quite slower if many elements are included in a single request. We can improve the pipeline using stateful DoFn where the number elements to process and maximum wait seconds can be controlled by state and timers. Note that, although the stateful DoFn used in this post solves the data augmentation task well, in practice, we should use the built-in transforms such as BatchElements and GroupIntoBatches whenever possible.

September 18, 202411 min read Data Streaming Apache Beam Python Examples Apache Beam Apache Flink Apache Kafka GRPC Python

In the previous post, we developed an Apache Beam pipeline where the input data is augmented by a Remote Procedure Call (RPC) service. Each input element performs an RPC call and the output is enriched by the response. This is not an efficient way of accessing an external service provided that the service can accept more than one element. In this post, we discuss how to enhance the pipeline so that a single RPC call is made for a bundle of elements, which can save a significant amount time compared to making a call for each element.

August 15, 202413 min read Data Streaming Apache Beam Python Examples Apache Beam Apache Flink Apache Kafka GRPC Python

In this post, we develop an Apache Beam pipeline where the input data is augmented by a Remote Procedure Call (RPC) service. Each input element performs an RPC call and the output is enriched by the response. This is not an efficient way of accessing an external service provided that the service can accept more than one element. In the subsequent two posts, we will discuss updated pipelines that make RPC calls more efficiently. We begin with illustrating how to manage development resources followed by demonstrating the RPC service that we use in this series. Finally, we develop a Beam pipeline that accesses the external service to augment the input elements.

Apache Beam Python Examples - Part 6 Call RPC Service in Batch With Defined Batch Size Using Stateful DoFn

Apache Beam Python Examples - Part 5 Call RPC Service in Batch Using Stateless DoFn

Apache Beam Python Examples - Part 4 Call RPC Service for Data Augmentation