PySpark

Develop and Test Apache Spark Apps for EMR Remotely Using Visual Studio Code

September 7, 202215 min read Data Engineering Amazon EMR Apache Spark AWS PySpark

We will discuss how to set up a remote dev environment on an EMR cluster deployed in a private subnet with VPN and the VS Code remote SSH extension. Typical Spark development examples will be illustrated while sharing the cluster with multiple users. Overall it brings an effective way of developing Spark apps on EMR, which improves developer experience significantly.

June 26, 202212 min read Data Engineering Amazon EMR Apache Iceberg Apache Spark AWS PySpark Python

We'll discuss how to implement data warehousing ETL using Iceberg for data storage/management and Spark for data processing. A Pyspark ETL app will be used for demonstration in an EMR local environment. Finally the ETL results will be queried by Athena for verification.

May 8, 202217 min read Data Engineering Amazon EMR Apache Spark AWS Docker PySpark

We'll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.

November 14, 20218 min read Data Engineering AWS AWS Glue Docker PySpark Python

Recently AWS Glue 3.0 was released but a docker image for this version is not published. In this post, I’ll illustrate how to create a development environment for AWS Glue 3.0 (and later versions) by building a custom docker image.

August 20, 20219 min read Data Engineering Apache Spark AWS AWS Glue Docker PySpark Python

In this post, I'll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote - Containers extension.

Develop and Test Apache Spark Apps for EMR Remotely Using Visual Studio Code

Data Warehousing ETL Demo With Apache Iceberg on EMR Local Environment

Develop and Test Apache Spark Apps for EMR Locally Using Docker

Local Development of AWS Glue 3.0 and Later

AWS Glue Local Development With Docker and Visual Studio Code