Apache Spark

AWS Glue Local Development With Docker and Visual Studio Code

August 20, 20219 min read Data Engineering Apache Spark AWS AWS Glue Docker PySpark Python

In this post, I'll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote - Containers extension.

Boost SparkR With Hive

April 30, 20168 min read Data Engineering Apache Hive Apache Spark HiveQL R SparkR

One option to boost SparkR's performance as a data processing engine is manipulating data in Hive Context rather than in limited SQL Context. In this post, we discuss how to run SparkR in Hive Context.

Quick Start SparkR in Local and Cluster Mode

March 2, 20168 min read Data Engineering Apache Spark R SparkR

In this post, we discuss how to execute SparkR in a local and cluster mode.

Spark Cluster Setup on VirtualBox

February 22, 20163 min read Data Engineering Apache Spark R SparkR

We discuss how to set up a Spark cluser between 2 Ubuntu guests. Firstly it begins with machine preparation.