Jaehyeon Kim
Jaehyeon Kim

  • Blog
    • Categories

      List of categories.

    • Tags

      List of tags.

    • Series

      List of series.

  • Slides

/

  • Github Linkedin RSS

  • Font Size
  • Palette
  • Mode
  1. Home
  2. Tags
  3. Apache Spark

AWS Glue Local Development With Docker and Visual Studio Code

featured.png
August 20, 20219 min read Data EngineeringApache SparkAWSAWS GlueDockerPySparkPython

In this post, I'll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote - Containers extension.

Read More

Boost SparkR With Hive

April 30, 20168 min read Data EngineeringApache HiveApache SparkHiveQLRSparkR

One option to boost SparkR's performance as a data processing engine is manipulating data in Hive Context rather than in limited SQL Context. In this post, we discuss how to run SparkR in Hive Context.

Read More

Quick Start SparkR in Local and Cluster Mode

March 2, 20168 min read Data EngineeringApache SparkRSparkR

In this post, we discuss how to execute SparkR in a local and cluster mode.

Read More

Spark Cluster Setup on VirtualBox

February 22, 20163 min read Data EngineeringApache SparkRSparkR

We discuss how to set up a Spark cluser between 2 Ubuntu guests. Firstly it begins with machine preparation.

Read More
  • ««
  • «
  • 1
  • 2
  • »
  • »»
Profile
Jaehyeon Kim
Jaehyeon Kim
Developer Experience at Factor House
Taxonomies
Data Streaming 68 Data Engineering 33 Development 28 Data Analysis 17 Data Integration 12 Kubernetes 5 Security 5 Data Architecture 3 Data Processing 3 Machine Learning 3 Big Data 2 System Architecture 2 Data Platform 1 Industry 4.0 1 Stream Processing 1 Web Development 1
Python 72 Apache Kafka 63 Docker 51 AWS 50 R 37 Apache Flink 32 Kpow 23 Apache Beam 17 Kafka Connect 15 Amazon MSK 14 Apache Spark 14 AWS Lambda 14 Dbt 13 Amazon EMR 11 Kubernetes 8 Pyflink 8 Kotlin 7 Change Data Capture (CDC) 6 Debezium 6 PostgreSQL 6 Amazon DynamoDB 5 Apache Airflow 5 Factor House Local 5 PySpark 5 Amazon API Gateway 4 Amazon Athena 4 AWS Glue 4 AWS Glue Schema Registry 4 BigQuery 4 FastAPI 4 Minikube 4 R Shiny 4 RServe 4 Amazon EKS 3 Amazon QuickSight 3 Apache Hudi 3 Digital Twin 3 Discrete Event Simulation 3 EMR on EKS 3 GCP 3 ALL 151
Kafka Development With Docker 11 Apache Beam Python Examples 10 Real Time Streaming With Kafka and Flink 7 DBT Pizza Shop Demo 6 Tree Based Methods in R 6 Apache Beam Local Development With Python 5 DBT for Effective Data Transformation on AWS 5 Getting Started With Real-Time Streaming in Kotlin 5 Kafka Connect for AWS Services Integration 5 Serverless Data Product 4 Data Lake Demo Using Change Data Capture 3 Getting Started With Pyflink on AWS 3 Kafka Development on Kubernetes 3 Parallel Processing on Single Machine 3 Realtime Dashboard With FastAPI, Streamlit and Next.js 3 API Development With R 2 Building Real-Time Digital Twins With Dynamic-Des 2 DBT Guide for Production 2 Deploy Python Stream Processing App on Kubernetes 2 From Prototype to Production: Real-Time Product Recommendation With Contextual Bandits 2 ALL 23
2026 6 2025 13 2024 29 2023 39 2022 15 2021 7 2020 1 2019 5 2018 2 2017 6 2016 6 2015 15 2014 5
Posts
  • featured.png
    Building an Event-Driven Hybrid Digital Twin With Dynamic-Des
    April 29, 2026
  • featured.png
    Why Digital Twins Are Rewiring Industry 4.0
    April 22, 2026
  • featured.png
    Building a Real-Time Industrial Digital Twin With Apache Flink and Online Machine Learning
    April 21, 2026
  • featured.png
    Slides as Code: Integrating Reveal.js Into My Hugo Blog
    March 9, 2026
  • featured.gif
    Productionizing an Online Product Recommender Using Event Driven Architecture
    February 23, 2026
  • featured.png
    Stream Processing With Flink in Kotlin
    December 10, 2025
  • featured.gif
    Guide to Building Integrated Web Applications With FastAPI and NiceGUI
    November 19, 2025
  • featured.png
    Self-Service Data Platform via a Multi-Tenant SQL Gateway
    July 17, 2025
  • featured.png
    Flink Table API - Declarative Analytics for Supplier Stats in Real Time
    June 17, 2025
  • featured.png
    Flink DataStream API - Scalable Event Processing for Supplier Stats
    June 10, 2025
  • featured.png
    Building an Event-Driven Hybrid Digital Twin With Dynamic-Des
    April 29, 2026
  • featured.png
    Why Digital Twins Are Rewiring Industry 4.0
    April 22, 2026
  • featured.png
    Building a Real-Time Industrial Digital Twin With Apache Flink and Online Machine Learning
    April 21, 2026
  • featured.png
    Slides as Code: Integrating Reveal.js Into My Hugo Blog
    March 9, 2026
  • featured.gif
    Productionizing an Online Product Recommender Using Event Driven Architecture
    February 23, 2026
  • featured.gif
    Prototyping an Online Product Recommender in Python
    January 27, 2026
  • featured.png
    Stream Processing With Flink in Kotlin
    December 10, 2025
  • featured.gif
    Guide to Building Integrated Web Applications With FastAPI and NiceGUI
    November 19, 2025
  • featured.png
    Self-Service Data Platform via a Multi-Tenant SQL Gateway
    July 17, 2025
  • featured.png
    Flink Table API - Declarative Analytics for Supplier Stats in Real Time
    June 17, 2025
Actions
Go back Reload Copy URL

Jaehyeon Kim

Developer Experience at Factor House

Copyright © 2023-2026 Jaehyeon Kim. All Rights Reserved.