Skip to content

Host a Docker container for the Spark history server / Spark UI of AWS Glue jobs

Notifications You must be signed in to change notification settings

ev2900/Glue_Spark_History_Server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 

Repository files navigation

AWS Glue Spark UI Log via. History Server

map-user map-user

AWS Glue provides a serverless Spark execution environment. The Spark UI via. a Spark history server can help with performance tuning jobs.

Glue does not provide a Spark history server. However, Glue jobs can produce Spark UI logs in an S3 bucket. You can subsequently host your own Spark history server that visualizes the Spark UI logs directly from the S3 bucket Glue logs in.

The instructions below are for Windows. It uses a docker container to run the Spark history server and directly consumes Spark UI logs from an S3 bucket in real time.

Spark History Server Install

  1. Install Docker Desktop on Windows

  2. Enable Spark UI logs for a Glue job if it is not already enabled. Instructions for how to enable the Spark web UI for glue jobs are provided in the AWS documentation

  3. Download the required docker files that correspond to the version of Glue

  1. Build the Docker image by running the following from the command line in the folder you downloaded the docker files in

docker build -t glue/sparkui:latest .

  1. Set the following environment variables from the command line
  • Replace the <S3_BUCKET_PATH_TO_SPARK_UI_LOGS> with the name of the S3 bucket and path to the folder that contains the Spark UI logs from the Glue job(s)

set LOG_DIR="s3a://<S3_BUCKET_PATH_TO_SPARK_UI_LOGS>"

  • Replace <AWS_ACCESS_KEY_ID> and <AWS_SECRET_ACCESS_KEY> with the access key id and secret access key for a user that has access to read the Spark UI files from the S3 bucket

set AWS_ACCESS_KEY_ID="<AWS_ACCESS_KEY_ID>"

set AWS_SECRET_ACCESS_KEY="<AWS_SECRET_ACCESS_KEY>"

  1. Create the docker container running the following from the command line

docker run -itd -e SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=%LOG_DIR% -Dspark.hadoop.fs.s3a.access.key=%AWS_ACCESS_KEY_ID% -Dspark.hadoop.fs.s3a.secret.key=%AWS_SECRET_ACCESS_KEY%" -p 18080:18080 --name sparkui glue/spark:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"

  1. The Spark UI will be avaiable at http://localhost:18080/

About

Host a Docker container for the Spark history server / Spark UI of AWS Glue jobs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published