- Get JDK. Tested with
1.7.0_55
and1.8.0_91
on OSX - Execute
./gradlew clean shadowJar
(or./gradlew.bat clean shadowJar
on Windows)
Execute ./gradlew test
(or ./gradlew.bat test
on Windows)
- Download
csv
files from http://stat-computing.org/dataexpo/2009/the-data.html and uncompress them. - Download Spark distribution from http://spark.apache.org/downloads.html. Tested with
spark-2.0.0-bin-hadoop2.7.tgz
only. - Build the project.
- Execute:
spark-2.0.0-bin-hadoop2.7/bin/spark-submit --master "local[*]" --class com.github.saulius.flightstats.JobRunner build/libs/flightstats-all.jar com.github.saulius.flightstats.jobs.ArrivalDelayPredictionJob data
Assuming here that Spark was downloaded to the project directory and the data resides indata
directory on project root.