[] (https://bintray.com/netflixoss/maven/genie/_latestVersion)
This branch contains code in active development towards Genie 3.0. It is not yet ready for use. If you're looking for a version that is ready for production please see the master branch. If you want to see what we're working on see the 3.0.0 Milestone.
Genie is a federated job execution engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing many distributed processing cluster configurations and the commands and applications which run on them.
Genie builds are run on Travis CI here.
Branch | Build | Coverage |
---|---|---|
Master | ||
Develop |
Successful builds which generate SNAPSHOT, release candidate (rc) or final artifacts also generate a docker container
which is published to Docker Hub. You can use docker pull netflixoss/genie-app:{version}
to test the one you want.
You can run via docker run -t --rm -p 8080:8080 netflixoss/genie-app:{version}
A demo of Genie 3 exists as part of the source code in this repository. This demo is a work in progress.
- Java 8
- Docker (tested against v1.12.1)
- Docker Compose
- Disk Space
- Four total images currently sizing ~3.3 GB
- Available Ports on local machine
- 8080, 8088, 19888, 50070, 50075, 8089, 19889, 50071, 50076
- Since all this is running locally on one machine it can be slow, much slower than you'd expect production level systems to run
- Networking is kind of funky within the Hadoop UI due to how DNS is working between the containers. Sometimes if you click a link in the UI and it doesn't work try swapping in localhost for the hostname instead.
- Clone the repository
git clone [email protected]:Netflix/genie.git
orgit clone https://github.com/Netflix/genie.git
- Go to the root of the repo
cd genie
- Start the demo
./gradlew demoStart
- The first time you run this it could take quite a while as it has to download 2 large images (Genie itself and Hadoop) and build two others (a genie-apache image for serving files and a genie-client)
- This will use docker compose to bring up 5 containers with tags (name):
- netflixoss/genie-app:{version} (docker_genie_1)
- Image from official Genie build which runs Genie app server
- Maps port 8080 for Genie UI
- netflixoss/genie-demo-apache:{version} (docker_genie-apache_1)
- Extension of apache image which includes files used during demo that Genie will download
- netflixoss/genie-demo-client:{version} (docker_genie-client_1)
- Simulates a client node for Genie which includes several python scripts to configure and run jobs on Genie
- sequenceiq/hadoop-docker:2.7.1 (docker_genie-hadoop-prod_1 and docker_genie-hadoop-test_1)
- Two Hadoop "clusters" one designated prod and one designated test
- Ports Exposed (prod/test)
- 8088/8089 Resource Manager UI
- 19888/19889 Job History UI
- 50070/50071 NameNode (HDFS) UI
- 50075/50076 DataNode UI
- netflixoss/genie-app:{version} (docker_genie_1)
- Wait a while after the build says SUCCEEDED. You'll know how long once
http://localhost:8080
shows the Genie UI
- Look at the Genie UI (
http://localhost:8080
) and notice there are no jobs, clusters, commands or applications currently - Initialize the configurations for the two clusters (prod and test), three commands (hadoop, hdfs, yarn) and single
application (hadoop)
./gradlew demoInit
- Review the Genie UI again and notice that now clusters, commands and applications have data in them
- Run some jobs. Recommend running the Hadoop job first so others have something interesting to show.
Available jobs include:
./gradlew demoRunProdHadoopJob
or./gradlew demoRunTestHadoopJob
- See the MR job at
http://localhost:8088
orhttp://localhost:8089
respectively
- See the MR job at
./gradlew demoRunProdHDFSJob
or./gradlew demoRunTestHDFSJob
- Runs a
dfs -ls
on the input directory on HDFS and stores results in stdout
- Runs a
./gradlew demoRunProdYarnJob
or./gradlew demoRunTestYarnJob
- Lists all yarn applications from the resource manager into stdout
./gradlew demoRunProdSparkSubmitJob
or./gradlew demoRunTestSparkSubmitJob
- Runs the SparkPi example with input of 10. Results stored in stdout
- For each of these jobs you can see their status, output and other information via the Genie UI
- For how everything is configured and run you can view the scripts in
genie-demo/src/main/docker/client/example
- Once you're done trying everything out you can shut down the demo
./gradlew demoStop
- This will stop and remove all the containers from the demo. The images will remain on disk and if you run the demo again it will startup much fasters since nothing needs to be downloaded or built.
- Netflix Tech Blog Posts
- Presentations
- Netflix OSS Meetups
- 2013 Hadoop Summit
- Genie Github
- Client API Documentation
Please use the Google Group for general questions and discussion.
You can report bugs and request new features here. Pull requests are always welcome.