Skip to content

liuq4360/genie

 
 

Repository files navigation

Genie

[Download] (https://bintray.com/netflixoss/maven/genie/_latestVersion) License Issues

In Active Development

This branch contains code in active development towards Genie 3.0. It is not yet ready for use. If you're looking for a version that is ready for production please see the master branch. If you want to see what we're working on see the 3.0.0 Milestone.

Introduction

Genie is a federated job execution engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing many distributed processing cluster configurations and the commands and applications which run on them.

Builds

Genie builds are run on Travis CI here.

Branch Build Coverage
Master Build Status Coverage Status
Develop Build Status Coverage Status

Docker

Docker Example

Successful builds which generate SNAPSHOT, release candidate (rc) or final artifacts also generate a docker container which is published to Docker Hub. You can use docker pull netflixoss/genie-app:{version} to test the one you want.

You can run via docker run -t --rm -p 8080:8080 netflixoss/genie-app:{version}

Demo

A demo of Genie 3 exists as part of the source code in this repository. This demo is a work in progress.

Prerequisites:

  • Java 8
  • Docker (tested against v1.12.1)
  • Docker Compose
  • Disk Space
    • Four total images currently sizing ~3.3 GB
  • Available Ports on local machine
    • 8080, 8088, 19888, 50070, 50075, 8089, 19889, 50071, 50076

Caveats

  • Since all this is running locally on one machine it can be slow, much slower than you'd expect production level systems to run
  • Networking is kind of funky within the Hadoop UI due to how DNS is working between the containers. Sometimes if you click a link in the UI and it doesn't work try swapping in localhost for the hostname instead.

Steps:

  • Clone the repository
    • git clone [email protected]:Netflix/genie.git or git clone https://github.com/Netflix/genie.git
  • Go to the root of the repo
    • cd genie
  • Start the demo
    • ./gradlew demoStart
    • The first time you run this it could take quite a while as it has to download 2 large images (Genie itself and Hadoop) and build two others (a genie-apache image for serving files and a genie-client)
    • This will use docker compose to bring up 5 containers with tags (name):
      • netflixoss/genie-app:{version} (docker_genie_1)
        • Image from official Genie build which runs Genie app server
        • Maps port 8080 for Genie UI
      • netflixoss/genie-demo-apache:{version} (docker_genie-apache_1)
        • Extension of apache image which includes files used during demo that Genie will download
      • netflixoss/genie-demo-client:{version} (docker_genie-client_1)
        • Simulates a client node for Genie which includes several python scripts to configure and run jobs on Genie
      • sequenceiq/hadoop-docker:2.7.1 (docker_genie-hadoop-prod_1 and docker_genie-hadoop-test_1)
        • Two Hadoop "clusters" one designated prod and one designated test
        • Ports Exposed (prod/test)
          • 8088/8089 Resource Manager UI
          • 19888/19889 Job History UI
          • 50070/50071 NameNode (HDFS) UI
          • 50075/50076 DataNode UI
    • Wait a while after the build says SUCCEEDED. You'll know how long once http://localhost:8080 shows the Genie UI
  • Look at the Genie UI (http://localhost:8080) and notice there are no jobs, clusters, commands or applications currently
  • Initialize the configurations for the two clusters (prod and test), three commands (hadoop, hdfs, yarn) and single application (hadoop)
    • ./gradlew demoInit
  • Review the Genie UI again and notice that now clusters, commands and applications have data in them
  • Run some jobs. Recommend running the Hadoop job first so others have something interesting to show. Available jobs include:
    • ./gradlew demoRunProdHadoopJob or ./gradlew demoRunTestHadoopJob
      • See the MR job at http://localhost:8088 or http://localhost:8089 respectively
    • ./gradlew demoRunProdHDFSJob or ./gradlew demoRunTestHDFSJob
      • Runs a dfs -ls on the input directory on HDFS and stores results in stdout
    • ./gradlew demoRunProdYarnJob or ./gradlew demoRunTestYarnJob
      • Lists all yarn applications from the resource manager into stdout
    • ./gradlew demoRunProdSparkSubmitJob or ./gradlew demoRunTestSparkSubmitJob
      • Runs the SparkPi example with input of 10. Results stored in stdout
  • For each of these jobs you can see their status, output and other information via the Genie UI
  • For how everything is configured and run you can view the scripts in genie-demo/src/main/docker/client/example
  • Once you're done trying everything out you can shut down the demo
    • ./gradlew demoStop
    • This will stop and remove all the containers from the demo. The images will remain on disk and if you run the demo again it will startup much fasters since nothing needs to be downloaded or built.

Documentation

Support

Please use the Google Group for general questions and discussion.

Issues

You can report bugs and request new features here. Pull requests are always welcome.

About

Federated Job Execution Engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 79.3%
  • Python 15.5%
  • JavaScript 2.0%
  • PLpgSQL 1.7%
  • Shell 0.7%
  • Groovy 0.4%
  • Other 0.4%