This repo contains the source code for the Spindle V2V distributed map/reduce system. Spindle is written primarily in Scala and contains a number of sbt projects.
Shared
contains source code that is shared among multiple Spindle sbt projects.
Spark
, contains the library source code for client Spark programs.
Test-Spark-Program
contains an Apache Spark program that uses the Spindle Spark library to perform distributed map/reduce using Spindle.
Vehicle
contains the source code for the Spindle software running on vehicle nodes as well as the source code for the vehicle network simulator.
The docs
directory contains the source code for the http://spindl.network website.
Currently, this system has only been tested on macOS Sierra, CentOS 7, and Raspbian 8 (Jessie). As of this writing, all components of the system can run on macOS Sierra. The Spindle vehicle node software runs on Raspbian and is compatible with Raspberry Pi 2 Model B and should be compatible with the Model 3 as well. The "cloud" Kafka cluster is known to run on CentOS and should also work fine on Ubuntu and most other mainstream linux distributions.
To develop and build software in this repository, you will need to install SBT. I (Rory) also strongly suggest using IntelliJ instead of Eclipse to do development. You can find information about importing SBT to IntelliJ.
To run the Spark programs, download Spark 2.0.1 and add the bin
and sbin
folders to your PATH environment variable.
The "cloud" server should be using Apache Kafka 0.10.2.0. You will want to take a look at the documentation. For information about starting up the cloud kafka cluster, see the quick start information.
Spindle uses the Typesafe Config library.
The configuration files are located in src/main/resources/application.conf
.
The configurations are loaded by objects declared in Configuration.scala
.
To understand how a particular program can be configured, take a look at its application.conf
.
Of particular note is when a configuration parameter is delcared twice where the second declaration looks something like: foo.bar.baz=${?BIZZ_BUZZ}
.
This syntax means you can configure the foo.bar.baz
property by setting the environment variable BIZZ_BUZZ
before starting the program.
If no environment variable is set, the default value (specified in the first of the two declarations) is used.
In this case, it is a good idea to use environment variable rather than changing the default value.
In short, say, you have 3 rpi's : pi1
, pi2
, pi3
and you want to make pi1
the cluster-head and suppose you have foo.bar.net
as the middleware running zookeeper and kafka. Essentially just follow the steps:
-
Step 0: Git clone the NSL-Spindle in some directory of the dev environment. Go to
~/NSL-Spindle/Vehicle/Vehicle-Node/src/main/resources/application.conf
file and set theroot-domain
to point to the middleware hostname. -
Configure the middleware (this should always be the first step):
- Download and setup Kafka
- In
/config/server.properties
advertised.listeners
in toPLAINTEXT://middleware_public_ip:Kafka_server_port
- Start zookeeper and kafka
-
Prepare the Jar file:
- Run
sbt Assembly
in~/Vehicle-Node/
directory to get the fat-jar in~/Vehicle-Node/target/scala-2.11/
folder of the master/ dev environment.
- Run
-
ssh
into each pi- Git Clone the repo.
- Make sure if exists / create the folder structure
~/NSL-Spindle/Vehicle/Vehicle-Node/target/scala-2.11
if does not exist. - Deploy / scp the jar from the dev environment into the above specified folder.
- Set the environment variables for
CLUSTERHEAD_BROKER
for Kafka andCLUSTERHEAD_ZK_STRING
for Zookeeper to point to the respective cluster-head/heads' kafka and zookeeper configuration in the\Vehicle-Node\src\main\resources\application.conf
file. Also setroot-domain
variable to point to middleware host. Alternatively set theMIDDLEWARE_HOSTNAME
environment variable to point to the middle-ware host. ( Make sure this is done in all the pis, or all the nodes. )- So you would do something like
export CLUSTERHEAD_BROKER=$ClusterHeadIP:9093
,export CLUSTERHEAD_ZK_STRING=$Clusterhead_IP:2182
&export MIDDLEWARE_HOSTNAME=$Middleware_IP
- So you would do something like
- Set the
advertised.listeners
in\Vehicle-Node\src\main\resources\kafka.props
toPLAINTEXT://your_public_ip:Kafka_server_port
- If the
listeners
points to localhost, then set it toPLAINTEXT://0.0.0.0:Kafka_server_port
to listen to all configured network interfaces. - Run the jar file from inside the directory
~/Vehicle-Node
-
Configure Test Spark Program
- In the
Test-Spark-Program
Main.scala
file, set theStreamConfig
to point to the middleware so configure it as:
val stream = NSLUtils.createVStream(ssc, NSLUtils.StreamConfig("middleware_public_ip:zk_port", "middleware_public_ip:kafka_port", TOPIC), new MockQueryUidGenerator) .map(foo) .reduceByKey{bar} .print()
- do
sbt run
to run from inside Test-Spark-Program
- In the
The middleware must be running kafka and zookeeper before the pi's are fired up, else the system WILL crash and Spark WILL crash.