Skip to content

Commit

Permalink
spark-shell -> bin/spark-shell
Browse files Browse the repository at this point in the history
  • Loading branch information
ScrapCodes committed Jan 2, 2014
1 parent 980afd2 commit b810a85
Show file tree
Hide file tree
Showing 9 changed files with 15 additions and 15 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ which is packaged with it. To build Spark and its example programs, run:

Once you've built Spark, the easiest way to start using it is the shell:

./spark-shell
./bin/spark-shell

Or, for the Python API, the Python shell (`./pyspark`).

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ to connect to. This can be a [URL for a distributed cluster](scala-programming-g
or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using
`local` for testing.

Finally, you can run Spark interactively through modified versions of the Scala shell (`./spark-shell`) or
Finally, you can run Spark interactively through modified versions of the Scala shell (`./bin/spark-shell`) or
Python interpreter (`./pyspark`). These are a great way to learn the framework.

# Launching on a Cluster
Expand Down
2 changes: 1 addition & 1 deletion docs/mllib-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ svmAlg.optimizer.setNumIterations(200)
val modelL1 = svmAlg.run(parsedData)
{% endhighlight %}

Both of the code snippets above can be executed in `spark-shell` to generate a
Both of the code snippets above can be executed in `bin/spark-shell` to generate a
classifier for the provided dataset.

Available algorithms for binary classification:
Expand Down
4 changes: 2 additions & 2 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ $ sbt/sbt assembly
## Basics

Spark's interactive shell provides a simple way to learn the API, as well as a powerful tool to analyze datasets interactively.
Start the shell by running `./spark-shell` in the Spark directory.
Start the shell by running `./bin/spark-shell` in the Spark directory.

Spark's primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. Let's make a new RDD from the text of the README file in the Spark source directory:

Expand Down Expand Up @@ -99,7 +99,7 @@ scala> linesWithSpark.count()
res9: Long = 15
{% endhighlight %}

It may seem silly to use Spark to explore and cache a 30-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes. You can also do this interactively by connecting `spark-shell` to a cluster, as described in the [programming guide](scala-programming-guide.html#initializing-spark).
It may seem silly to use Spark to explore and cache a 30-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes. You can also do this interactively by connecting `bin/spark-shell` to a cluster, as described in the [programming guide](scala-programming-guide.html#initializing-spark).

# A Standalone App in Scala
Now say we wanted to write a standalone application using the Spark API. We will walk through a simple application in both Scala (with SBT), Java (with Maven), and Python. If you are using other build systems, consider using the Spark assembly JAR described in the developer guide.
Expand Down
2 changes: 1 addition & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ For example:

SPARK_JAR=./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \
SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \
MASTER=yarn-client ./spark-shell
MASTER=yarn-client ./bin/spark-shell

# Building Spark for Hadoop/YARN 2.2.x

Expand Down
10 changes: 5 additions & 5 deletions docs/scala-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ At a high level, every Spark application consists of a *driver program* that run

A second abstraction in Spark is *shared variables* that can be used in parallel operations. By default, when Spark runs a function in parallel as a set of tasks on different nodes, it ships a copy of each variable used in the function to each task. Sometimes, a variable needs to be shared across tasks, or between tasks and the driver program. Spark supports two types of shared variables: *broadcast variables*, which can be used to cache a value in memory on all nodes, and *accumulators*, which are variables that are only "added" to, such as counters and sums.

This guide shows each of these features and walks through some samples. It assumes some familiarity with Scala, especially with the syntax for [closures](http://www.scala-lang.org/node/133). Note that you can also run Spark interactively using the `spark-shell` script. We highly recommend doing that to follow along!
This guide shows each of these features and walks through some samples. It assumes some familiarity with Scala, especially with the syntax for [closures](http://www.scala-lang.org/node/133). Note that you can also run Spark interactively using the `bin/spark-shell` script. We highly recommend doing that to follow along!

# Linking with Spark

Expand Down Expand Up @@ -54,16 +54,16 @@ object for more advanced configuration.

The `master` parameter is a string specifying a [Spark or Mesos cluster URL](#master-urls) to connect to, or a special "local" string to run in local mode, as described below. `appName` is a name for your application, which will be shown in the cluster web UI. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.

In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `spark-shell` on four cores, use
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `bin/spark-shell` on four cores, use

{% highlight bash %}
$ MASTER=local[4] ./spark-shell
$ MASTER=local[4] ./bin/spark-shell
{% endhighlight %}

Or, to also add `code.jar` to its classpath, use:

{% highlight bash %}
$ MASTER=local[4] ADD_JARS=code.jar ./spark-shell
$ MASTER=local[4] ADD_JARS=code.jar ./bin/spark-shell
{% endhighlight %}

### Master URLs
Expand Down Expand Up @@ -95,7 +95,7 @@ If you want to run your application on a cluster, you will need to specify the t
* `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them).
* `jars`: A list of JAR files on the local machine containing your application's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your application into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies.

If you run `spark-shell` on a cluster, you can add JARs to it by specifying the `ADD_JARS` environment variable before you launch it. This variable should contain a comma-separated list of JARs. For example, `ADD_JARS=a.jar,b.jar ./spark-shell` will launch a shell with `a.jar` and `b.jar` on its classpath. In addition, any new classes you define in the shell will automatically be distributed.
If you run `bin/spark-shell` on a cluster, you can add JARs to it by specifying the `ADD_JARS` environment variable before you launch it. This variable should contain a comma-separated list of JARs. For example, `ADD_JARS=a.jar,b.jar ./bin/spark-shell` will launch a shell with `a.jar` and `b.jar` on its classpath. In addition, any new classes you define in the shell will automatically be distributed.

# Resilient Distributed Datasets (RDDs)

Expand Down
2 changes: 1 addition & 1 deletion docs/spark-debugger.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ where `path/to/event-log` is where you want the event log to go relative to `$SP

### Loading the event log into the debugger

1. Run a Spark shell with `MASTER=<i>host</i> ./spark-shell`.
1. Run a Spark shell with `MASTER=<i>host</i> ./bin/spark-shell`.
2. Use `EventLogReader` to load the event log as follows:
{% highlight scala %}
spark> val r = new spark.EventLogReader(sc, Some("path/to/event-log"))
Expand Down
4 changes: 2 additions & 2 deletions docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,9 @@ constructor](scala-programming-guide.html#initializing-spark).

To run an interactive Spark shell against the cluster, run the following command:

MASTER=spark://IP:PORT ./spark-shell
MASTER=spark://IP:PORT ./bin/spark-shell

Note that if you are running spark-shell from one of the spark cluster machines, the `spark-shell` script will
Note that if you are running spark-shell from one of the spark cluster machines, the `bin/spark-shell` script will
automatically set MASTER from the `SPARK_MASTER_IP` and `SPARK_MASTER_PORT` variables in `conf/spark-env.sh`.

You can also pass an option `-c <numCores>` to control the number of cores that spark-shell uses on the cluster.
Expand Down
2 changes: 1 addition & 1 deletion make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
# 2) cd to deploy dir; ./bin/start-master.sh
# 3) Verify master is up by visiting web page, ie http://master-ip:8080. Note the spark:// URL.
# 4) ./bin/start-slave.sh 1 <<spark:// URL>>
# 5) MASTER="spark://my-master-ip:7077" ./spark-shell
# 5) MASTER="spark://my-master-ip:7077" ./bin/spark-shell
#

# Figure out where the Spark framework is installed
Expand Down

0 comments on commit b810a85

Please sign in to comment.