Skip to content

Commit

Permalink
Merge remote-tracking branch 'apache-github/master' into remove-binaries
Browse files Browse the repository at this point in the history
Conflicts:
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	docs/python-programming-guide.md
  • Loading branch information
pwendell committed Jan 4, 2014
2 parents 9e6f3bd + c4d6145 commit 604fad9
Show file tree
Hide file tree
Showing 98 changed files with 436 additions and 1,453 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
*~
*.swp
*.ipr
*.iml
*.iws
.idea/
.settings
.cache
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@ which can be obtained [here](http://www.scala-sbt.org). To build Spark and its e

Once you've built Spark, the easiest way to start using it is the shell:

./spark-shell
./bin/spark-shell

Or, for the Python API, the Python shell (`./pyspark`).
Or, for the Python API, the Python shell (`./bin/pyspark`).

Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./run-example <class> <params>`. For example:
To run one of them, use `./bin/run-example <class> <params>`. For example:

./run-example org.apache.spark.examples.SparkLR local[2]
./bin/run-example org.apache.spark.examples.SparkLR local[2]

will run the Logistic Regression example locally on 2 CPUs.

Expand Down
12 changes: 11 additions & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,17 @@

<profiles>
<profile>
<id>hadoop2-yarn</id>
<id>yarn-alpha</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn-alpha_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>yarn</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
11 changes: 4 additions & 7 deletions assembly/src/main/assembly/assembly.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,20 @@
</fileSet>
<fileSet>
<directory>
${project.parent.basedir}/bin/
${project.parent.basedir}/sbin/
</directory>
<outputDirectory>/bin</outputDirectory>
<outputDirectory>/sbin</outputDirectory>
<includes>
<include>**/*</include>
</includes>
</fileSet>
<fileSet>
<directory>
${project.parent.basedir}
${project.parent.basedir}/bin/
</directory>
<outputDirectory>/bin</outputDirectory>
<includes>
<include>run-example*</include>
<include>spark-class*</include>
<include>spark-shell*</include>
<include>spark-executor*</include>
<include>**/*</include>
</includes>
</fileSet>
</fileSets>
Expand Down
2 changes: 1 addition & 1 deletion bin/compute-classpath.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ rem Load environment variables from conf\spark-env.cmd, if it exists
if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"

rem Build up classpath
set CLASSPATH=%SPARK_CLASSPATH%;%FWDIR%conf
set CLASSPATH=%FWDIR%conf
if exist "%FWDIR%RELEASE" (
for %%d in ("%FWDIR%jars\spark-assembly*.jar") do (
set ASSEMBLY_JAR=%%d
Expand Down
2 changes: 1 addition & 1 deletion bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ SCALA_VERSION=2.10
FWDIR="$(cd `dirname $0`/..; pwd)"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e $FWDIR/conf/spark-env.sh ] ; then
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi

Expand Down
4 changes: 2 additions & 2 deletions pyspark → bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#

# Figure out where the Scala framework is installed
FWDIR="$(cd `dirname $0`; pwd)"
FWDIR="$(cd `dirname $0`/..; pwd)"

# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"
Expand All @@ -37,7 +37,7 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
fi

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e $FWDIR/conf/spark-env.sh ] ; then
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi

Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion pyspark2.cmd → bin/pyspark2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ rem
set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
set FWDIR=%~dp0
set FWDIR=%~dp0..\

rem Export this as SPARK_HOME
set SPARK_HOME=%FWDIR%
Expand Down
4 changes: 2 additions & 2 deletions run-example → bin/run-example
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ esac
SCALA_VERSION=2.10

# Figure out where the Scala framework is installed
FWDIR="$(cd `dirname $0`; pwd)"
FWDIR="$(cd `dirname $0`/..; pwd)"

# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e $FWDIR/conf/spark-env.sh ] ; then
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi

Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions run-example2.cmd → bin/run-example2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ rem
set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
set FWDIR=%~dp0
set FWDIR=%~dp0..\

rem Export this as SPARK_HOME
set SPARK_HOME=%FWDIR%
Expand Down Expand Up @@ -49,7 +49,7 @@ if "x%SPARK_EXAMPLES_JAR%"=="x" (

rem Compute Spark classpath using external script
set DONT_PRINT_CLASSPATH=1
call "%FWDIR%bin\compute-classpath.cmd"
call "%FWDIR%sbin\compute-classpath.cmd"
set DONT_PRINT_CLASSPATH=0
set CLASSPATH=%SPARK_EXAMPLES_JAR%;%CLASSPATH%

Expand Down
6 changes: 3 additions & 3 deletions spark-class → bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ esac
SCALA_VERSION=2.10

# Figure out where the Scala framework is installed
FWDIR="$(cd `dirname $0`; pwd)"
FWDIR="$(cd `dirname $0`/..; pwd)"

# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e $FWDIR/conf/spark-env.sh ] ; then
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi

Expand Down Expand Up @@ -92,7 +92,7 @@ JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
# Load extra JAVA_OPTS from conf/java-opts, if it exists
if [ -e $FWDIR/conf/java-opts ] ; then
if [ -e "$FWDIR/conf/java-opts" ] ; then
JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
fi
export JAVA_OPTS
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions spark-class2.cmd → bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ rem
set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
set FWDIR=%~dp0
set FWDIR=%~dp0..\

rem Export this as SPARK_HOME
set SPARK_HOME=%FWDIR%
Expand Down Expand Up @@ -73,7 +73,7 @@ for %%d in ("%TOOLS_DIR%\target\scala-%SCALA_VERSION%\spark-tools*assembly*.jar"

rem Compute classpath using external script
set DONT_PRINT_CLASSPATH=1
call "%FWDIR%bin\compute-classpath.cmd"
call "%FWDIR%sbin\compute-classpath.cmd"
set DONT_PRINT_CLASSPATH=0
set CLASSPATH=%CLASSPATH%;%SPARK_TOOLS_JAR%

Expand Down
6 changes: 3 additions & 3 deletions spark-shell → bin/spark-shell
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ esac
# Enter posix mode for bash
set -o posix

FWDIR="`dirname $0`"
FWDIR="$(cd `dirname $0`/..; pwd)"

for o in "$@"; do
if [ "$1" = "-c" -o "$1" = "--cores" ]; then
Expand Down Expand Up @@ -90,10 +90,10 @@ if $cygwin; then
# "Backspace sends ^H" setting in "Keys" section of the Mintty options
# (see https://github.com/sbt/sbt/issues/562).
stty -icanon min 1 -echo > /dev/null 2>&1
$FWDIR/spark-class -Djline.terminal=unix $OPTIONS org.apache.spark.repl.Main "$@"
$FWDIR/bin/spark-class -Djline.terminal=unix $OPTIONS org.apache.spark.repl.Main "$@"
stty icanon echo > /dev/null 2>&1
else
$FWDIR/spark-class $OPTIONS org.apache.spark.repl.Main "$@"
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$@"
fi

# record the exit status lest it be overwritten:
Expand Down
5 changes: 3 additions & 2 deletions spark-shell.cmd → bin/spark-shell.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

set FWDIR=%~dp0
rem Find the path of sbin
set SBIN=%~dp0..\sbin\

cmd /V /E /C %FWDIR%spark-class2.cmd org.apache.spark.repl.Main %*
cmd /V /E /C %SBIN%spark-class2.cmd org.apache.spark.repl.Main %*
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ private[spark] class CoarseMesosSchedulerBackend(
CoarseGrainedSchedulerBackend.ACTOR_NAME)
val uri = conf.get("spark.executor.uri", null)
if (uri == null) {
val runScript = new File(sparkHome, "spark-class").getCanonicalPath
val runScript = new File(sparkHome, "./bin/spark-class").getCanonicalPath
command.setValue(
"\"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %d".format(
runScript, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))
Expand All @@ -136,7 +136,7 @@ private[spark] class CoarseMesosSchedulerBackend(
// glob the directory "correctly".
val basename = uri.split('/').last.split('.').head
command.setValue(
"cd %s*; ./spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %d"
"cd %s*; ./bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %d"
.format(basename, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))
command.addUris(CommandInfo.URI.newBuilder().setValue(uri))
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,12 @@ private[spark] class MesosSchedulerBackend(
.setEnvironment(environment)
val uri = sc.conf.get("spark.executor.uri", null)
if (uri == null) {
command.setValue(new File(sparkHome, "spark-executor").getCanonicalPath)
command.setValue(new File(sparkHome, "/sbin/spark-executor").getCanonicalPath)
} else {
// Grab everything to the first '.'. We'll use that and '*' to
// glob the directory "correctly".
val basename = uri.split('/').last.split('.').head
command.setValue("cd %s*; ./spark-executor".format(basename))
command.setValue("cd %s*; ./sbin/spark-executor".format(basename))
command.addUris(CommandInfo.URI.newBuilder().setValue(uri))
}
val memory = Resource.newBuilder()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import org.apache.spark.scheduler.SchedulingMode
/**
* Continuously generates jobs that expose various features of the WebUI (internal testing tool).
*
* Usage: ./run spark.ui.UIWorkloadGenerator [master]
* Usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR]
*/
private[spark] object UIWorkloadGenerator {

Expand All @@ -36,7 +36,7 @@ private[spark] object UIWorkloadGenerator {

def main(args: Array[String]) {
if (args.length < 2) {
println("usage: ./spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR]")
println("usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR]")
System.exit(1)
}

Expand Down
2 changes: 1 addition & 1 deletion core/src/test/scala/org/apache/spark/DriverSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class DriverSuite extends FunSuite with Timeouts {
forAll(masters) { (master: String) =>
failAfter(60 seconds) {
Utils.executeAndGetOutput(
Seq("./spark-class", "org.apache.spark.DriverWithoutCleanup", master),
Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
new File(sparkHome),
Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions docs/bagel-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ trait Message[K] {

# Where to Go from Here

Two example jobs, PageRank and shortest path, are included in `examples/src/main/scala/org/apache/spark/examples/bagel`. You can run them by passing the class name to the `run-example` script included in Spark; e.g.:
Two example jobs, PageRank and shortest path, are included in `examples/src/main/scala/org/apache/spark/examples/bagel`. You can run them by passing the class name to the `bin/run-example` script included in Spark; e.g.:

./run-example org.apache.spark.examples.bagel.WikipediaPageRank
./bin/run-example org.apache.spark.examples.bagel.WikipediaPageRank

Each example program prints usage help when run without any arguments.
14 changes: 5 additions & 9 deletions docs/building-with-maven.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,20 +37,16 @@ For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop versions wit
# Cloudera CDH 4.2.0 with MapReduce v1
$ mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package

For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you should enable the "hadoop2-yarn" profile and set the "yarn.version" property:
For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions with YARN, you should enable the "yarn-alpha" or "yarn" profile and set the "hadoop.version", "yarn.version" property:

# Apache Hadoop 2.0.5-alpha
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.5-alpha -Dyarn.version=2.0.5-alpha -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -Dyarn.version=2.0.5-alpha -DskipTests clean package

# Cloudera CDH 4.2.0 with MapReduce v2
$ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 -Dyarn.version=2.0.0-chd4.2.0 -DskipTests clean package
$ mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -Dyarn.version=2.0.0-chd4.2.0 -DskipTests clean package

Hadoop versions 2.2.x and newer can be built by setting the ```new-yarn``` and the ```yarn.version``` as follows:

# Apache Hadoop 2.2.X and newer
$ mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn

The build process handles Hadoop 2.2.x as a special case that uses the directory ```new-yarn```, which supports the new YARN API. Furthermore, for this version, the build depends on artifacts published by the spark-project to enable Akka 2.0.5 to work with protobuf 2.5.
# Apache Hadoop 2.2.X ( e.g. 2.2.0 as below ) and newer
$ mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package

## Spark Tests in Maven ##

Expand Down
10 changes: 5 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,18 @@ For its Scala API, Spark {{site.SPARK_VERSION}} depends on Scala {{site.SCALA_VE
# Running the Examples and Shell

Spark comes with several sample programs in the `examples` directory.
To run one of the samples, use `./run-example <class> <params>` in the top-level Spark directory
(the `run-example` script sets up the appropriate paths and launches that program).
For example, try `./run-example org.apache.spark.examples.SparkPi local`.
To run one of the samples, use `./bin/run-example <class> <params>` in the top-level Spark directory
(the `bin/run-example` script sets up the appropriate paths and launches that program).
For example, try `./bin/run-example org.apache.spark.examples.SparkPi local`.
Each example prints usage help when run with no parameters.

Note that all of the sample programs take a `<master>` parameter specifying the cluster URL
to connect to. This can be a [URL for a distributed cluster](scala-programming-guide.html#master-urls),
or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using
`local` for testing.

Finally, you can run Spark interactively through modified versions of the Scala shell (`./spark-shell`) or
Python interpreter (`./pyspark`). These are a great way to learn the framework.
Finally, you can run Spark interactively through modified versions of the Scala shell (`./bin/spark-shell`) or
Python interpreter (`./bin/pyspark`). These are a great way to learn the framework.

# Launching on a Cluster

Expand Down
4 changes: 2 additions & 2 deletions docs/java-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,9 @@ We hope to generate documentation with Java-style syntax in the future.

Spark includes several sample programs using the Java API in
[`examples/src/main/java`](https://github.com/apache/incubator-spark/tree/master/examples/src/main/java/org/apache/spark/examples). You can run them by passing the class name to the
`run-example` script included in Spark; for example:
`bin/run-example` script included in Spark; for example:

./run-example org.apache.spark.examples.JavaWordCount
./bin/run-example org.apache.spark.examples.JavaWordCount

Each example program prints usage help when run
without any arguments.
2 changes: 1 addition & 1 deletion docs/mllib-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ svmAlg.optimizer.setNumIterations(200)
val modelL1 = svmAlg.run(parsedData)
{% endhighlight %}

Both of the code snippets above can be executed in `spark-shell` to generate a
Both of the code snippets above can be executed in `bin/spark-shell` to generate a
classifier for the provided dataset.

Available algorithms for binary classification:
Expand Down
Loading

0 comments on commit 604fad9

Please sign in to comment.