Skip to content

Commit

Permalink
SAMZA-303. Edit README to bring it up-to-date.
Browse files Browse the repository at this point in the history
  • Loading branch information
Martin Kleppmann committed Jun 25, 2014
1 parent 5e34ec9 commit 1a6992d
Showing 1 changed file with 12 additions and 14 deletions.
26 changes: 12 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
## What is Samza?

Apache Incubator Samza is a distributed stream processing framework. It uses <a target="_blank" href="http://kafka.apache.org">Apache Kafka</a> for messaging, and <a target="_blank" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">Apache Hadoop YARN</a> to provide fault tolerance, processor isolation, security, and resource management.
[Apache Incubator Samza](http://samza.incubator.apache.org/) is a distributed stream processing framework. It uses [Apache Kafka](http://kafka.apache.org) for messaging, and [Apache Hadoop YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) to provide fault tolerance, processor isolation, security, and resource management.

* **Simpe API:** Unlike most low-level messaging system APIs, Samza provides a very simple call-back based "process message" API that should be familiar to anyone that's used Map/Reduce.
* **Managed state:** Samza manages snapshotting and restoration of a stream processor's state. Samza will restore a stream processor's state to a snapshot consistent with the processor's last read messages when the processor is restarted.
* **Fault tolerance:** Samza will work with YARN to restart your stream processor if there is a machine or processor failure.
* **Durability:** Samza uses Kafka to guarantee that messages will be processed in the order they were written to a partition, and that no messages will ever be lost.
* **Scalability:** Samza is partitioned and distributed at every level. Kafka provides ordered, partitioned, re-playable, fault-tolerant streams. YARN provides a distributed environment for Samza containers to run in.
Samza's key features include:

* **Simple API:** Unlike most low-level messaging system APIs, Samza provides a very simple callback-based "process message" API comparable to MapReduce.
* **Managed state:** Samza manages snapshotting and restoration of a stream processor's state. When the processor is restarted, Samza restores its state to a consistent snapshot. Samza is built to handle large amounts of state (many gigabytes per partition).
* **Fault tolerance:** Whenever a machine in the cluster fails, Samza works with YARN to transparently migrate your tasks to another machine.
* **Durability:** Samza uses Kafka to guarantee that messages are processed in the order they were written to a partition, and that no messages are ever lost.
* **Scalability:** Samza is partitioned and distributed at every level. Kafka provides ordered, partitioned, replayable, fault-tolerant streams. YARN provides a distributed environment for Samza containers to run in.
* **Pluggable:** Though Samza works out of the box with Kafka and YARN, Samza provides a pluggable API that lets you run Samza with other messaging systems and execution environments.
* **Processor isolation:** Samza works with Apache YARN, which supports processor security through Hadoop's security model, and resource isolation through Linux CGroups.
* **Processor isolation:** Samza works with Apache YARN, which supports Hadoop's security model, and resource isolation through Linux CGroups.

Check out [Hello Samza](/startup/hello-samza/0.7.0) to try Samza. Read the [Background](/learn/documentation/0.7.0/introduction/background.html) page to learn more about Samza.
Check out [Hello Samza](https://samza.incubator.apache.org/startup/hello-samza/0.7.0/) to try Samza. Read the [Background](https://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/background.html) page to learn more about Samza.

### Building Samza

To build Samza from a git checkout or binary release, run:
To build Samza from a git checkout, run:

./gradlew clean build

Expand Down Expand Up @@ -59,13 +61,9 @@ To modify a job's checkpoint (assumes that the job is not currently running), gi
./gradlew samza-shell:checkpointTool -PconfigPath=file:///path/to/job/config.properties \
-PnewOffsets=file:///path/to/new/offsets.properties

#### Maven

Samza uses Kafka, which is not managed by Maven. To use Kafka as though it were a Maven artifact, Samza installs Kafka into a local repository using the `mvn install` command. You must have Maven installed to build Samza.

### Developers

To get eclipse projects, run:
To get Eclipse projects, run:

./gradlew eclipse

Expand Down

0 comments on commit 1a6992d

Please sign in to comment.