Skip to content

Simple script for processing streaming data from Redis using Apache Spark

License

Notifications You must be signed in to change notification settings

dustin-decker/LogNom

Repository files navigation

LogNom

Provides wicked fast and scalable streaming log nomming via Apache Spark. Spark is useful for simple processing and filtering distributed tasks to complex data analysis.

data -> message broker -> Apache Spark nomming -> message broker -> elasticsearch

Architecture

Minimal requirements

  • Apache Spark v1.6.1 (provided by maven) Using a local cluster for now. If you want to use an external Spark cluster (I didn't test this yet):
docker run -it --rm --volume "$(pwd)":/lognom -p 8088:8088 -p 8042:8042 -p 4040:4040 --name spark --hostname sandbox sequenceiq/spark:1.4.1 bash
  • Scala v2.11.8 (provided by maven)
  • Jedis v2.7 (provided by maven)
  • spark-redis (packaged internally)
  • Redis v3.2.1 (external)
  • Elasticsearch-spark 2.3.2 (provided by maven)
  • Elasticsearch 2.3.2 (external)
docker run --rm --name redis-logs redis

Starting it

Clean and build packages:

 mvn clean package -DskipTests 

Start it up:

 mvn exec:java -Dexec.mainClass="org.squishyspace.lognom.LogNom"

Processing data

Make your changes in ./src/main/scala/org.squishyspace/LogNom.scala

Redis/Kafka streams

Redis/Kafka data comes into Apache Spark as a data stream (DStream). A sliding window is used to batch up the stream in n seconds batches. Batch processing happens on the Spark cluster. The DStream object is an RDD (Resilient Distributed Dataset) over time. Data stream

Elasticsearch

Elasticsearch data is represented as a native RDD.

Operations

Best to read Spark's guide, but i'll sum the main points up here.

RDDs support transformations and actions. Transformations create a new data set from an existing one, changed in some way, and an action does something with the changed data. Transformations are instantaneously queued up and computation does not actually begin across the cluster until an action is performed.

About

Simple script for processing streaming data from Redis using Apache Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published