A na(t)ive proof-of-concept implementation of Apache Spark in C++.
Compiled & tested under gcc-9.2.1, boost-1.71, cmake-3.15.
Inspired by rust Spark implementation native_spark and based on Spark-0.5.
Check examples
Check bin/prepare.sh
- Boost
- Cap'n Proto
- fmt
- gperftools
-
- tcmalloc
-
- google-gprof
- concurrentqueue
- phmap
# install
./bin/prepare.sh # root
./bin/check.sh # check installation version
# env
export SPARK_LOCAL_IP=<local ip>
export CPUPROFILE=<profile file> # if google-gprof enabled
export CPUPROFILESIGNAL=<sig> # if google-gprof enabled
# master
./bin/start_master.sh
# slave
./bin/start_slave.sh
- More precise concept control
- Async network support (-fcoroutines, boost::asio::io_service::async_accept), replacing raw socket + thread_pool
- Compare single boost::serialization without Cap'n Proto (& with boost flags, like no_headers)
- Add config (master/slave addr/port) file support
- new version of Spark optimizations: ShuffleWriter
- See other TODOs in files
Check miscs/discussion.md and miscs/report.pdf