Welcome! vSwarm is a collection of ready-to-run serverless benchmarks, each typically consisting of a number of interconnected serverless functions, and with a general focus on realistic data-intensive workloads.
This suite is a turnkey and fully tested solution meant to used in conjunction with vHive, and is compatible with all technologies that it supports, namely, containers, Firecracker and gVisor microVMs. The majority of benchmarks support distributed tracing with Zipkin which traces both the infra components and the user functions.
benchmarks
contains all of the available benchmark source code and manifests.utils
contains utilities for use within serverless functions, e.g. the tracing module.tools
is for command-line tools and services useful outside of serverless functions, such as deployment or invokation.runner
is for setting up self-hosted GitHub Actions runners.docs
contains additional documentation on a number of relevant topics.
- 2 microbenchmarks for benchmarking chained functions performance, data transfer performance in various patterns (pipeline, scatter, gather), and using different communication medium (AWS S3 and inline transfers)
- 8 real-world benchmarks
- MapReduce: Corral (golang), and an aws-reference python implementation of Aggregation Query from the representative AMPLab Big Data Benchmark 1node dataset.
- Real-time video analytics (Python and Golang): recognizes objects in a video fragment
- ML models training: stacking ensemble training and iterative hyperparameter tuning
- ExCamera video decoding (gg): decoding of a video in parallel
- distributed compilation (gg): compiles LLVM in parallel
- fibonacci (gg): classic recursive implementation to find
n
th number in the sequence by calculatingn-1
andn-2
in parallel
Refer to this document for more detail on the differences and supported features of each benchmark.
Details on each specific benchmark can be found in their relative subfolders. Every benchmark can
be run on a knative cluster, and most can also be run locally with docker-compose
. Please see the
running benchmarks document for detailed instructions on how to
run a benchmark locally or on a cluster.
We have a detailed outline on the benchmarking methodology used, which you can find here.
We openly welcome any contributions, so please get in touch if you're interested!
Bringing up a benchmark typically consists of dockerizing the benchmark functions to deploy and test them with docker-compose, then integrating the functions with knative, and including the benchmark in the CI/CD pipeline. Please refer to our documentation on bringing up new benchmarks for more guidance.
We also have some basic requirements for contributions the the repository, which are described in detail in our Contributing to vHive document.
vSwarm is free. We publish the code under the terms of the MIT License that allows distribution, modification, and commercial use. This software, however, comes without any warranty or liability.
The software is maintained at the EASE lab in the University of Edinburgh, Stanford Systems and Networking Research, and the vSwarm open-source community.
- Invoker, timeseriesdb, runners - Dmitrii Ustiugov: GitHub, twitter, web page
- ML benchmarks and utils (tracing and storage modules) - Michal Baczun GitHub
- ML benchmarks - Rustem Feyzkhanov GitHub
- Video Analytics and Map-Reduce benchmarks - Shyam Jesalpura GitHub
- GG benchmarks - Francisco Romero GitHub and Clemente Farias GitHub