Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.
If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.
Bolt also has theoretical guarantees bounding the errors in its approximations.
EDIT: this repo now also features the source code for MADDNESS, our shiny new algorithm for approximate matrix multiplication. MADDNESS has no Python wrapper yet, and is referred to as "mithral" in the source code. Name changed because apparently I'm the only who gets Lord of the Rings references. MADDNESS runs ridiculously fast and, under reasonable assumptions, requires zero multiply-adds. Realistically, it'll be most useful for speeding up neural net inference on CPUs, but it'll take another couple papers to get it there; we need to generalize it to convolution and write the CUDA kernels to allow GPU training. See also the poster and slides.
EDIT2: Looking for a research project? See our list of ideas.
EDIT3: See Build.md for a working dockerfile that builds and runs Bolt, contributed by @mneilly.
sudo docker build . -t $USER/extension-script:latest
sudo docker run -v /home/$USER/:/home/$USER --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -dit $USER/extension-script:latest
# -v to share local files
# --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined to allow debuging
# On mac use /Users/$USER instead of /home/$USER
#With VScode, running as notebook
Locally inside VScode Docker tab right click and "Attach Visual Studio Code"
Inside container open from the local folder, bolt (NOT devcontainer/code workspace etc)
If you built remotely you'll want to add that docker context to local machine
Now you can run the "build_bazel_mithral_dbg" task; the config is in .vscode/launch (can make a shortcut)
And connect C++ code to python code debugger with the debug config "Mithral Debug Attach gdb to py proc"; config in .vscode/launch.json
#As script
sudo docker exec -it $CONTAINER_ID bash
cd cpp
bazel build :mithral_wrapped --strip=never #--strip=never for debuging, current build options are for prod, line 21
#Install cifar datasets
cd ../experiments
python graph_mithral.py
An example notebook showing speed up muliplications on the Cifar dataset is experiments/graph_mithral.py
. See experiments/README.md
for installing datasets.
The C++ code does not learned Hyperparameters, (split index, split value, centroids, etc). The Python code does. Using Pybind11 we copy over parameters, then encode both X and Q inside C++, do the multiplication, and write to a memory view Pybind11 can access. Mithral keeps the reconstructed ordering better than magnitudes; the Cifar numbers are based on using Mithral for only the last layer, it is 10-15x faster with answers changed 4% of the time for Cifar-10 with 2 codebooks, or 17-20x faster and 0.2% wrong on training data.
The main C++ files: cpp/src/quantize: mithral.hhp, mithral.cpp, profile_amm.hpp #Mithral implementation cpp/mithral_wrapped.cpp #binding code
Current binding uses floats for scales and offsets, lower precision may work better. mithral.hhp: mithral_input_type_traits.output_type; //uint8 or unint16 both work
mithral_scan_test_zipped can run by itself and is a simpler/unoptimized version of mithral_scan:1243.
NOTE: All below code refers to the Python wrapper for Bolt and has nothing to do with MADDNESS. It also seems to be no longer building for many people. If you want to use MADDNESS, see the Python Implementation driven by amm_main.py or C++ implementation. All code is ugly, but Python code should be pretty easy to add new AMM methods/variations to.
$ brew install swig # for wrapping C++; use apt-get, yum, etc, if not OS X
$ pip install numpy # bolt installation needs numpy already present
$ git clone https://github.com/dblalock/bolt.git
$ cd bolt && python setup.py install
$ pytest tests/ # optionally, run the tests
If you run into any problems, please don't hesitate to mention it in the Python build problems issue.
Install Bazel, Google's open-source build system. Then
$ git clone https://github.com/dblalock/bolt.git
$ cd bolt/cpp && bazel run :main
The bazel run
command will build the project and run the tests and benchmarks.
If you want to integrate Bolt with another C++ project, include cpp/src/include/public.hpp
and add the remaining files under cpp/src
to your builds. You should let me know if you're interested in doing such an integration because I'm hoping to see Bolt become part of many libraries and thus would be happy to help you.
Bolt currently only supports machines with AVX2 instructions, which basically means x86 machines from fall 2013 or later. Contributions for ARM support are welcome. Also note that the Bolt Python wrapper is currently configured to require Clang, since GCC apparently runs into issues.
Bolt is based on vector quantization. For details, see the Bolt paper or slides.
Bolt includes a thorough set of speed and accuracy benchmarks. See the experiments/
directory. This is also what you want if you want to reproduce the results in the paper.
Note that all of the timing results use the raw C++ implementation. At present, the Python wrapper is slightly slower due to Python overhead. If you're interested in having a full-speed wrapper, let me know and I'll allocate time to making this happen.
X, queries = some N x D array, some iterable of length D arrays
# these are approximately equal (though the latter are shifted and scaled)
enc = bolt.Encoder(reduction='dot').fit(X)
[np.dot(X, q) for q in queries]
[enc.transform(q) for q in queries]
# same for these
enc = bolt.Encoder(reduction='l2').fit(X)
[np.sum((X - q) * (X - q), axis=1) for q in queries]
[enc.transform(q) for q in queries]
# but enc.transform() is 10x faster or more
import bolt
import numpy as np
from scipy.stats import pearsonr as corr
from sklearn.datasets import load_digits
import timeit
# for simplicity, use the sklearn digits dataset; we'll split
# it into a matrix X and a set of queries Q
X, _ = load_digits(return_X_y=True)
nqueries = 20
X, Q = X[:-nqueries], X[-nqueries:]
enc = bolt.Encoder(reduction='dot', accuracy='lowest') # can tweak acc vs speed
enc.fit(X)
dot_corrs = np.empty(nqueries)
for i, q in enumerate(Q):
dots_true = np.dot(X, q)
dots_bolt = enc.transform(q)
dot_corrs[i] = corr(dots_true, dots_bolt)[0]
# dot products closely preserved despite compression
print("dot product correlation: {} +/- {}".format(
np.mean(dot_corrs), np.std(dot_corrs))) # > .97
# massive space savings
print(X.nbytes) # 1777 rows * 64 cols * 8B = 909KB
print(enc.nbytes) # 1777 * 2B = 3.55KB
# massive time savings (~10x here, but often >100x on larger
# datasets with less Python overhead; see the paper)
t_np = timeit.Timer(
lambda: [np.dot(X, q) for q in Q]).timeit(5) # ~9ms
t_bolt = timeit.Timer(
lambda: [enc.transform(q) for q in Q]).timeit(5) # ~800us
print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format(
t_np * 1000, t_bolt * 1000)
# can get output without offset/scaling if needed
dots_bolt = [enc.transform(q, unquantize=True) for q in Q]
# search using squared Euclidean distances
# (still using the Digits dataset from above)
enc = bolt.Encoder('l2', accuracy='high').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q] # knn for each query
# search using dot product (maximum inner product search)
enc = bolt.Encoder('dot', accuracy='medium').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q] # knn for each query
Bolt stands for "Based On Lookup Tables". Feel free to use this exciting fact at parties.