The hybrid vector query, which calculates similarity scores for objects represented by two vectors using a weighted sum of their distances and employs a query-specific parameter
This project contains the code,, optimal parameters, and other detailed information used for the experiments of our paper. It is worth noting that we reimplement all algorithms based on exactly the same design pattern, programming language and tricks, and experimental setup, which makes the comparison more fair.
Our experiment involves four real-world datasets where three of them can be downloaded from the link in the paper. Note that, all base data and query data are converted to fvecs
format, and ground-truth data is converted to ivecs
format. Please refer here for the description of fvecs
and ivecs
format.
For the parameters of each algorithm on all experimental datasets, see the code.
- GCC 4.9+ with OpenMP
- CMake 2.8+
- Boost 1.55+
$ mkdir build && cd build/
$ cmake ..
$ make -j
Then, you can run the following instructions for build graph index.
cd ./build/test/
./main algorithm_name dataset_name \alpha max_distance_1 max_distance_2 build
With the index built, you can run the following commands to perform the search. Related information about the search such as search time, distance evaluation times, candidate set size, average query path length, memory load can be obtained or calculated according to the output log information.
cd ./build/test/
./main algorithm_name dataset_name \alpha max_distance_1 max_distance_2 search