Simple Vector DB is a lightweight, efficient, and easy-to-use vector database designed to store, retrieve, and manage high-dimensional vectors. It supports operations such as insertion, update, deletion, and comparison of vectors using cosine similarity, Euclidean distance, and dot product. Additionally, it allows for finding the nearest vector based on KD-tree median points.
- Efficient Vector Storage: Stores high-dimensional vectors with dynamic allocation.
- Vector Operations: Supports insertion, retrieval, update, and deletion of vectors.
- Comparison Metrics: Compare vectors using cosine similarity, Euclidean distance, and dot product.
- Nearest Vector Search: Find the nearest vector based on KD-tree median points for efficient indexing and improved performance.
- RESTful API: Simple and intuitive API endpoints for easy integration.
- Persistent Storage: Save and load vector databases from disk.
Simple Vector DB depends on libmicrohttpd
and cJSON
. Here are the instructions to install these dependencies on different operating systems.
- libmicrohttpd:
brew install libmicrohttpd
- cJSON:
brew install cjson
For Debian-based distributions (e.g., Ubuntu):
- libmicrohttpd:
sudo apt-get update sudo apt-get install libmicrohttpd-dev
- cJSON:
sudo apt-get install libcjson-dev
For Red Hat-based distributions (e.g., CentOS, Fedora):
- libmicrohttpd:
sudo dnf install libmicrohttpd-devel
- cJSON:
sudo dnf install cjson-devel
For Windows, you can use a package manager like vcpkg to install the dependencies.
-
Install vcpkg:
git clone https://github.com/microsoft/vcpkg.git cd vcpkg ./bootstrap-vcpkg.sh
-
libmicrohttpd:
./vcpkg install libmicrohttpd
-
cJSON:
./vcpkg install cjson
After installing the libraries, you need to find the paths to libmicrohttpd.h
and cjson/cjson.h
.
-
macOS and Linux:
-
Typically, headers are located in
/usr/include
,/usr/local/include
, or the installation prefix of the package manager (e.g.,/opt/homebrew/include
for Homebrew on macOS). -
Libraries are usually in
/usr/lib
,/usr/local/lib
, or the package manager prefix (e.g.,/opt/homebrew/lib
). -
Use the
find
command to locate the header files:find /usr -name "libmicrohttpd.h" find /usr -name "cjson.h"
-
-
Windows:
- vcpkg installs libraries in the
vcpkg/installed
directory. You can find headers and libraries invcpkg/installed/x64-windows/include
andvcpkg/installed/x64-windows/lib
.
- vcpkg installs libraries in the
Update your Makefile to include the correct paths for the headers and libraries. Below are the two lines that need to be updated:
CFLAGS = -Wall -I/opt/homebrew/include -I./include
LDFLAGS = -L/opt/homebrew/lib -lmicrohttpd -lcjson
Replace /opt/homebrew/include
and /opt/homebrew/lib
with the appropriate paths for your system.
You can start the server on the default port (8888) or specify a custom port using the -p
flag. Additionally, you can specify other parameters such as the database filename, kd-tree dimension, and vector size using the corresponding flags. Alternatively, you can use a configuration file with the -c
flag.
# Start the server with default settings
./executable/vector_db_server
# Start the server on a custom port (e.g., 8080)
./executable/vector_db_server -p 8080
# Start the server with a custom database filename
./executable/vector_db_server -f custom_database.db
# Start the server with a custom kd-tree dimension
./executable/vector_db_server -d 5
# Start the server with a custom vector size
./executable/vector_db_server -s 256
# Start the server with a configuration file
./executable/vector_db_server -c config.json
# Combine multiple custom settings
./executable/vector_db_server -p 8080 -f custom_database.db -d 5 -s 256 -c config.json
Save this as config.json
:
{
"DB_FILENAME": "vector_database.db",
"DEFAULT_PORT": 8888,
"DEFAULT_KD_TREE_DIMENSION": 3,
"DB_VECTOR_SIZE": 128
}
DB_FILENAME
: The name of the database file (e.g.,vector_database.db
).DEFAULT_PORT
: The port number on which the server will run (e.g.,8888
).DEFAULT_KD_TREE_DIMENSION
: The default dimension for the kd-tree (e.g.,3
).DB_VECTOR_SIZE
: The size of the database vectors (e.g.,128
).
You can fill the database with different vectors of different dimensions. Randomly generated.
# Change execution of the file
chmod +x ./test/add_vectors.sh
./test/add_vectors.sh
- Endpoint:
/vector
- Method:
POST
- Request Body: JSON array of float64 values.
curl -X POST -H "Content-Type: application/json" -d '{"uuid": "123e4567-e89b-12d3-a456-426614174000", "vector": [1.23, 4.56, 7.89, 0.12, 3.45]}' http://localhost:8888/vector
UUID is considered as the bridge (shared key for a chunk) between your application database and the simple vector database.
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
- Endpoint:
/vector
- Method:
GET
- Query Parameter:
index
(the index of the vector to retrieve). - Query Parameter:
uuid
(the uuid of the vector to retrieve).
curl "http://localhost:8888/vector?index=0"
curl "http://localhost:8888/vector?uuid=0"
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
- Endpoint:
/vector
- Method:
PUT
- Query Parameter:
index
(the index of the vector to update). - Request Body: JSON array of float64 values.
curl -X PUT -H "Content-Type: application/json" -d '[1.5, 2.5, 3.5, 4.5]' "http://localhost:8888/vector?index=0"
- Endpoint:
/vector
- Method:
DELETE
- Query Parameter:
index
(the index of the vector to delete).
curl -X DELETE "http://localhost:8888/vector?index=0"
- Endpoint:
/compare/cosine_similarity
- Method:
GET
- Query Parameters:
index1
andindex2
(the indices of the vectors to compare).
curl "http://localhost:8888/compare/cosine_similarity?index1=0&index2=1"
- Endpoint:
/compare/euclidean_distance
- Method:
GET
- Query Parameters:
index1
andindex2
(the indices of the vectors to compare).
curl "http://localhost:8888/compare/euclidean_distance?index1=0&index2=1"
- Endpoint:
/compare/dot_product
- Method:
GET
- Query Parameters:
index1
andindex2
(the indices of the vectors to compare).
curl "http://localhost:8888/compare/dot_product?index1=0&index2=1"
- Endpoint:
/nearest
- Method:
POST
- Content-Type:
application/json
- Request Body: JSON array representing the input vector.
- Optional query parameter:
number=(int)
The number of nearest vectors to return - default is 1.
The /nearest
endpoint uses a KD-tree for indexing, which allows for more efficient nearest neighbor searches. All vectors in the database must have the same dimension. During vector insertion, a point is added to the KD-tree, and during vector updates, the KD-tree is modified to reflect the changes.
curl -X POST -H "Content-Type: application/json" -d '[7,3.00003,6.32,4.5,8,5,1.842,4.929066,7.94764,6.16051,6.946,4.71,4.3,1.704,2.321,5.9,6.74227,7.365,5.31,4.1705]' "http://localhost:8888/nearest"
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
This response indicates that the nearest vector is at index 2, and it includes the vector and its median point.
To build and run Simple Vector DB, execute the following commands:
# Build the project
make
# Start the server
./executable/vector_db_server
To specify a custom port:
./executable/vector_db_server -p 8080
We welcome contributions to Simple Vector DB! Please fork the repository, create a new branch for your feature or bugfix, and submit a pull request.
- Fork the repository.
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some new feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.