searchengine is a similarity search engine built around Faiss library. It allows you to find the most similar vectors (a common mathematical and simplified representation of an image or a sound) of a query vector among hundreds of millions, or billions if you have enough RAM.
Many computer scientists are now able to use machine learning frameworks to classify images without even having to understand how it works. It is very easy to obtain a probability of belonging to a class. In a production environment, regularly adding images or classes causes a bottleneck since you must constantly re-learn your model. One solution is to use a non-evolving model that simply produces an N-dimensional vector for each input image. These vectors can then be compressed, indexed and/or searched by similarity with an external and incremental system.
searchengine is similarity search engine (Think Elasticsearch, but for vectors, not text documents). searchengine is in fact just a wrapper around the Faiss library. It aims at providing what is missing in such a scientific library:
-
Data persitency: reliable storage of raw and compressed vectors (Faiss stores the data in RAM, and can persist to disk on demand, but the writing is not incremental).
-
Web services: searchengine provides a gRPC server, and some rest API endpoints for management/monitoring.
-
Packaging in a ready-to-run Docker image.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
-
MongoDB 4.0+ (for storing databases and indices informations, and raw features)
-
Redis 5.0+ (for storing Faiss indices and encoded vectors, inverted lists, etc.)
-
cmake 3.7+
-
openblas
-
tacopie 3.2.0: https://github.com/Cylix/tacopie
-
cpp_redis 4.3.1: https://github.com/cpp-redis/cpp_redis
-
faiss 1.5.3: https://github.com/facebookresearch/faiss
-
pybind11 2.2.4: https://github.com/pybind/pybind11
-
fmtlib 5.3.0: https://github.com/fmtlib/fmt
docker-compose up -d
A default configuration file is provided inside the the Docker image. There are 4 complementary ways to configure searchengine:
-
One or more configuration files (yaml|tomljson|ini|xml) provided by the
--config-file /my/config/file/path
command line arg. -
One or more configuration directories provided with the
--config-path /my/config/directory
command line arg. Example: providing the path/run/secrets
, searchengine will read every files in all subdirectories of/run/secrets
and associate a key (the subpath) to a value (file content). If there is a file/run/secrets/db/redis/password
containing the textnotagoodpassword
, the configuration will be:db.redis.password=notagoodpassword
-
Every environment variables starting with
searchengine__
will be parsed. For instance,searchengine__DB__REDIS__HOST=my-redis-host
is equivalent todb.redis.host=my-redis-host
. -
Every additional command lines provided with the arg
--additional-config
or-C
, such as-C db.redis.host=my-redis-host
.