We're going to create a blogging site then add vector search to it through django-vectordb
.
Create a new Django project named tutorial
, then start a new app called blog
.
# Create a virtual environment to isolate our package dependencies locally
python3 -m venv env
source env/bin/activate # On Windows use `env\Scripts\activate`
# Install Django and Django VectorDB into the virtual environment
pip install django
pip install django-vectordb[standard] # include optional dependencies
# Set up a new project with a single application
django-admin startproject tutorial
cd tutorial
django-admin startapp blog
The project layout should look like:
tutorial
├── blog
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── migrations
│ │ └── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py
├── manage.py
└── tutorial
├── __init__.py
├── asgi.py
├── settings.py
├── urls.py
└── wsgi.py
Now sync your database for the first time:
python manage.py makemigrations # create migrations for blog
python manage.py migrate
We'll also create an initial user named admin
with a password. We'll authenticate as that user later in our example.
python manage.py createsuperuser --username admin --email [email protected]
Now lets add blog
app and vectordb
to the INSTALLED_APPS
in tutorial/settings.py
# settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
"blog", # add blog to here
"vectordb", # add vectordb to here. The order is not important though
]
Next, let's add a POST
model to our blog app by modifying blog/models.py
, which will enable us to save our posts. Additionally, we will include the get_vectordb_text
method to specify the text we want to search by in vectordb
. We will also implement the get_vectordb_metadata
method to incorporate specific fields we wish to filter by. If we don't create our custom get_vectordb_metadata
, vectordb
will serialize all the fields into metadata, which could be suboptimal as we might not be interested in some of the fields added. Therefore, it is recommended to implement our own get_vectordb_metadata
that contains only the fields we want to filter by. These methods will allow vectordb
to perform additional functions for us.
# blog/models.py
from django.db import models
from django.contrib.auth import get_user_model
User = get_user_model()
class Post(models.Model):
title = models.CharField(max_length=100)
description = models.TextField()
user = models.ForeignKey(User, on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True)
def get_vectordb_text(self):
return self.title + " " + self.description
def get_vectordb_metadata(self):
return {
# so we can filter by user id
"user_id": self.user.id,
# so we can filter by username
"username": self.user.username,
# so we can filter by created_at_year
"created_at_year": self.created_at.year,
# so we can filter by created_at_month
"created_at_month": self.created_at.month,
}
def __str__(self):
return self.title
Our model has three fields, the title
, description
, created_at
, and a user
foreign key. In the metadata we include the user_id
, username
, created_at_year
, and created_at_month
. We will use these fields to filter our results later.
Now lets make migrations and make tables for the blog
app and the vectordb
app.
python manage.py makemigrations
python manage.py migrate
Run the development server now
./manage.py runserver
Lets add the Post model to the admins panel. Edit the blog/admin.py
# blog/admin.py
from django.contrib import admin
from .models import Post
admin.site.register(Post)
Now we can visit http://127.0.0.1:8000/admin/blog/post/ and add a few Posts. Go ahead and add something
First, let's synchronize all posts with the vector database. To do this, simply run the following command:
python manage.py vectordb_sync <app_name> <model_name>
For this example we run:
python manage.py vectordb_sync blog Post
Lastly, let's make sure that the Post model remains synchronized with the vector database, so that any changes, such as creating, deleting, or updating posts, are automatically registered by vectordb
. To do this, we'll create a signals.py
file in the blog directory and input the following code:
from vectordb.shortcuts import autosync_model_to_vectordb
from .models import Post
autosync_model_to_vectordb(Post)
The above code is equivalent to doing the following manually:
# blog/signals.py
from django.db.models.signals import post_save, post_delete
from vectordb.sync_signals import (
sync_vectordb_on_create_update,
sync_vectordb_on_delete,
)
from .models import Post
post_save.connect(
sync_vectordb_on_create_update,
sender=Post,
dispatch_uid="update_vector_index_super_unique_id",
)
post_delete.connect(
sync_vectordb_on_delete,
sender=Post,
dispatch_uid="delete_vector_index_super_unique_id",
)
Then import the signals in your apps.py
# blog/apps.py
from django.apps import AppConfig
class BlogConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "blog"
# add ready method if not defined
def ready(self):
# import signals
import blog.signals
Now we can run the development server again
./manage.py shell
from blog.models import Post
from django.contrib.auth import get_user_model
User = get_user_model()
user = User.objects.first()
post = Post(title="A Culinary Journey", description="A journey through France", user=user)
post.save()
Now run the app
./manage.py runserver
Now, when you visit the Django Vector Database Admin Panel at http://127.0.0.1:8000/admin/vectordb/vector/, you'll notice that the new post has been automatically added to the vector database. Similarly, if you try to delete a post, it will be automatically removed from the vector database as well. All these features come built-in with vectordb
, ensuring a seamless and efficient experience.
To perform a search, simply invoke the vectordb.search()
method:
from vectordb import vectordb
results = vectordb.search("A Culinary Journey", k=10) # k represents the maximum number of results desired.
# get the search time in seconds
print(results.search_time) # only available if search is the last method called
Note that the search method returns a QuerySet with results ordered by the best match. The QuerySet will also have the search_time in seconds which only available when search is the last method called on the QuerySet. Each result item will contain the following fields: id
, content_object
, object_id
, content_type
, text
, embedding
, an annotated distance
, and a vector
property that returns the np.ndarray
representation of the embedding field, which is in bytes
. As the search provides a QuerySet
, you can selectively display the fields you want like this:
from vectordb import vectordb
results = vectordb.search("A Culinary Journey", k=10).only('text', 'content_object')
If k
is not specified, the default value is 10.
Search doesn't only work for text
you can also search for model instances:
post1 = Post.objects.get(id=1)
# Limit the search scope to a user with an id of 1
results = vectordb.search(post1, k=10)
This is also a way to get related posts to post1
. Thus, you can use vectordb
for recommendations as well.
Note: Seaching by model instances will automatically scope the results to instances of that type. For example, if you search by post1
you will only get results that are instances of Post
.
You can apply filters on text
or metadata
using the full capabilities of Django QuerySet filtering:
# Limit the search scope to a user with an id of 1
results = vectordb.filter(metadata__user_id=1).search("A Culinary Journey", k=10)
# Scope the results to text which contains France, belonging to user with id 1 and created in 2023
vectordb.filter(text__icontains="France", metadata__user_id=1, metadata__create_at_year=2023).search("A Culinary Journey", k=10)
We can also use model instances instead of text:
post1 = Post.objects.get(id=1)
# Limit the search scope to a user with an id of 1
results = vectordb.filter(metadata__user_id=1).search(post1, k=10)
# Scope the results to text which contains France, belonging to user with id 1 and created in 2023
vectordb.filter(text__icontains="France", metadata__user_id=1, metadata__create_at_year=2023).search(post1, k=10)
For more information on filtering, refer to the Django documentation on querying the JSONField
.
VectorDB also provides a way to manually add items to it. This is useful if you want to add items to the database that are not in the database yet.
VectorDB provides two utility methods for adding items to the database: vectordb.add_instance
or vectordb.add_text
. Note that for adding the instance, you need to provide the get_vectordb_text
and an optional get_vectordb_metadata
methods.
post1 = models.create(title="post1", description="post1 description", user=user1) # provide valid user
# add to vector database
vectordb.add_instance(post1)
To add text to the database, you can use vectordb.add_text()
:
vectordb.add_text(text="Hello text", id=3, metadata={"user_id": 1})
The text
and id
are required. Additionally, the id
must be unique, or an error will occur. metadata
can be None
or any valid JSON.
You can customize vectordb
by providing your settings in the settings.py
file of your project. The following settings are available:
# settings.py
DJANGO_VECTOR_DB = {
"DEFAULT_EMBEDDING_CLASS": ..., # Default: "vectordb.embedding_functions.SentenceTransformerEncoder",
"DEFAULT_EMBEDDING_MODEL": ..., # Default: "all-MiniLM-L6-v2",
# Can be "cosine" or "l2"
"DEFAULT_EMBEDDING_SPACE": ..., # Default "l2"
"DEFAULT_EMBEDDING_DIMENSION": ..., # Default is 384 for "all-MiniLM-L6-v2"
"DEFAULT_MAX_N_RESULTS": 10, # Number of results to return from search maximum is default is 10
"DEFAULT_MIN_SCORE": 0.0, # Minimum score (distance) to return from search default is 0.0
"DEFAULT_MAX_BRUTEFORCE_N": 10_000, # Maximum number of items to search using brute force default is 10_000. If the number of items is greater than this number, the search will be done using the HNSW index.
}
Great! That was a quickstart to django-vectordb
. We've created a blogging site and added extremely fast vector search to it. If you want to get a more in depth understanding of vectordb
head on over to the tutorial.