Multimodal search lets you use one type of data (in this case, text) to search another type of data (in this case, images). This example leverages core Jina technologies that make it simpler to build and run your search, including:
- DocumentArray - let's us concurrently process Documents and push/pull them between machines. Useful for creating embeddings on remote machine with GPU and then indexing and querying locally
- Jina Hub Executors, so we don't have to manually integrate deep learning models
- Jina Client, so we don't have to worry about how best to format the REST request
- PQLite allowing us to pre-filter results by season, price, rating, etc
The front-end is built in Streamlit.
We've got a live demo for you to play with.
There are multiple ways you can run this:
- Deploy on JCloud
- Run with Docker-Compose
- Run on bare metal
- Clone this repo:
git clone https://github.com/jina-ai/example-multimodal-fashion-search.git
- Download data:
python ./get_data.py
JCloud lets you run the fashion backend Jina Flow on the cloud, without having to use your own compute.
pip install jcloud
cd backend
jc login
jc deploy jcloud
After that you can use Jina Client to connect and search/index your data.
This will spin up:
- Indexer: saves embeddings and metadata to
/backend/workspace
. You can tweak how many Documents to index indocker-compose.yml
. You can also comment out thebackend-index
section indocker-compose.yml
if you've already indexed and don't want to re-index. - Searcher: searches the embeddings/metadata stored on disk
- Frontend: Streamlit frontend to make user experience easier
docker-compose up
pip install -r requirements.txt
Then, in backend
:
- Build your index:
python app.py -t index -n 1000 # index 1000 images
- Open up RESTful interface for searching/indexing:
python app.py -t serve
To open the frontend, go to the frontend
directory and run streamlit run frontend.py
- Index using the small dataset, then swap out the
data
directory for that of the hi-res dataset for nicer-looking results.
This is because you're trying to index data that's already been indexed. The database we use has a UNIQUE
constraint that means it won't index duplicate data. You can fix this by:
- Deleting
backend/workspace
(this will delete your entire index) - Commenting out the
backend-index
section fromdocker-compose.yml