Use Python and AI to index your memes by their content and text, making them easily retrievable for your meme warfare pleasures.
A table of contents for the remainder of this README:
- Introduction
- Pipeline overview
- Installation instructions (standard version)
- Changelog
- Feature requests and contributing
- Running tests
This repository contains code, a walkthrough notebook (meme_search_walkthrough.ipynb
), and streamlit demo app for indexing, searching, and easily retrieving your memes based on semantic search of their content and text.
All processing - from image-to-text extraction, to vector embedding, to search - is performed locally.
This meme search pipeline is built using the following open source components:
- moondream: a tiny, kickass vision language model used for image captioning / extracting image text
- all-MiniLM-L6-v2: a very popular text embedding model
- faiss: a fast and efficient vector db
- sqlite: the greatest database of all time, used for data indexing
- streamlit: for serving up the app
To create a handy tool for your own memes pull the repo and install the requirements file
pip install -r requirements.txt
Note that the particular pinned requirements here are necessary to avoid a current nasty segmentation fault involving sentence-transformers
as of 6/5/2024.
Alternatively you can install all the requirements you need using docker via the compose file found in the repo. The command to install the above requirements and start the server using docker-compose is
docker compose up
After indexing your memes you can then start the streamlit app, allowing you to semantically search for and retrieve your memes
python -m streamlit run meme_search/app.py
To start the app via docker-compose use
docker compose up
Note: you can drag and drop any recovered meme directly from the streamlit app to any messager app of your choice.
Place any images / memes you would like indexed for the search app in this repo's subdirectory
data/input/
You can clear out the default test images in this location first, or leave them.
Next, click the "refresh index" button to update your index when images are added or removed from the image directory, affecting only the newly added or removed images.
Alternatively - at your terminal - paste the following command
python meme_search/utilities/create.py
or if running the server via docker us
docker exec meme_search python meme_search/utilities/create.py
You will see printouts at the terminal indicating success of the 3 main stages for making your memes searchable. These steps are
-
extract: get text descriptions of each image, including ocr of any text on the image, using the kickass tiny vision-llm moondream
-
embed: window and embed each image's text description using a popular embedding model - sentence-transformers/all-MiniLM-L6-v2
-
index: index the embeddings in an open source and local vector base faiss database and references connecting the embeddings to their images in the greatest little db of all time - sqlite
Meme Search is under active development! See the CHANGELOG.md
in this repo for a record of the most recent changes.
Feature requests and contributions are welcome!
See the discussion section of this repository for suggested enhancements to contribute to / weight in on!
Please see CONTRIBUTING.md
for some boilerplate ground rules for contributing.
Tests can be run by first installing the test requirements as
pip install -r requirements.test
Then the test suite can be run as
python -m pytest tests/