This project demonstrates how to build a semantic search system using Weaviate and Sentence Transformers to retrieve contextually relevant content from web pages. It processes raw HTML from any given URL, cleans and chunks the content, generates embeddings, and stores them in Weaviate for fast and efficient semantic querying.
The FastAPI backend handles:
- 🔗 Fetching HTML content from a given URL
- 🧹 Cleaning and chunking the text content
- 🧬 Generating and storing sentence embeddings using Sentence Transformers
- 📦 Storing the embeddings and metadata in Weaviate
- 🔍 Performing semantic search on the stored embeddings
The React-based frontend allows users to:
- Enter a webpage URL
- Input a semantic search query
- View the top-k relevant results retrieved from the backend
Make sure you have the following installed:
- Python 3.8+
- FastAPI
- Weaviate (local or Docker)
sentence-transformers
nltk
requests
- React 18+
Install Python dependencies via:
pip install -r requirements.txt
- Start Weaviate Locally
In the project root directory, adocker-compose.yml
file is included to run Weaviate locally.
docker-compose up -d
- Run the Backend (FastAPI)
Navigate into the backend directory and start the FastAPI server:
cd backend
uvicorn main:app --reload
- Run the Frontend (React)
Navigate into the frontend directory, install dependencies, and start the development server:
cd frontend
npm install
npm run dev