Skip to content

Commit

Permalink
Frontend update
Browse files Browse the repository at this point in the history
  • Loading branch information
diicellman committed Mar 11, 2024
1 parent 6f203b3 commit 872e017
Show file tree
Hide file tree
Showing 15 changed files with 2,103 additions and 313 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,4 +162,5 @@ cython_debug/
# Local files
backend/data/chroma_db
backend/data/*.json
*.DS_store
*.DS_store
*/.streamlit
70 changes: 44 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
# FastAPI Wrapper for DSPy
# Full-Stack DSPy Application with FastAPI and Streamlit

## Introduction

This project is a [FastAPI](https://github.com/tiangolo/fastapi) wrapper designed to integrate with the [DSPy](https://github.com/stanfordnlp/dspy) framework developed by StanfordNLP, offering a straightforward example of building a FastAPI backend with DSPy capabilities. Uniquely, this implementation is fully local, utilizing [Ollama](https://github.com/ollama/ollama) for both the language and embedding models, [Chroma DB](https://github.com/chroma-core/chroma) for vector storage, and [Arize Phoenix](https://github.com/Arize-ai/phoenix) for an observability layer. This setup ensures that all operations, from querying to data storage, are performed on the local machine without the need for external cloud services, enhancing privacy and data security.
This project is a full-stack application designed to leverage natural language processing capabilities entirely locally and to integrate with the [DSPy](https://github.com/stanfordnlp/dspy) framework developed by StanfordNLP. It features a [FastAPI](https://github.com/tiangolo/fastapi) backend for processing and a [Streamlit](https://streamlit.io) frontend for interactive user interfaces. This implementation is fully local, integrating cutting-edge technologies such as [Ollama](https://github.com/ollama/ollama) for language and embedding models, [Chroma DB](https://github.com/chroma-core/chroma) for vector storage, and [Arize Phoenix](https://github.com/Arize-ai/phoenix) for observability. This setup ensures all operations, from processing to data storage, are executed on the local machine, enhancing privacy, data security, and ease of use.

## Features

- **Local Execution**: Everything runs on your local machine, ensuring data privacy and security. No external cloud services are involved.
- **Ollama Integration**: Leverages Ollama with the phi-2 language model and nomic embedding model by default. However, now with configurable LLM support, allowing users to specify the desired language model in the .env file or Docker Compose file.
- **Chroma DB for Vector Storage**: Uses Chroma DB for efficient and scalable vector storage, facilitating fast and accurate retrieval of information.
- **Arize Phoenix**: Incorporates Arize Phoenix for observability, offering real-time monitoring and analytics to track and improve model performance and system health.
- **Zero-shot-query**: Allows users to perform zero-shot queries using DSPy through a simple GET request.
- **Compiled-query**: Enables the compilation of queries for optimized execution, accessible via GET.
- **Compile-program**: Offers an interface for compiling DSPy programs through a POST request, facilitating more complex interactions with the language model.
- **Fully Local Execution**: Ensures privacy and security by running all processes on your local machine without external dependencies.
- **Ollama Integration**: Leverages the powerful Ollama for language and embedding models.
- **Chroma DB Vector Storage**: Utilizes Chroma DB for efficient, scalable vector storage, enabling quick and precise information retrieval.
- **Arize Phoenix Observability**: Integrates Arize Phoenix for real-time monitoring and analytics, aiding in performance improvement and system health tracking.
- **FastAPI Backend**: Offers robust and scalable API endpoints for interacting with the NLP models and performing various queries and compilations.
- **Streamlit Frontend**: Provides an intuitive and interactive UI for users to easily interact with the backend services, improving the overall user experience.

## Architecture

The FastAPI wrapper integrates DSPy with Ollama, Arize Phoenix and Chroma DB in a seamless manner, providing a robust backend for applications requiring advanced natural language processing and data retrieval capabilities. Here's how the components interact within our local setup:
This full-stack application combines the DSPy Framework with Ollama, Arize Phoenix, and Chroma DB in a cohesive ecosystem. Here's a brief overview of the system components:

- **DSPy Framework**: Handles the optimization of language model prompts and weights, offering a sophisticated interface for programming with language models.
- **Ollama**: Serves as the backend for both the language model and the embedding model, enabling powerful and efficient natural language understanding and generation.
- **Chroma DB**: Acts as the vector store, allowing for efficient storage and retrieval of high-dimensional data vectors, which is crucial for tasks such as semantic search and similarity matching.
- **Arize Phoenix**: Phoenix makes your DSPy applications observable by visualizing the underlying structure of each call to your compiled DSPy module.
- **DSPy Framework**: Serves as the core for language model interactions, offering advanced NLP capabilities.
- **Ollama**: Acts as the backend engine for language understanding and generation.
- **Chroma DB**: Provides efficient vector storage solutions, essential for NLP tasks like semantic search.
- **Arize Phoenix**: Enhances visibility into the application's performance and health.
- **FastAPI**: Facilitates the backend logic, handling API requests and responses.
- **Streamlit**: Creates the frontend interface, enabling users to engage with the backend services visually.

This local setup not only enhances data security and privacy but also provides developers with a flexible and powerful environment for building advanced NLP applications.

## Installation

Expand All @@ -43,6 +43,7 @@ cd dspy-rag-fastapi
```
### Getting Started with Local Development

#### Backend setup
First, navigate to the backend directory:
```bash
cd backend/
Expand All @@ -62,7 +63,6 @@ ENVIRONMENT=<your_environment_value>
INSTRUMENT_DSPY=<true or false>
COLLECTOR_ENDPOINT=<your_arize_phoenix_endpoint>
OLLAMA_BASE_URL=<your_ollama_instance_endpoint>
OLLAMA_MODEL_NAME=<your_llm_model_name>
```
Third, run this command to create embeddings of data located in data/example folder:
```bash
Expand All @@ -73,27 +73,45 @@ Then run this command to start the FastAPI server:
```bash
python main.py
```

#### Frontend setup
First, navigate to the frontend directory:
```bash
cd frontend/
```

Second, setup the environment:

```bash
poetry config virtualenvs.in-project true
poetry install
poetry shell
```
Specify your environment variables in an .env file in backend directory.
Example .env file:
```yml
FASTAPI_BACKEND_URL = <your_fastapi_address>
```

Then run this command to start the Streamlit application:
```bash
streamlit run about.py
```

### Getting Started with Docker-Compose
This project now supports Docker Compose for easier setup and deployment, including backend services and Arize Phoenix for query tracing.

1. Configure your environment variables in the .env file or modify the compose file directly.
2. Ensure that Docker is installed and running.
3. Run the command `docker-compose -f compose.yml up` to spin up services for the backend, and Phoenix.
4. Backend docs can be viewed using the [OpenAPI](http://0.0.0.0:8000/docs).
5. Traces can be viewed using the [Phoenix UI](http://0.0.0.0:6006).
5. Frontend can be viewed using [Streamlit](http://0.0.0.0:8501)
6. Traces can be viewed using the [Phoenix UI](http://0.0.0.0:6006).
7. When you're finished, run `docker compose down` to spin down the services.

## Usage

After starting the FastAPI server, you can interact with the API endpoints as follows:

| Method | Endpoint | Description | Example |
|--------|----------------------|------------------------------------|----------------------------------------------------------------------------------------------|
| GET | `/zero-shot-query` | Perform a zero-shot query. | `curl http://<your_address>:8000/api/rag/zero-shot-query?query=<your-query>` |
| GET | `/compiled-query` | Get a compiled query. | `curl http://<your_address>:8000/api/rag/compiled-query?query=<your-query>` |
| POST | `/compile-program` | Compile a DSPy program. | `curl -X POST http://<your_address>:8000/api/rag/compile-program -H "Content-Type: application/json" -d ''` |

Ensure to replace `<your-query>` and `<your-program>` with the actual query and DSPy program you wish to execute.
The FastAPI and Streamlit integration allows for seamless interaction between the user and the NLP backend. Utilize the FastAPI endpoints for NLP tasks and visualize results and interact with the system through the Streamlit frontend.


## Contributing
Expand Down
42 changes: 20 additions & 22 deletions backend/app/api/routers/rag.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
"""Endpoints."""

from fastapi import APIRouter

from app.utils.rag_modules import RAG, compile_rag, get_compiled_rag
from app.utils.models import MessageData, RAGResponse, QAList
from app.utils.rag_functions import (
get_zero_shot_query,
get_compiled_rag,
compile_rag,
get_list_ollama_models,
)

rag_router = APIRouter()

Expand All @@ -13,30 +18,23 @@ async def healthcheck():
return {"message": "Thanks for playing."}


@rag_router.get("/zero-shot-query")
async def zero_shot_query(query: str):
rag = RAG()
pred = rag(query)
@rag_router.get("/list-models")
async def list_models():
return get_list_ollama_models()

return {
"question": query,
"predicted answer": pred.answer,
"retrieved contexts (truncated)": [c[:200] + "..." for c in pred.context],
}

@rag_router.post("/zero-shot-query", response_model=RAGResponse)
async def zero_shot_query(payload: MessageData):
return get_zero_shot_query(payload=payload)

@rag_router.get("/compiled-query")
async def compiled_query(query: str):
compiled_rag = get_compiled_rag()
pred = compiled_rag(query)

return {
"question": query,
"predicted answer": pred.answer,
"retrieved contexts (truncated)": [c[:200] + "..." for c in pred.context],
}
@rag_router.post("/compiled-query", response_model=RAGResponse)
async def compiled_query(payload: MessageData):
return get_compiled_rag(payload=payload)


@rag_router.post("/compile-program")
async def compile_program():
return compile_rag()
async def compile_program(qa_list: QAList):

print(qa_list)
return compile_rag(qa_items=qa_list)
38 changes: 38 additions & 0 deletions backend/app/utils/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Pydantic models."""

from pydantic import BaseModel
from typing import List


class MessageData(BaseModel):
"""Datamodel for messages."""

query: str
# chat_history: List[dict] | None
ollama_model_name: str
temperature: float
top_p: float
max_tokens: int


class RAGResponse(BaseModel):
"""Datamodel for RAG response."""

question: str
answer: str
retrieved_contexts: List[str]


class QAItem(BaseModel):
question: str
answer: str


class QAList(BaseModel):
"""Datamodel for trainset."""

items: List[QAItem]
ollama_model_name: str
temperature: float
top_p: float
max_tokens: int
133 changes: 133 additions & 0 deletions backend/app/utils/rag_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""DSPy functions."""

import os

import dspy
import ollama
from dotenv import load_dotenv
from dspy.retrieve.chromadb_rm import ChromadbRM
from dspy.teleprompt import BootstrapFewShot

from app.utils.load import OllamaEmbeddingFunction
from app.utils.rag_modules import RAG
from app.utils.models import MessageData, RAGResponse, QAList

load_dotenv()


from typing import Dict

# Global settings
DATA_DIR = "data"
ollama_base_url = os.getenv("OLLAMA_BASE_URL", "localhost")
ollama_embedding_function = OllamaEmbeddingFunction(host=ollama_base_url)

retriever_model = ChromadbRM(
"quickstart",
f"{DATA_DIR}/chroma_db",
embedding_function=ollama_embedding_function,
k=5,
)

dspy.settings.configure(rm=retriever_model)


def get_zero_shot_query(payload: MessageData):
rag = RAG()
# Global settings
ollama_lm = dspy.OllamaLocal(
model=payload.ollama_model_name,
base_url=ollama_base_url,
temperature=payload.temperature,
top_p=payload.top_p,
max_tokens=payload.max_tokens,
)
# parsed_chat_history = ", ".join(
# [f"{chat['role']}: {chat['content']}" for chat in payload.chat_history]
# )
with dspy.context(lm=ollama_lm):
pred = rag(
question=payload.query, # chat_history=parsed_chat_history
)

return RAGResponse(
question=payload.query,
answer=pred.answer,
retrieved_contexts=[c[:200] + "..." for c in pred.context],
)


def validate_context_and_answer(example, pred, trace=None):
answer_EM = dspy.evaluate.answer_exact_match(example, pred)
answer_PM = dspy.evaluate.answer_passage_match(example, pred)
return answer_EM and answer_PM


def compile_rag(qa_items: QAList) -> Dict:
# Global settings
ollama_lm = dspy.OllamaLocal(
model=qa_items.ollama_model_name,
base_url=ollama_base_url,
temperature=qa_items.temperature,
top_p=qa_items.top_p,
max_tokens=qa_items.max_tokens,
)

trainset = [
dspy.Example(
question=item.question,
answer=item.answer,
).with_inputs("question")
for item in qa_items.items
]

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
with dspy.context(lm=ollama_lm):
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

# Saving
compiled_rag.save(f"{DATA_DIR}/compiled_rag.json")

return {"message": "Successfully compiled RAG program!"}


def get_compiled_rag(payload: MessageData):
# Loading:
rag = RAG()
rag.load(f"{DATA_DIR}/compiled_rag.json")

# Global settings
ollama_lm = dspy.OllamaLocal(
model=payload.ollama_model_name,
base_url=ollama_base_url,
temperature=payload.temperature,
top_p=payload.top_p,
max_tokens=payload.max_tokens,
)
# parsed_chat_history = ", ".join(
# [f"{chat['role']}: {chat['content']}" for chat in payload.chat_history]
# )
with dspy.context(lm=ollama_lm):
pred = rag(
question=payload.query, # chat_history=parsed_chat_history
)

return RAGResponse(
question=payload.query,
answer=pred.answer,
retrieved_contexts=[c[:200] + "..." for c in pred.context],
)


def get_list_ollama_models():
client = ollama.Client(host=ollama_base_url)

models = []
models_list = client.list()
for model in models_list["models"]:
models.append(model["name"])

return {"models": models}
Loading

0 comments on commit 872e017

Please sign in to comment.