Frontend update

diicellman · Mar 11, 2024 · 872e017 · 872e017
1 parent 6f203b3
commit 872e017
Show file tree

Hide file tree

Showing 15 changed files with 2,103 additions and 313 deletions.
diff --git a/.gitignore b/.gitignore
@@ -162,4 +162,5 @@ cython_debug/
 # Local files
 backend/data/chroma_db
 backend/data/*.json
-*.DS_store
+*.DS_store
+*/.streamlit
diff --git a/README.md b/README.md
@@ -1,29 +1,29 @@
-# FastAPI Wrapper for DSPy
+# Full-Stack DSPy Application with FastAPI and Streamlit
 
 ## Introduction
 
-This project is a [FastAPI](https://github.com/tiangolo/fastapi) wrapper designed to integrate with the [DSPy](https://github.com/stanfordnlp/dspy) framework developed by StanfordNLP, offering a straightforward example of building a FastAPI backend with DSPy capabilities. Uniquely, this implementation is fully local, utilizing [Ollama](https://github.com/ollama/ollama) for both the language and embedding models, [Chroma DB](https://github.com/chroma-core/chroma) for vector storage, and [Arize Phoenix](https://github.com/Arize-ai/phoenix) for an observability layer. This setup ensures that all operations, from querying to data storage, are performed on the local machine without the need for external cloud services, enhancing privacy and data security.
+This project is a full-stack application designed to leverage natural language processing capabilities entirely locally and to integrate with the [DSPy](https://github.com/stanfordnlp/dspy) framework developed by StanfordNLP. It features a [FastAPI](https://github.com/tiangolo/fastapi) backend for processing and a [Streamlit](https://streamlit.io) frontend for interactive user interfaces. This implementation is fully local, integrating cutting-edge technologies such as [Ollama](https://github.com/ollama/ollama) for language and embedding models, [Chroma DB](https://github.com/chroma-core/chroma) for vector storage, and [Arize Phoenix](https://github.com/Arize-ai/phoenix) for observability. This setup ensures all operations, from processing to data storage, are executed on the local machine, enhancing privacy, data security, and ease of use.
 
 ## Features
 
-- **Local Execution**: Everything runs on your local machine, ensuring data privacy and security. No external cloud services are involved.
-- **Ollama Integration**: Leverages Ollama with the phi-2 language model and nomic embedding model by default. However, now with configurable LLM support, allowing users to specify the desired language model in the .env file or Docker Compose file.
-- **Chroma DB for Vector Storage**: Uses Chroma DB for efficient and scalable vector storage, facilitating fast and accurate retrieval of information.
-- **Arize Phoenix**: Incorporates Arize Phoenix for observability, offering real-time monitoring and analytics to track and improve model performance and system health.
-- **Zero-shot-query**: Allows users to perform zero-shot queries using DSPy through a simple GET request.
-- **Compiled-query**: Enables the compilation of queries for optimized execution, accessible via GET.
-- **Compile-program**: Offers an interface for compiling DSPy programs through a POST request, facilitating more complex interactions with the language model.
+- **Fully Local Execution**: Ensures privacy and security by running all processes on your local machine without external dependencies.
+- **Ollama Integration**: Leverages the powerful Ollama for language and embedding models.
+- **Chroma DB Vector Storage**: Utilizes Chroma DB for efficient, scalable vector storage, enabling quick and precise information retrieval.
+- **Arize Phoenix Observability**: Integrates Arize Phoenix for real-time monitoring and analytics, aiding in performance improvement and system health tracking.
+- **FastAPI Backend**: Offers robust and scalable API endpoints for interacting with the NLP models and performing various queries and compilations.
+- **Streamlit Frontend**: Provides an intuitive and interactive UI for users to easily interact with the backend services, improving the overall user experience.
 
 ## Architecture
 
-The FastAPI wrapper integrates DSPy with Ollama, Arize Phoenix and Chroma DB in a seamless manner, providing a robust backend for applications requiring advanced natural language processing and data retrieval capabilities. Here's how the components interact within our local setup:
+This full-stack application combines the DSPy Framework with Ollama, Arize Phoenix, and Chroma DB in a cohesive ecosystem. Here's a brief overview of the system components:
 
-- **DSPy Framework**: Handles the optimization of language model prompts and weights, offering a sophisticated interface for programming with language models.
-- **Ollama**: Serves as the backend for both the language model and the embedding model, enabling powerful and efficient natural language understanding and generation.
-- **Chroma DB**: Acts as the vector store, allowing for efficient storage and retrieval of high-dimensional data vectors, which is crucial for tasks such as semantic search and similarity matching.
-- **Arize Phoenix**: Phoenix makes your DSPy applications observable by visualizing the underlying structure of each call to your compiled DSPy module.
+- **DSPy Framework**: Serves as the core for language model interactions, offering advanced NLP capabilities.
+- **Ollama**: Acts as the backend engine for language understanding and generation.
+- **Chroma DB**: Provides efficient vector storage solutions, essential for NLP tasks like semantic search.
+- **Arize Phoenix**: Enhances visibility into the application's performance and health.
+- **FastAPI**: Facilitates the backend logic, handling API requests and responses.
+- **Streamlit**: Creates the frontend interface, enabling users to engage with the backend services visually.
 
-This local setup not only enhances data security and privacy but also provides developers with a flexible and powerful environment for building advanced NLP applications.
 
 ## Installation
 
@@ -43,6 +43,7 @@ cd dspy-rag-fastapi
 ```
 ### Getting Started with Local Development
 
+#### Backend setup
 First, navigate to the backend directory:
 ```bash
 cd backend/
@@ -62,7 +63,6 @@ ENVIRONMENT=<your_environment_value>
 INSTRUMENT_DSPY=<true or false>
 COLLECTOR_ENDPOINT=<your_arize_phoenix_endpoint>
 OLLAMA_BASE_URL=<your_ollama_instance_endpoint>
-OLLAMA_MODEL_NAME=<your_llm_model_name>
 ```
 Third, run this command to create embeddings of data located in data/example folder:
 ```bash
@@ -73,27 +73,45 @@ Then run this command to start the FastAPI server:
 ```bash
 python main.py
 ```
+
+#### Frontend setup
+First, navigate to the frontend directory:
+```bash
+cd frontend/
+```
+
+Second, setup the environment:
+
+```bash
+poetry config virtualenvs.in-project true
+poetry install
+poetry shell
+```
+Specify your environment variables in an .env file in backend directory.
+Example .env file:
+```yml
+FASTAPI_BACKEND_URL = <your_fastapi_address>
+```
+
+Then run this command to start the Streamlit application:
+```bash
+streamlit run about.py
+```
+
 ### Getting Started with Docker-Compose
 This project now supports Docker Compose for easier setup and deployment, including backend services and Arize Phoenix for query tracing. 
 
 1. Configure your environment variables in the .env file or modify the compose file directly.
 2. Ensure that Docker is installed and running.
 3. Run the command `docker-compose -f compose.yml up` to spin up services for the backend, and Phoenix.
 4. Backend docs can be viewed using the [OpenAPI](http://0.0.0.0:8000/docs).
-5. Traces can be viewed using the [Phoenix UI](http://0.0.0.0:6006).
+5. Frontend can be viewed using [Streamlit](http://0.0.0.0:8501)
+6. Traces can be viewed using the [Phoenix UI](http://0.0.0.0:6006).
 7. When you're finished, run `docker compose down` to spin down the services.
 
 ## Usage
 
-After starting the FastAPI server, you can interact with the API endpoints as follows:
-
-| Method | Endpoint             | Description                        | Example                                                                                      |
-|--------|----------------------|------------------------------------|----------------------------------------------------------------------------------------------|
-| GET    | `/zero-shot-query`   | Perform a zero-shot query.         | `curl http://<your_address>:8000/api/rag/zero-shot-query?query=<your-query>`                                   |
-| GET    | `/compiled-query`    | Get a compiled query.              | `curl http://<your_address>:8000/api/rag/compiled-query?query=<your-query>`                                    |
-| POST   | `/compile-program`   | Compile a DSPy program.            | `curl -X POST http://<your_address>:8000/api/rag/compile-program -H "Content-Type: application/json" -d ''` |
-
-Ensure to replace `<your-query>` and `<your-program>` with the actual query and DSPy program you wish to execute.
+The FastAPI and Streamlit integration allows for seamless interaction between the user and the NLP backend. Utilize the FastAPI endpoints for NLP tasks and visualize results and interact with the system through the Streamlit frontend.
 
 
 ## Contributing

diff --git a/backend/app/api/routers/rag.py b/backend/app/api/routers/rag.py
@@ -1,8 +1,13 @@
 """Endpoints."""
 
 from fastapi import APIRouter
-
-from app.utils.rag_modules import RAG, compile_rag, get_compiled_rag
+from app.utils.models import MessageData, RAGResponse, QAList
+from app.utils.rag_functions import (
+    get_zero_shot_query,
+    get_compiled_rag,
+    compile_rag,
+    get_list_ollama_models,
+)
 
 rag_router = APIRouter()
 
@@ -13,30 +18,23 @@ async def healthcheck():
     return {"message": "Thanks for playing."}
 
 
-@rag_router.get("/zero-shot-query")
-async def zero_shot_query(query: str):
-    rag = RAG()
-    pred = rag(query)
+@rag_router.get("/list-models")
+async def list_models():
+    return get_list_ollama_models()
 
-    return {
-        "question": query,
-        "predicted answer": pred.answer,
-        "retrieved contexts (truncated)": [c[:200] + "..." for c in pred.context],
-    }
 
+@rag_router.post("/zero-shot-query", response_model=RAGResponse)
+async def zero_shot_query(payload: MessageData):
+    return get_zero_shot_query(payload=payload)
 
-@rag_router.get("/compiled-query")
-async def compiled_query(query: str):
-    compiled_rag = get_compiled_rag()
-    pred = compiled_rag(query)
 
-    return {
-        "question": query,
-        "predicted answer": pred.answer,
-        "retrieved contexts (truncated)": [c[:200] + "..." for c in pred.context],
-    }
+@rag_router.post("/compiled-query", response_model=RAGResponse)
+async def compiled_query(payload: MessageData):
+    return get_compiled_rag(payload=payload)
 
 
 @rag_router.post("/compile-program")
-async def compile_program():
-    return compile_rag()
+async def compile_program(qa_list: QAList):
+
+    print(qa_list)
+    return compile_rag(qa_items=qa_list)
diff --git a/backend/app/utils/models.py b/backend/app/utils/models.py
@@ -0,0 +1,38 @@
+"""Pydantic models."""
+
+from pydantic import BaseModel
+from typing import List
+
+
+class MessageData(BaseModel):
+    """Datamodel for messages."""
+
+    query: str
+    # chat_history: List[dict] | None
+    ollama_model_name: str
+    temperature: float
+    top_p: float
+    max_tokens: int
+
+
+class RAGResponse(BaseModel):
+    """Datamodel for RAG response."""
+
+    question: str
+    answer: str
+    retrieved_contexts: List[str]
+
+
+class QAItem(BaseModel):
+    question: str
+    answer: str
+
+
+class QAList(BaseModel):
+    """Datamodel for trainset."""
+
+    items: List[QAItem]
+    ollama_model_name: str
+    temperature: float
+    top_p: float
+    max_tokens: int
diff --git a/backend/app/utils/rag_functions.py b/backend/app/utils/rag_functions.py
@@ -0,0 +1,133 @@
+"""DSPy functions."""
+
+import os
+
+import dspy
+import ollama
+from dotenv import load_dotenv
+from dspy.retrieve.chromadb_rm import ChromadbRM
+from dspy.teleprompt import BootstrapFewShot
+
+from app.utils.load import OllamaEmbeddingFunction
+from app.utils.rag_modules import RAG
+from app.utils.models import MessageData, RAGResponse, QAList
+
+load_dotenv()
+
+
+from typing import Dict
+
+# Global settings
+DATA_DIR = "data"
+ollama_base_url = os.getenv("OLLAMA_BASE_URL", "localhost")
+ollama_embedding_function = OllamaEmbeddingFunction(host=ollama_base_url)
+
+retriever_model = ChromadbRM(
+    "quickstart",
+    f"{DATA_DIR}/chroma_db",
+    embedding_function=ollama_embedding_function,
+    k=5,
+)
+
+dspy.settings.configure(rm=retriever_model)
+
+
+def get_zero_shot_query(payload: MessageData):
+    rag = RAG()
+    # Global settings
+    ollama_lm = dspy.OllamaLocal(
+        model=payload.ollama_model_name,
+        base_url=ollama_base_url,
+        temperature=payload.temperature,
+        top_p=payload.top_p,
+        max_tokens=payload.max_tokens,
+    )
+    # parsed_chat_history = ", ".join(
+    #     [f"{chat['role']}: {chat['content']}" for chat in payload.chat_history]
+    # )
+    with dspy.context(lm=ollama_lm):
+        pred = rag(
+            question=payload.query,  # chat_history=parsed_chat_history
+        )
+
+    return RAGResponse(
+        question=payload.query,
+        answer=pred.answer,
+        retrieved_contexts=[c[:200] + "..." for c in pred.context],
+    )
+
+
+def validate_context_and_answer(example, pred, trace=None):
+    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
+    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
+    return answer_EM and answer_PM
+
+
+def compile_rag(qa_items: QAList) -> Dict:
+    # Global settings
+    ollama_lm = dspy.OllamaLocal(
+        model=qa_items.ollama_model_name,
+        base_url=ollama_base_url,
+        temperature=qa_items.temperature,
+        top_p=qa_items.top_p,
+        max_tokens=qa_items.max_tokens,
+    )
+
+    trainset = [
+        dspy.Example(
+            question=item.question,
+            answer=item.answer,
+        ).with_inputs("question")
+        for item in qa_items.items
+    ]
+
+    # Set up a basic teleprompter, which will compile our RAG program.
+    teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
+
+    # Compile!
+    with dspy.context(lm=ollama_lm):
+        compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
+
+    # Saving
+    compiled_rag.save(f"{DATA_DIR}/compiled_rag.json")
+
+    return {"message": "Successfully compiled RAG program!"}
+
+
+def get_compiled_rag(payload: MessageData):
+    # Loading:
+    rag = RAG()
+    rag.load(f"{DATA_DIR}/compiled_rag.json")
+
+    # Global settings
+    ollama_lm = dspy.OllamaLocal(
+        model=payload.ollama_model_name,
+        base_url=ollama_base_url,
+        temperature=payload.temperature,
+        top_p=payload.top_p,
+        max_tokens=payload.max_tokens,
+    )
+    # parsed_chat_history = ", ".join(
+    #     [f"{chat['role']}: {chat['content']}" for chat in payload.chat_history]
+    # )
+    with dspy.context(lm=ollama_lm):
+        pred = rag(
+            question=payload.query,  # chat_history=parsed_chat_history
+        )
+
+    return RAGResponse(
+        question=payload.query,
+        answer=pred.answer,
+        retrieved_contexts=[c[:200] + "..." for c in pred.context],
+    )
+
+
+def get_list_ollama_models():
+    client = ollama.Client(host=ollama_base_url)
+
+    models = []
+    models_list = client.list()
+    for model in models_list["models"]:
+        models.append(model["name"])
+
+    return {"models": models}