Canopy is an open-source Retrieval Augmented Generation (RAG) framework built on top of the Pinecone vector database. Canopy enables developers to quickly and easily experiment with and build applications using Retrieval Augmented Generation (RAG). Canopy provides a configurable built-in server that allows users to effortlessly deploy a RAG-infused Chatbot web app using their own documents as a knowledge base. For advanced use cases, the canopy core library enables building your own custom retrieval-powered AI applications.
Canopy is desinged to be:
- Easy to implement: Bring your text data in Parquet or JSONL format, and Canopy will handle the rest. Canopy makes it easy to incorporate RAG into your OpenAI chat applications.
- Reliable at scale: Build fast, highly accurate GenAI applications that are production-ready and backed by Pinecone’s vector database. Seamlessly scale to billions of items with transarent, resource-based pricing.
- Open and flexible: Fully open-source, Canopy is both modular and extensible. You can configure to choose the components you need, or extend any component with your own custom implementation. Easily incorporate it into existing OpenAI applications and connect Canopy to your preferred UI.
- Interactive and iterative: Evaluate your RAG workflow with a CLI based chat tool. With a simple command in the Canopy CLI you can interactively chat with your text data and compare RAG vs. non-RAG workflows side-by-side to evaluate the augmented results before scaling to production.
Learn how Canopy implemenets the full RAG workflow to prevent hallucinations and augment you LLM (via an OpenAI endpoint) with your own text data.
Chat Flow (click to expand)
- User will promt a question to Canopy /chat/completions endpoint.
- canopy will use a language model to break down the questions into queries, sometimes, a single user ask may result in multiple knowledge queries.
- Canopy will encode and embed each query seperateley.
- Canopy will query pinecone with the embedded queries and will fetch back K results for each. Canopy will determine how many results it needs to fetch based on the token budget set by the user
- Now canopy has all the external knowledge needed to answer the original question, Canopy will perform a context building step to create an on-bugdet optimal context.
- Canopy will generate a prompt combining general task information and the system message and sent the prompt+context to the language model.
- Canopy will decode the response from the language model and will return the response in the API response (or in streaming).
Context Flow (click to expand)
- User will call /context/upsert with Documents - each document with id, text, and optinally source and metadata
- Canopy KnowledgeBase will process the documents and chunk ecah document in a structural and semantic way
- Canopy KnowledgeBase will encode each chunk using one or more embedding models
- Canopy KnowledgeBase will upsert the encoded chunks into Pinecone Index
- Canopy Core Library - Canopy has 3 API level components that are responsible for different parts of the RAG workflow:
-
ChatEngine
/chat/completions
- implements the full RAG workflow and exposes a chat interface to interact with your data. It acts as a wrapper around the Knowledge Base and Context Engine. -
ContextEngine - performs the “retrieval” part of RAG. The
ContextEngine
utilizes the underlyingKnowledgeBase
to retrieve the most relevant document chunks, then formulates a coherent textual context to be used as a prompt for the LLM. -
KnowledgeBase _
/context/{upsert, delete}
- prepares your data for the RAG workflow. It automatically chunks and transforms your text data into text embeddings before upserting them into the Pinecone vector database. It also handles Delete operations.
-
more information about the Core Library usage can be found in the Library Documentation
-
Canopy Service - a webservice that wraps the Canopy Core and exposes it as a REST API. The service is built on top of FastAPI, Uvicorn and Gunicorn and can be easily deployed in production. The service also comes with a built in Swagger UI for easy testing and documentation. After you start the server, you can access the Swagger UI at
http://host:port/docs
(default:http://localhost:8000/docs
) -
Canopy CLI - A built-in development tool that allows users to swiftly set up their own Canopy server and test its configuration.
With just three CLI commands, you can create a new Canopy service, upload your documents to it, and then interact with the Chatbot using a built-in chat application directly from the terminal. The built-in chatbot also enables comparison of RAG-infused responses against a native LLM chatbot.
- Canopy is currently only compatiable with OpenAI API endpoints for both the embedding model and the LLM. Rate limits and pricing set by OpenAI will apply.
- set up a virtual environment (optional)
python3 -m venv canopy-env
source canopy-env/bin/activate
more about virtual environments here
- install the package
pip install pinecone-canopy
- Set up the environment variables
export PINECONE_API_KEY="<PINECONE_API_KEY>"
export PINECONE_ENVIRONMENT="<PINECONE_ENVIRONMENT>"
export OPENAI_API_KEY="<OPENAI_API_KEY>"
export INDEX_NAME=<INDEX_NAME>
CLICK HERE for more information about the environment variables
Name | Description | How to get it? |
---|---|---|
PINECONE_API_KEY |
The API key for Pinecone. Used to authenticate to Pinecone services to create indexes and to insert, delete and search data | Register or log into your Pinecone account in the console. You can access your API key from the "API Keys" section in the sidebar of your dashboard |
PINECONE_ENVIRONMENT |
Determines the Pinecone service cloud environment of your index e.g west1-gcp , us-east-1-aws , etc |
You can find the Pinecone environment next to the API key in console |
OPENAI_API_KEY |
API key for OpenAI. Used to authenticate to OpenAI's services for embedding and chat API | You can find your OpenAI API key here. You might need to login or register to OpenAI services |
INDEX_NAME |
Name of the Pinecone index Canopy will underlying work with | You can choose any name as long as it follows Pinecone's restrictions |
CANOPY_CONFIG_FILE |
The path of a configuration yaml file to be used by the Canopy service. | Optional - if not provided, default configuration would be used |
- Check that installation is successful and environment is set, run:
canopy
output should be similar to this:
Canopy: Ready
Usage: canopy [OPTIONS] COMMAND [ARGS]...
# rest of the help message
In this quickstart, we will show you how to use the Canopy to build a simple question answering system using RAG (retrival augmented generation).
Canopy will create and configure a new Pinecone index on your behalf. Just run:
canopy new
And follow the CLI instructions. The index that will be created will have a prefix canopy--<INDEX_NAME>
. This will have to be done only once per index.
To learn more about Pinecone Indexes and how to manage them, please refer to the following guide: Understanding indexes
You can load data into your Canopy Index by simply using the CLI:
canopy upsert /path/to/data_directory
# or
canopy upsert /path/to/data_directory/file.parquet
# or
canopy upsert /path/to/data_directory/file.jsonl
Canopy support single or mulitple files in jsonl or praquet format. The documents should have the following schema:
+----------+--------------+--------------+---------------+
| id(str) | text(str) | source | metadata |
| | | Optional[str]| Optional[dict]|
|----------+--------------+--------------+---------------|
| "id1" | "some text" | "some source"| {"key": "val"}|
+----------+--------------+--------------+---------------+
Follow the instructions in the CLI to upload your data.
Canopy service serve as a proxy between your application and Pinecone. It will also handle the RAG part of the application. To start the service, run:
canopy start
Now, you should be prompted with the following standard Uvicorn message:
...
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
📝 NOTE:
The canopy start command will keep the terminal occupied. To proceed with the next steps, please open a new terminal window. If you want to run the service in the background, you can use the following command -
nohup canopy start &
However, this is not recommended.
Now that you have data in your index, you can chat with it using the CLI:
canopy chat
This will open a chat interface in your terminal. You can ask questions and the Canopy will try to answer them using the data you uploaded.
To compare the chat response with and without RAG use the --baseline
flag
canopy chat --baseline
This will open a similar chat interface window, but will send your question directly to the LLM without the RAG pipeline.
To stop the service, simply press CTRL+C
in the terminal where you started it.
If you have started the service in the background, you can stop it by running:
canopy stop
If you already have an application that uses the OpenAI API, you can migrate it to Canopy by simply changing the API endpoint to http://host:port/context
as follows:
import openai
openai.api_base = "http://host:port/context"
# now you can use the OpenAI API as usual
or without global state change:
import openai
openai_response = openai.Completion.create(..., api_base="http://host:port/context")
Canopy is using FastAPI as the web framework and Uvicorn as the ASGI server. It is recommended to use Gunicorn as the production server, mainly because it supports multiple worker processes and can handle multiple requests in parallel, more details can be found here.
To run the canopy service for production, please run:
gunicorn canopy_cli.app:app --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --workers <number of desired worker processes>