Skip to content

A comprehensive guide to building RAG-based LLM applications for production.

License

Notifications You must be signed in to change notification settings

ray-project/llm-applications

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2044d3f Β· Aug 2, 2024

History

88 Commits
Jan 19, 2024
Nov 10, 2023
Aug 2, 2024
Jan 4, 2024
Sep 1, 2023
Apr 23, 2024
Aug 1, 2024
Aug 22, 2023
Apr 2, 2024
Sep 13, 2023
Oct 21, 2023
Apr 23, 2024
Sep 10, 2023
Oct 21, 2023
Apr 2, 2024
Aug 1, 2024
Jan 26, 2024

Repository files navigation

LLM Applications

A comprehensive guide to building RAG-based LLM applications for production.

In this guide, we will learn how to:

  • πŸ’» Develop a retrieval augmented generation (RAG) based LLM application from scratch.
  • πŸš€ Scale the major components (load, chunk, embed, index, serve, etc.) in our application.
  • βœ… Evaluate different configurations of our application to optimize for both per-component (ex. retrieval_score) and overall performance (quality_score).
  • πŸ”€ Implement LLM hybrid routing approach to bridge the gap b/w OSS and closed LLMs.
  • πŸ“¦ Serve the application in a highly scalable and available manner.
  • πŸ’₯ Share the 1st order and 2nd order impacts LLM applications have had on our products.

Setup

API keys

We'll be using OpenAI to access ChatGPT models like gpt-3.5-turbo, gpt-4, etc. and Anyscale Endpoints to access OSS LLMs like Llama-2-70b. Be sure to create your accounts for both and have your credentials ready.

Compute

Local You could run this on your local laptop but a we highly recommend using a setup with access to GPUs. You can set this up on your own or on [Anyscale](http://anyscale.com/).
Anyscale
  • Start a new Anyscale workspace on staging using an g3.8xlarge head node, which has 2 GPUs and 32 CPUs. We can also add GPU worker nodes to run the workloads faster. If you're not on Anyscale, you can configure a similar instance on your cloud.
  • Use the default_cluster_env_2.6.2_py39 cluster environment.
  • Use the us-west-2 if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).

Repository

git clone https://github.com/ray-project/llm-applications.git .
git config --global user.name <GITHUB-USERNAME>
git config --global user.email <EMAIL-ADDRESS>

Data

Our data is already ready at /efs/shared_storage/goku/docs.ray.io/en/master/ (on Staging, us-east-1) but if you wanted to load it yourself, run this bash command (change /desired/output/directory, but make sure it's on the shared storage, so that it's accessible to the workers)

git clone https://github.com/ray-project/llm-applications.git .

Environment

Then set up the environment correctly by specifying the values in your .env file, and installing the dependencies:

pip install --user -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$PWD
pre-commit install
pre-commit autoupdate

Credentials

touch .env
# Add environment variables to .env
OPENAI_API_BASE="https://api.openai.com/v1"
OPENAI_API_KEY=""  # https://platform.openai.com/account/api-keys
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_KEY=""  # https://app.endpoints.anyscale.com/credentials
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
source .env

Now we're ready to go through the rag.ipynb interactive notebook to develop and serve our LLM application!

Learn more

  • If your team is investing heavily in developing LLM applications, reach out to us to learn more about how Ray and Anyscale can help you scale and productionize everything.
  • Start serving (+fine-tuning) OSS LLMs with Anyscale Endpoints ($1/M tokens for Llama-3-70b) and private endpoints available upon request (1M free tokens trial).
  • Learn more about how companies like OpenAI, Netflix, Pinterest, Verizon, Instacart and others leverage Ray and Anyscale for their AI workloads at the Ray Summit 2024 this Sept 18-20 in San Francisco.