LLM Applications

A comprehensive guide to building RAG-based LLM applications for production.

Blog post: https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1
GitHub repository: https://github.com/ray-project/llm-applications
Interactive notebook: https://github.com/ray-project/llm-applications/blob/main/notebooks/rag.ipynb
Anyscale Endpoints: https://endpoints.anyscale.com/
Ray documentation: https://docs.ray.io/

In this guide, we will learn how to:

💻 Develop a retrieval augmented generation (RAG) based LLM application from scratch.
🚀 Scale the major components (load, chunk, embed, index, serve, etc.) in our application.
✅ Evaluate different configurations of our application to optimize for both per-component (ex. retrieval_score) and overall performance (quality_score).
🔀 Implement LLM hybrid routing approach to bridge the gap b/w OSS and closed LLMs.
📦 Serve the application in a highly scalable and available manner.
💥 Share the 1st order and 2nd order impacts LLM applications have had on our products.

Setup

API keys

We'll be using OpenAI to access ChatGPT models like gpt-3.5-turbo, gpt-4, etc. and Anyscale Endpoints to access OSS LLMs like Llama-2-70b. Be sure to create your accounts for both and have your credentials ready.

Compute

Local

You could run this on your local laptop but a we highly recommend using a setup with access to GPUs. You can set this up on your own or on [Anyscale](http://anyscale.com/).

Anyscale

Start a new Anyscale workspace on staging using an g3.8xlarge head node, which has 2 GPUs and 32 CPUs. We can also add GPU worker nodes to run the workloads faster. If you're not on Anyscale, you can configure a similar instance on your cloud.
Use the default_cluster_env_2.6.2_py39 cluster environment.
Use the us-west-2 if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).

Repository

git clone https://github.com/ray-project/llm-applications.git .
git config --global user.name <GITHUB-USERNAME>
git config --global user.email <EMAIL-ADDRESS>

Data

Our data is already ready at /efs/shared_storage/goku/docs.ray.io/en/master/ (on Staging, us-east-1) but if you wanted to load it yourself, run this bash command (change /desired/output/directory, but make sure it's on the shared storage, so that it's accessible to the workers)

git clone https://github.com/ray-project/llm-applications.git .

Environment

Then set up the environment correctly by specifying the values in your .env file, and installing the dependencies:

pip install --user -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$PWD
pre-commit install
pre-commit autoupdate

Credentials

touch .env
# Add environment variables to .env
OPENAI_API_BASE="https://api.openai.com/v1"
OPENAI_API_KEY=""  # https://platform.openai.com/account/api-keys
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_KEY=""  # https://app.endpoints.anyscale.com/credentials
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
source .env

Now we're ready to go through the rag.ipynb interactive notebook to develop and serve our LLM application!

Learn more

If your team is investing heavily in developing LLM applications, reach out to us to learn more about how Ray and Anyscale can help you scale and productionize everything.
Start serving (+fine-tuning) OSS LLMs with Anyscale Endpoints ($1/M tokens for Llama-3-70b) and private endpoints available upon request (1M free tokens trial).
Learn more about how companies like OpenAI, Netflix, Pinterest, Verizon, Instacart and others leverage Ray and Anyscale for their AI workloads at the Ray Summit 2024 this Sept 18-20 in San Francisco.

Name	Name	Last commit message	Last commit date
Latest commit GokuMohandas reverted to default project Aug 2, 2024 2044d3f · Aug 2, 2024 History 88 Commits
.github/workflows	.github/workflows	updated update_index github actions workflow (#96 )	Jan 19, 2024
datasets	datasets	improved fine-tuned embeddings and reranking (#76 )	Nov 10, 2023
deploy	deploy	reverted to default project	Aug 2, 2024
experiments	experiments	added more mixtral experiments	Jan 4, 2024
migrations	migrations	cleaned up notebook (#28 )	Sep 1, 2023
notebooks	notebooks	added llama3 and mixtral8x22 to config	Apr 23, 2024
rag	rag	temporary playground replacement	Aug 1, 2024
.gitignore	.gitignore	simple text data sources (#19 )	Aug 22, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	[Bug] Fix some issues to make rag.ipynb work (#101 )	Apr 2, 2024
LICENSE	LICENSE	added architecture diagrams (#48 )	Sep 13, 2023
Makefile	Makefile	added reranking eval and serve (#69 )	Oct 21, 2023
README.md	README.md	added llama3 and mixtral8x22 to config	Apr 23, 2024
pyproject.toml	pyproject.toml	Cleaned scripts (#42 )	Sep 10, 2023
requirements.txt	requirements.txt	added reranking eval and serve	Oct 21, 2023
setup-pgvector.sh	setup-pgvector.sh	[Bug] Fix some issues to make rag.ipynb work (#101 )	Apr 2, 2024
test.py	test.py	temporary playground replacement	Aug 1, 2024
update-index.sh	update-index.sh	Update update-index.sh to switch from master to latest	Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Applications

Setup

API keys

Compute

Repository

Data

Environment

Credentials

Learn more

About

Releases

Packages

Contributors 5

Languages

License

ray-project/llm-applications

Folders and files

Latest commit

History

Repository files navigation

LLM Applications

Setup

API keys

Compute

Repository

Data

Environment

Credentials

Learn more

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages