Skip to content

Latest commit

 

History

History
 
 

04-monitoring

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Module 4: Evaluation and Monitoring

In this module, we will learn how to evaluate and monitor our LLM and RAG system.

In the evaluation part, we assess the quality of our entire RAG system before it goes live.

In the monitoring part, we collect, store and visualize metrics to assess the answer quality of a deployed LLM. We also collect chat history and user feedback.

4.1 Introduction to monitoring answer quality

  • Why monitoring LLM systems?
  • Monitoring answer quality of LLMs
  • Monitoring answer quality with user feedback
  • What else to monitor, that is not covered by this module?

4.2 Offline vs Online (RAG) evaluation

  • Modules recap
  • Online vs offline evaluation
  • Offline evaluation metrics

4.3 Generating data for offline RAG evaluation

Links:

4.4 Offline RAG evaluation: cosine similarity

Content

  • A->Q->A' cosine similarity
  • Evaluating gpt-4o
  • Evaluating gpt-3.5-turbo
  • Evaluating gpt-4o-mini

Links:

4.5 Offline RAG evaluation: LLM as a judge

  • LLM as a judge
  • A->Q->A' evaluation
  • Q->A evaluation

Links:

4.6 Capturing user feedback

You can see the prompts and the output from claude here

Content

  • Adding +1 and -1 buttons
  • Setting up a postgres database
  • Putting everything in docker compose
pip install pgcli
pgcli -h localhost -U your_username -d course_assistant -W

Links:

4.6.2 Capturing user feedback: part 2

  • adding vector search
  • adding OpenAI

Links:

4.7 Monitoring the system

  • Setting up Grafana
  • Tokens and costs
  • QA relevance
  • User feedback
  • Other metrics

Links:

4.7.2 Extra Grafana video

  • Grafana variables
  • Exporting and importing dashboards

Links:

Homework

See here

Extra resources

Overview of the module

image

https://www.loom.com/share/1dd375ec4b0d458fabdfc2b841089031

Notes

  • Notes by slavaheroes
  • Did you take notes? Add them above this line (Send a PR with links to your notes)