Skip to content

Commit 0eff9b1

Browse files
committed
Eval markdown
1 parent 1929ba1 commit 0eff9b1

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

docs/evaluation.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
Follow these steps to evaluate the quality of the answers generated by the RAG flow.
44

5+
* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
6+
* [Setup the evaluation environment](#setup-the-evaluation-environment)
7+
* [Generate ground truth data](#generate-ground-truth-data)
8+
* [Run bulk evaluation](#run-bulk-evaluation)
9+
* [Review the evaluation results](#review-the-evaluation-results)
10+
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)
11+
512
## Deploy a GPT-4 model
613

714

@@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py
4552

4653
Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
4754
48-
## Evaluate the RAG answer quality
55+
## Run bulk evaluation
4956
5057
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
5158
@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
7279
python -m evaltools diff evals/results/baseline/
7380
```
7481
75-
## Run the evaluation on a PR
82+
## Run bulk evaluation on a PR
7683
7784
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 commit comments

Comments
 (0)