Skip to content

Commit

Permalink
chore: readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
bclavie committed Jan 9, 2024
1 parent 0b84548 commit 28b1dfe
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ In this section, we'll quickly walk you through the three core aspects of RAGato

_If you're just prototyping, you don't need to train your own model! While finetuning can be useful, one of the strength of ColBERT is that the pretrained models are particularly good at generalisation, and [ColBERTv2](https://huggingface.co/colbert-ir/colbertv2.0) has [repeatedly been shown to be extremely strong](https://arxiv.org/abs/2303.00807) at zero-shot retrieval in new domains!_

⚠️ Please note: Training can currently only be ran on GPU, and will error out if using CPU/MPS! ⚠️

#### Data Processing

RAGatouille's RAGTrainer has a built-in `TrainingDataProcessor`, which can take most forms of retrieval training data, and automatically convert it to training triplets, with data enhancements. The pipeline works as follows:
Expand Down Expand Up @@ -119,9 +117,13 @@ To create an index, you'll need to load a trained model, this can be one of your
```python
from ragatouille import RAGPretrainedModel
from ragatouille.utils import get_wikipedia_page
from ragatouille.data import CorpusProcessor


RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
my_documents = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]
processor = CorpusProcessor()
my_documents = processor.process_corpus(my_documents)
index_path = RAG.index(index_name="my_index", collection=my_documents)
```

Expand Down

0 comments on commit 28b1dfe

Please sign in to comment.