chore: readme update

flexorRegev · Jan 9, 2024 · 28b1dfe · 28b1dfe
1 parent 0b84548
commit 28b1dfe
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -50,8 +50,6 @@ In this section, we'll quickly walk you through the three core aspects of RAGato
 
 _If you're just prototyping, you don't need to train your own model! While finetuning can be useful, one of the strength of ColBERT is that the pretrained models are particularly good at generalisation, and [ColBERTv2](https://huggingface.co/colbert-ir/colbertv2.0) has [repeatedly been shown to be extremely strong](https://arxiv.org/abs/2303.00807) at zero-shot retrieval in new domains!_
 
-⚠️ Please note: Training can currently only be ran on GPU, and will error out if using CPU/MPS! ⚠️
-
 #### Data Processing
 
 RAGatouille's RAGTrainer has a built-in `TrainingDataProcessor`, which can take most forms of retrieval training data, and automatically convert it to training triplets, with data enhancements. The pipeline works as follows:
@@ -119,9 +117,13 @@ To create an index, you'll need to load a trained model, this can be one of your
 ```python
 from ragatouille import RAGPretrainedModel
 from ragatouille.utils import get_wikipedia_page
+from ragatouille.data import CorpusProcessor
+
 
 RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
 my_documents = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]
+processor = CorpusProcessor()
+my_documents = processor.process_corpus(my_documents)
 index_path = RAG.index(index_name="my_index", collection=my_documents)
 ```