Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
shenlei1020 committed Jan 19, 2024
1 parent 672e60d commit 940cace
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* @Author: shenlei
* @Modified: linhui
* @Date: 2023-12-19 10:31:41
* @LastEditTime: 2024-01-19 12:20:52
* @LastEditTime: 2024-01-19 16:33:34
* @LastEditors: shenlei
-->

Expand Down Expand Up @@ -127,7 +127,7 @@ Use `EmbeddingModel`, and `cls` [pooler](./BCEmbedding/models/embedding.py#L24)
from BCEmbedding import EmbeddingModel

# list of sentences
sentences = ['sentence_0', 'sentence_1', ...]
sentences = ['sentence_0', 'sentence_1']

# init embedding model
model = EmbeddingModel(model_name_or_path="maidalun1020/bce-embedding-base_v1")
Expand All @@ -143,7 +143,7 @@ from BCEmbedding import RerankerModel

# your query and corresponding passages
query = 'input_query'
passages = ['passage_0', 'passage_1', ...]
passages = ['passage_0', 'passage_1']

# construct sentence pairs
sentence_pairs = [[query, passage] for passage in passages]
Expand Down Expand Up @@ -491,7 +491,7 @@ The summary of multiple domains evaluations can be seen in <a href="#1-multiple-
- Our ***bce-embedding-base_v1*** outperforms other opensource embedding models with comparable model sizes.
- ***114 datastes including 119 eval results*** (some dataset contains multiple languages) of "Retrieval", "STS", "PairClassification", "Classification", "Reranking" and "Clustering" in ***`["en", "zh", "en-zh", "zh-en"]` setting***, including **MTEB and CMTEB**.
- The [crosslingual evaluation datasets](./BCEmbedding/evaluation/c_mteb/Retrieval.py) we released belong to `Retrieval` task.
- More evaluation details should be checked [Embedding Models Evaluations](./Docs/EvaluationSummary/embedding_eval_summary.md).
- More evaluation details should be checked in [Embedding Models Evaluations](./Docs/EvaluationSummary/embedding_eval_summary.md).

#### 2. Reranker Models

Expand All @@ -505,7 +505,7 @@ The summary of multiple domains evaluations can be seen in <a href="#1-multiple-

- Our ***bce-reranker-base_v1*** outperforms other opensource reranker models.
- ***12 datastes*** of "Reranking" in ***`["en", "zh", "en-zh", "zh-en"]` setting***.
- More evaluation details should be checked [Reranker Models Evaluations](./Docs/EvaluationSummary/reranker_eval_summary.md).
- More evaluation details should be checked in [Reranker Models Evaluations](./Docs/EvaluationSummary/reranker_eval_summary.md).

### RAG Evaluations in LlamaIndex

Expand Down
10 changes: 5 additions & 5 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* @Author: shenlei
* @Modified: linhui
* @Date: 2023-12-19 10:31:41
* @LastEditTime: 2024-01-19 12:15:15
* @LastEditTime: 2024-01-19 16:33:48
* @LastEditors: shenlei
-->

Expand Down Expand Up @@ -127,7 +127,7 @@ pip install -v -e .
from BCEmbedding import EmbeddingModel

# list of sentences
sentences = ['sentence_0', 'sentence_1', ...]
sentences = ['sentence_0', 'sentence_1']

# init embedding model
model = EmbeddingModel(model_name_or_path="maidalun1020/bce-embedding-base_v1")
Expand All @@ -143,7 +143,7 @@ from BCEmbedding import RerankerModel

# your query and corresponding passages
query = 'input_query'
passages = ['passage_0', 'passage_1', ...]
passages = ['passage_0', 'passage_1']

# construct sentence pairs
sentence_pairs = [[query, passage] for passage in passages]
Expand All @@ -170,7 +170,7 @@ rerank_results = model.rerank(query, passages)
from transformers import AutoModel, AutoTokenizer

# list of sentences
sentences = ['sentence_0', 'sentence_1', ...]
sentences = ['sentence_0', 'sentence_1']

# init model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('maidalun1020/bce-embedding-base_v1')
Expand All @@ -181,7 +181,7 @@ model.to(device)

# get inputs
inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs_on_device = {k: v.to(self.device) for k, v in inputs.items()}
inputs_on_device = {k: v.to(device) for k, v in inputs.items()}

# get embeddings
outputs = model(**inputs_on_device, return_dict=True)
Expand Down

0 comments on commit 940cace

Please sign in to comment.