Skip to content

Latest commit

 

History

History
53 lines (46 loc) · 2.75 KB

axiom-reranking.md

File metadata and controls

53 lines (46 loc) · 2.75 KB

Axiomatic Reranking

How to use

To use it with BM25:

Anserini/target/appassembler/bin/SearchCollection -topicreader Trec -index /path/to/index/ -hits 1000 -topics Anserini/src/main/resources/topics-and-qrels/topics.51-100.txt -bm25 -axiom -axiom.beta 0.4 -output run_axiom_beta_0.4.txt

To use it with Dirichlet Language Model:

Anserini/target/appassembler/bin/SearchCollection -topicreader Trec -index /path/to/index/ -hits 1000 -topics Anserini/src/main/resources/topics-and-qrels/topics.51-100.txt -ql -axiom -axiom.beta 0.4 -output run_axiom_beta_0.4.txt

Algorithm

  1. Rank the documents and pick the top M documents as the reranking documents pool RP
  2. Randomly select (R-1)*M documents from the index and add them to RP so that we have R*M documents in the reranking pool
  3. Build the inverted term-docs list RTL for RP
  4. For each term in RTL, calculate its reranking score as the mutual information between query terms and itself: s(q,t)=I(X_q, X_t|RP)=SUM(p(X_q,X_t|W)*log(p(X_q,X_t|W)/p(X_q|W)/p(X_t|W))) where X_q and X_t are two binary random variables that denote the presence/absence of query term q and term t in the document.
  5. The final reranking score of each term t in RTL is calculated by summing up its scores for all query terms: s(t) = SUM(s(q,t))
  6. Pick top K terms from RTL based on their reranking scores with their weights s(t)
  7. Rerank the documents by using the K reranking terms with their weights. In Lucene, it is something like (term1^0.2 term2^0.01 ...)

Notes

Axiomatic Reranking algorithm is a non-deterministic algorithm since it randomly pick (R-1)*M documents as part of the reranking pool (see algorithm above for details). Here we just list the performance references for major TREC collections. The ranking model we used is BM25 and the parameter beta is set as 0.4 for all collections although this is definitely not the optimal value for individual collection. We report MAP for all collections except ClueWeb collections where ndcg@20 is reported.

Collection BM25+Axiom
Disk12 51-100 0.2653
Disk12 101-150 0.2690
Disk12 151-200 0.3414
Robust04 701-750 0.2909
Robust05 0.2583
Wt10g 451-550 0.1946
Gov2 701-750 0.2460
Gov2 751-800 0.3146
Gov2 801-850 0.2935
CW09b 51-100 0.1555
CW09b 101-150 0.1593
CW09b 151-200 0.1106
CW12 201-250 0.1679
CW12 251-300 0.2154
Microblog 2011 0.3866
Microblog 2012 0.2215
Microblog 2013 0.2606
Microblog 2014 0.4519

Please refer to the paper [Yang et al, 2013] Yang, P., and Fang, H. (2013). Evaluating the Effectiveness of Axiomatic Approaches in Web Track. In TREC 2013. for more details.