Skip to content

Anserini is a Lucene toolkit for reproducible information retrieval research

License

Notifications You must be signed in to change notification settings

erfansadraiye/anserini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anserini

Build Status Maven Central LICENSE

Getting Started

Build using Maven:

mvn clean package

The eval/ directory contains evaluation tools and scripts, including trec_eval. Before using trec_eval, unpack and compile it, as follows:

tar xvfz trec_eval.9.0.tar.gz && cd trec_eval.9.0 && make

Running Standard IR Experiments

Anserini is designed to support experiments on various standard TREC collections out of the box:

Tools

Python Interface

Anserini was designed with Python integration in mind, for connecting with popular deep learning toolkits such as PyTorch. This is accomplished via pyjnius. To make this work, tell Maven to explicitly build the fat jar, as follows:

mvn clean package shade:shade

The SimpleSearcher class provides a simple Python/Java bridge, shown below:

import jnius_config
jnius_config.set_classpath("target/anserini-0.0.1-SNAPSHOT-fatjar.jar")

from jnius import autoclass
JString = autoclass('java.lang.String')
JSearcher = autoclass('io.anserini.search.SimpleSearcher')

searcher = JSearcher(JString('lucene-index.robust04.pos+docvectors+rawdocs'))
hits = searcher.search(JString('hubble space telescope'))

# the docid of the 1st hit
hits[0].docid

# the internal Lucene docid of the 1st hit
hits[0].ldocid

# the score of the 1st hit
hits[0].score

# the full document of the 1st hit
hits[0].content

Release History

About

Anserini is a Lucene toolkit for reproducible information retrieval research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 83.2%
  • Python 15.8%
  • Shell 0.5%
  • TypeScript 0.4%
  • Julia 0.1%
  • TeX 0.0%