Skip to content

Commit

Permalink
Add README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
louislefevre committed Mar 27, 2021
1 parent 254d014 commit f70fa83
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Information Retrieval Models
Ranks passages against queries using various models and techniques.

## How to Run
The program can be initialised by running *start.py*, which accepts parameters in the format of:
`start.py <dataset> <model> [-s <smoothing>] [-p]`

### Parameters
#### Dataset
- The `<dataset>` parameter is required and is the path of the dataset to be parsed.
- Expects a TSV file in the format *<qid pid query passage>*, where qid is the query ID, pid is the
ID of the passage retrieved, query is the query text, and passage is the passage text,
- Each column must be tab separated.
#### Model
- The `<model>` parameter is required and is the name of the model to be used for ranking passages.
- Expects either 'bm25' for the BM25 model, 'vs' for the Vector Space model, or 'lm' for the query
likelihood model.
- Any other input will be deemed invalid, and an exception will be raised.
#### Smoothing
- The `-s <smoothing>` parameter is required only when using the Query Likelihood model, and is
the name of the smoothing technique which will be applied.
- Expects either 'laplace' for Laplace smoothing, 'lidstone' for Lidstone smoothing, or 'dirichlet'
for Dirichlet smoothing.
- This parameter can only ever be used if the Query Likelihood model was selected for the `<model>`
parameter, and an exception will be raised if any other model is used.
#### Frequency Plot
- The `-p` parameter is optional and generates a PNG file which displays term frequencies in graph
format.
- By default, this file will be saved to the local directory as *term-frequencies.png*.

0 comments on commit f70fa83

Please sign in to comment.