-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
254d014
commit f70fa83
Showing
1 changed file
with
29 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Information Retrieval Models | ||
Ranks passages against queries using various models and techniques. | ||
|
||
## How to Run | ||
The program can be initialised by running *start.py*, which accepts parameters in the format of: | ||
`start.py <dataset> <model> [-s <smoothing>] [-p]` | ||
|
||
### Parameters | ||
#### Dataset | ||
- The `<dataset>` parameter is required and is the path of the dataset to be parsed. | ||
- Expects a TSV file in the format *<qid pid query passage>*, where qid is the query ID, pid is the | ||
ID of the passage retrieved, query is the query text, and passage is the passage text, | ||
- Each column must be tab separated. | ||
#### Model | ||
- The `<model>` parameter is required and is the name of the model to be used for ranking passages. | ||
- Expects either 'bm25' for the BM25 model, 'vs' for the Vector Space model, or 'lm' for the query | ||
likelihood model. | ||
- Any other input will be deemed invalid, and an exception will be raised. | ||
#### Smoothing | ||
- The `-s <smoothing>` parameter is required only when using the Query Likelihood model, and is | ||
the name of the smoothing technique which will be applied. | ||
- Expects either 'laplace' for Laplace smoothing, 'lidstone' for Lidstone smoothing, or 'dirichlet' | ||
for Dirichlet smoothing. | ||
- This parameter can only ever be used if the Query Likelihood model was selected for the `<model>` | ||
parameter, and an exception will be raised if any other model is used. | ||
#### Frequency Plot | ||
- The `-p` parameter is optional and generates a PNG file which displays term frequencies in graph | ||
format. | ||
- By default, this file will be saved to the local directory as *term-frequencies.png*. |