diff --git a/README.md b/README.md new file mode 100644 index 0000000..58bd5bb --- /dev/null +++ b/README.md @@ -0,0 +1,29 @@ +# Information Retrieval Models +Ranks passages against queries using various models and techniques. + +## How to Run +The program can be initialised by running *start.py*, which accepts parameters in the format of: +`start.py [-s ] [-p]` + +### Parameters +#### Dataset +- The `` parameter is required and is the path of the dataset to be parsed. +- Expects a TSV file in the format **, where qid is the query ID, pid is the + ID of the passage retrieved, query is the query text, and passage is the passage text, +- Each column must be tab separated. +#### Model +- The `` parameter is required and is the name of the model to be used for ranking passages. +- Expects either 'bm25' for the BM25 model, 'vs' for the Vector Space model, or 'lm' for the query + likelihood model. +- Any other input will be deemed invalid, and an exception will be raised. +#### Smoothing +- The `-s ` parameter is required only when using the Query Likelihood model, and is + the name of the smoothing technique which will be applied. +- Expects either 'laplace' for Laplace smoothing, 'lidstone' for Lidstone smoothing, or 'dirichlet' + for Dirichlet smoothing. +- This parameter can only ever be used if the Query Likelihood model was selected for the `` + parameter, and an exception will be raised if any other model is used. +#### Frequency Plot +- The `-p` parameter is optional and generates a PNG file which displays term frequencies in graph + format. +- By default, this file will be saved to the local directory as *term-frequencies.png*.