Skip to content

Commit

Permalink
Add link to run_regression.py in regression docs (castorini#1764)
Browse files Browse the repository at this point in the history
+ We can now just copy/paste from docs to run regression on Waterloo servers
+ Cleanup of uniCOIL on MS MARCO V1 docs
  • Loading branch information
lintool authored Feb 10, 2022
1 parent a275437 commit 6d8f494
Show file tree
Hide file tree
Showing 177 changed files with 1,199 additions and 41 deletions.
6 changes: 6 additions & 0 deletions docs/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 201
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking18.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking18
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 201
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking19.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking19.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking19
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 202
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking20
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-car17v1.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v1.5.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v1.5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression car17v1.5
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-car17v2.0-doc2query.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ For more complete instructions on how to run end-to-end experiments, refer to [t
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0-doc2query.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression car17v2.0-doc2query
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-car17v2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression car17v2.0
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-clef06-fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ Associated data can be found on the [CLEF test suites pages](http://www.clef-ini
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/clef06-fr.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/clef06-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression clef06-fr
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-core17.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the TREC 2017 Common Core Track, which uses
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core17.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core17.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression core17
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-core18.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the TREC 2018 Common Core Track, which uses
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core18.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression core18
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-cw09b.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2009 to 2012 using
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw09b.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw09b.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression cw09b
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-cw12.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2013 and 2014 using
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression cw12
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-cw12b13.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2013 and 2014 using
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12b13.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12b13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression cw12b13
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-disk12.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for ad hoc topics from TREC 1-3, which use [TIPS
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/disk12.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/disk12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression disk12
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-disk45.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ This page describes regressions for ad hoc topics from TREC 7-8, which use [TREC
The exact configurations for these regressions are stored in [this YAML file](${yaml).
Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression disk45
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-doc-docTTTTTquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
As a result, we have had to rebuild all our regressions from the raw corpus.
These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-docTTTTTquery
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-doc-segmented-docTTTTTquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
As a result, we have had to rebuild all our regressions from the raw corpus.
These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-docTTTTTquery
```

## Indexing

Typical indexing command:
Expand Down
21 changes: 18 additions & 3 deletions docs/regressions-dl19-doc-segmented-unicoil.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Anserini: Regressions for [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) Segmented w/ uniCOIL
# Anserini: Regressions on DL19 (Doc) with uniCOIL

This page describes experiments, integrated into Anserini's regression testing framework, for the TREC 2019 Deep Learning Track (Document Ranking Task) on the MS MARCO document collection using relevance judgments from NIST.
These runs use the uniCOIL model described in the following paper:
This page describes regression experiments, integrated into Anserini's regression testing framework, with uniCOIL on the [TREC 2019 Deep Learning Track Document Ranking Task](https://trec.nist.gov/data/deep2019.html).
The uniCOIL model is described in the following paper:

> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
Expand All @@ -12,6 +12,12 @@ Retrieval uses MaxP technique, where we select the score of the highest-scoring
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-unicoil.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-unicoil
```

## Corpus

We make available a version of the MS MARCO passage corpus that has already been processed with uniCOIL, i.e., gone through document expansion and term reweighting.
Expand All @@ -28,6 +34,15 @@ tar xvf collections/msmarco-doc-segmented-unicoil.tar -C collections/

To confirm, `msmarco-doc-segmented-unicoil.tar` is 18 GB and has MD5 checksum `6a00e2c0c375cb1e52c83ae5ac377ebb`.

With the corpus downloaded, the following command will perform the complete regression, end to end, on any machine:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-unicoil \
--corpus-path collections/msmarco-doc-segmented-unicoil
```

Alternatively, you can simply copy/paste from the commands below and obtain the same results.

## Indexing

Sample indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-doc-segmented.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
As a result, we have had to rebuild all our regressions from the raw corpus.
These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
As a result, we have had to rebuild all our regressions from the raw corpus.
These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-passage-docTTTTTquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-docTTTTTquery.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-docTTTTTquery
```

## Indexing

Typical indexing command:
Expand Down
21 changes: 18 additions & 3 deletions docs/regressions-dl19-passage-unicoil.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Anserini: Regressions for [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) w/ uniCOIL
# Anserini: Regressions on DL19 (Passage) with uniCOIL

This page describes document expansion experiments, integrated into Anserini's regression testing framework, for the TREC 2019 Deep Learning Track (Passage Ranking Task) on the MS MARCO passage collection using relevance judgments from NIST.
These runs use the uniCOIL model described in the following paper:
This page describes regression experiments, integrated into Anserini's regression testing framework, with uniCOIL on the [TREC 2019 Deep Learning Track Passage Ranking Task](https://trec.nist.gov/data/deep2019.html).
The uniCOIL model is described in the following paper:

> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
Expand All @@ -14,6 +14,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-unicoil.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-unicoil
```

## Corpus

We make available a version of the MS MARCO passage corpus that has already been processed with uniCOIL, i.e., gone through document expansion and term reweighting.
Expand All @@ -30,6 +36,15 @@ tar xvf collections/msmarco-passage-unicoil.tar -C collections/

To confirm, `msmarco-passage-unicoil.tar` is 3.3 GB and has MD5 checksum `78eef752c78c8691f7d61600ceed306f`.

With the corpus downloaded, the following command will perform the complete regression, end to end, on any machine:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-unicoil \
--corpus-path collections/msmarco-passage-unicoil
```

Alternatively, you can simply copy/paste from the commands below and obtain the same results.

## Indexing

Sample indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl19-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage
```

## Indexing

Typical indexing command:
Expand Down
6 changes: 6 additions & 0 deletions docs/regressions-dl20-doc-docTTTTTquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
As a result, we have had to rebuild all our regressions from the raw corpus.
These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression dl20-doc-docTTTTTquery
```

## Indexing

Typical indexing command:
Expand Down
Loading

0 comments on commit 6d8f494

Please sign in to comment.