Add link to run_regression.py in regression docs (castorini#1764)

+ We can now just copy/paste from docs to run regression on Waterloo servers + Cleanup of uniCOIL on MS MARCO V1 docs
billcui57 · Feb 10, 2022 · 6d8f494 · 6d8f494
1 parent a275437
commit 6d8f494
Show file tree

Hide file tree

Showing 177 changed files with 1,199 additions and 41 deletions.
diff --git a/docs/regressions-backgroundlinking18.md b/docs/regressions-backgroundlinking18.md
@@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 201
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking18.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking18
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-backgroundlinking19.md b/docs/regressions-backgroundlinking19.md
@@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 201
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking19.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking19.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking19
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-backgroundlinking20.md b/docs/regressions-backgroundlinking20.md
@@ -4,6 +4,12 @@ This page describes regressions for the background linking task in the [TREC 202
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression backgroundlinking20
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-car17v1.5.md b/docs/regressions-car17v1.5.md
@@ -4,6 +4,12 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v1.5.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v1.5.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression car17v1.5
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-car17v2.0-doc2query.md b/docs/regressions-car17v2.0-doc2query.md
@@ -10,6 +10,12 @@ For more complete instructions on how to run end-to-end experiments, refer to [t
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0-doc2query.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0-doc2query.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression car17v2.0-doc2query
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-car17v2.0.md b/docs/regressions-car17v2.0.md
@@ -4,6 +4,12 @@ This page documents regression experiments for the [TREC 2017 Complex Answer Ret
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/car17v2.0.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/car17v2.0.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression car17v2.0
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-clef06-fr.md b/docs/regressions-clef06-fr.md
@@ -6,6 +6,12 @@ Associated data can be found on the [CLEF test suites pages](http://www.clef-ini
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/clef06-fr.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/clef06-fr.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression clef06-fr
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-core17.md b/docs/regressions-core17.md
@@ -4,6 +4,12 @@ This page describes regressions for the TREC 2017 Common Core Track, which uses
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core17.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core17.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression core17
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-core18.md b/docs/regressions-core18.md
@@ -4,6 +4,12 @@ This page describes regressions for the TREC 2018 Common Core Track, which uses
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/core18.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/core18.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression core18
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-cw09b.md b/docs/regressions-cw09b.md
@@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2009 to 2012 using
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw09b.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw09b.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression cw09b
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-cw12.md b/docs/regressions-cw12.md
@@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2013 and 2014 using
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression cw12
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-cw12b13.md b/docs/regressions-cw12b13.md
@@ -4,6 +4,12 @@ This page describes regressions for the Web Tracks from TREC 2013 and 2014 using
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/cw12b13.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/cw12b13.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression cw12b13
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-disk12.md b/docs/regressions-disk12.md
@@ -4,6 +4,12 @@ This page describes regressions for ad hoc topics from TREC 1-3, which use [TIPS
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/disk12.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/disk12.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression disk12
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-disk45.md b/docs/regressions-disk45.md
@@ -4,6 +4,12 @@ This page describes regressions for ad hoc topics from TREC 7-8, which use [TREC
 The exact configurations for these regressions are stored in [this YAML file](${yaml).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression disk45
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-doc-docTTTTTquery.md b/docs/regressions-dl19-doc-docTTTTTquery.md
@@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
 As a result, we have had to rebuild all our regressions from the raw corpus.
 These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-docTTTTTquery
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-doc-segmented-docTTTTTquery.md b/docs/regressions-dl19-doc-segmented-docTTTTTquery.md
@@ -20,6 +20,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
 As a result, we have had to rebuild all our regressions from the raw corpus.
 These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-docTTTTTquery
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-doc-segmented-unicoil.md b/docs/regressions-dl19-doc-segmented-unicoil.md
@@ -1,7 +1,7 @@
-# Anserini: Regressions for [DL19 (Doc)](https://trec.nist.gov/data/deep2019.html) Segmented w/ uniCOIL
+# Anserini: Regressions on DL19 (Doc) with uniCOIL
 
-This page describes experiments, integrated into Anserini's regression testing framework, for the TREC 2019 Deep Learning Track (Document Ranking Task) on the MS MARCO document collection using relevance judgments from NIST.
-These runs use the uniCOIL model described in the following paper:
+This page describes regression experiments, integrated into Anserini's regression testing framework, with uniCOIL on the [TREC 2019 Deep Learning Track Document Ranking Task](https://trec.nist.gov/data/deep2019.html).
+The uniCOIL model is described in the following paper:
 
 > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
 
@@ -12,6 +12,12 @@ Retrieval uses MaxP technique, where we select the score of the highest-scoring
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-segmented-unicoil.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-doc-segmented-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-unicoil
+```
+
 ## Corpus
 
 We make available a version of the MS MARCO passage corpus that has already been processed with uniCOIL, i.e., gone through document expansion and term reweighting.
@@ -28,6 +34,15 @@ tar xvf collections/msmarco-doc-segmented-unicoil.tar -C collections/
 
 To confirm, `msmarco-doc-segmented-unicoil.tar` is 18 GB and has MD5 checksum `6a00e2c0c375cb1e52c83ae5ac377ebb`.
 
+With the corpus downloaded, the following command will perform the complete regression, end to end, on any machine:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented-unicoil \
+  --corpus-path collections/msmarco-doc-segmented-unicoil
+```
+
+Alternatively, you can simply copy/paste from the commands below and obtain the same results.
+
 ## Indexing
 
 Sample indexing command:

diff --git a/docs/regressions-dl19-doc-segmented.md b/docs/regressions-dl19-doc-segmented.md
@@ -20,6 +20,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
 As a result, we have had to rebuild all our regressions from the raw corpus.
 These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-doc.md b/docs/regressions-dl19-doc.md
@@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
 As a result, we have had to rebuild all our regressions from the raw corpus.
 These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-doc
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-passage-docTTTTTquery.md b/docs/regressions-dl19-passage-docTTTTTquery.md
@@ -9,6 +9,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-docTTTTTquery.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-docTTTTTquery.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-docTTTTTquery
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl19-passage-unicoil.md b/docs/regressions-dl19-passage-unicoil.md
@@ -1,7 +1,7 @@
-# Anserini: Regressions for [DL19 (Passage)](https://trec.nist.gov/data/deep2019.html) w/ uniCOIL
+# Anserini: Regressions on DL19 (Passage) with uniCOIL
 
-This page describes document expansion experiments, integrated into Anserini's regression testing framework, for the TREC 2019 Deep Learning Track (Passage Ranking Task) on the MS MARCO passage collection using relevance judgments from NIST.
-These runs use the uniCOIL model described in the following paper:
+This page describes regression experiments, integrated into Anserini's regression testing framework, with uniCOIL on the [TREC 2019 Deep Learning Track Passage Ranking Task](https://trec.nist.gov/data/deep2019.html).
+The uniCOIL model is described in the following paper:
 
 > Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
 
@@ -14,6 +14,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage-unicoil.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-unicoil
+```
+
 ## Corpus
 
 We make available a version of the MS MARCO passage corpus that has already been processed with uniCOIL, i.e., gone through document expansion and term reweighting.
@@ -30,6 +36,15 @@ tar xvf collections/msmarco-passage-unicoil.tar -C collections/
 
 To confirm, `msmarco-passage-unicoil.tar` is 3.3 GB and has MD5 checksum `78eef752c78c8691f7d61600ceed306f`.
 
+With the corpus downloaded, the following command will perform the complete regression, end to end, on any machine:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-passage-unicoil \
+  --corpus-path collections/msmarco-passage-unicoil
+```
+
+Alternatively, you can simply copy/paste from the commands below and obtain the same results.
+
 ## Indexing
 
 Sample indexing command:

diff --git a/docs/regressions-dl19-passage.md b/docs/regressions-dl19-passage.md
@@ -8,6 +8,12 @@ For additional instructions on working with MS MARCO passage collection, refer t
 The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-passage.yaml).
 Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/dl19-passage.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl19-passage
+```
+
 ## Indexing
 
 Typical indexing command:

diff --git a/docs/regressions-dl20-doc-docTTTTTquery.md b/docs/regressions-dl20-doc-docTTTTTquery.md
@@ -19,6 +19,12 @@ Note that in November 2021 we discovered issues in our regression tests, documen
 As a result, we have had to rebuild all our regressions from the raw corpus.
 These new versions yield end-to-end scores that are slightly different, so if numbers reported in a paper do not exactly match the numbers here, this may be the reason.
 
+From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:
+
+```
+python src/main/python/run_regression.py --index --verify --search --regression dl20-doc-docTTTTTquery
+```
+
 ## Indexing
 
 Typical indexing command: