adding explanations

davidliujiafeng · Aug 29, 2022 · 73bee45 · 73bee45
1 parent 8ee685a
commit 73bee45
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -40,6 +40,7 @@ width="800px"></p>
 
 See the [release notes](./docs/release.md) for more details.
 
+- TBD: added reproducibility and ablation grids, along with an updated version of the paper.
 - 24/02/2022: Releasing v3.0.4: split into two stems (i.e. karaoke mode).
     Export as float32 or int24.
 - 17/12/2021: Releasing v3.0.3: bug fixes  (thanks @keunwoochoi), memory drastically

diff --git a/demucs/grids/repro.py b/demucs/grids/repro.py
@@ -31,8 +31,11 @@ def explorer(launcher):
         for model in [base, newt, hdem]:
             sub = launcher.bind(model)
             if model is base:
+                # Training the v2 Demucs on MusDB HQ
                 sub(epochs=360)
                 continue
+
+            # those two will be used in the repro_mdx_a bag of models.
             sub(svd)
             sub(svd, seed=43)
             if model == newt:

diff --git a/docs/training.md b/docs/training.md
@@ -141,21 +141,33 @@ If you want to combine multiple models, potentially with different weights for e
 demucs --repo ./release_models -n my_bag my_track.mp3
 ```
 
-You can also evaluate your bag of model with the following command:
+## Model evaluation
+
+You can evaluate any pre-trained model or bag of models using the following command:
+```bash
+python3 -m tools.test_pretrained -n NAME_OF_MODEL [EXTRA ARGS]
+```
+where `NAME_OF_MODEL` is either the name of the bag (e.g. `mdx`, `repro_mdx_a`),
+or a single Dora signature of one of the model of the bags. You can pass `EXTRA ARGS` to customize
+the test options, like the number of random shifts (e.g. `test.shifts=2`). This will compute the old-style
+SDR and can take quite  bit of time.
+
+For custom models that were trained locally, you will need to indicate that you wish
+to use the local model repositories, with the `--repo ./release_models` flag, e.g.,
 ```bash
 python3 -m tools.test_pretrained --repo ./release_models -n my_bag
 ```
 
 ## Model Zoo
 
 
- **About Wiener filtering**: It came to my attention that in fact none of the model were trained with Wiener filtering. In particular, using Wiener filtering at train time was too slow, while using it only at test time led to worse performance, as this would change the output of the spectrogram prediction without giving a chance to the waveform one to adapt. I will update the paper and code documentation to make that clear.
-
 Here is a short descriptions of the models used for the MDX submission, either Track A (MusDB HQ only)
 or Track B (extra training data allowed). Training happen in two stage, with the second stage
 being the fine tunining on the automix generated dataset.
 All the fine tuned models are available on our AWS repository
-(you can retrieve it with `demucs.pretrained.get_model(SIG)`).
+(you can retrieve it with `demucs.pretrained.get_model(SIG)`). The bag of models are available
+by doing `demucs.pretrained.get_model(NAME)` with `NAME` begin either `mdx` (for Track A) or `mdx_extra`
+(for Track B).
 
 ### Track A
 
@@ -207,5 +219,21 @@ Similarly you can do (those will contain a few extra lines, for training without
 dora grid mdx_extra --dry_run --init
 ```
 
+### Reproducibility and Ablation
+
+I updated the paper to report numbers with a more homogeneous setup than the one used for the competition.
+On MusDB HQ, I still need to use a combination of time only and hybrid models to achieve the best performance.
+The experiments are provided in the grids [repro.py](../demucs/grids/repro.py) and
+[repro_ft._py](../demucs/grids/repro_ft.py) for the fine tuning on the realistic mix datasets.
+
+The new bag of models reaches an SDR of 7.64 (vs. 7.68 for the original track A model). It uses
+2 time only models trained with residual branches, local attention and the SVD penalty,
+along with 2 hybrid models, with the same features, and using CaC representation.
+We average the performance of all the models with the same weight over all sources, unlike
+what was done for the original track A model. We trained for 600 epochs, against 360 before.
+
+The new bag of model is available as part of the pretrained model as `repro_mdx_a`.
+The time only bag is named `repro_mdx_a_time_only`, and the hybrid only `repro_mdx_a_hybrid_only`.
+Checkout the paper for more information on the training.
 
 [dora]: https://github.com/facebookresearch/dora