Various improvements by @CarlGao4. Support for segment
param inside of HTDemucs
model.
Adding hybrid transformer Demucs model.
Added support for Torchaudio implementation of HDemucs, thanks @skim0514.
Added experimental 6 sources model htdemucs_6s
(drums
, bass
, other
, vocals
, piano
, guitar
).
Option to customize output path of stems (@CarlGao4)
Fixed bug in pad1d leading to failure sometimes.
Added --segment
flag to customize the segment length and use less memory (thanks @CarlGao4).
Fix reflect padding bug on small inputs.
Compatible with pyTorch 1.12
Added option to split into two stems (i.e. vocals, vs. non vocals), thanks to @CarlGao4.
Added --float32
, --int24
and --clip-mode
options to customize how output stems are saved.
Fix bug in weights used for different sources. Thanks @keunwoochoi for the report and fix.
Improving drastically memory usage on GPU for long files. Thanks a lot @famzah for providing this.
Adding multithread evaluation on CPU (-j
option).
(v3.0.2 had a bug with the CPU pool and is skipped.)
Release of Demucs v3, featuring hybrid domain separation and much more. This drops support for Conv-Tasnet and training on the non HQ MusDB dataset. There is no version 3.0.0 because I messed up.
- Fix in Tasnet (PR #178)
- Use ffmpeg in priority when available instead of torchaudio to avoid small shift in MP3 data.
- other minor fixes
MusDB HQ support added. Custom wav dataset support added. Minor changes: issue with padding of mp3 and torchaudio reading, in order to limit that, Demucs now uses ffmpeg in priority and fallback to torchaudio. Replaced pre-trained demucs model with one trained on more recent codebase.
This is a big release, with at lof of breaking changes. You will likely need to install Demucs from scratch.
- Demucs now supports on the fly resampling by a factor of 2. This improves SDR almost 0.3 points.
- Random scaling of each source added (From Uhlich et al. 2017).
- Random pitch and tempo augmentation addded, from [Cohen-Hadria et al. 2019].
- With extra augmentation, the best performing Demucs model now has only 64 channels instead of 100, so model size goes from 2.4GB to 1GB. Also SDR is up from 5.6 SDR to 6.3 when trained only on MusDB.
- Quantized model using DiffQ has been added. Model size is 150MB, no loss in quality as far as I, or the metrics, can say.
- Pretrained models are now using the TorchHub interface.
- Overlap mode for separation, to limit inconsitencies at frame boundaries, with linear transition over the overlap. Overlap is currently at 25%. Not that this is only done for separation, not training, because I added that quite late to the code. For Conv-TasNet this can improve SDR quite a bit (+0.3 points, to 6.0).
- PyPI hosting, for separation, not training!