Demuxr (http://demuxr.com) is a machine learning app that splits an audio track into its constituent stems. Stems are the individual instrument tracks in a song. Demuxr uses the open source model for music separation Demucs from Facebook AI.
Demuxr is for anyone who wishes they had a karaoke version of their favorite tracks, not limited to voice karaoke! I use Demuxr to train my bass ear, isolate tricky riffs, and karaoke-jam on my bass and guitar. My friend replaces vocals with his own. His friend listens to isolated drum tracks on loop. Demuxr is for anyone who wants to play around with the music they listen to.
- Find your song on youtube
- Head to http://demuxr.com
- Paste the URL in the box
- Wait for the model to split the track
- Adjust the volume for each stem, and seek on the original track
- Do your thing
Music separation is trivial... only if you have the original multitrack studio recordings. Even sophisticated (and expensive) software struggles to cleanly isolate a track into its stems.
Enter AI. Demucs is a model from Facebook AI researchers that has state-of-the-art performance in music splitting. What the model does is detect patterns in sound that correspond to different instruments. This same kind of technology is what Zoom uses to mute out your colleagues' applause. Read more about Demucs, or play around with the model on their Colab notebook.
Before deploying it on to Demuxr, I made a few changes that result in the model running faster; this optimization is a work-in-progress. Hit me up if you'd like to know more.
There's definitely room to improve, and any contribution - issues, bugs, or code - are welcome! Please reach out or open an issue.
The model takes - under ideal conditions - a minute or so to split a 6-minute song. If there's a lot of people running Demuxr at the same time, you're in a queue and that can take a while (depends on how long the queue is). That is not the experience I'd like you to have though, so open an issue while you're waiting (thx).
The model is trained on the MusDB dataset that consists of 150 tracks along with their isolated bass, drums, vocals and accompaniment stems. Most of the songs in this dataset are from the Pop/Rock genre. Under/Un-represented genres will be more challenging for the model to demux, and it will resort to what it interprets as various instruments. For example, you might clearly hear the sax and other wind instruments in the demuxed Vocal stem.