The training dataset is synthesized using multiple sources.
- Chords and melodies are obtained from Hooktheory
- Lyrics are obtained by scraping Google or Musixmatch
- Audio features are obtained using the Spotify API
To build the training set and add lyrics and audio features:
- Download the Hooktheory dataset from this repo and copy the
event
folder into this directory, renaming ithooktheory
. - Register application at the Spotify Developer Dashboard.
- Write client id into the file
spotify_client_id
and secret intospotify_client_secret
. - Set
add_lyrics
(true/false),add_spotify
(true/false) andlyrics_provider
(google/musixmatch) insideprepocessor.py
. - Run
python prepocessor.py
. - The dataset will be built into the folder
processed
.
You don't need lyrics if you are running Lofi2Lofi only. This will create a larger dataset, as tracks with no lyrics will get discarded if add_lyrics
is true