Skip to content

Commit

Permalink
New tokenization workflow, fixes in time signature (Natooz#66)
Browse files Browse the repository at this point in the history
* option to delete equal successive tempo / time sig changes, black formatting

* fixes in tokenizations when encoding / decoding Tempo messages, passing pytest with xdist

* HUGE: CPWord and Octuple now adopting common workflow, OctupleMono removed, CPWord handling Time Signature, fixes in tempo and time sig decoding for MIDILike & REMI & TSD when one_token_stream is False, common TIME_SIGNATURE_RANGE set in constants, fixes in tests than now also test all time signature and tempo changes

* fix in tests, doc update (additional tokens table)

* black code formatting

* fix in MuMIDI init + removed from data augment test (not compatible)

* fix in MMM init when no config object is given
  • Loading branch information
Natooz authored Aug 17, 2023
1 parent 114d253 commit 3f33a12
Show file tree
Hide file tree
Showing 24 changed files with 816 additions and 1,037 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools flake8 pytest coverage torch tensorflow
pip install setuptools flake8 pytest-xdist[psutil] coverage torch tensorflow
pip install -r requirements.txt
- name: Lint with flake8
run: |
Expand All @@ -35,6 +35,6 @@ jobs:
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
coverage run -m pytest
coverage run -m pytest -n auto
- name: Codecov
uses: codecov/[email protected]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Python package to tokenize MIDI music files, presented at the ISMIR 2021 LBD.
[![GitHub CI](https://github.com/Natooz/MidiTok/actions/workflows/pytest.yml/badge.svg)](https://github.com/Natooz/MidiTok/actions/workflows/pytest.yml)
[![Codecov](https://img.shields.io/codecov/c/github/Natooz/MidiTok)](https://codecov.io/gh/Natooz/MidiTok)
[![GitHub license](https://img.shields.io/github/license/Natooz/MidiTok.svg)](https://github.com/Natooz/MidiTok/blob/main/LICENSE)
[![Downloads](https://pepy.tech/badge/MidiTok)](https://pepy.tech/project/MidiTok)
[![Downloads](https://static.pepy.tech/badge/miditok)](https://pepy.tech/project/MidiTok)
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Using Deep Learning with symbolic music ? MidiTok can take care of converting (tokenizing) your MIDI files into tokens, ready to be fed to models such as Transformer, for any generation, transcription or MIR task.
Expand Down
9 changes: 9 additions & 0 deletions docs/additional_tokens_table.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Tokenization,Tempo,Time signature,Chord,Rest
MIDILike,✅,✅,✅,✅
REMI,✅,✅,✅,✅
TSD,✅,✅,✅,✅
Structured,❌,❌,❌,❌
CPWord,✅,✅,✅,✅
Octuple,✅,✅,❌,❌
MuMIDI,✅,❌,✅,❌
MMM,✅,✅,✅,❌
70 changes: 4 additions & 66 deletions docs/midi_tokenizer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,72 +62,10 @@ Additional tokens
MidiTok offers to include additional tokens on music information. You can specify them in the ``tokenizer_config`` argument (:class:`miditok.TokenizerConfig`) when creating a tokenizer. The :class:`miditok.TokenizerConfig` documentations specifically details the role of each of them, and their associated parameters.
Cells with ❕ markers means the additional token is implemented by default and not optionnal.

.. list-table:: Compatibility table of tokenizations and additional tokens.
.. csv-table:: Compatibility table of tokenizations and additional tokens.
:file: additional_tokens_table.csv
:header-rows: 1

* - Token type
- :ref:`REMI`
- :ref:`REMIPlus`
- :ref:`MIDI-Like`
- :ref:`TSD`
- :ref:`Structured`
- :ref:`CPWord`
- :ref:`Octuple`
- :ref:`MuMIDI`
- :ref:`MMM`
* - Chord
- ✅
- ✅
- ✅
- ✅
- ✅
- ❌
- ❌
- ✅
- ✅
* - Rest
- ✅
- ✅
- ✅
- ✅
- ✅
- ❌
- ❌
- ❌
- ❌
* - Tempo
- ✅
- ✅
- ✅
- ✅
- ✅
- ❌
- ✅
- ✅
- ✅
* - Program
- ✅¹
- ✅¹
- ✅¹
- ✅¹
- ✅¹
- ✅²
- ✅❕
- ✅❕
- ✅❕
* - Time signature
- ✅
- ✅
- ✅
- ✅
- ❌
- ❌
- ✅
- ❌
- ✅

**¹** the tokenizer will add `Program` tokens before each `Pitch` / `NoteOn` token, and will treat all the tracks of a MIDI as a single sequence of tokens.
**²** unimplemented, the tokenizer's vocabulary will contain the `Program` tokens, but it will not use it.

Special tokens
------------------------
Expand All @@ -148,7 +86,7 @@ Tokens & TokSequence input / output format

Depending on the tokenizer at use, the **format** of the tokens returned by the ``midi_to_tokens`` method may vary, as well as the expected format for the ``tokens_to_midi`` method. The format is given by the ``tokenizer.io_format` property. For any tokenizer, the format is the same for both methods.
The format is deduced from the ``is_multi_voc`` and ``one_token_stream`` tokenizer properties. In short: **one_token_stream** being True means that the tokenizer will convert a MIDI file into a single stream of tokens for all instrument tracks, otherwise it will convert each track to a distinct token stream; **is_mult_voc** being True means that each token stream is a list of lists of tokens, of shape ``(T,C)`` for T time steps and C subtokens per time step.
The format is deduced from the ``is_multi_voc`` and ``one_token_stream`` tokenizer properties. **one_token_stream** being True means that the tokenizer will convert a MIDI file into a single stream of tokens for all instrument tracks, otherwise it will convert each track to a distinct token sequence. **is_mult_voc** being True means that each token stream is a list of lists of tokens, of shape ``(T,C)`` for T time steps and C subtokens per time step.

This results in four situations, where I is the number of tracks, T is the number of tokens (or time steps) and C the number of subtokens per time step:

Expand All @@ -163,7 +101,7 @@ Some tokenizer examples to illustrate:

* **TSD** without ``config.use_programs`` will not have multiple vocabularies and will treat each MIDI track as a unique stream of tokens, hence it will convert MIDI files to a list of ``TokSequence`` objects, ``(I,T)`` format.
* **TSD** with ``config.use_programs`` being True will convert all MIDI tracks to a single stream of tokens, hence one ``TokSequence`` object, ``(T)`` format.
* **CPWord** is a multi-voc tokenizer and treats each MIDI track as a distinct stream of tokens, hence it will convert MIDI files to a list of ``TokSequence`` objects with ``(I,T,C)`` format.
* **CPWord** is a multi-voc tokenizer, without ``config.use_programs`` it will treat each MIDI track as a distinct stream of tokens, hence it will convert MIDI files to a list of ``TokSequence`` objects with the ``(I,T,C)`` format.
* **Octuple** is a multi-voc tokenizer and converts all MIDI track to a single stream of tokens, hence it will convert MIDI files to a ``TokSequence`` object, ``(T,C)`` format.


Expand Down
7 changes: 0 additions & 7 deletions docs/tokenizations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,6 @@ Octuple
:noindex:
:show-inheritance:

Octuple Mono
------------------------

.. autoclass:: miditok.OctupleMono
:noindex:
:show-inheritance:

MuMIDI
------------------------

Expand Down
2 changes: 0 additions & 2 deletions miditok/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
TSD,
Structured,
Octuple,
OctupleMono,
CPWord,
MuMIDI,
MMM,
Expand Down Expand Up @@ -44,7 +43,6 @@ def _tweak_config_before_creating_voc(self):
"TSD",
"Structured",
"Octuple",
"OctupleMono",
"CPWord",
"MuMIDI",
"MMM",
Expand Down
39 changes: 35 additions & 4 deletions miditok/classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@
NB_TEMPOS,
TEMPO_RANGE,
LOG_TEMPOS,
DELETE_EQUAL_SUCCESSIVE_TEMPO_CHANGES,
TIME_SIGNATURE_RANGE,
DELETE_EQUAL_SUCCESSIVE_TIME_SIG_CHANGES,
PROGRAMS,
CURRENT_VERSION_PACKAGE,
)
Expand Down Expand Up @@ -164,8 +166,8 @@ class TokenizerConfig:
add more TimeSignatureChange objects. (default: False)
:param use_programs: will use ``Program`` tokens, if the tokenizer is compatible.
Used to specify an instrument / MIDI program. The :ref:`Octuple`, :ref:`MMM` and :ref:`MuMIDI` tokenizers
use natively `Program` tokens, this option is always enabled. :ref:`TSD`, :ref:`REMI`, :ref:`REMIPlus`,
:ref:`MIDILike` and :ref:`Structured` will add `Program` tokens before each `Pitch` / `NoteOn` token to
use natively `Program` tokens, this option is always enabled. :ref:`TSD`, :ref:`REMI`, :ref:`MIDILike`,
:ref:`Structured` and :ref:`CPWord` will add `Program` tokens before each `Pitch` / `NoteOn` token to
indicate its associated instrument and will treat all the tracks of a MIDI as a single sequence of tokens.
:ref:`CPWord`, :ref:`Octuple` and :ref:`MuMIDI` add a `Program` tokens with the stacks of `Pitch`,
`Velocity` and `Duration` tokens. (default: False)
Expand All @@ -183,8 +185,25 @@ class TokenizerConfig:
:param nb_tempos: number of tempos "bins" to use. (default: 32)
:param tempo_range: range of minimum and maximum tempos within which the bins fall. (default: (40, 250))
:param log_tempos: will use log scaled tempo values instead of linearly scaled. (default: False)
:param delete_equal_successive_tempo_changes: setting this option True will delete identical successive tempo
changes when preprocessing a MIDI file after loading it. For examples, if a MIDI has two tempo changes
for tempo 120 at tick 1000 and the next one is for tempo 121 at tick 1200, during preprocessing the tempo
values are likely to be downsampled and become identical (120 or 121). If that's the case, the second
tempo change will be deleted and not tokenized. This parameter doesn't apply for tokenizations that natively
inject the tempo information at recurrent timings (e.g. Octuple). For others, note that setting it True
might reduce the number of `Tempo` tokens and in turn the recurrence of this information. Leave it False if
you want to have recurrent `Tempo` tokens, that you might inject yourself by adding `TempoChange` objects to
your MIDIs. (default: False)
:param time_signature_range: range as a dictionary {denom_i: [num_i1, ..., num_in] / (min_num_i, max_num_i)}.
(default: {4: [4]})
:param delete_equal_successive_time_sig_changes: setting this option True will delete identical successive time
signature changes when preprocessing a MIDI file after loading it. For examples, if a MIDI has two time
signature changes for 4/4 at tick 1000 and the next one is also 4/4 at tick 1200, the second time signature
change will be deleted and not tokenized. This parameter doesn't apply for tokenizations that natively
inject the time signature information at recurrent timings (e.g. Octuple). For others, note that setting it
True might reduce the number of `TimeSig` tokens and in turn the recurrence of this information. Leave it
False if you want to have recurrent `TimeSig` tokens, that you might inject yourself by adding
`TimeSignatureChange` objects to your MIDIs. (default: False)
:param programs: sequence of MIDI programs to use. Note that `-1` is used and reserved for drums tracks.
(default: from -1 to 127 included)
:param **kwargs: additional parameters that will be saved in `config.additional_params`.
Expand All @@ -208,7 +227,11 @@ def __init__(
nb_tempos: int = NB_TEMPOS,
tempo_range: Tuple[int, int] = TEMPO_RANGE,
log_tempos: bool = LOG_TEMPOS,
time_signature_range: Dict[int, Union[List[int], Tuple[int, int]]] = TIME_SIGNATURE_RANGE,
delete_equal_successive_tempo_changes: bool = DELETE_EQUAL_SUCCESSIVE_TEMPO_CHANGES,
time_signature_range: Dict[
int, Union[List[int], Tuple[int, int]]
] = TIME_SIGNATURE_RANGE,
delete_equal_successive_time_sig_changes: bool = DELETE_EQUAL_SUCCESSIVE_TIME_SIG_CHANGES,
programs: Sequence[int] = PROGRAMS,
**kwargs,
):
Expand Down Expand Up @@ -239,12 +262,20 @@ def __init__(
self.nb_tempos: int = nb_tempos # nb of tempo bins for additional tempo tokens, quantized like velocities
self.tempo_range: Tuple[int, int] = tempo_range # (min_tempo, max_tempo)
self.log_tempos: bool = log_tempos
self.delete_equal_successive_tempo_changes = (
delete_equal_successive_tempo_changes
)

# Time signature params
self.time_signature_range: Dict[int, List[int]] = {
beat_res: list(range(beats[0], beats[1] + 1)) if isinstance(beats, tuple) else beats
beat_res: list(range(beats[0], beats[1] + 1))
if isinstance(beats, tuple)
else beats
for beat_res, beats in time_signature_range.items()
}
self.delete_equal_successive_time_sig_changes = (
delete_equal_successive_time_sig_changes
)

# Programs
self.programs: Sequence[int] = programs
Expand Down
5 changes: 4 additions & 1 deletion miditok/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@
NB_TEMPOS = 32
TEMPO_RANGE = (40, 250) # (min_tempo, max_tempo)
LOG_TEMPOS = False # log or linear scale tempos
DELETE_EQUAL_SUCCESSIVE_TEMPO_CHANGES = False

# Time signature params
TIME_SIGNATURE_RANGE = {4: [4]} # {denom_i: [num_i1, ..., num_in] / (min_num_i, max_num_i)}
# {denom_i: [num_i1, ..., num_in] / (min_num_i, max_num_i)}
TIME_SIGNATURE_RANGE = {8: [3, 12, 6], 4: [5, 6, 3, 2, 1, 4]}

# Programs
PROGRAMS = list(range(-1, 128))
Expand All @@ -80,6 +82,7 @@
TIME_DIVISION = 384 # 384 and 480 are convenient as divisible by 4, 8, 12, 16, 24, 32
TEMPO = 120
TIME_SIGNATURE = (4, 4)
DELETE_EQUAL_SUCCESSIVE_TIME_SIG_CHANGES = False

# Used with chords
PITCH_CLASSES = [
Expand Down
Loading

0 comments on commit 3f33a12

Please sign in to comment.