Neural speaker diarization with pyannote.audio

This is the development branch of upcoming pyannote.audio 2.0 for which it has been decided to rewrite almost everything from scratch. Highlights of this upcoming release will be:

a much smaller and cleaner codebase
Python-first API (the good old pyannote-audio CLI will still be available, though)
multi-GPU and TPU training thanks to pytorch-lightning
data augmentation with torch-audiomentations
huggingface model hosting
prodigy recipes for audio annotations
online demo based on streamlit

Installation

Until a proper release is available on PyPI, install from the develop branch:

pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

Windows users need to install PyTorch by themselves using the recommended commands (only torch and torchaudio are required) after the installation pyannote.audio.

pyannote.audio 101

For now, this is the closest you can get to an actual documentation.

Experimental protocol is reproducible thanks to pyannote.database. Here, we use the AMI "only_words" speaker diarization protocol.

from pyannote.database import get_protocol
ami = get_protocol('AMI.SpeakerDiarization.only_words')

Data augmentation is supported via torch-audiomentations.

from torch_audiomentations import Compose, ApplyImpulseResponse, AddBackgroundNoise
augmentation = Compose(transforms=[ApplyImpulseResponse(...),
                                   AddBackgroundNoise(...)])

A growing collection of tasks can be addressed. Here, we address speaker segmentation.

from pyannote.audio.tasks import Segmentation
seg = Segmentation(ami, augmentation=augmentation)

A growing collection of model architecture can be used. Here, we use the PyanNet (sincnet + LSTM) architecture.

from pyannote.audio.models.segmentation import PyanNet
model = PyanNet(task=seg)

We benefit from all the nice things that pytorch-lightning has to offer: distributed (GPU & TPU) training, model checkpointing, logging, etc. In this example, we don't really use any of this...

from pytorch_lightning import Trainer
trainer = Trainer()
trainer.fit(model)

Predictions are obtained by wrapping the model into the Inference engine.

from pyannote.audio import Inference
inference = Inference(model)
predictions = inference('audio.wav')

Pretrained models can be shared on Huggingface.co model hub. Here, we download and use a pretrained voice activity detection model.

inference = Inference('hbredin/VoiceActivityDetection-PyanNet-DIHARD')
predictions = inference('audio.wav')

Fine-tuning is as easy as setting the task attribute, freezing early layers and training. Here, we fine-tune on AMI dataset a voice activity detection model pretrained on DIHARD dataset.

from pyannote.audio import Model
model = Model.from_pretrained('hbredin/VoiceActivityDetection-PyanNet-DIHARD')
model.task = VoiceActivityDetection(ami)
model.freeze_up_to('sincnet')
trainer.fit(model)

Transfer learning is also supported out of the box. Here, we do transfer learning from voice activity detection to overlapped speech detection.

from pyannote.audio.tasks import OverlappedSpeechDetection
osd = OverlappedSpeechDetection(ami)
model.task = osd
trainer.fit(model)

Default optimizer (Adam with default parameters) is automatically set up for you. Customizing optimizer (and scheduler) requires overriding model.configure_optimizers method:

from types import MethodType
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR
def configure_optimizers(self):
    return {"optimizer": SGD(self.parameters()),
            "lr_scheduler": ExponentialLR(optimizer, 0.9)}
model.configure_optimizers = MethodType(configure_optimizers, model)
trainer.fit(model)

Contributing

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Testing

Tests rely on a set of debugging files available in test/data directory. Set PYANNOTE_DATABASE_CONFIG environment variable to test/data/database.yml before running tests:

PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest

Name		Name	Last commit message	Last commit date
Latest commit History 1,962 Commits
.github		.github
doc		doc
notebook		notebook
pyannote		pyannote
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
environment.yaml		environment.yaml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural speaker diarization with pyannote.audio

Installation

pyannote.audio 101

Contributing

Testing

About

Releases

Packages

Languages

License

clmpt/pyannote-audio

Folders and files

Latest commit

History

Repository files navigation

Neural speaker diarization with pyannote.audio

Installation

pyannote.audio 101

Contributing

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages