Skip to content

Latest commit

 

History

History

frechet_audio_distance

Frechet Audio Distance

This repository provides supporting code used to compute the Fréchet Audio Distance (FAD), a reference-free evaluation metric for audio generation algorithms, in particular music enhancement.

For more details about Fréchet Audio Distance and how we verified it please check out our paper:

Useage

FAD depends on:

and also requires downloading a VGG model checkpoint file:

Example installation and use

Get the FAD code

$ git clone https://github.com/google-research/google-research.git
$ cd google-research

Install dependencies

Create a virtualenv to isolate from everything else and activate it first.

# Python 2
$ virtualenv fad
# or Oython 3
$ python3 -m venv fad # (apache-beam does not yet support Python 3)
# activate the virtualenv
$ source fad/bin/activate
# Upgrade pip
$ python -m pip install --upgrade pip
# Install dependences
$ pip install apache-beam numpy scipy tensorflow

Clone TensorFlow models repo into a 'models' directory.

$ mkdir tensorflow_models
$ touch tensorflow_models/__init__.py
$ svn export https://github.com/tensorflow/models/trunk/research/audioset tensorflow_models/audioset/
$ touch tensorflow_models/audioset/__init__.py

Download data files into a data directory

$ mkdir -p data
$ curl -o data/vggish_model.ckpt https://storage.googleapis.com/audioset/vggish_model.ckpt

Create test files and file lists

This will generate a set of background test files (sine waves at different frequencies). And two test sets of sine waves with distortions.

$ python -m frechet_audio_distance.gen_test_files --test_files "test_audio"

#Add them to file lists:
$ ls --color=never test_audio/background/*  > test_audio/test_files_background.cvs
$ ls --color=never test_audio/test1/*  > test_audio/test_files_test1.cvs
$ ls --color=never test_audio/test2/*  > test_audio/test_files_test2.cvs

Compute embeddings and eastimate multivariate Gaussians

$ mkdir -p stats
$ python -m frechet_audio_distance.create_embeddings_main --input_files test_audio/test_files_background.cvs --stats stats/background_stats
$ python -m frechet_audio_distance.create_embeddings_main --input_files test_audio/test_files_test1.cvs --stats stats/test1_stats
$ python -m frechet_audio_distance.create_embeddings_main --input_files test_audio/test_files_test2.cvs --stats stats/test2_stats

Compute the FAD from the stats

$ python -m frechet_audio_distance.compute_fad --background_stats stats/background_stats --test_stats stats/test1_stats
$ python -m frechet_audio_distance.compute_fad --background_stats stats/background_stats --test_stats stats/test2_stats