Detect dog bark from CNN based spectrogram classification
Dataset:
Urbansound (training/testing set)
ESC-50 (testing set)
Freiburg 106 (training set, negative samples)
3rdparty libraries (use conan to figure them out automatically):
libsndfile
fftw3
opencv
darknet
portaudio
# To slice audio files into 2 secs duration clips, and store them into new .wav files
# Note that this will take up to 25 GB on your disk
python preprocessing.py --urbansound_dir [folder UrbanSound] --esc50_dir [folder ESC-50] --kitchen106_dir [folder building_106_kitchen/building_106_kitchen]
# Generate spectrograms
# Edit makefile, specify your darknet header/lib location
make
./create_spectrogram [folder UrbanSound] [folder ESC-50] [folder building_106_kitchen/building_106_kitchen]
Edit [Your darknet folder]/examples/classifier.c In function void train_classifier(...), change args.type from CLASSIFICATION_DATA to OLD_CLASSIFICATION_DATA, and rebuild darknet, since we don't want darknet augments input data for us.
Edit cfg/dogbark.data, specify your training and validation set list.
# Training
./darknet classifier train cfg/dogbark.data cfg/dogbark.cfg
# Testing
./darknet classifier valid cfg/dogbark.data cfg/dogbark.test.cfg backup/dogbark.backup
Test from file:
./classification_from_file [sound file] [win secs] [step secs] [export image height] [cfg file] [weights file]
# For example:
#./classification_from_file sound/dogtest.wav 2.8 0.1 200 cfg/dogbark.test.cfg weights/dogbark_32.weights
Test from microphone:
./classification_from_mic [win secs] [step secs] [export image height] [cfg file] [weights file]
# For example:
./classification_from_mic 2.5 0.02 200 cfg/dogbark.test.cfg weights/dogbark_32.weights