The repository contains the code and report for the first assignment of the course CSL7770 - Speech Understanding.
Directory Structure
.
├── question1_report.pdf
├── question1_ppt.pdf
├── question2_report.pdf
├── question2
│ ├── TaskA
│ └── TaskB
└── README.md
How to use this repository
First, clone the repository to your local machine and navigate to the repository using the following commands:
git clone https://github.com/Shubh-Goyal-07/CSL7770_Assgn1.git
cd CSL7770_Assgn1
The report for Question 1 is present in the question1_report.pdf
file
The presentation for Question 1 is present in the question1_ppt.pdf
file
The report for Question 2 is present in the question2_report.pdf
file
Please follow the below instructions to run the code for Task A:
-
Navigate to the Task A directory
cd question2/TaskA
This directory contains all the code files required to run Task A. The structure of the directory is as follows:
. ├── logs ├── models ├── train_data ├── utils ├── processing.py ├── siamese.py ├── train.py ├── requirements.txt └── plots.ipynb
The directory named
logs
currently contains the log files generated during the execution of the code. If you re-run the code, the log files will be overwritten.The directory named
models
currently contains the trained models. If you re-run the training code, the models will be overwritten.The directory named
train_data
contains the embeddings (extracted features from spectrograms).The directory named
utils
contains the utility functions used in the code.The
plots.ipynb
file contains the code to generate the plots for the report. -
Download and extract the dataset
wget https://goo.gl/8hY5ER tar -zxvf 8hY5ER
You must see a folder named
UrbanSound8K
in the current directory after running the above commands. -
Prepare the environment
conda create -n csl7770 -y conda activate csl7770 conda install --file requirements.txt -y -c conda-forge -c nvidia -c pytorch
The above command will create a new conda environment named
csl7770
and install all the required packages in it. It will take a few minutes to complete. -
Generate spectrograms
Run the file
processing.py
to generate the spectrograms for the audio files. The generated spectrograms will be saved in thetrain_data/spectrograms/{window_name}
directory.This will also generate a log file named
processing_{window_name}.log
in thelogs
directory.python processing.py --window window_name --n window_size --overlap overlap_size
The window_name can be one of the following:
hann
,hamming
,rectangular
. To generate spectrogram for all windows at once, keep the window_name asall
.The code will also assess the window correctness using rmse and the results would be visible in the log file.
-
Extracting features from spectrograms
To extract features from the generated spectrograms so as to train neural network and classical machine learning models, run the command given below.
This will train a
resnet18
backbone siamese network to extract features from the spectrograms. The extracted features will be saved in thetrain_data/embeddings/{window_name}_embeddings.csv
file.python siamese.py --window window_name --epochs num_epochs
The window_name can be one of the following:
hann
,hamming
,rectangular
. To generate features for all windows at once, keep the window_name asall
.The log file generated will be named
siamese_{window_name}.log
. Please use it to check the training progress.The trained siamese model will be saved as
models/siamese/model_{window_name}.pth
file. -
Training classifiers
To train classifiers on the generated data, run the command given below.
python train.py --window window_name --model model_name --epochs num_epochs
The window_name can be one of the following:
hann
,hamming
,rectangular
.The models can be one of the following:
svm
,rf
,knn
,dt
,nn
,cnn
.The log file generated will be named
train_{model_name}_{window_name}.log
. Please use it to check the training progress.The trained model will be saved as
models/{model_name}/{window_name}.pth/pkl
file.Note:
- The argument
--epochs
is useful only when usingnn
orcnn
models. It will be ignored for other models. - To train the
cnn
model, the spectrograms must be generated first using theprocessing.py
script. Thecnn
model will use the spectrograms as training data. - The
nn
,svm
,rf
,knn
,dt
models will use the extracted features from the siamese network as training data. So, the siamese network must be trained first using thesiamese.py
script. svm
- Support Vector Machine,rf
- Random Forest,knn
- K-Nearest Neighbors,dt
- Decision Tree,nn
- Neural Network,cnn
- Convolutional Neural Network.
- The argument
The code for task B is present in the TaskB
directory.
cd question2/TaskB
The directory structure is as follows:
.
├── songs_data
| ├── ghoomar_indian_rajasthani_folk.mp3
| ├── love_the_way_u_lie_eminem_rap.mp3
| ├── perfect_ed_pop.mp3
| ├── sapna_jahan_bollywood.mp3
├── task_b.ipynb
The songs_data
directory contains the songs used for the task.
The task_b.ipynb
file contains the code for the task. It can be executed using the same environment created in Task A.
It contains the spectrograms of the four chosen songs.
The songs used are:
- Ghoomar - Indian Folk Song
- Love The Way You Lie - Rap Song
- Perfect - Pop Song
- Sapna Jahan - Bollywood Song