CSL7770 Assignment 1 - Shubh Goyal (B21CS073)

The repository contains the code and report for the first assignment of the course CSL7770 - Speech Understanding.

Directory Structure

.
├── question1_report.pdf
├── question1_ppt.pdf
├── question2_report.pdf
├── question2
│   ├── TaskA
│   └── TaskB
└── README.md

How to use this repository

First, clone the repository to your local machine and navigate to the repository using the following commands:

git clone https://github.com/Shubh-Goyal-07/CSL7770_Assgn1.git
cd CSL7770_Assgn1

Question 1

The report for Question 1 is present in the question1_report.pdf file

The presentation for Question 1 is present in the question1_ppt.pdf file

Question 2

The report for Question 2 is present in the question2_report.pdf file

Task A

Please follow the below instructions to run the code for Task A:

Instructions to replicate the results

Navigate to the Task A directory
```
cd question2/TaskA
```
This directory contains all the code files required to run Task A. The structure of the directory is as follows:
```
 .
 ├── logs
 ├── models
 ├── train_data
 ├── utils
 ├── processing.py
 ├── siamese.py
 ├── train.py
 ├── requirements.txt
 └── plots.ipynb
```
The directory named logs currently contains the log files generated during the execution of the code. If you re-run the code, the log files will be overwritten.

The directory named models currently contains the trained models. If you re-run the training code, the models will be overwritten.

The directory named train_data contains the embeddings (extracted features from spectrograms).

The directory named utils contains the utility functions used in the code.

The plots.ipynb file contains the code to generate the plots for the report.
Download and extract the dataset
```
wget https://goo.gl/8hY5ER
tar -zxvf 8hY5ER
```
You must see a folder named UrbanSound8K in the current directory after running the above commands.
Prepare the environment
```
conda create -n csl7770 -y
conda activate csl7770
conda install --file requirements.txt -y -c conda-forge -c nvidia -c pytorch
```
The above command will create a new conda environment named csl7770 and install all the required packages in it. It will take a few minutes to complete.
Generate spectrograms

Run the file processing.py to generate the spectrograms for the audio files. The generated spectrograms will be saved in the train_data/spectrograms/{window_name} directory.

This will also generate a log file named processing_{window_name}.log in the logs directory.
```
python processing.py --window window_name --n window_size --overlap overlap_size
```
The window_name can be one of the following: hann, hamming, rectangular. To generate spectrogram for all windows at once, keep the window_name as all.

The code will also assess the window correctness using rmse and the results would be visible in the log file.
Extracting features from spectrograms

To extract features from the generated spectrograms so as to train neural network and classical machine learning models, run the command given below.

This will train a resnet18 backbone siamese network to extract features from the spectrograms. The extracted features will be saved in the train_data/embeddings/{window_name}_embeddings.csv file.
```
python siamese.py --window window_name --epochs num_epochs
```
The window_name can be one of the following: hann, hamming, rectangular. To generate features for all windows at once, keep the window_name as all.

The log file generated will be named siamese_{window_name}.log. Please use it to check the training progress.

The trained siamese model will be saved as models/siamese/model_{window_name}.pth file.
Training classifiers

To train classifiers on the generated data, run the command given below.
```
python train.py --window window_name --model model_name --epochs num_epochs
```
The window_name can be one of the following: hann, hamming, rectangular.

The models can be one of the following: svm, rf, knn, dt, nn, cnn.

The log file generated will be named train_{model_name}_{window_name}.log. Please use it to check the training progress.

The trained model will be saved as models/{model_name}/{window_name}.pth/pkl file.

Note:
- The argument --epochs is useful only when using nn or cnn models. It will be ignored for other models.
- To train the cnn model, the spectrograms must be generated first using the processing.py script. The cnn model will use the spectrograms as training data.
- The nn, svm, rf, knn, dt models will use the extracted features from the siamese network as training data. So, the siamese network must be trained first using the siamese.py script.
- svm - Support Vector Machine, rf - Random Forest, knn - K-Nearest Neighbors, dt - Decision Tree, nn - Neural Network, cnn - Convolutional Neural Network.

Task B

The code for task B is present in the TaskB directory.

cd question2/TaskB

The directory structure is as follows:

.
├── songs_data
|   ├── ghoomar_indian_rajasthani_folk.mp3
|   ├── love_the_way_u_lie_eminem_rap.mp3
|   ├── perfect_ed_pop.mp3
|   ├── sapna_jahan_bollywood.mp3
├── task_b.ipynb

The songs_data directory contains the songs used for the task.

The task_b.ipynb file contains the code for the task. It can be executed using the same environment created in Task A. It contains the spectrograms of the four chosen songs.

The songs used are:

Ghoomar - Indian Folk Song
Love The Way You Lie - Rap Song
Perfect - Pop Song
Sapna Jahan - Bollywood Song

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSL7770 Assignment 1 - Shubh Goyal (B21CS073)

Question 1

Question 2

Task A

Instructions to replicate the results

Task B

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
question2		question2
.gitignore		.gitignore
README.md		README.md
question1_ppt.pdf		question1_ppt.pdf
question1_report.pdf		question1_report.pdf
question2_report.pdf		question2_report.pdf

Shubh-Goyal-07/CSL7770_Assgn1

Folders and files

Latest commit

History

Repository files navigation

CSL7770 Assignment 1 - Shubh Goyal (B21CS073)

Question 1

Question 2

Task A

Instructions to replicate the results

Task B

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages