This repository contains solutions to Assignment 2 of the Neural Network and Transfer Learning course. The assignment is divided into two main exercises: implementing custom Dataset
and DataLoader
classes in PyTorch, and performing Principal Component Analysis (PCA) for image reconstruction using Eigen Decomposition (EVD) and Singular Value Decomposition (SVD).
This exercise focuses on creating custom PyTorch Dataset
and DataLoader
classes for two datasets. The implementation demonstrates:
-
Dataset1 - Tokenized SMS Messages:
- Tokenized text data (SMS messages labeled as spam or not spam).
- Processed to include padding for uniform input size.
- Binary labels (0 for not spam, 1 for spam).
- Implemented a custom
Dataset
class to preprocess the data and aDataLoader
to batch and shuffle it.
-
Dataset2 - Vector-borne Disease Data:
- Tabular data containing symptoms and prognoses.
- Labels encoded using one-hot encoding.
- Created a custom
Dataset
class to preprocess the tabular data, handle categorical encoding, and manage data splits.
This exercise demonstrates Principal Component Analysis (PCA) applied to a dataset of cat images. Key aspects include:
-
Computing the Mean Cat:
- Calculated the mean image by averaging pixel values across all samples.
- Visualized the result to understand the average characteristics of the dataset.
-
Eigencats - Principal Components of Images:
- Used Eigen Decomposition (EVD) to compute the principal components (eigencats).
- Visualized the top principal components as "eigenfaces" for cats.
-
Reconstructing Images:
- Reconstructed images using varying numbers of principal components.
- Demonstrated image fidelity improvement with additional components.
- Used Singular Value Decomposition (SVD) as an alternative method for PCA and compared its results to EVD.
- Python 3.6 or higher
- PyTorch
- Numpy
- Matplotlib
- Pandas
- Clone the repository:
git clone https://github.com/username/NNTI_Assignment2.git
- Navigate to the repository:
cd NNTI_Assignment2
- Install dependencies
- Dataset and Dataloader Implementation: Execute
exercise_2_2.py
to see the customDataset
andDataLoader
classes in action. - Eigencats Analysis: Open
exercise_2_3.ipynb
and run the notebook to see PCA-based image reconstruction.
- Dataset1: Successfully tokenized, padded, and batched SMS messages with their corresponding labels.
- Dataset2: Successfully preprocessed tabular data, applied one-hot encoding, and loaded data for analysis.
- Mean Cat: Visualized the average image of the dataset.
- Principal Components: Computed and visualized top eigencats using both EVD and SVD.
- Reconstructed Images:
- Visualized reconstructed images with varying numbers of principal components (e.g., 10, 40, and 80 components).
- Observed trade-offs between reconstruction quality and the number of components used.