PCA on MNIST Dataset

This project demonstrates the use of Principal Component Analysis (PCA) to analyze and reduce dimensionality in the MNIST dataset using PyTorch. The notebook includes data preprocessing, eigendecomposition, variance analysis, and reconstruction error calculations.

Concepts Implemented

Data Loading and Preprocessing:
- The MNIST dataset is loaded using PyTorch’s torchvision.datasets.
- The images are normalized and reshaped to be compatible with PCA. Each image is flattened from a 28x28 grid to a 784-dimensional vector.
Eigendecomposition of the Covariance Matrix:
- The PCA method begins by calculating the covariance matrix of the data.
- Eigendecomposition is applied to extract eigenvalues and eigenvectors of the covariance matrix:
  - Eigenvalues represent the variance explained by each principal component.
  - Eigenvectors define the directions of these components.
- Cumulative variance explained by principal components is calculated to analyze how much variance is retained with a subset of components.
Variance Analysis:
- Plots the cumulative variance explained as a function of the number of principal components.
- Helps in choosing the number of components needed to retain a specified amount of variance, which is critical for dimensionality reduction.
Reconstruction and Error Calculation:
- Using a subset of the top principal components, the original images are reconstructed.
- Calculates the PCA reconstruction error to quantify how much information is lost when using a limited number of components.
- The reconstruction error is computed as the normalized Frobenius norm of the difference between the original and reconstructed data.

Usage Example

The following steps outline how to load the dataset, perform PCA, and analyze results:

import torch
from torchvision import datasets, transforms
from src.eigen_decomposition import eigen_decomp
from src.reconstruction import pca_reconstruction_error

# Load and preprocess the MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
mnist_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
data = mnist_data.data[:2000].float() / 255.0
X = data.view(data.size(0), -1)  # Flatten images to 784 dimensions

# Perform PCA to get eigenvalues, eigenvectors, and cumulative variance
eigvals, eigvecs, cumulative_variance = eigen_decomp(X)

# Calculate reconstruction error using the top p components
error, reconstructed_data = pca_reconstruction_error(X, p=50, eigvecs=eigvecs)
print(f"Reconstruction error with 50 components: {error}")

Requirements Python 3.x PyTorch Matplotlib (for plotting) torchvision (for dataset loading)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
assignment3_4.ipynb		assignment3_4.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCA on MNIST Dataset

Concepts Implemented

Usage Example

About

Releases

Packages

Languages

nimadindar/NNTI_assignment3

Folders and files

Latest commit

History

Repository files navigation

PCA on MNIST Dataset

Concepts Implemented

Usage Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages