GitHub - vineetg3/multimodal-mnist-classifier

This project focuses on predicting the handwritten digits shown on an image and the corresponding audio of the digit. The objective is to develop a model capable of accurately recognizing digits from both image and audio inputs. The model is trained on MNIST dataset containing roughly 60000 images and audio recordings. To achieve this, a convolutional neural networks (CNN) fusion model with long short-term memory(LSTM) layers was implemented. This hybrid architecture integrates CNNs for image processing with LSTM networks for sequential data analysis. The model achieved a remarkable test accuracy of 0.99. This project leveraged multiple techniques to generate embedding of a common task, which shows the potential and importance of multi-modal approach.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Vineet_Gandham_HW5.ipynb		Vineet_Gandham_HW5.ipynb
Vineet_Gandham_HW5_report.pdf		Vineet_Gandham_HW5_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

vineetg3/multimodal-mnist-classifier

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages