Skip to content

vineetg3/multimodal-mnist-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

This project focuses on predicting the handwritten digits shown on an image and the corresponding audio of the digit. The objective is to develop a model capable of accurately recognizing digits from both image and audio inputs. The model is trained on MNIST dataset containing roughly 60000 images and audio recordings. To achieve this, a convolutional neural networks (CNN) fusion model with long short-term memory(LSTM) layers was implemented. This hybrid architecture integrates CNNs for image processing with LSTM networks for sequential data analysis. The model achieved a remarkable test accuracy of 0.99. This project leveraged multiple techniques to generate embedding of a common task, which shows the potential and importance of multi-modal approach.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published