This project focuses on predicting the handwritten digits shown on an image and the corresponding audio of the digit. The objective is to develop a model capable of accurately recognizing digits from both image and audio inputs. The model is trained on MNIST dataset containing roughly 60000 images and audio recordings. To achieve this, a convolutional neural networks (CNN) fusion model with long short-term memory(LSTM) layers was implemented. This hybrid architecture integrates CNNs for image processing with LSTM networks for sequential data analysis. The model achieved a remarkable test accuracy of 0.99. This project leveraged multiple techniques to generate embedding of a common task, which shows the potential and importance of multi-modal approach.
-
Notifications
You must be signed in to change notification settings - Fork 0
vineetg3/multimodal-mnist-classifier
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published