This is our final project for Introduction to Information Technology.
Our team:
Download at https://docs.conda.io/en/latest/miniconda.html
Download at https://code.visualstudio.com/Download/
pip install numpy
pip install matplotlib
pip install opencv-python
Download MNIST dataset at: http://yann.lecun.com/exdb/mnist/ and DO NOT UNZIP FILES.
The MNIST dataset contains 60,000 images used to recognise input numbers called train
, and 10,000 images used to check if the algorithm is good or bad, called test
. Every image has its label, respective to the number written in the image.
4 zips of MNIST dataset is in data
subfolder.
Run test_MNIST.py
file to make sure MNIST dataset is successfully installed and set up.
-
Step 1: Vectorize all the images of
train
dataset and theinput img
. -
Step 2: Find the distance between
input img
and each img intrain
. -
Step 3: Sort all the distances in increasing order.
-
Step 4: Choose
k
smallest value, called k nearest neighbours (KNN).k
can be 50, 100, 500, etc. You can choose any value for it. -
Step 5: Count and find in
k
labels which label has the largest frequency. That is the number this algorithm guess.
Run file main.py
.
Run by this cmd: python main.py
Use C++ code to increase speed.
Get the lib.hpp
and lib.cpp
files.
Run these command (I use GNU-GCC):
Or compile them by Visual Studio.
You now have a lib.so
file. Keep this file and main_optimze.py
file in same directory.
If you don't want to edit the library or you don't have a compiler, use mine instead of building by yourself.
Run main_optimize.py
file instead of main.py
file.
The only difference between these files is main.py
runs guess()
function in Python
, but in main_optimize.py
, the guess()
function calls the guess_optimize()
function written by C++ in lib.so
.