Skip to content

sirajsandhu/ACA2017

Repository files navigation

ACA2017 PROJECT REPORT

This repository contains the code completed, studied and implemented as part of the semester project under the Association of Computing Activities, IIT Kanpur in 2016-17.

Phase 1. COURSERA : Machine Learning by Andrew Ng ( https://www.coursera.org/learn/machine-learning )

Studied the following :

  • Linear and Logistic Regression using Gradient Descent
  • Feed Forward Neural Networks
  • Support Vector Machines (SVM)
  • K-Means Clustering and Principal Component Analysis (PCA)
  • Collaborative Filtering
  • Stochastic Gradient Descent, Map Reduce and Online Learning

Contains my notes and solutions to the assignment problems along with some references I found helpful.

Phase 2. LEARNING PYTHON :

The Python Tutorial : https://docs.python.org/2/tutorial/
Learnt about basic python syntax, data structures, functions, numpy.

UDACITY Programming Foundations with Python
( https://in.udacity.com/course/programming-foundations-with-python--ud036 )
Learnt about modular programming, objects and classes, libraries. Implemented Movie Database of watched movies with synopsis, YouTube trailer and my ratings on HTML file.

Phase 3. FURTHER LEARNING :

Decision Tree : http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
Studied homogeneity criteria such as gini index, chi-squared score, log loss score; node split threshold score; maximum depth and node termination.
Implemented classifier (in Python) to check banknote authenticity based on numerical feature values.

Naive Bayes Classifier : http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/
Studied Baye's Theorem and Gaussian Distribution.
Implemented naive gaussian classifier (in Python) to diagnose diabetes using medical vitals. Extended to use logarithm probabilities and multinomial distribution.

Scikit-Learn : http://scikit-learn.org/stable/tutorial/index.html
Completed tutorials on logistic regressor, naive bayes classifier, random forest classifier implementations.

Text Summarization :
Studied Tf-Idf, Word2Vec for basic Natural Language Processing (NLP).
Implemented complete RSS newsfeed summarizer using NLTK for stopword corpus and sentence tokenization, Gensim for Word2Vec implementation and Networkx for page rank algorithm.

Phase 4. NEURAL NETWORKS AND DEEP LEARNING, by MICHAEL NIELSEN
( http://neuralnetworksanddeeplearning.com/ )

In-depth study of Feed Forward Neural Networks from online booksite and implementation of hand-written digit classifier in Python with MNIST dataset. Introductory reading on Convolutional Neural Networks (CNN).

Phase 5. KAGGLE :
Titanic Dataset : Completed introductory tutorial on data visualisation, analysis and feature engineering. Implemented binary classifier based on the same ( https://www.kaggle.com/c/titanic )

Bag of Words : Completed tutorial on using Google's Word2Vec implementation for sentiment analysis of movie reviews and implemented binary classifier based on the same ( https://www.kaggle.com/c/word2vec-nlp-tutorial )

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published