Part 1. My Portfolio
Click Here to see my current portfolio (continue updating).
Below is a place to document on how I learn Data Science myself.
1. Learning Python (Link)
I've been using online resources to learn python.
a. Udemy Online Course: Complete Python Bootcamp: Go from zero to hero in Python
-
Review: This is my first python course. At the time I took it, it's in python2. But the instructor also provides python3 syntax whenever it's applicable. I know I want to learn python3. So I changed all the assignments in python3 syntax. This is a very comprehensive course from python basic to methods, functions, moduels and packages. I learned a lot in this course. But to master python and be familiar with it, the best way to do so is to practice day-in and day-out.
- Certificate
- Projects:
b. DataQuest: Python For Beginner
- Review: This is one of the classes in the DataQuest Data Science Path. I did this mainly to refresh my knowledge in python. It's very intuitive as you will be coding and learning at the same time.
- Certificate
- Project: Explore U.S. Births
2. Refreshing SQL Knowledge (Link)
I've been using SQL on a daily basis to do my health analytic work. But all my SQL is done within SAS EG. I'd like to refresh my knowledge on SQL. So I almost practice SQL online and took some SQL Bootcamps to refresh my knowledge. Most of the online resources are free, such as SQLZOO, Udacity SQL etc.
a. The Complete SQL Bootcamp
- I also took a quick SQL Bootcamp on Udemy. Since I've been using SQL daily, this course is no challenge to me. I'm able to finish the course in less than 2 days.
b. Udacity Nanodegree in Data Science Prerequisite: SQL
c. Pratice SQL on SQLZOO
d. Stanford Online Lagunia course: Introduction to Database
When I searched Data Scientist job postings online, I found out many of them require A/B testing experience. With a background in Statistics, and I've done some researches in Statistics experiment designs back in college, I decided to once again refresh my knowledge in A/B testing and build a portfolio in A/B Testing.
1.Udacity A/B Testing
Machine learning technique is very important and powerful in data science. To better understand it and apply it in data analysis, we not only need to have some background in statistics and probability, but also need some fundamental knowledge of what the algorithm are. I found the following material useful and start learning self paced.
a. Statistical Learning by Stanford
This is an introductory-level course in supervised learning, with a focus on regression and classification methods.
The course include: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).
The course also cover all material to compute in R, but I'm planning to change them to Python. To be continue...
Enrolled in the first cohort Udaity Data Science Nanodegree program. Successfully completed all the projects including supervised learning, deep learning and unsupervised learning. Some of the projects include:
1. Supervised Learning: Finding Donor
2. Deep Learning
- Using Pytorch for Deep Learning
- Image Classification
- Use Google Cloud at scale for deep learning project as my computer doesn't have GPU supported.
3. Unsupervised Learning: Identify Customer Segments
- Use K-Mean cluster method to identify customers segments for a German company.
I also constantly play around on Kaggle to do some data exploratory and visualization exercise. Some examples bellow: