Genetic Mutation Cancer Classification Project - Natural Language Processing and Machine Learning

Contributors: Katie Critelli, Patrick Masi-Phelps, Sam Trost, and Jing Wang

Blog post: https://blog.nycdatascience.com/student-works/redefining-cancer-treatment-predicting-gene-mutations-advance-personalized-medicine/

Welcome to the repository for the Cancer Genetic Analysis Team (CGAT) of NYC Data Science Academy!

We put together a package of solutions to help cancer researchers speed up the process of classifying genetic mutation variations as drivers or passengers of cancer tumor growth. Memorial Sloan Kettering Cancer Center put out a public competition soliciting machine learning models to take relevant medical research text and classify the associated genetic mutations into one of nine mutually exclusive classes, each contributing differently to cancer tumor growth (or not).

Our solutions include:

RShiny app with interactive visualizations, including distributions of genes, variations, and mutation characteristics broken down by class, as well as text features obtained from medical research papers used to manually classify mutations.
Machine learning classification models using vectorized text features to classify genetic mutations into one of nine classes
Selenium web scraping app to scrape PubMed for new research papers related to genetic mutations and associated cancer risks
Django app and database of research text which allows a user to enter new medical text and (almost) immediately see the class (from 1-9) of the associated genetic mutation - as it relates to cancer risk.

You can find a deck summarizing our project pipeline, including data analysis and visualizations (and RShiny app), machine learning models, findings, Django app, and areas for further improvement in the file "MSK Kaggle Project (1).pdf".

iPython notebooks and R files used in data preprocessing, visualization, and modeling can be found in the appropriate folder.

Link to associated Kaggle competition: https://www.kaggle.com/c/msk-redefining-cancer-treatment

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.ipynb_checkpoints		.ipynb_checkpoints
RShiny App		RShiny App
data_EDA_R		data_EDA_R
data_django		data_django
data_kaggle		data_kaggle
data_selenium		data_selenium
model_bow		model_bow
model_tfidf		model_tfidf
models_doc2vec		models_doc2vec
submissions		submissions
.DS_Store		.DS_Store
MSK Kaggle Project (1).pdf		MSK Kaggle Project (1).pdf
README.md		README.md
ngram1.png		ngram1.png
ngram2.png		ngram2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic Mutation Cancer Classification Project - Natural Language Processing and Machine Learning

Contributors: Katie Critelli, Patrick Masi-Phelps, Sam Trost, and Jing Wang

Blog post: https://blog.nycdatascience.com/student-works/redefining-cancer-treatment-predicting-gene-mutations-advance-personalized-medicine/

About

Releases

Packages

Contributors 4

Languages

pmasiphelps/MSK

Folders and files

Latest commit

History

Repository files navigation

Genetic Mutation Cancer Classification Project - Natural Language Processing and Machine Learning

Contributors: Katie Critelli, Patrick Masi-Phelps, Sam Trost, and Jing Wang

Blog post: https://blog.nycdatascience.com/student-works/redefining-cancer-treatment-predicting-gene-mutations-advance-personalized-medicine/

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages