Project Description

This project serves as a hand-in for the final exam for 02476 Machine Learning Operations (January 2024). Our group (number 10) consists of the following memebers.

Laura Paz
Shah Bekhsh
Tomasz Truszkowski
Àiax Faura Vilalta
Diana Podoroghin

Inspiration & Data

As online news articles are being quoted and shared by billions of people everyday, the validity of these news articles and the information contained within them is continuously called into question. The goal of this project is to create a text classifier model that analyzes a given news article and identifies whether it is authentic or fake. It is trained on a dataset (obtained from kaggle) that contains the following information:

The title of the news article
The text of the news article

There are 72,133 unique articles and one binary classification label column. Note: In the data available on Kaggle, 0=Real and 1=Fake.

Framework & Pipeline

For this project, the base framework will be PyTorch, which we will be using to create, train and validate our model (exact model algorithm to be decided). We will also be using an NLP transformer from huggingface (exact transformer to be decided) for additional functionality and robustness regarding NLP. To make this project more streamlined and available, we will be implementing version controlling for our data using DVC and there will be docker images available for the training and prediction.

Project Structure

The directory structure of the project looks like this:

├── Makefile             <- Makefile with convenience commands like `make data` or `make train`
├── README.md            <- The top-level README for developers using this project.
├── data
│   ├── processed        <- The final, canonical data sets for modeling.
│   └── raw              <- The original, immutable data dump.
│
├── docs                 <- Documentation folder
│   │
│   ├── index.md         <- Homepage for your documentation
│   │
│   ├── mkdocs.yml       <- Configuration file for mkdocs
│   │
│   └── source/          <- Source directory for documentation files
│
├── models               <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks            <- Jupyter notebooks.
│
├── pyproject.toml       <- Project configuration file
│
├── reports              <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures          <- Generated graphics and figures to be used in reporting
│
├── requirements.txt     <- The requirements file for reproducing the analysis environment
|
├── requirements_dev.txt <- The requirements file for reproducing the analysis environment
│
├── tests                <- Test files
│
├── src  <- Source code for use in this project.
│   │
│   ├── __init__.py      <- Makes folder a Python module
│   │
│   ├── data             <- Scripts to download or generate data
│   │   ├── __init__.py
│   │   └── make_dataset.py
│   │
│   ├── models           <- model implementations, training script and prediction script
│   │   ├── __init__.py
│   │   ├── model.py
│   │
│   ├── visualization    <- Scripts to create exploratory and results oriented visualizations
│   │   ├── __init__.py
│   │   └── visualize.py
│   ├── train_model.py   <- script for training the model
│   └── predict_model.py <- script for predicting from a model
│
└── LICENSE              <- Open-source license if one is chosen

Created using mlops_template, a cookiecutter template for getting started with Machine Learning Operations (MLOps).

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
.dvc		.dvc
.github/workflows		.github/workflows
dockerfiles		dockerfiles
docs		docs
notebooks		notebooks
reports		reports
src		src
tests		tests
.DS_Store		.DS_Store
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
data.dvc		data.dvc
docker-compose.yaml		docker-compose.yaml
models.dvc		models.dvc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Inspiration & Data

Framework & Pipeline

Project Structure

About

Releases

Packages

Contributors 5

Languages

License

truszkowski-tomasz/dtu-mlops

Folders and files

Latest commit

History

Repository files navigation

Project Description

Inspiration & Data

Framework & Pipeline

Project Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages