Pluvius is a simple rain predict app for machine learning engineering study purpose
the project still in development and the next updates will be coded in the following tasks:
- model training and ONNX saving
- create Dockerfile for featurizer
- create utils for downloading the INMET dataset
- create a worker pipeline with haskell for preprocessing (enrichment stage)
- create a worker pipeline with haskell for preprocessing (normalizing and vectorizing stage)
- create an evaluation API for prediction
before start, verify if you have the following things:
- you have installed the most recent version of Haskell and
cabal
(you can install it with GHCup) in your machine - you have installed the most recent version of
python
,pip
in your machine - you have installed Docker and docker compose
For install pluvius
first you need:
- clone repository
$ git clone https://github.com/roqueando/pluvius.git
- build feature-extractor
$ make build/feature-extractor
- create a virtualenv for python stuff
$ make setup/python
To use pluvius you will need the correct dataset, then use the download script to get the INMETBR dataset (for now I'm just using Brazilian data)
$ make download/dataset
This will download, extract the zip file and merge into a single CSV for post-use.
Make sure to import this CSV into a mongo database. I put the collection named as raw
.
Here we will need to pre process all dataset to fit into model, so we will run the preprocess dockerized app
$ make run/feature-extractor file=./data/raw/2019.csv
Make sure that you have run the make download/dataset
step to have the merged raw file in correct directory.
Want to be part of this project? Click HERE and read how to contribute
This project is under the license. See the file LICENSE for more details