Environmental Code Challenge

Jul 18th, 2018

Classification and prediction project

Created by

Duong Vu

Notebook • Main Features • Usage • Dependencies • Folder Structure

Overview

Task 1 - Classify wind turbine failure

Classify if the turbine will break down within the next 40 days

predictive_maintenance_dataset.csv is a file that contains parameters and settings for many wind turbines:

operational_setting_1
operational_setting_2
sensor_measurement_1
sensor_measurement_2 ...

There is a column called unit_number which specifies which turbine it is, and one called status, in which a value of 1 means the turbine broke down that day, and 0 means it didn't.

The task is to create a model that, when fed with operational settings and sensor measurements (unit_number and time_stamp will not be fed in), outputs 1 if the turbine will break down within the next 40 days, and 0 if not.

For a closer look at the process, please review the Jupyter Notebook

Task 2 - Predict city pollution

Predict the pollution value after 6 hours.

forecasting_dataset.csv is a file that contains pollution data for a city. The task is to create a model that, when fed with columns co_gt, nhmc, c6h6, s2, nox, s3, no2, s4, s5, t, rh, ah, and level, predicts the value of y six hours later.

For a closer look at the process, please review the Jupyter Notebook

Notebook

A writeup explaining design decisions, potential works and the reasons for making current choices: Notebook

For log files, open using TensorBoard by typing below command in your terminal in where the log folder is:

tensorboard --logdir=logs

Main Features

The model is actually a pipeline for both tasks.

Task 1 pipeline contains:

Get dummies from categorical variable and drop 1 level
Select only features appears in training set
Impute with the mean
Feed Forward Neural Network with Keras

Task 2 pipeline contains:

Select only features appears in training set
Get dummies from categorical variable
Impute with the mean
Feed Forward Neural Network with Keras

Usage

Download the model saved in pickle file in Result folder.

For task 1:

Load in the pipeline first
Then load the keras model in the pipeline. (use Keras 1.2 to load the model)

# Load the pipeline first:
pl_load_in = joblib.load('../../results/task1_pipeline.pkl')

# Then, load the Keras model:
pl_load_in.named_steps['model'].model = load_model('../../results/task1_keras_model.h5')

# Test the model:
# Compute and print MSE for validation
ypred = pl_load_in.predict(Xval)
mse = mean_squared_error(yval, ypred)
print("Mean squared error: %f" % (mse))

# reset index for comparison (if yval already have clean index, this step can be omitted)
yval2 = yval.reset_index(drop=True)

# assign hard label (function hard_label() is in src.task1.reform_results)
new_ypred=pd.DataFrame(ypred)[0].apply(hard_label)

# Compute the accuracy: accuracy for validation
accuracy = float(np.sum(new_ypred==yval2))/yval2.shape[0]
print("accuracy: {}%".format(round(accuracy*100, 3)))

For task 2:

Load in the pipeline. The model is included in the pipeline.

# load the model from disk
filename = 'results/task2_model.pkl' # path leads to pickle model
loaded_model = pickle.load(open(filename, 'rb'))

# Test the model
ypred = loaded_model.predict(Xtest)
print("R squared score is:", r2_score(ytest,ypred).round(3))

Dependencies

numpy
pandas
missingno
imbalanced-learn
sklearn
statsmodels
keras 2.0 for modelling, 1.2 if just need to load model and use it.
matplotlib
seaborn
scikitplot

Folder Structure

The hierarchy of this repository is described like below:

     .
     |-- README 
     |-- LICENSE
     |-- .gitignore.py        
     |-- data
     |   -- predictive_maintenance_dataset.csv
     |   -- forecasting_dataset.csv
     |-- doc 
     |   -- notebook.md         # electronic lab notebook
     |   -- manuscript.md       
     |-- results		# storing all the result models 
     |-- src                    # source code used for both tasks
     |   -- task1               # code specific for task 1
     |   -- task2               # code specific for task 2
     |-- test			# tests for functions
     |-- assets                 # store images
     |-- bin
     |   -- # keep all the files I want to delete but not sure whether I will need it later

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environmental Code Challenge

Classification and prediction project

Created by

Duong Vu

Notebook • Main Features • Usage • Dependencies • Folder Structure

Overview

Task 1 - Classify wind turbine failure

Classify if the turbine will break down within the next 40 days

Task 2 - Predict city pollution

Predict the pollution value after 6 hours.

Notebook

Main Features

Usage

Dependencies

Folder Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
bin		bin
data		data
doc		doc
results		results
src		src
tests		tests
.gitignore		.gitignore
CITATION.md		CITATION.md
LICENSE.md		LICENSE.md
README.md		README.md
__init__.py		__init__.py

License

DuongVu39/Code_Challenge

Folders and files

Latest commit

History

Repository files navigation

Environmental Code Challenge

Classification and prediction project

Created by

Duong Vu

Notebook • Main Features • Usage • Dependencies • Folder Structure

Overview

Task 1 - Classify wind turbine failure

Classify if the turbine will break down within the next 40 days

Task 2 - Predict city pollution

Predict the pollution value after 6 hours.

Notebook

Main Features

Usage

Dependencies

Folder Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages