Skip to content

DuongVu39/Code_Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Environmental Code Challenge

Jul 18th, 2018

Classification and prediction project

Created by

License Python version

Notebook  •  Main Features  •  Usage  • Dependencies • Folder Structure   


Overview

Task 1 - Classify wind turbine failure


Classify if the turbine will break down within the next 40 days


predictive_maintenance_dataset.csv is a file that contains parameters and settings for many wind turbines:

  • operational_setting_1
  • operational_setting_2
  • sensor_measurement_1
  • sensor_measurement_2 ...

There is a column called unit_number which specifies which turbine it is, and one called status, in which a value of 1 means the turbine broke down that day, and 0 means it didn't.

The task is to create a model that, when fed with operational settings and sensor measurements (unit_number and time_stamp will not be fed in), outputs 1 if the turbine will break down within the next 40 days, and 0 if not.

For a closer look at the process, please review the Jupyter Notebook

Task 2 - Predict city pollution


Predict the pollution value after 6 hours.

forecasting_dataset.csv is a file that contains pollution data for a city. The task is to create a model that, when fed with columns co_gt, nhmc, c6h6, s2, nox, s3, no2, s4, s5, t, rh, ah, and level, predicts the value of y six hours later.

For a closer look at the process, please review the Jupyter Notebook

Notebook

A writeup explaining design decisions, potential works and the reasons for making current choices: Notebook

For log files, open using TensorBoard by typing below command in your terminal in where the log folder is:

tensorboard --logdir=logs

Main Features

The model is actually a pipeline for both tasks.

Task 1 pipeline contains:

  • Get dummies from categorical variable and drop 1 level
  • Select only features appears in training set
  • Impute with the mean
  • Feed Forward Neural Network with Keras

Task 2 pipeline contains:

  • Select only features appears in training set
  • Get dummies from categorical variable
  • Impute with the mean
  • Feed Forward Neural Network with Keras

Usage

Download the model saved in pickle file in Result folder.

For task 1:

  • Load in the pipeline first
  • Then load the keras model in the pipeline. (use Keras 1.2 to load the model)
# Load the pipeline first:
pl_load_in = joblib.load('../../results/task1_pipeline.pkl')

# Then, load the Keras model:
pl_load_in.named_steps['model'].model = load_model('../../results/task1_keras_model.h5')

# Test the model:
# Compute and print MSE for validation
ypred = pl_load_in.predict(Xval)
mse = mean_squared_error(yval, ypred)
print("Mean squared error: %f" % (mse))

# reset index for comparison (if yval already have clean index, this step can be omitted)
yval2 = yval.reset_index(drop=True)

# assign hard label (function hard_label() is in src.task1.reform_results)
new_ypred=pd.DataFrame(ypred)[0].apply(hard_label)

# Compute the accuracy: accuracy for validation
accuracy = float(np.sum(new_ypred==yval2))/yval2.shape[0]
print("accuracy: {}%".format(round(accuracy*100, 3)))

For task 2:

Load in the pipeline. The model is included in the pipeline.

# load the model from disk
filename = 'results/task2_model.pkl' # path leads to pickle model
loaded_model = pickle.load(open(filename, 'rb'))

# Test the model
ypred = loaded_model.predict(Xtest)
print("R squared score is:", r2_score(ytest,ypred).round(3))

Dependencies

  • numpy
  • pandas
  • missingno
  • imbalanced-learn
  • sklearn
  • statsmodels
  • keras 2.0 for modelling, 1.2 if just need to load model and use it.
  • matplotlib
  • seaborn
  • scikitplot

Folder Structure

The hierarchy of this repository is described like below:

     .
     |-- README 
     |-- LICENSE
     |-- .gitignore.py        
     |-- data
     |   -- predictive_maintenance_dataset.csv
     |   -- forecasting_dataset.csv
     |-- doc 
     |   -- notebook.md         # electronic lab notebook
     |   -- manuscript.md       
     |-- results		# storing all the result models 
     |-- src                    # source code used for both tasks
     |   -- task1               # code specific for task 1
     |   -- task2               # code specific for task 2
     |-- test			# tests for functions
     |-- assets                 # store images
     |-- bin
     |   -- # keep all the files I want to delete but not sure whether I will need it later

About

Deep Learning Code Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published