Skip to content

limcheekin/r-flight-delay-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predict Flight Delay using R

Use the Machine Learning Workflow to process and transform DOT data to create a prediction model. This model must predict whether a flight would arrive 15+ minutes after the scheduled arrival time with 70+% accuracy.

Download data

Download 2015 January raw data in csv file from the following URL: http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time

The downloaded data stored in raw_201501.csv.

Preparing data

Find out more from Steps of Data Preparation and it's corresponding source file.

Selecting algorithm

The selected initial algorithm is Logistic Regression as it is simple(easy to understand), fast(up to 100x faster) and stable to data changes.

Next, switched to Random Forest to improve prediction result.

Train model

  • Training the model using caret package

    install.packages('caret')
    library(caret)
  • Set the seed so that the random number generated in the same sequence to yield the same training results

    set.seed(12345)
  • Retrieve feature columns only

    featureColumns <- c('ARR_DEL15', 'DAY_OF_WEEK', 'CARRIER', 'DEST', 'ORIGIN', 'DEP_TIME_BLK')
    onTimeDataFiltered <- onTimeData[,featureColumns]
  • Retrieve 70% of data for training

    trainRows <- createDataPartition(onTimeDataFiltered$ARR_DEL15, p=0.7, list=FALSE)
    head(trainRows, 10)
    trainData <- onTimeDataFiltered[trainRows,]
  • Retrieve the remaining 30% of data for testing

    testData <- onTimeDataFiltered[-trainRows,]
  • Simple verification

    nrow(trainData)/(nrow(trainData) + nrow(testData))
    nrow(testData)/(nrow(trainData) + nrow(testData))
  • Train the model with train data

    logisticRegModel <- train(ARR_DEL15 ~ ., data=trainData, method="glm", family="binomial")
    Error in train.default(x, y, weights = w, ...) : 
    One or more factor levels in the outcome has no data: ''
    

About

Predict Flight Delay using R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages