An R package to make imputation simple. Currently supported methods include
- Model based (optionally add [non-]parametric random residual)
- linear regression
- robust linear regression (M-estimation)
- ridge/elasticnet/lasso regression (from version >= 0.2.1)
- CART models
- Random forest
- Model based, multivariate
- Imputation based on EM-estimated parameters (from version >= 0.2.1)
- missForest (from version >= 0.2.1)
- Donor imputation (including various donor pool specifications)
- k-nearest neigbour (based on gower's distance)
- sequential hotdeck (LOCF, NOCB)
- random hotdeck
- Predictive mean matching
- Other
- (groupwise) median imputation (optional random residual)
- Proxy imputation (copy from other variable)
To install simputation and all packages needed to support various imputation models do the following.
install.packages("simputation", dependencies=TRUE)
Create some data suffering from missings
library(simputation) # current package
library(magrittr) # for the %>% not-a-pipe operator
dat <- iris
# empty a few fields
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA
head(dat,10)
Now impute Sepal.Length
and Sepal.Width
by regression on Petal.Length
and Species
, and impute Species
using a CART model, that uses all other variables (including the imputed variables in this case).
dat %>%
impute_lm(Sepal.Length + Sepal.Width ~ Petal.Length + Species) %>%
impute_cart(Species ~ .) %>% # use all variables except 'Species' as predictor
head(10)
Beta versions can be installed from my drat repo. If you use the OS whose name shall not be spoken, first install Rtools.
if(!require(drat)) install.packages("drat")
drat::addRepo("markvanderloo")
install.packages("simputation",type="source")