-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmlim.error.Rd
87 lines (77 loc) · 3.08 KB
/
mlim.error.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mlim.error.R
\name{mlim.error}
\alias{mlim.error}
\title{imputation error}
\usage{
mlim.error(
imputed,
incomplete,
complete,
transform = NULL,
varwise = FALSE,
ignore.missclass = TRUE,
ignore.rank = FALSE
)
}
\arguments{
\item{imputed}{the imputed dataframe}
\item{incomplete}{the dataframe with missing values}
\item{complete}{the original dataframe with no missing values}
\item{transform}{character. it can be either "standardize", which standardizes the
numeric variables before evaluating the imputation error, or
"normalize", which change the scale of continuous variables to
range from 0 to 1. the default is NULL.}
\item{varwise}{logical, default is FALSE. if TRUE, in addition to
mean accuracy for each variable type, the algorithm's
performance for each variable (column) of the datast is
also returned. if TRUE, instead of a numeric vector, a
list is retuned.}
\item{ignore.missclass}{logical. the default is TRUE. if FALSE, the overall
missclassification rate for imputed unordered factors will be
returned. in general, missclassification is not recommended,
particularly for multinomial factors because it is not robust
to imbalanced data. in other words, an imputation might show
a very high accuracy, because it is biased towards the majority
class, ignoring the minority levels. to avoid this error,
Mean Per Class Error (MPCE) is returned, which is the average
missclassification of each class and thus, it is a fairer
criteria for evaluating multinomial classes.}
\item{ignore.rank}{logical (default is FALSE, which is recommended). if TRUE,
the accuracy of imputation of ordered factors (ordinal variables)
will be evaluated based on 'missclassification rate' instead of
normalized euclidean distance. this practice is not recommended
because higher classification rate for ordinal variables does not
guarantee lower distances between the imputed levels, despite the
popularity of evaluating ordinal variables based on missclassification
rate. in other words, assume an ordinal variable has 5 levels (1. strongly
disagree, 2. disagree, 3. uncertain, 4. agree, 5.strongly agree). in this
example, if "ignore.rank = TRUE", then an imputation that imputes level
"5" as "4" is equally inaccurate as other algorithm that imputes level "5"
as "1". therefore, if you have ordinal variables in your dataset, make sure
you declare them as "ordered" factors to get the best imputation accuracy.}
}
\value{
numeric vector
}
\description{
calculates NRMSE, missclassification rate, and miss-ranking
absolute mean distance, scaled between 0 to 1, where 1 means
maximum distance between the actual rank of a level and the
imputed level.
}
\examples{
\dontrun{
data(iris)
# add 10\% missing values, ensure missingness is stratified for factors
irisNA <- mlim.na(iris, p = 0.1, stratify = TRUE, seed = 2022)
# run the default imputation
MLIM <- mlim(irisNA)
mlim.error(MLIM, irisNA, iris)
# get error estimations for each variable
mlim.error(MLIM, irisNA, iris, varwise = TRUE)
}
}
\author{
E. F. Haghish
}