forked from tidyverse/design
-
Notifications
You must be signed in to change notification settings - Fork 0
/
def-magical.Rmd
187 lines (131 loc) · 5.91 KB
/
def-magical.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# Avoid magical defaults {#def-magical}
```{r, include = FALSE, cache = FALSE}
source("common.R")
source("fun_def.R")
```
```{r, eval = FALSE, include = FALSE}
funs <- c(pkg_funs("base"), pkg_funs("stats"), pkg_funs("utils"))
funs %>% funs_formals_keep(~ is_symbol(.x) && !is_missing(.x))
pkg_funs("base") %>% funs_body_keep(has_call, "missing")
```
## What's the problem?
If a function behaves differently when the default value is supplied explicitly, we say it has a __magical default__. Magical defaults are best avoided because they make it harder to interpret the function specification.
## What are some examples?
* In `data.frame()`, the default argument for `row.names` is `NULL`, but
if you supply it directly you get a different result:
```{r}
fun_call(data.frame)
x <- setNames(nm = letters[1:3])
data.frame(x)
data.frame(x, row.names = NULL)
```
* In `hist()`, the default value of `xlim` is `range(breaks)`, and the
default value for `breaks` is `"Sturges"`. `range("Sturges")` returns
`c("Sturges", "Sturges")` which doesn't work when supplied explicitly:
```{r, error = TRUE, fig.show = "hide"}
fun_call(hist.default)
hist(1:10, xlim = c("Sturges", "Sturges"))
```
* In `Vectorize()`, the default argument for `vectorize.args` is `arg.names`,
but this variable is defined inside of `Vectorize()`, so if you supply it
explicitly you get an error.
```{r, error = TRUE}
fun_call(Vectorize)
Vectorize(rep.int, vectorize.args = arg.names)
```
* In `rbeta()`, the default value of `ncp` is 0, but if you explicitly supply
it the function uses a different algorithm:
```{r}
rbeta
```
* In `table()`, the default value of `dnn` is `list.names(...)`; but
`list.names()` is only defined inside of `table()`.
* `readr::read_csv()` has `progress = show_progress()`, but until version
1.3.1, `show_progress()` was not exported from the package. That means if you
attempted to run it yourself, you'd see an error message:
```{r, error = TRUE}
show_progress()
```
* In `usethis::use_rmarkdown_template()`, `template_dir` has the default value
of `tolower(asciify(template_name))`, but `asciify` is not exported. That
means there's no way to interactively explore this default value.
## What are the exceptions?
It's ok to use this behaviour when you want the default value of one argument to be the same as another. For example, take `rlang::set_names()`, which allows you to create a named vector from two inputs:
```{r}
fun_call(set_names)
set_names(1:3, letters[1:3])
```
The default value for the names is the vector itself. This provides a convenient shortcut for naming a vector with itself:
```{r}
set_names(letters[1:3])
```
You can see this same technique in `merge()`, where `all.x` and `all.y` default to the same value as `all`, and in `factor()` where `labels` defaults to the same value as `levels`.
If you use this technique, make sure that you never use the value of an argument that comes later in the argument list. For example, in `file.copy()` `overwrite` defaults to the same value as `recursive`, but the `recursive` argument is defined after `overwrite`:
```{r}
fun_call(file.copy)
```
This makes the defaults arguments harder to understand because you can't just read from left-to-right.
## What causes the problem?
There are three primary causes:
* Overuse of lazy evaluation of default values, which are evaluated in the
environment of the function, as described in
[Advanced R](https://adv-r.hadley.nz/functions.html#default-arguments).
Here's a simple example:
```{r}
f1 <- function(x = y) {
y <- trunc(Sys.time(), units = "months")
x
}
y <- 1
f1()
f1(y)
```
When `x` takes the value `y` from its default, it's evaluated inside the
function, yielding `1`. When `y` is supplied explicitly, it is evaluated
in the caller environment, yielding `2`.
* Use of `missing()` so that the default value is never consulted:
```{r}
f2 <- function(x = 1) {
if (missing(x)) {
2
} else {
x
}
}
f2()
f2(1)
```
* In packages, it's easy to use a non-exported function without thinking
about it. This function is available to you, the package author, but not
the user of the package, which makes it harder for them to understand
how a package works.
## How do I remediate the problem?
This problem is generally easy to avoid for new functions:
* Don't use default values that depend on variables defined inside the function.
* Don't use `missing()`[^missing-exceptions].
* Don't use unexported functions.
[^missing-exceptions]: The only exceptions are described in Sections \@ref(args-mutually-exclusive) and \@ref(args-compound).
If you have a made a mistake in an older function you can remediate it by using a `NULL` default, as described in Chapter \@ref(def-short). If the problem is caused by an unexported function, you can also choose to document and export it.
```{r}
`%||%` <- function(x, y) if (is.null(x)) y else x
f1_better <- function(x = NULL) {
y <- trunc(Sys.time(), units = "weeks")
x <- x %||% y
x
}
f2_better <- function(x = NULL) {
x <- x %||% 2
x
}
```
This modification should not break existing code, because expands the function interface: all previous code will continue to work, and the function will also work if the argument is passed `NULL` input (which probably didn't previously).
For functions like `data.frame()` where `NULL` is already a permissible value, you'll need to use a sentinel object, as described in Section \@ref(args-default-sentinel).
```{r}
sentinel <- function() structure(list(), class = "sentinel")
is_sentinel <- function(x) inherits(x, "sentinel")
data.frame_better <- function(..., row.names = sentinel()) {
if (is_sentinel(row.names)) {
# old default behaviour
}
}
```