-
Notifications
You must be signed in to change notification settings - Fork 48
/
Copy pathch15.Rmd
302 lines (220 loc) · 8.09 KB
/
ch15.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
---
title: "Ch15"
output:
pdf_document: default
html_document:
df_print: paged
---
```{r}
library(lubridate)
library(tidyverse)
```
19.2.1 Practice
Why is TRUE not a parameter to rescale01()? What would happen if x contained a single missing value, and na.rm was FALSE?
```{r}
rescale01 <- function(x) {
rng <- range(x, na.rm = FALSE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(-1, 0, 5, 20, NA))
```
Everything is NA! We should set an argument to control the `TRUE` or `FALSE` of `range`
In the second variant of rescale01(), infinite values are left unchanged. Rewrite rescale01() so that -Inf is mapped to 0, and Inf is mapped to 1.
```{r}
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE, finite = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
x[x == Inf] <- 1
x[x == -Inf] <- 0
x
}
rescale01(c(Inf, -Inf, 0:5, NA))
```
Practice turning the following code snippets into functions. Think about what each function does. What would you call it? How many arguments does it need? Can you rewrite it to be more expressive or less duplicative?
```{r}
x <- 1:10
mean(is.na(x))
x / sum(x, na.rm = TRUE)
sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
```
Implementation
```{r}
prop_miss <- function(x) {
mean(is.na(x))
}
my_mean <- function(x) {
x / sum(x, na.rm = TRUE)
}
my_var <- function(x) {
sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
}
# Follow http://nicercode.github.io/intro/writing-functions.html to write your own functions to compute the variance and skew of a numeric vector.
```
Write both_na(), a function that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.
```{r}
both_na <- function(x, y) {
stopifnot(length(x) == length(y))
which(is.na(x) & is.na(y))
}
both_na(c(1, 2, NA, 2, NA), c(1, 2, 3, 4, NA))
```
What do the following functions do? Why are they useful even though they are so short?
```{r}
# Checks whether the path is a directory.
is_directory <- function(x) file.info(x)$isdir
# Checks whether a file is readable.
is_readable <- function(x) file.access(x, 4) == 0
```
Read the complete lyrics to “Little Bunny Foo Foo”. There’s a lot of duplication in this song. Extend the initial piping example to recreate the complete song, and use functions to reduce the duplication.
## Exercises 19.3.1
Read the source code for each of the following three functions, puzzle out what they do, and then brainstorm better names.
```{r}
f1 <- function(string, prefix) {
substr(string, 1, nchar(prefix)) == prefix
}
f2 <- function(x) {
if (length(x) <= 1) return(NULL)
x[-length(x)]
}
f3 <- function(x, y) {
rep(y, length.out = length(x))
}
```
```{r}
# It tests whether the prefix argument is the same prefix
# as in the string
has_prefix <- function(string, prefix) {
substr(string, 1, nchar(prefix)) == prefix
}
has_prefix(c("hey_ho", "another_pre"), "hey")
# It removes the last element of x, only of the length greater than 1
# otherwise returns NULL
remove_last <- function(x) {
if (length(x) <= 1) return(NULL)
x[-length(x)]
}
remove_last(c(1, 2, 3))
convert_length <- function(x, y) {
rep(y, length.out = length(x))
}
convert_length(1:10, 1:3)
```
Take a function that you’ve written recently and spend 5 minutes brainstorming a better name for it and its arguments.
-
Compare and contrast rnorm() and MASS::mvrnorm(). How could you make them more consistent?
You could create one function that accepts the same first three arguments and then a logical stating whether you want it to be uni or multivariate.
Make a case for why norm_r(), norm_d() etc would be better than rnorm(), dnorm(). Make a case for the opposite.
Taken from: https://jrnold.github.io/r4ds-exercise-solutions/functions.html#when-should-you-write-a-function
If named norm_r and norm_d, it groups the family of functions related to the normal distribution. If named rnorm, and dnorm, functions related to are grouped into families by the action they perform. r* functions always sample from distributions: rnorm, rbinom, runif, rexp. d* functions calculate the probability density or mass of a distribution: dnorm, dbinom, dunif, dexp.
## Exercises 19.4.4
What’s the difference between if and ifelse()? Carefully read the help and construct three examples that illustrate the key differences.
`ifelse` tests the conditions in a vectorizes way, meaning it returns a vector that can be of length 1 or > 1. `if` tests only one logical statement for an object of length 1.
Write a greeting function that says “good morning”, “good afternoon”, or “good evening”, depending on the time of day. (Hint: use a time argument that defaults to lubridate::now(). That will make it easier to test your function.)
```{r}
greeter <- function(now = now()) {
if (between(hour(now), 8, 13)) {
print("Good morning")
} else if (between(hour(now), 13, 18)) {
print("Good afternoon")
} else {
print("Good evening")
}
}
greeter(now())
```
Implement a fizzbuzz function. It takes a single number as input. If the number is divisible by three, it returns “fizz”. If it’s divisible by five it returns “buzz”. If it’s divisible by three and five, it returns “fizzbuzz”. Otherwise, it returns the number. Make sure you first write working code before you create the function.
```{r}
x <- 9
fizzbuzz <- function(x) {
by_three <- x %% 3 == 0
by_five <- x %% 5 == 0
if (by_three && by_five) {
return("fizzbuzz")
} else if (by_three) {
return("fizz")
} else if (by_five) {
return("buzz")
} else {
return(x)
}
}
sapply(1:20, fizzbuzz)
```
How could you use cut() to simplify this set of nested if-else statements?
```{r}
temp <- c(27, 30)
if (temp <= 0) {
"freezing"
} else if (temp <= 10) {
"cold"
} else if (temp <= 20) {
"cool"
} else if (temp <= 30) {
"warm"
} else {
"hot"
}
cut(temp, breaks = seq(-10, 40, 10),
labels = c("freezing", "cold", "cool", "warm", "hot"))
```
How would you change the call to cut() if I’d used < instead of <=? What is the other chief advantage of cut() for this problem? (Hint: what happens if you have many values in temp?)
First, that it's vectorized, so I can recode a vector of values with the single function and second that it controls the closing points of what to recode such as:
```{r}
cut(temp, breaks = seq(-10, 40, 10),
right = FALSE,
labels = c("freezing", "cold", "cool", "warm", "hot"))
```
What happens if you use switch() with numeric values?
```{r, eval = FALSE}
x = 2
switch(x, 1 = "No", 2 = "Yes")
switch(x, `1` = "No", `2` = "Yes")
```
What does this switch() call do? What happens if x is “e”?
```{r}
x <- "d"
switch(x,
a = ,
b = "ab",
c = ,
d = "cd"
)
```
When a slot is empty it finds the next slot that has values.
```{r}
x <- "a"
switch(x,
a = ,
z = ,
b = "ab",
c = ,
d = "cd"
)
```
## Exercises 19.5.5
What does commas(letters, collapse = "-") do? Why?
```{r}
commas <- function(..., collapse = ",") {
str_c(..., collapse = collapse)
}
commas(letters, collapse = "-")
```
You get them `-` separated!
It’d be nice if you could supply multiple characters to the pad argument, e.g. rule("Title", pad = "-+"). Why doesn’t this currently work? How could you fix it?
```{r}
rule <- function(..., pad = "-") {
title <- paste0(...)
width <- getOption("width") - nchar(title) - 5
pad_char <- nchar(pad)
cat(title, " ", stringr::str_dup(pad, width / pad_char), "\n", sep = "")
}
rule("my title", pad = "-+")
```
What does the trim argument to mean() do? When might you use it?
```{r}
mean(c(99, 1:10))
mean(c(99, 1:10), trim = 0.1)
```
In this case, it trimes `10%` of the the vector on both sides.
The default value for the method argument to cor() is c("pearson", "kendall", "spearman"). What does that mean? What value is used by default?
The first value is used by default and you have the option of specifying any of the three values.