Skip to content

Commit

Permalink
Fixes from Roberto
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jun 27, 2016
1 parent 98c14b8 commit 539e6d9
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions data-structures.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ library(purrr)
library(dplyr)
```

So far this book has focussed on data frames and packages that work with them. But as you start to write your own functions, and dig deeper into R, you need to learn about vectors, the objects that underpin data frames. If you've learned R in a more traditional way, you're probably familiar with vectors already, as most R resource start with vectors and work their way up to data frames. I think it's better to start with data frames because they're immediately useful, and then work your way down to the underlying components.
So far this book has focussed on data frames and packages that work with them. But as you start to write your own functions, and dig deeper into R, you need to learn about vectors, the objects that underpin data frames. If you've learned R in a more traditional way, you're probably already familiar with vectors, as most R resources start with vectors and work their way up to data frames. I think it's better to start with data frames because they're immediately useful, and then work your way down to the underlying components.

Vectors are particularly important as its to learn to write functions that work with vectors, rather than data frames. The technology that lets ggplot2, tidyr, dplyr etc work with data frames is considerably more complex and not currently standardised. While I'm currently working on a new standard that will make life much easier, it's unlikely to be ready in time for this book.

Expand Down Expand Up @@ -211,7 +211,7 @@ if (length(x)) {
}
```

In this case, 0 is converted to `FALSE` and everything else is converted to `TRUE`. I think this makes it harder to understand your code, and I recommend it.
In this case, 0 is converted to `FALSE` and everything else is converted to `TRUE`. I think this makes it harder to understand your code, and I don't recommend it.

It's also important to understand what happens when you try and create a vector containing multiple types with `c()`: the most complex type always wins.

Expand Down
4 changes: 2 additions & 2 deletions functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ if (NA) {}

You can use `||` (or) and `&&` (and) to combine multiple logical expressions. These operators are "short-circuiting": as soon as `||` sees the first `TRUE` it returns `TRUE` without computing anything else. As soon as `&&` sees the first `FALSE` it returns `FALSE`. You should never use `|` or `&` in an `if` statement: these are vectorised operations that apply to multiple values (that's why you use them in `filter()`). If you do have a logical vector, you can use `any()` or `all()` to collapse it to a single value.

Be careful when testing for equality. `==` is vectorised, which means that it's easy to get more than one output. Either check the the length is already 1, collapsed with `all()` or `any()`, or use the non-vectorised `identical()`. `identical()` is very strict: it always returns either a single `TRUE` or a single `FALSE`, and doesn't coerce types. This means that you need to be careful when comparing integers and doubles:
Be careful when testing for equality. `==` is vectorised, which means that it's easy to get more than one output. Either check the length is already 1, collapsed with `all()` or `any()`, or use the non-vectorised `identical()`. `identical()` is very strict: it always returns either a single `TRUE` or a single `FALSE`, and doesn't coerce types. This means that you need to be careful when comparing integers and doubles:

```{r}
identical(0L, 0)
Expand Down Expand Up @@ -641,7 +641,7 @@ complicated_function <- function(x, y, z) {
```

Another reason is becuase you have a `if` statement with one complex block and one simple block. For example, you might write an if statement like this:
Another reason is because you have a `if` statement with one complex block and one simple block. For example, you might write an if statement like this:

```{r, eval = FALSE}
f <- function() {
Expand Down
7 changes: 3 additions & 4 deletions iteration.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

```{r setup, include=FALSE}
library(purrr)
library(stringr)
```

In [functions], we talked about how important it is to reduce duplication in your code. Reducing code duplication has three main benefits:
Expand Down Expand Up @@ -104,8 +105,6 @@ Every for loop has three components:
the work. It's run repeatedly, each time with a different value for `i`.
The first iteration will run `output[[1]] <- median(df[[1]])`,
the second will run `output[[2]] <- median(df[[2]])`, and so on.
If you haven't seen `x[[i]]` before, it extracts the `i`th element from
`x`. You'll learn more about it in [subsetting].
That's all there is to the for loop! Now is a good time to practice creating some basic (and not so basic) for loops using the exercises below. Then we'll move on some variations of the for loop that help you solve other problems that will crop up in practice.
Expand All @@ -127,7 +126,7 @@ That's all there is to the for loop! Now is a good time to practice creating som
```{r}
out <- ""
for (x in letters) {
out <- paste0(out, x)
out <- str_c(out, x)
}
x <- sample(100)
Expand Down Expand Up @@ -843,7 +842,7 @@ library(ggplot2)
plots <- mtcars %>%
split(.$cyl) %>%
map(~ggplot(., aes(mpg, wt)) + geom_point())
paths <- paste0(names(plots), ".pdf")
paths <- str_c(names(plots), ".pdf")
pwalk(list(paths, plots), ggsave, path = tempdir())
```
Expand Down
2 changes: 1 addition & 1 deletion model-many.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ The advantage of this structure is that it generalises in a straightforward way
Now if you want to iterate over names and values in parallel, you can use `map2()`:

```{r}
df %>% mutate(smry = map2_chr(name, value, ~ paste0(.x, ": ", .y[1])))
df %>% mutate(smry = map2_chr(name, value, ~ stringr::str_c(.x, ": ", .y[1])))
```

Expand Down
9 changes: 5 additions & 4 deletions relational-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
library(dplyr)
library(nycflights13)
library(ggplot2)
library(stringr)
```

It's rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you're interested in. Collectively, multiple tables of data are called __relational data__ because it is the relations, not just the individual datasets, that are particularly important.
Expand Down Expand Up @@ -261,8 +262,8 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
and a foreign key in `x`.
```{r}
x <- data_frame(key = c(1, 2, 2, 1), val_x = paste0("x", 1:4))
y <- data_frame(key = 1:2, val_y = paste0("y", 1:2))
x <- data_frame(key = c(1, 2, 2, 1), val_x = str_c("x", 1:4))
y <- data_frame(key = 1:2, val_y = str_c("y", 1:2))
left_join(x, y, by = "key")
```
Expand All @@ -275,8 +276,8 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
```
```{r}
x <- data_frame(key = c(1, 2, 2, 3), val_x = paste0("x", 1:4))
y <- data_frame(key = c(1, 2, 2, 3), val_y = paste0("y", 1:4))
x <- data_frame(key = c(1, 2, 2, 3), val_x = str_c("x", 1:4))
y <- data_frame(key = c(1, 2, 2, 3), val_y = str_c("y", 1:4))
left_join(x, y, by = "key")
```
Expand Down
2 changes: 1 addition & 1 deletion strings.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ There are number of other special patterns that match more than one character:
* `\d`: any digit.
* `\s`: any whitespace (space, tab, newline).
* `[abc]`: match a, b, or c.
* `[!abc]`: match anything except a, b, or c.
* `[^abc]`: match anything except a, b, or c.
Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.
Expand Down
2 changes: 1 addition & 1 deletion tidy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Note that this chapter explains how to change the format, or layout, of tabular

In *Section 4.1*, you will learn how the features of R determine the best way to layout your data. This section introduces "tidy data," a way to organize your data that works particularly well with R.

*Section 4.2* teaches the basic method for making untidy data tidy. In this section, you will learn how to reorganize the values in your data set with the the `spread()` and `gather()` functions of the `tidyr` package.
*Section 4.2* teaches the basic method for making untidy data tidy. In this section, you will learn how to reorganize the values in your data set with the `spread()` and `gather()` functions of the `tidyr` package.

*Section 4.3* explains how to split apart and combine values in your data set to make them easier to access with R.

Expand Down

0 comments on commit 539e6d9

Please sign in to comment.