Skip to content

Commit

Permalink
More @jennybc comments
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Oct 7, 2016
1 parent daaa861 commit 6afdb03
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 10 deletions.
4 changes: 2 additions & 2 deletions import.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,14 @@ Encodings are a rich and complex topic, and I've only scratched the surface here

### Factors {#readr-factors}

R uses factors to represent categorical variables that have a known set of possible values. Given `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
R uses factors to represent categorical variables that have a known set of possible values. Give `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:

```{r}
fruit <- c("apple", "banana")
parse_factor(c("apple", "banana", "bananana"), levels = fruit)
```

If you have problematic entries, it's often easier to read in as strings and then use the tools you'll learn about in [strings] and [factors] to clean them up.
But it you many problematic entries, it's often easier to leave as character vectors and then use the tools you'll learn about in [strings] and [factors] to clean them up.

### Dates, date-times, and times {#readr-datetimes}

Expand Down
8 changes: 4 additions & 4 deletions relational-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,6 @@ For nycflights13:
it contained weather records for all airports in the USA, what additional
relation would it define with `flights`?

1. You might expect that there's an implicit relationship between plane
and airline, because each plane is flown by a single airline. Confirm
or reject this hypothesis using data.

1. We know that some days of the year are "special", and fewer people than
usual fly on them. How might you represent that data as a data frame?
What would be the primary keys of that table? How would it connect to the
Expand Down Expand Up @@ -531,6 +527,10 @@ flights %>%
1. What does `anti_join(flights, airports, by = c("dest" = "faa"))` tell you?
What does `anti_join(airports, flights, by = c("faa" = "dest"))` tell you?

1. You might expect that there's an implicit relationship between plane
and airline, because each plane is flown by a single airline. Confirm
or reject this hypothesis using the tools you've learned above.

## Join problems

The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.
Expand Down
3 changes: 3 additions & 0 deletions tibble.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ The main reason that some older functions don't work with tibble is the `[` func
df[, c("abc", "xyz")]
```
1. If you have the name of a variable stored in an object, e.g. `var <- "mpg"`,
how can you extract the reference variable from a tibble?
1. Practice referring to non-syntactic names in the following data frame by:
1. Extracting the variable called `1`.
Expand Down
9 changes: 5 additions & 4 deletions tidy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,8 @@ table5 %>%
do? Why would you set it to `FALSE`?
1. Compare and contrast `separate()` and `extract()`. Why are there
three variations of separation, but only one unite?
three variations of separation (by position, by separator, and with
groups), but only one unite?
## Missing values
Expand Down Expand Up @@ -441,7 +442,7 @@ The best place to start is almost always to gather together the columns that are
in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`)
these are likely to be values, not variables.

So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
So we need to gather together all the columns from `new_sp_m014` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.

```{r}
who1 <- who %>%
Expand Down Expand Up @@ -539,10 +540,10 @@ who %>%
missing values? What's the difference between an `NA` and zero?

1. What happens if you neglect the `mutate()` step?
(`mutate(key = stringr::str_replace(key, "newrel", "new_rel"))`)

1. I claimed that `iso2` and `iso3` were redundant with `country`.
Confirm my claim by creating a table that uniquely maps from `country`
to `iso2` and `iso3`.
Confirm this claim.

1. For each country, year, and sex compute the total number of cases of
TB. Make an informative visualisation of the data.
Expand Down

0 comments on commit 6afdb03

Please sign in to comment.