Skip to content

Commit

Permalink
Merge branch 'master' of github.com:hadley/r4ds
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jan 21, 2016
2 parents b359976 + de38ea8 commit 43fcab6
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions relational-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ So far, the pairs of tables have always been joined by a single variable, and th
a suffix.
* A named character vector: `by = c("a" = "b")`. This will
match variable `a` in table `x` to variable `y` in table `b`. The
match variable `a` in table `x` to variable `b` in table `y`. The
variables from `x` will be used in the output.
For example, if we want to draw a map we need to combine the flights data
Expand Down Expand Up @@ -429,7 +429,7 @@ Graphically, a semi-join looks like this:
knitr::include_graphics("diagrams/join-semi.png")
```

Only the existence of a match is important; it doesn't match what observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
Only the existence of a match is important; it doesn't matter which observation is matched. This means that filtering joins never duplicate rows like mutating joins do:

```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("diagrams/join-semi-many.png")
Expand Down Expand Up @@ -467,7 +467,7 @@ flights %>%
The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.

1. Start by identifying the variables that form the primary key in each table.
You should usually do this based on your understand of the data, not
You should usually do this based on your understanding of the data, not
empirically by looking for a combination of variables that give a
unique identifier. If you just look for variables without thinking about
what they mean, you might get (un)lucky and find a combination that's
Expand All @@ -490,7 +490,7 @@ The data you've been working with in this chapter has been cleaned up so that yo
use of inner vs. outer joins, carefully considering whether or not you
want to drop rows that don't have a match.
Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlikely at the number of dropped rows might exactly equal the number of duplicated rows!
Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlucky as the number of dropped rows might exactly equal the number of duplicated rows!
## Set operations {#set-operations}
Expand Down

0 comments on commit 43fcab6

Please sign in to comment.