Skip to content

Commit

Permalink
Minor corrections (hadley#1244)
Browse files Browse the repository at this point in the history
  • Loading branch information
stephenbalogun authored Jan 23, 2023
1 parent 01b8566 commit 5d912aa
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
8 changes: 4 additions & 4 deletions joins.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ library(nycflights13)

## Keys

To understand joins, you need to first understand how two tables can be connected through a pair of keys, with on each table.
To understand joins, you need to first understand how two tables can be connected through a pair of keys, within each table.
In this section, you'll learn about the two types of key and see examples of both in the datasets of the nycflights13 package.
You'll also learn how to check that your keys are valid, and what to do if your table lacks a key.

Expand Down Expand Up @@ -138,7 +138,7 @@ weather |>
### Surrogate keys

So far we haven't talked about the primary key for `flights`.
It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if have some way to describe them to others.
It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if we have some way to describe them to others.

After a little thinking and experimentation, we determined that there are three variables that together uniquely identify each flight:

Expand Down Expand Up @@ -194,7 +194,7 @@ Surrogate keys can be particular useful when communicating to other humans: it's
## Basic joins {#sec-mutating-joins}

Now that you understand how data frames are connected via keys, we can start using joins to better understand the `flights` dataset.
dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, and `anti_join()`.
dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, `anti_join(), and full_join()`.
They all have the same interface: they take a pair of data frames (`x` and `y`) and return a data frame.
The order of the rows and columns in the output is primarily determined by `x`.

Expand Down Expand Up @@ -321,7 +321,7 @@ airports |>
**Anti-joins** are the opposite: they return all rows in `x` that don't have a match in `y`.
They're useful for finding missing values that are **implicit** in the data, the topic of @sec-missing-implicit.
Implicitly missing values don't show up as `NA`s but instead only exist as an absence.
For example, we can find rows that as missing from `airports` by looking for flights that don't have a matching destination airport:
For example, we can find rows that are missing from `airports` by looking for flights that don't have a matching destination airport:

```{r}
flights2 |>
Expand Down
2 changes: 1 addition & 1 deletion spreadsheets.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ read_excel("data/bake-sale.xlsx")

### Formatted output

The readxl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar.
For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse.
However, this is ok.
Expand Down

0 comments on commit 5d912aa

Please sign in to comment.