Skip to content

Commit

Permalink
Clarify plaintext and fix punctuation (hadley#1251)
Browse files Browse the repository at this point in the history
  • Loading branch information
zekiakyol authored Jan 27, 2023
1 parent 96f4ad4 commit feaf954
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion data-import.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ It then works through the following questions:
[^data-import-2]: You can override the default of 1000 with the `guess_max` argument.
- Does it contain only `F`, `T`, `FALSE`, or `TRUE` (ignoring case)? If so, it's a logical.
- Does it contain only numbers (e.g., `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it contain only numbers (e.g. `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it match the ISO8601 standard? If so, it's a date or date-time. (We'll return to date-times in more detail in @sec-creating-datetimes).
- Otherwise, it must be a string.
Expand Down
2 changes: 1 addition & 1 deletion joins.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This chapter will introduce you to two important types of joins:
- Filtering joins, which filter observations from one data frame based on whether or not they match an observation in another.

We'll begin by discussing keys, the variables used to connect a pair of data frames in a join.
We cement the theory with an examination of the keys in the nycflights13 datasets, then use that knowledge to start joining data frames together.
We cement the theory with an examination of the keys in the datasets from the nycflights13 package, then use that knowledge to start joining data frames together.
Next we'll discuss how joins work, focusing on their action on the rows.
We'll finish up with a discussion of non-equi-joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship.

Expand Down
4 changes: 2 additions & 2 deletions logicals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ We'll finish off with `if_else()` and `case_when()`, two useful functions for ma
### Prerequisites

Most of the functions you'll learn about in this chapter are provided by base R, so we don't need the tidyverse, but we'll still load it so we can use `mutate()`, `filter()`, and friends to work with data frames.
We'll also continue to draw examples from the nycflights13 dataset.
We'll also continue to draw examples from the `nycflights13::flights` dataset.

```{r}
#| label: setup
Expand Down Expand Up @@ -404,7 +404,7 @@ This works, but what if we wanted to also compute the average delay for flights
We'd need to perform a separate filter step, and then figure out how to combine the two data frames together[^logicals-3].
Instead you could use `[` to perform an inline filtering: `arr_delay[arr_delay > 0]` will yield only the positive arrival delays.

[^logicals-3]: We'll cover this in @sec-joins\]
[^logicals-3]: We'll cover this in @sec-joins.

This leads to:

Expand Down
2 changes: 1 addition & 1 deletion webscraping.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Note, however, the situation is rather different in Europe where courts have fou

Even if the data is public, you should be extremely careful about scraping personally identifiable information like names, email addresses, phone numbers, dates of birth, etc.
Europe has particularly strict laws about the collection of storage of such data (GDPR), and regardless of where you live you're likely to be entering an ethical quagmire.
For example, in 2016, a group of researchers scraped public profile information (e.g., usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
For example, in 2016, a group of researchers scraped public profile information (e.g. usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
While the researchers felt that there was nothing wrong with this since the data were already public, this work was widely condemned due to ethics concerns around identifiability of users whose information was released in the dataset.
If your work involves scraping personally identifiable information, we strongly recommend reading about the OkCupid study as well as similar studies with questionable research ethics involving the acquisition and release of personally identifiable information.[^webscraping-3]

Expand Down

0 comments on commit feaf954

Please sign in to comment.