Skip to content

Commit

Permalink
Better factor motivation
Browse files Browse the repository at this point in the history
Thanks to @csgillespie
  • Loading branch information
hadley committed Oct 4, 2016
1 parent 55caa63 commit 5fee66e
Showing 1 changed file with 46 additions and 8 deletions.
54 changes: 46 additions & 8 deletions factors.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,32 +19,70 @@ library(forcats)

## Creating factors

Typically you'll convert a factor from a character vector, using `factor()`. Apart from the character input, the most important argument is the list of valid __levels__:
Imagine that you have a variable that records month:

```{r}
x <- c("pear", "apple", "banana", "apple", "pear", "apple")
factor(x, levels = c("apple", "banana", "pear"))
x1 <- c("Dec", "Apr", "Jan", "Mar")
```

Any values not in the list of levels will be silently converted to `NA`:
Using a string to record this variable has two problems:

1. There are only twelve possible months, and there's nothing saving you
from typos:

```{r}
x2 <- c("Dec", "Apr", "Jam", "Mar")
```
1. It doesn't sort in a useful way:
```{r}
sort(x1)
```
You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid __levels__:
```{r}
month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)
```

Now you can create a factor:

```{r}
y1 <- factor(x1, levels = month_levels)
y1
sort(y1)
```

And any values not in the set will be silently converted to NA:

```{r}
y2 <- factor(x2, levels = month_levels)
y2
```

If you want a want, you can use `readr::parse_factor()`:

```{r}
factor(x, levels = c("apple", "banana"))
y2 <- parse_factor(x2, levels = month_levels)
```

If you omit the levels, they'll be taken from the data in alphabetical order:

```{r}
factor(x)
factor(x1)
```

Sometimes you'd prefer that the order of the levels match the order of the first appearance in the data. You can do that when creating the factor by setting levels to `unique(x)`, or after the fact, with `fct_inorder()`:

```{r}
f1 <- factor(x, levels = unique(x))
f1 <- factor(x1, levels = unique(x1))
f1
f2 <- x %>% factor() %>% fct_inorder()
f2 <- x1 %>% factor() %>% fct_inorder()
f2
```

Expand Down

0 comments on commit 5fee66e

Please sign in to comment.