diff --git a/data-transform.qmd b/data-transform.qmd index 20febd4e1..63957dc18 100644 --- a/data-transform.qmd +++ b/data-transform.qmd @@ -699,6 +699,39 @@ daily |> You get a single row back because dplyr treats all the rows in an ungrouped data frame as belonging to one group. +### `.by` + +dplyr 1.1.0 includes an new, experimental, syntax for per-operation grouping, the `.by` argument. +`group_by()` and `ungroup()` aren't going away, but you can now also use the `.by` argument to group within a single operation: + +```{r} +#| results: false +flights |> + summarize( + delay = mean(dep_delay, na.rm = TRUE), + n = n(), + .by = month + ) +``` + +Or if you want to group by multiple variables: + +```{r} +#| results: false +flights |> + summarize( + delay = mean(dep_delay, na.rm = TRUE), + n = n(), + .by = c(origin, dest) + ) +``` + +`.by` works with all verbs and has the advantage that you don't need to use the `.groups` argument to suppress the grouping message or `ungroup()` when you're done. + +We didn't focus on this syntax in this chapter because it was very new when wrote the book. +We did want to mention it because we think it has a lot of promise and it's likely to be quite popular. +You can learn more about it in the [dplyr 1.1.0 blog post](https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-operation-grouping/). + ### Exercises 1. Which carrier has the worst average delays?