03-iteration-01.qmd

---
title: "useR to programmeR"
subtitle: "Iteration 1"
author: "Emma Rand and Ian Lyttle"
format: 
  revealjs:
    theme: [simple, styles.scss]
    footer: <https://pos.it/programming_r_24>
    slide-number: true
    chalkboard: true
    code-link: true
    code-line-numbers: false
bibliography: references.bib
---

# Overview

## Overview

In this session we will cover another way to reduce code duplication: iteration.

## Learning Objectives

At the end of this section you will be able to:

::: {style="font-size: 80%;"}
-   recognise that much iteration comes free with R

-   iterate across rows using `across()`

    -   use selection functions to select columns for iteration
    -   use anonymous functions to pass arguments
    -   give more than one function for iteration
    -   use `.names` to control the output

-   use `across()` in functions
:::

## What is iteration?

-   Iteration means repeating steps multiple times until a condition is met

-   In other languages, iteration is performed with loops: `for`, `while`

. . .

-   Iteration is different in R

-   You *can* use loops....... but you often don't *need* to

## Iteration in R

Iteration is an inherent part of the language. For example, if

```{r}
nums <- c(3, 1, 6, 4)

```

Then

```{r}
#| eval: false
2 * nums
```

is

## Iteration in R

``` r
[1]  6  2 12  8
```

and NOT

. . .

``` r
[1]  6  2 12  8  6  2 12  8
```

## Iteration in R

We have:

-   `group_by()` with `summarize()`

-   `facet_wrap()`

-   `across()` and `purrr()`

. . . - the `apply()` family

. . .

other languages, a for loop would be right after hello world

## Functional programming

"functional programming" because functions take other functions as input

-   modifying multiple columns {dplyr}

-   reading multiple files {purrr}

-   saving multiple outputs {purrr}

# Set up

## Create a `.R`

```{r}
#| eval: false

usethis::use_r("iteration-01")
```

## Packages

🎬 Load packages:

```{r}
library(tidyverse)
library(palmerpenguins)
```

## Load `penguins`

🎬 Load `penguins` data set

```{r}
data(penguins)
glimpse(penguins)

```

# Modifying multiple columns

## Scenario

Recall our standard error function from this morning:

```{r}
sd_error <- function(x){
  sd(x, na.rm = TRUE) / sqrt(sum(!is.na(x)))
}
```

## Scenario

Which we might use as:

```{r}
penguins |> 
  summarise(se_bill_len = sd_error(bill_length_mm),
            se_bill_dep = sd_error(bill_depth_mm),
            se_flip_len = sd_error(flipper_length_mm ),
            se_body_mas = sd_error(body_mass_g))
```

. . .

⚠️ Code repetition!

How can we iterate over rows?

## Solution: `across()`

```{r}
penguins |> 
  summarise(across(bill_length_mm:body_mass_g, sd_error))
```

## `across()` Arguments

`across(.cols, .fns, .names)`

3 important arguments

## `across()` Arguments

-   which columns you want to iterate over: `.cols = bill_length_mm:body_mass_g`

. . .

-   what you want to do to each column: `.fns = sd_error`

    -   single function, include arguments, more than one function

. . .

-   `.names` to control output

## selecting columns with `.cols`

-   we could use colon notation, `bill_length_mm:body_mass_g`, because columns are adjacent

. . .

but

-   `.cols` uses same specification as `select()`: `starts_with()`, `ends_with()`, `contains()`, `matches()`

## selecting columns with `.cols`

```{r}
penguins |> 
  summarise(across(ends_with("mm"), sd_error))
```

## selecting columns with `.cols`

-   `everything()`: all non-grouping columns

```{r}
penguins |> 
  group_by(species, island, sex) |> 
  summarise(across(everything(), sd_error))
```

## selecting columns with `.cols`

```{r}
#| eval: false
penguins |> 
  group_by(species, island, sex) |> 
  summarise(across(everything(), sd_error))
```

-   variables in `group_by()` are excluded

-   all of `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `body_mass_g`, `year`

## selecting columns with `.cols`

-   `everything()`: all non-grouping columns without year

```{r}
penguins |> 
  select(-year) |>
  group_by(species, island, sex) |> 
  summarise(across(everything(), sd_error))
```

## selecting columns with `.cols`

-   My columns have very different names and I don't want to group!

. . .

-   all the *numeric* columns: `where()`

```{r}
penguins |> 
  select(-year) |>
  summarise(across(where(is.numeric), sd_error))
```

## `.funs`: calling one function

-   we can pass a function, `sd_error` to `across()` since R is a functional programming language

-   note, we are not calling `sd_error()`

-   instead we pass `sd_error` so `across()` can call it

-   thus function name is **not** followed by `()`

## function name is **not** followed by `()`

📢

```{r}
#| error: true
penguins |> 
  select(-year) |>
  summarise(across(where(is.numeric), sd_error()))
```

. . .

This error is easy to make!

## Include arguments

```{r}
penguins |> 
  summarise(across(ends_with("mm"), mean))
```

We get the NA because we have missing values[^1].

[^1]: There is no problem when we use `sd_error()` because we accounted for NA in our function definition

## Include arguments

`mean()` has an `na.rm` argument.

How can we pass on `na.rm = TRUE`?

. . .

We might try:

```{r}
#| error: true
penguins |> 
  summarise(across(ends_with("mm"), mean(na.rm = TRUE)))
```

## Include arguments

The solution is to create a new function that calls `mean()` with `na.rm = TRUE`

. . .

```{r}
penguins |> 
  summarise(across(ends_with("mm"), 
                   function(x) mean(x, na.rm = TRUE)))
```

. . .

`mean` is replaced by a function definition

## Anonymous functions

``` r
penguins |> 
  summarise(across(ends_with("mm"), 
                   function(x) mean(x, na.rm = TRUE)))
```

-   This is called an **anonymous** or **lambda** function.

-   It is anonymous because we do not give it a name with `<-`

## Anonymous functions

Shorthand

. . .

Instead of writing `function` we can use `\`

```{r}
penguins |> 
  summarise(across(ends_with("mm"), 
                   \(x) mean(x, na.rm = TRUE)))
```

## Anonymous functions

Note, You might also see:

```{r}
penguins |> 
  summarise(across(ends_with("mm"), 
                   ~ mean(.x, na.rm = TRUE)))
```

. . .

-   `\(x)` is base syntax new in 4.1.0 **Recommended**

-   `~ .x` is fine but only works in tidyverse functions

## `.funs`: calling more than one function

How can we use more than one function across the columns?

``` r
penguins |> 
  summarise(across(ends_with("mm"), _MORE THAN ONE FUNCTION_))
```

. . .

by using a list

## `.funs`: calling more than one function

Using a list:

``` r
penguins |> 
  summarise(across(where(is.numeric), list(
    sd_error, 
    length)))
```

. . .

Or, with anonymous functions:

``` r
penguins |> 
  summarise(across(ends_with("mm"), list(
    \(x) mean(x, na.rm = TRUE),
    \(x) sd(x, na.rm = TRUE))))
```

## `.funs`: calling more than one function

```{r}
penguins |> 
  summarise(across(ends_with("mm"), list(
    \(x) mean(x, na.rm = TRUE),
    \(x) sd(x, na.rm = TRUE))))
```

. . .

Problem: the suffixes `_1` and `_2` for functions are not very useful.

## `.funs`: calling more than one function

We can improve by naming the elements in the list

```{r}
penguins |> 
  summarise(across(ends_with("mm"), list(
    mean = \(x) mean(x, na.rm = TRUE),
    sdev = \(x) sd(x, na.rm = TRUE))))
```

. . .

The column name is `{.col}_{.fn}`: `bill_length_mm_mean`

fn: **f**unction **n**ame

. . .

We can change using the `.names` argument

## `.names` to control output

```{r}
penguins |> 
  summarise(across(ends_with("mm"),
                   list(mean = \(x) mean(x, na.rm = TRUE),
                        sdev = \(x) sd(x, na.rm = TRUE)),
                   .names = "{.fn}_of_{.col}"))
```

## `.names` to control output

Especially important for `mutate()`.

Recall our `to_z()` function

```{r}
to_z <- function(x, middle = 1) {
  trim = (1 - middle)/2
  (x - mean(x, na.rm = TRUE, trim = trim)) / sd(x, na.rm = TRUE)
}
```

## `to_z()` function in `mutate()`

which we used like this

```{r}
penguins |>
  mutate(
    z_bill_length_mm = to_z(bill_length_mm),
    z_bill_depth_mm = to_z(bill_depth_mm),
    z_flipper_length_mm = to_z(flipper_length_mm)
  ) |> 
  glimpse()
```

## `.names` to control output

It makes sense to use `across()` to apply the transformation to all three variables

```{r}
penguins |>
  mutate(across(ends_with("mm"),
                to_z)
  ) |> 
  glimpse()
```

😮 Results go into existing columns!

## 

```{r}
penguins |>
  mutate(across(ends_with("mm"),
                to_z,
                .names = "z_{.col}")
  ) |> 
  glimpse()
```

<!-- ## A note on dots in argument names -->

<!-- -    -->

<!-- -    -->

<!-- ## Iteration over columns in `filter()` -->

<!-- ?? -->

## Your turn

Time to bring together functions and iteration!

🎬 Write a function that summarises multiple specified columns of a data frame

``` r
my_summary <- function(df, cols) {

   . . . .

}
```

``` r
my_summary(penguins, ends_with("mm"))
```

## A solution

```{r}
my_summary <- function(df, cols) {
  df |> 
    summarise(across({{ cols }},
                     list(mean = \(x) mean(x, na.rm = TRUE),
                          sdev = \(x) sd(x, na.rm = TRUE))),
              .groups = "drop")
}

```

## Try it out

```{r}
penguins |> 
  group_by(species) |> 
  my_summary(ends_with("mm"))
```

## A improved solution

Include a default.

```{r}
my_summary <- function(df, cols = where(is.numeric)) {
  df |> 
    summarise(across({{cols}},
                     list(mean = \(x) mean(x, na.rm = TRUE),
                          sdev = \(x) sd(x, na.rm = TRUE))),
              .groups = "drop")
}

```

## Try it out

```{r}
penguins |> 
  select(-year) |>
  my_summary()
```

## Summary

-   you already knew some iteration: `group_by()`, `facet_wrap()`

-   `across()` iterates over columns

    -   choose columns with familiar `select()` spec
    -   pass functions without their `()`
    -   use anonymous functions to add arguments
    -   use a list to use multiple functions
    -   specify the names

-   You can use `across()` in functions!