index.Rmd

---
title: "Available datasets"
output: rmarkdown::html_document
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(DT.warn.size = FALSE)
library(DT)
library(dplyr)

tablify <- function(x, ...) {
  datatable(
    x, 
    style = "bootstrap4",
    rownames = FALSE,
    options = list(
      pageLength = 10,
      lengthMenu = c(10, 100, 500, 1000),
      # fixedHeader = TRUE,
      # paging = FALSE,
      autoWidth = TRUE,
      columnDefs = list(
        # Unfortunately widths are all over the place
        # due to some very long dataset names
        list(targets = 0, width = '5%'),
        list(targets = 1, width = '5%'),
        list(targets = 2, width = '30%'),
        list(targets = 3, searchable = FALSE),
        list(targets = 4, width = '10%'),
        list(targets = 5, width = '10%')
      )
    ), 
    filter = "top",
    ...
  )
}

xdat <- read.csv(here::here("datasets.csv")) %>%
  mutate(
    Package = factor(Package),
    CSV = paste0('<a href="', CSV, '">CSV</a>'),
    #CSV = purrr::map(CSV, htmltools::HTML),
    Doc = paste0('<a href="', Doc, '">Doc</a>'),
    #Doc = purrr::map(Doc, htmltools::HTML),
    Link = paste(CSV, "<br/>", Doc),
    Link = purrr::map(Link, htmltools::HTML)
  ) %>%
  select(Package, Item, Title, Link, Rows, Cols, order(names(.)), -CSV, -Doc)

```

This is a slightly adapted version of [Vincent's very handy project](https://github.com/vincentarelbundock/Rdatasets) with some added packages and a couple of tweaks to the table.

Please note that not all datasets could be parsed correctly, with some type-specific issues for non-`data.frame` datasets such as `plyr::ozone` which is an `array` with dimensions `24 24 72`.

# Datasets {.tabset}

## All

This shows all  `r nrow(xdat)` datasets.

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
tablify(xdat)
```

## Small

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset <- xdat %>%
  filter(Rows <= 100, Cols <= 10)
```

This subset contains `r nrow(xdat_subset)` datasets with `Rows <= 100 & Cols <= 10`.

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset %>%
  tablify()
```

## High dimensional

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset <- xdat %>%
  filter(Cols > Rows)
```

This subset contains `r nrow(xdat_subset)` datasets with `Cols > Rows`.   
Defining high-dimensionality is hard but this seems like a place to start.

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset %>%
  tablify()
```

## Large

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset <- xdat %>%
  filter(Rows >= 10000)
```

This subset contains `r nrow(xdat_subset)` datasets with `Rows >= 10000`.

```{r, messages=FALSE, warnings=FALSE, echo=FALSE}
xdat_subset %>%
  tablify()
```