Skip to content

Commit

Permalink
Use curl::multi_download()
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jan 23, 2023
1 parent 707b332 commit f090584
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Depends:
Imports:
arrow,
babynames,
curl (>= 5.0.0),
dplyr,
duckdb,
gapminder,
Expand Down
12 changes: 7 additions & 5 deletions arrow.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,18 @@ We begin by getting a dataset worthy of these tools: a data set of item checkout
This dataset contains 41,389,465 rows that tell you how many times each book was checked out each month from April 2005 to October 2022.

The following code will get you a cached copy of the data.
The data is a 9GB CSV file, so it will take some time to download: simply getting the data is often the first challenge!
The data is a 9GB CSV file, so it will take some time to download.
I highly recommend using `curl::multidownload()` to get very large files as it's built for exactly this purpose: it gives you a progress bar and it can resume the download if its interrupted.

```{r}
#| eval: false
dir.create("data", showWarnings = FALSE)
url <- "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv"
# Default timeout is 60s; bump it up to an hour
options(timeout = 60 * 60)
download.file(url, "data/seattle-library-checkouts.csv")
curl::multi_download(
"https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv",
"data/seattle-library-checkouts.csv",
resume = TRUE
)
```

## Opening a dataset
Expand Down

0 comments on commit f090584

Please sign in to comment.