Mild import/wrangling reorg

zekiakyol · Jun 20, 2022 · 8f7748d · 8f7748d
1 parent 23bfba6
commit 8f7748d
Show file tree

Hide file tree

Showing 12 changed files with 25 additions and 72 deletions.
diff --git a/.gitignore b/.gitignore
@@ -14,5 +14,5 @@ libs
 _main.*
 tmp-pdfcrop-*
 figures
-
 /.quarto/
+site_libs
diff --git a/EDA.qmd b/EDA.qmd
@@ -81,7 +81,7 @@ To make the discussion easier, let's define some terms:
     Tabular data is *tidy* if each value is placed in its own "cell", each variable in its own column, and each observation in its own row.
 
 So far, all of the data that you've seen has been tidy.
-In real-life, most data isn't tidy, so we'll come back to these ideas again in [Chapter -@sec-list-columns] and [Chapter -@sec-rectangle-data].
+In real-life, most data isn't tidy, so we'll come back to these ideas again in @sec-rectangling.
 
 ## Variation
 

diff --git a/_quarto.yml b/_quarto.yml
@@ -65,14 +65,13 @@ book:
         - missing-values.qmd
         - column-wise.qmd
 
-    - part: import.qmd
+    - part: wrangle.qmd
       chapters:
-        - import-rectangular.qmd
-        - import-spreadsheets.qmd
-        - import-databases.qmd
-        - rectangle.qmd
-        - import-webscrape.qmd
-        - import-other.qmd
+        - parsing.qmd
+        - spreadsheets.qmd
+        - databases.qmd
+        - rectangling.qmd
+        - webscraping.qmd
 
     - part: program.qmd
       chapters:

diff --git a/data-import.qmd b/data-import.qmd
@@ -11,8 +11,7 @@ status("polishing")
 
 Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data.
 In this chapter, you'll learn how to read plain-text rectangular files into R.
-Here, we'll only scratch the surface of data import, but many of the principles will translate to other forms of data.
-We'll finish with a few pointers to packages that are useful for other types of data.
+Here, we'll only scratch the surface of data import, but many of the principles will translate to other forms of data, which we'll come back to in @sec-wrangle.
 
 ### Prerequisites
 
@@ -320,33 +319,10 @@ There are two alternatives:
     ```
 
 Feather tends to be faster than RDS and is usable outside of R.
-RDS supports list-columns (which you'll learn about in [Chapter -@sec-list-columns]; feather currently does not.
+RDS supports list-columns (which you'll learn about in @sec-rectangling; feather currently does not.
 
 ```{r}
 #| include: false
-
 file.remove("students-2.csv")
 file.remove("students.rds")
 ```
-
-## Other types of data
-
-To get other types of data into R, we recommend starting with the tidyverse packages listed below.
-They're certainly not perfect, but they are a good place to start.
-For rectangular data:
-
--   **readxl** reads Excel files (both `.xls` and `.xlsx`).
-    See [Chapter -@sec-import-spreadsheets] for more on working with data stored in Excel spreadsheets.
-
--   **googlesheets4** reads Google Sheets.
-    Also see [Chapter -@sec-import-spreadsheets] for more on working with data stored in Google Sheets.
-
--   **DBI**, along with a database specific backend (e.g. **RMySQL**, **RSQLite**, **RPostgreSQL** etc) allows you to run SQL queries against a database and return a data frame.
-    See [Chapter -@sec-import-databases] for more on working with databases .
-
--   **haven** reads SPSS, Stata, and SAS files.
-
-For hierarchical data: use **jsonlite** (by Jeroen Ooms) for json, and **xml2** for XML.
-Jenny Bryan has some excellent worked examples at <https://jennybc.github.io/purrr-tutorial/>.
-
-For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [**rio**](https://github.com/leeper/rio) package.
diff --git a/data-tidy.qmd b/data-tidy.qmd
@@ -557,7 +557,7 @@ df <- tribble(
 )
 ```
 
-If we attempt to pivot this we get an output that contains list-columns, which you'll learn more about in [Chapter -@sec-list-columns]:
+If we attempt to pivot this we get an output that contains list-columns, which you'll learn more about in @sec-rectangling:
 
 ```{r}
 df |> pivot_wider(

diff --git a/import-databases.qmd → databases.qmd b/import-databases.qmd → databases.qmd
diff --git a/import-rectangular.qmd → parsing.qmd b/import-rectangular.qmd → parsing.qmd
@@ -1,4 +1,4 @@
-# Rectangular data {#sec-import-rectangular}
+# Parsing {#sec-import-rectangular}
 
 ```{r}
 #| results: "asis"

diff --git a/rectangle.qmd → rectangling.qmd b/rectangle.qmd → rectangling.qmd
@@ -1,4 +1,4 @@
-# Data rectangling {#sec-rectangle-data}
+# Data rectangling {#sec-rectangling}
 
 ```{r}
 #| results: "asis"
@@ -86,10 +86,10 @@ x5 <- list(1, list(2, list(3, list(4, list(5)))))
 str(x5)
 ```
 
-As lists get even large and more complex, even `str()` starts to fail, you'll need to switch to `View()`[^rectangle-1].
+As lists get even large and more complex, even `str()` starts to fail, you'll need to switch to `View()`[^rectangling-1].
 @fig-view-collapsed shows the result of calling `View(x4)`. The viewer starts by showing just the top level of the list, but you can interactively expand any of the components to see more, as in @fig-view-expand-1. RStudio will also show you the code you need to access that element, as in @fig-view-expand-2. We'll come back to how this code works in @sec-vector-subsetting.
 
-[^rectangle-1]: This is an RStudio feature.
+[^rectangling-1]: This is an RStudio feature.
 
 ```{r}
 #| label: fig-view-collapsed

diff --git a/import-spreadsheets.qmd → spreadsheets.qmd b/import-spreadsheets.qmd → spreadsheets.qmd
diff --git a/tidy.qmd b/tidy.qmd
diff --git a/import-webscrape.qmd → webscraping.qmd b/import-webscrape.qmd → webscraping.qmd
diff --git a/import.qmd → wrangle.qmd b/import.qmd → wrangle.qmd
@@ -1,4 +1,4 @@
-# Wrangle {#sec-import-intro .unnumbered}
+# Wrangle {#sec-wrangle .unnumbered}
 
 ```{r}
 #| results: "asis"
@@ -14,14 +14,20 @@ But in more complex cases it encompasses both tidying and transformation as the
 
 This part of the book proceeds as follows:
 
--   In @sec-import-rectangular, you'll learn how to get plain-text data in rectangular formats from disk and into R.
+-   In @sec-rectangling, you'll learn how to get plain-text data in rectangular formats from disk and into R.
 
 -   In @sec-import-spreadsheets, you'll learn how to get data from Excel spreadsheets and Google Sheets into R.
 
 -   In @sec-import-databases, you'll learn about getting data into R from databases.
 
--   In @sec-rectangle-data, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
+-   In @sec-rectangling, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
 
 -   In @sec-import-webscrape, you'll learn about harvesting data off the web and getting it into R.
 
--   We'll close up the part with a brief discussion on other types of data and pointers for how to get them into R in @sec-import-other.
+Some other types of data are not covered in this book:
+
+-   **haven** reads SPSS, Stata, and SAS files.
+
+-   xml2 for **xml2** for XML
+
+For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [**rio**](https://github.com/leeper/rio) package.
-Original file line number
+Diff line change
@@ Expand Up / @@ -557,7 +557,7 @@ df <- tribble( @@
     )
     ```
-    If we attempt to pivot this we get an output that contains list-columns, which you'll learn more about in [Chapter -@sec-list-columns]:
+    If we attempt to pivot this we get an output that contains list-columns, which you'll learn more about in @sec-rectangling:
     ```{r}
     df |> pivot_wider(
@@ Expand Down @@