forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbase-types.Rmd
182 lines (121 loc) · 7.26 KB
/
base-types.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# Base types {#base-types}
To talk about objects and OOP in R we need to first deal with a fundamental confusion: we use the word object to mean two different things. So far in this book, we've used object in a general sense, as captured by John Chambers' summary: "Everything that exists in R is an object". However, while everything _is_ an object, not everything is "object-oriented". This confusion arises because the base objects come from S, and were developed before anyone was thinking that S might need an OOP system. This means that the tools and nomenclature evolved organically without a single guiding principle.
Most of the time, this distinction is not important. But here we need to get into the nitty gritty details so we'll use the terms __base objects__ and __OO objects__ to distinguish the categories. OO objects have a class. The equivalent for a base object is the __base type__.
```{r, out.width = NULL, echo = FALSE}
knitr::include_graphics("diagrams/oo-venn.png", dpi = 300)
```
We'll also discuss the `is.*` functions in this chapter. These functions are used for many purposes, but the reason why we discuss them here is that they are commonly used to determine if an object has a specific base type.
## Base objects vs OO objects
To tell the difference between a base and OO object, use `is.object()`:
```{r}
# A base object:
is.object(1:10)
# An OO object
is.object(mtcars)
```
(This function would be better called `is.oo()` beacause it tells you if an object is a base object or a OO object.)
The primary attribute that distinguishes between base and OO object is the "class". Base objects do not have a class attribute:
```{r}
attr(1:10, "class")
attr(mtcars, "class")
```
Note, however, that `class()` never returns `NULL`: it's the class attribute that determines whether an object is OO or not, not the value of the `class()` _function_.
While only OO objects have a class attribute, every object has a __base type__:
```{r}
typeof(1:10)
typeof(mtcars)
```
You may have heard of `mode()` and `storage.mode()`. I recommend ignoring these functions because they just provide S compatible aliases of `typeof()`. Read the source code if you want to understand exactly what they do. \indexc{mode()}
Base objects do not form an OOP system because functions that behave differently for different base types are primarily written in C, where dispatch occurs using switch statements. This means only R-core can create new types, and creating a new type is a lot of work. As a consequence, new base types are rarely added. The most recent change, in 2011, added two exotic types that you never see in R, but are needed for diagnosing memory problems (`NEWSXP` and `FREESXP`). Prior to that, the last type added was a special base type for S4 objects (`S4SXP`) added in 2005.
## Base types
<!-- https://github.com/wch/r-source/blob/bf0a0a9d12f2ce5d66673dc32cd253524f3270bf/src/include/Rinternals.h#L149-L180 -->
In total, there are 25 different base types. They are listed below, loosely grouped according to where they're discussed in this book.
* The vectors: NULL, logical, integer, double, complex, character,
list, raw.
```{r}
typeof(1:10)
typeof(NULL)
typeof(1i)
```
* Functions: closure (regular R functions), special (internal functions),
builtin (primitive functions) and environments
```{r}
typeof(mean)
typeof(sum)
typeof(`[`)
typeof(globalenv())
```
* Language components: symbol (aka names), language (usually called calls),
pairlist (used for function arguments).
```{r}
typeof(quote(a))
typeof(quote(a + 1))
typeof(formals(mean))
```
* Expressions are a special purpose data type that's only returned by
`parse()` and `expression()`. They are not needed in user code.
* There are a few esoteric types that are important for C code but not
generally available at the R level: externalptr, weakref, bytecode, S4,
promise, "...", and any.
## The is functions
<!-- https://github.com/wch/r-source/blob/880337b753960bf77c6ccd8badca634e0f2a4914/src/main/coerce.c#L1764 -->
This is also a good place to discuss the `is` functions because they're often used to check if an object has a specific type:
```{r}
is.function(mean)
is.primitive(sum)
```
Generally, "is" functions can be surprising because there are several different classes, they often have a few special cases, and their names are historical so don't always reflect modern usage (or the usage found in this book.)
The "is" functions fall into six rough classes as described below.
* A specific value of `typeof()`:
`is.call()`, `is.character()`, `is.complex()`,
`is.double()`, `is.environment()`, `is.expression()`,
`is.list()`, `is.logical()`, `is.name()`, `is.null()`, `is.pairlist()`,
`is.raw()`, `is.symbol()`.
`is.integer()` is almost in this class, but it specifically checks for the
absense of a class attribute containing "factor". Note that `is.vector()`
is not in this group.
* A set of possible of base types:
* `is.atomic()` = logical, integer, double, characer, raw, and
(surprisingly) NULL.
* `is.function()` = special, builtin, closure.
* `is.primitive()` = special, builtin.
* `is.language()` = symbol, language, expression.
* `is.recursive()` = list, language, expression.
* Attributes:
* `is.matrix(x)` tests if `length(dim(x))` is 2.
* `is.array(x)` tests if `length(dim(x))` is 1 or 3+.
* `is.vector(x)` tests that `x` has no attributes apart from names.
It does __not__ check if an object is an atomic vector or list.
* Has an S3 class: `is.data.frame()`, `is.factor()`, `is.numeric_version()`,
`is.ordered()`, `is.package_version()`, `is.qr()`, `is.table()`.
* Vectorised math operation: `is.finite()`, `is.infinite()`, `is.na()`,
`is.nan()`
* Finally there are a bunch of special purpose functions that don't
fall into any other category:
* `is.loaded()`: tests if a C/Forton subroutine is loaded.
* `is.object()`: discussed above.
* `is.R()` and `is.single()`: are included for S+ compatibility
* `is.unsorted()` tests if a vector is unsorted.
* `is.element(x, y)` checks if `x` is an element of `y`: it's even more
different as it takes two arguments, unlike every other `is`. function.
One, `is.numeric()`, is sufficiently special that it gets its own section.
### The numeric "type"
We need a little extra discussion of the numeric "type" because it's used in three different ways in different places in R.
1. In some places it's used as an alias for "double". For example
`as.numeric()` is identical to `as.double()`, and `numeric()` is
identical to `double()`.
R also occasionally uses "real" instead of double; `NA_real_` is the one
place that you're likely to encounter this in practice.
1. In S3 and S4 it is used to mean either integer or double. We'll
talk about `s3_class()` in the next chapter:
```{r}
sloop::s3_class(1)
sloop::s3_class(1L)
```
1. In `is.numeric()` it means an object built on a base type of integer or
double that is not a factor (i.e. it is a number and behaves like a number).
```{r}
is.numeric(1)
is.numeric(1L)
is.numeric(factor("x"))
```