forked from bob-carpenter/prob-stats
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexponential.Rmd
367 lines (305 loc) · 12 KB
/
exponential.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
# Exponential and Poisson Distributions
In this chapter, we'll see how the exponential distribution can be
used to model the waiting time between events. For example, waiting
times might be used to model the time between salmon swimming
upstream, the rate at which a user sends text messages, or the time
between neutrinos arriving at a particle detector.
The Poisson distribution arises as the number of events in a fixed
period of time, such as the number of salmon per hour or number of
messages per day.^[We also consider generalizations to spatial and
volumetric processes, such as the number of graffiti tags in given
area of a city or the number of whales in a volume of the ocean.
These may even be combined to model, say, the number of whales in a
volume of the ocean in a given month.]
## The Exponential distribution
The exponential distribution can be used to model the waiting time
between events under two idealized assumptions. The first assumption
is that the arrival process is homogeneous in time. No matter the
time of day or night, arrivals happen at the same rate.^[Obviously,
this is too limiting for many applications, such as anything to do
with human or animal activity; in such cases, these simple processes
are used as building blocks in more complex spatio-temporal models.]
The second assumption is that the times between successive events is
independent. It doesn't matter if it's been an hour since the last
text message was sent or if one was just sent ten seconds ago, the
probability another message will arrive in the next minute is the
same. In other words, the expected waiting time for the next arrival
doesn't change based on how long you've already been waiting.
A distribution of arrival times satisfying these properties is the
exponential distribution. If $Y \sim \mbox{exponential}(1)$ has a standard
exponential distribution, then its density has a particularly simple
form^[We can work backward from this density to the definite integral
$$\int_0^{\infty} \exp(-y) = 1.$$]
$$
p_Y(y) = \exp(-y).
$$
For example, let's plot a few simulations of arrivals where the
waiting time between arrivals is distributed according to a standard
exponential distribution.^[We will circle back and explain how to
generate exponential variates shortly.]
We can simulate a sequence of arrivals during $\lambda$ time units,
assuming the waiting time between arrivals is distributed as standard
exponential. We start at time $t = 0$ and continue adding the waiting
times $w_n \sim \mbox{exponential}(1)$ until we pass a time $\lambda$,
at which point we return the arrival times we have accumulated.^[The
loop notation `1:infinity` is meant to indicate that `n` is
unbounded. Such "forever" loops must be broken with internal logic
such as the return in this case.]
```
t = 0
for n in 1:infinity
w[n] = exponential_rng(1)
t += w[n]
if (t > lambda)
return y
y[n] = t
```
Let's look at four realizations of this process up to time $\lambda = 10$.
```{r fig.asp = 0.35, fig.cap = "Four simulations of the first $\\lambda = 10$ time units of an arrival process where waiting times are distributed as $\\mbox{exponential}(1)$."}
gen_arrivals <- function(lambda) {
y <- c()
t <- 0
while (TRUE) {
t <- t + rexp(1)
if (t > lambda) return(y)
y[length(y) + 1] <- t
n <- n + 1
}
}
set.seed(1234)
names <- c("a", "b", "c", "d")
M <- length(names)
lambda <- 10
arrival_df <- data.frame(u = c(), y = c(), run = c())
for (m in 1:M) {
x <- gen_arrivals(lambda)
n <- length(x)
arrival_df <- rbind(arrival_df,
data.frame(x = x, y = rep(0, n), run = rep(names[m], n)))
}
arrival_plot <-
ggplot(arrival_df, aes(x = x, y = y)) +
scale_x_continuous(lim = c(0, 10), breaks = c(0, 5, 10)) +
scale_y_continuous(lim = c(-1, 9)) +
geom_hline(yintercept = 0, linetype = "dashed", size = 0.25) +
geom_point(size = 1) +
facet_grid(run ~ .) +
xlab("time") +
ylab("arrivals") +
ggtheme_tufte() +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
arrival_plot
```
A different number of arrivals occur in the different simulations.
## Scaled exponential
As with any continuous distribution, the exponential may be scaled.
As with scaling the normal distribution, this simply stretches or
shrinks the distribution without changing its basic shape. In the
normal distribution, the scale parameter $\sigma$ multiplies a
standard normal variate to get a scale $\sigma$ variate. Instead of
multiplying a standard exponential variate by a scale, it is
traditional to divide it by a rate (an inverse scale).
If $U \sim \mbox{exponential}(1)$ is a standard exponential variable,
then $Y = U / \lambda$ is an exponential variable with rate $\lambda$
(i.e., with scale $1 / \lambda$), for which we write $Y \sim
\mbox{exponential}(\lambda)$, and have the following^[This is a
straightforward derivation using the usual Jacobian formula, for which
the adjustment for the inverse transform is $$\Big| \,
\frac{\mathrm{d}}{\mathrm{d} y} \lambda \times y \, \Big| =
\lambda.$$]
$$
\begin{array}{rcl}
p_Y(y \mid \lambda)
& = &
\lambda \times p_U\left( \lambda \times y \right)
\\[6pt]
& = &
\lambda \times \exp \left(\lambda \times - y \right).
\end{array}
$$
## Simulating from the exponential distribution
Simulating a standard exponential variate is straightforward because
of the simple form of the cumulative distribution function. If $Y
\sim \mbox{exponential}(1)$, then the probability density function is
$p_Y(y) = \exp(-y)$ and the cumulative distribution function $F_Y :
[0, \infty) \rightarrow [0, 1)$ is given by
$$
\begin{array}{rcl}
F_Y(y) & = & \int_0^y \exp(-v') \mbox{d}v
\\[4pt]
& = & 1 - \exp(-y).
\end{array}
$$
This function is easily inverted to produce the inverse cumulative
distribution function, $F_y^{-1}:[0,1) \rightarrow [0, \infty)$,
$$
F_Y^{-1}(u) = - \log (1 - u).
$$
If we take $U \sim \mbox{uniform}(0,1)$, then
$$
F_Y^{-1}(u) \sim \mbox{exponential}(1).
$$
This is a very general trick for distributions for which we can
compute the inverse cumulative distribution function. We first saw
this used to generate logistic variates by log-odds transforming
uniform variates.^[The log-odds function, written $\mbox{logit}(u)$,
is the inverse cumulative distribution function for the standard
logistic distribution.]
The pseudocode follows the math, so we can generate standard
exponential variates as follows.
```
u <- uniform(0, 1)
y <- -log(1 - u)
```
It is traditional here to replace the term $\log 1 - u$ with the term
$\log u$ because if $u \sim \mbox{uniform}(0,1)$, then we also know $1
- u \sim \mbox{uniform}(0,1)$. We can then generalize to the
nonstandard exponential with rate (i.e., inverse scale) $\lambda$ by
dividing. This gives us the following exponential distribution
pseudorandom number generator.
```
exponential_rng(lambda)
u <- uniform(0, 1)
y <- -log(u) / lambda
return y
```
## Memorylessness of exponential waiting times
Suppose we simulate a sequence standard exponential waiting times,
$W_1, \ldots, W_N \sim \mbox{exponential}(1).$ Now what if we look at
the distribution of all of the $W$ and compare it to the distribution
of just those $W > 1$ and those $W > 2.5$. To make sure we have the
same number of draws so the histograms are scaled, we'll take 1000 of
each situations by using simple rejection sampling.
```
for (m in 1:M)
w[m] = -1
while (w[m] < min)
w[m] = exponential_rng(1)
```
Let's plot histograms of $10\,000$ draws for $\mbox{min} = 0, 1, 2.5$.
```{r fig.asp = 0.4, fig.cap = "Plot of $10\\,000$ draws from the standard exponential distribution (left) and discarding all draws below 1 (center) or 2.5 (right). Each histogram is the same, just shifted. This illustrates the memoryless of the exponential distribution as a model of waiting times---no matter how long you have already waited, the remaining wait time distribution is the same."}
exp_min <- function(lb, M) {
y <- rep(NA, M)
for (m in 1:M) {
y[m] <- -1
while (y[m] < lb) {
y[m] <- rexp(1)
}
}
y
}
M <- 100000
memoryless_df <-
data.frame(y = c(exp_min(0, M), exp_min(1, M), exp_min(2.5, M)),
lb = c(rep("min 0", M), rep("min 1", M), rep("min 2.5", M)))
memoryless_plot <-
ggplot(memoryless_df, aes(x = y)) +
geom_histogram(color = "black", fill = "#F8F8F0",
binwidth = 0.5, boundary = 1) +
facet_wrap(vars(lb)) +
scale_x_continuous(lim = c(0, 10), breaks = c(0, 5, 10)) +
ylab("count") +
ggtheme_tufte() +
theme(panel.spacing.x = unit(1, "lines")) +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
memoryless_plot
```
The resulting histograms are almost identical because each condition
has exactly the same distribution shifted over by the amount of wait
time already experienced.
We can characterize this property in terms of the probability density
function. If $Y \sim \mbox{exponential}(\lambda)$, for some fixed
$\lambda$, then for any fixed $c > 0$, we have
$$
p_Y(y \mid \lambda)
\ \propto \
p_Y(y + c \mid \lambda).
$$
## The Poisson distribution
The Poisson distribution is a discrete distribution over counts
$\mathbb{N} = 0, 1, 2, \ldots$ which can be defined as the number of
arrivals that occur in a fixed unit $\lambda$ of time when the
waiting times $w_n$ between arrivals are independently standard exponential
distributed, $w_n \sim \mbox{exponential}(1).$
If $Y \sim \mbox{Poisson}(\lambda)$, then we can simulate values of
$Y$ as follows.
```
poisson_rng(lambda)
t = 0
y = 0
while (true)
w = exponential_rng(1)
t += w
if (t > lambda)
return y
else
y += 1
```
We start at time $t = 0$, with $y = 0$ arrivals, then continue
simulating arrivals until we have past the total time $\lambda$, at
which point we report the number of arrivals we have seen before the
arrival that put us past time $\lambda.$
We can use this simulator to estimate the expectation and variance of
a variable $Y \sim \mbox{Poisson}(\lambda)$, as follows
```
for (m in 1:M)
y[m] = poisson_rng(lambda)
print 'estimated E[Y] = ' mean(y)
print 'estimated var[Y] = ' variance(y)
```
```{r}
poisson_rng <- function(lambda) {
t <- 0
y <- 0
while (TRUE) {
w <- rexp(1)
t <- t + w
if (t > lambda) return(y)
y <- y + 1
}
}
lambda <- 10
for (m in 1:M)
y[m] <- poisson_rng(lambda)
printf("estimated E[Y] = %4.2f\n", mean(y))
printf("estimated var[Y] = %4.2f\n", var(y))
```
## Poisson as limit of binomials
Another way to arrive at the Poisson distribution is as the limit of a
sequence of binomial distributions. A random variable with a Poisson
distribution is unbounded in that any value may arise.^[All but a few
values will be wildly improbable, but still possible.] A binomial
variable on the other hand takes on values between 0 and $N$ for some
fixed $N$. But if we let that the total count $N$ grows without bound
while keeping the expected value at $\lambda$, the binomial approaches
the Poisson distribution,
$$
\mbox{Poisson}(y \mid \lambda)
= \lim_{N \rightarrow \infty} \mbox{binomial}(N, \lambda / N).
$$
We can see what the binomial approximation looks like through
simulation. We'll simulate $\mbox{binomial}(N, 5.5 / N)$ for various
$N$ and compare with $\mbox{Poisson}(5.5)$.
```{r fig.cap = "Histograms of $1\\,000\\,000$ draws for a $\\mbox{Poisson}(5.5)$ and successively larger $N$ binomial approximations."}
binom_poisson_df <- data.frame(y = c(), N = c())
lambda <- 5.5
M <- 1000000
for (N in c(8, 16, 32, 64, 128, 256, 512)) {
binom_poisson_df <-
rbind(binom_poisson_df,
data.frame(y = rbinom(M, N, lambda / N),
N = rep(sprintf("binomial(%d)", N), M)))
}
binom_poisson_df <-
rbind(binom_poisson_df,
data.frame(y = rpois(M, lambda), N = rep("Poisson")))
binom_poisson_plot <-
ggplot(binom_poisson_df, aes(x = y)) +
geom_bar(aes(y), colour = "black", fill = "#F8F8F0", size = 0.2) +
facet_wrap(. ~ N, ncol = 4) +
ggtheme_tufte() +
theme(panel.spacing.x = unit(2, "lines")) +
theme(panel.spacing.y = unit(2, "lines")) +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
binom_poisson_plot
```