dodoextinctiontime.Rmd

---
title: "Estimating time of extinction using an optimal linear estimator"
author: "C. George Glen"
date: '2022-04-09'
output:
  html_document: default
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This analysis is recreating that made by [Roberts and Solow (2003)](https://www.nature.com/articles/426245a).

## Theory

Going off [Roberts and Solow (2003)](https://www.nature.com/articles/426245a), if we let $T_1 > T_2 > ... > T_n$ be the $k$ most recent sightings of a species, ordered from the most to least recent, then we can use these observations to estimate, $\hat\theta$, or the estimate time of extinction.

An important result is that of [Cooke (1980)](https://academic.oup.com/biomet/article-abstract/67/1/257/276536), who showed the joint distribution of the $k$ most recent sightings has an approximate "Weibull form", regardless of the distribution of the (unknown) complete sightings record. The parameter, $\theta$, is then optimal linear estimator for the extinction time.

The parameter, $\theta$, is computed as the weighted sum of the sightings

$$ 
\hat\theta = \sum^k_{i=1} a_i T_i
$$

Where $a_i$ is a vector of weights, computed as

$$
a_i = (e^T \Lambda^{-1} e)^{-1} \Lambda^{-1} e
$$

Where $e$ is a vector of $k$ 1's, $\Lambda$ is the symmetric $k \cdot k$ matrix with typical element equal to

$$
\hat\lambda_{ij} = \frac{\Gamma(2 \hat\upsilon + i) \Gamma(\hat\upsilon + j) }{\Gamma(\hat\upsilon + i)\Gamma(j)}, j \leq i 
$$

And $\Gamma()$ is the standard Gamma function.

Finally, $\hat\upsilon$ is an estimate of the shape parameter for the joint Weibull distribution of the k most recent sighting times, and is computed as

$$
\hat\upsilon = \frac{1}{k-1}\sum_{i-1}^{k-2} log \frac{T_1 - T_k}{T_1 - T_{i+1}}
$$

We an also compute 95% confidence intervals as

$$
\hat\theta_{95\% \;CI}= (T_1 + \frac{T_1 - T_k}{S_L - 1},T_1 \frac{T_1 - T_k}{S_U - 1})
\\
S_L = \Big(\frac{-log ( 1-\frac{\alpha}{2}) }{k}\Big)^{-\hat\upsilon} 
\\
S_U = \Big(\frac{-log (\frac{\alpha}{2}) }{k}\Big)^{-\hat\upsilon}
$$

## Example: Dodo

Let's now apply this to the Dodo.

The last observations for the dodo, given in [Roberts and Solow (2003)](https://www.nature.com/articles/426245a), are

```{r echo=T}
dodosightingtimes <- c(1662, 1638, 1631, 1628, 1628, 1611, 1607, 1602, 1601, 1598)
```

The optimal linear estimator, $\hat\theta$, and the associated 95% CI are computed as

```{r echo=T}
OLE <- function(data, alpha){
    
    ## sort the data and define a parameter for length (here k)
    obs <- rev(sort(data))
    k   <- length(obs)
    
    ## ---------------------------------------------------------------
    ## compute v, e, lambda, and a
    ## ---------------------------------------------------------------
    
    ## estimate the shape parameter of the joint Weibull distribution 
    ## 4th equation in Roberts and Solow 2003
    v <- (1/(k-1)) * sum(log((obs[1] - obs[k])/(obs[1] - obs[2:(k-1)])))
    
    ## define a vector of k 1’s
    e <- matrix(rep(1,k), ncol=1)
    
    ## Λ is the symmetric k x k matrix with typical element
    lambda <- compute.lambda(obs, v)
    ## make the Λ matrix symmetric
    lambda <- ifelse(lower.tri(lambda), lambda, t(lambda)) 
    
    ## vector of weights is given by: a = (e^t * Λ^-1 * e)^-1 * Λ^-1*e
    a      <- as.vector(solve(t(e) %*% solve(lambda) %*% e)) * solve(lambda) %*% e
    
    ## ---------------------------------------------------------------
    ## calculate confidence intervals
    ## ---------------------------------------------------------------
    SL     <- (-log(1-alpha/2)/length(obs))^-v
    SU     <- (-log(alpha/2)/length(obs))^-v
    lowerCI <- max(obs) + ((max(obs)-min(obs))/(SL-1)) ## lower confidence interval
    upperCI <- max(obs) + ((max(obs)-min(obs))/(SU-1)) ## upper confidence interval
    
    ## ---------------------------------------------------------------
    ## compute time of extinction
    ## ---------------------------------------------------------------
    extincttime <- sum(t(a)%*%obs)
    
    ## ---------------------------------------------------------------
    ## return a dataframe
    ## ---------------------------------------------------------------
    res <- data.frame(Estimate=extincttime, lowerCI=lowerCI, upperCI=upperCI)
    
    ##return the results
    return(res)
}
```

And $\Lambda$ is computed using the function

```{r echo=T}
compute.lambda <- function(dates, v){
  n      <- length(dates)
  lambda <- matrix(data=NA, nrow=n, ncol=n)
  for(i in 1:n){
    for(j in 1:n){
      lambda[i,j] <- (gamma(2*v + i) * gamma(v + j))/(gamma(v + i) * gamma(j))
    }
  }
  return(lambda)
}
```

```{r echo=T}
OLE(data=dodosightingtimes, alpha=0.05)
```

This is equal to the estimates of [Roberts and Solow (2003)](https://www.nature.com/articles/426245a): $$
Estimate = 1690\\
lowerCI = 1669\\  
upperCI = 1799\\
$$