-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
158 changed files
with
6,141 additions
and
73 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
\chapter{Some Special Distributions} | ||
|
||
\section{The Binomial and Related Distributions} | ||
|
||
|
||
\section{The Poisson Distribution} | ||
|
||
\section{The $\Gamma$,$\chi^2$, and $\beta$ Distributions} | ||
|
||
\section{The Normal Distribution} | ||
|
||
\section{The Multivariate Normal Distribution} | ||
|
||
\section{$t-$ and $F-$Distributions} | ||
|
||
\section{Mixture Distributions} | ||
|
||
|
||
\section{Homework} | ||
|
||
\begin{exercise}{3.2.17}{} | ||
Let $X_1$ and $X_2$ be two independent random variables. | ||
Suppose that $X_1$ and $Y=X_1+X_2$ have Possion Distributions | ||
with means $\mu_1$ and $\mu>\mu_1$, respectively. | ||
Find the distribution of $X_2$. | ||
\end{exercise} | ||
|
||
|
||
\begin{exercise}{3.4.21}{} | ||
Let $f(x)$ and $F(x)$ be the pdf and the cdf, respectively, of a distribution of | ||
the continuous type such that $f'(x)$ exists for all $x$. Let the mean of the truncated distribution | ||
that has pdf $g(y)=f(y)/F(b)$, $-\infty<y<b$, zero elsewhere, be equal to $-f(b)/F(b)$ for all real $b$. | ||
Prove that $f(x)$ is a pdf of a standard normal distribution. | ||
\end{exercise} | ||
|
||
|
||
\begin{exercise}{3.5.9}{} | ||
Say the correlation coefficient between the heights of husbands and wives is | ||
$0.70$ and the mean male height is $5$ feet $10$ inches with standard deviation $2$ inches, | ||
and the mean female height is $5$ feet $4$ inches with standard deviation $1\frac{1}{2}$ inches. | ||
Assuming a bivariate normal distribution, what is the best guess of the height of | ||
a woman whose husband's height is $6$ feet? Find a $95\%$ prediction interval for her height. | ||
\end{exercise} | ||
|
||
\section{Reference} | ||
|
||
\begin{itemize} | ||
\item \href{https://tomoki-okuno.com/files/math/Ch3_sol.pdf}{ex3.2.17} | ||
\item \href{https://cs.du.edu/~paulhorn/361/assn11-solns.pdf}{ex3.4.21} | ||
\item \href{https://faculty.etsu.edu/gardnerr/4047/Beamer-Hogg-McKean-Craig/Proofs-HMC-3-5.pdf}{ex3.5.9} | ||
\end{itemize} | ||
|
||
|
||
|
218 changes: 218 additions & 0 deletions
218
5_mathematical_statistics_note/chapter/4_Some_Elementary_Statistical_Inferences.tex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,222 @@ | ||
\chapter{Some Elementary | ||
Statistical Inferences} | ||
|
||
\section{Introduction} | ||
Statistics is a branch of Mathematics, that deals with the collection, analysis, interpretation, and the presentation of the numerical data. | ||
In other words, it is defined as the collection of quantitative data. | ||
The main purpose of Statistics is to make an accurate conclusion using a limited sample about a greater population. | ||
\par | ||
|
||
\subsection{Types of statistics} | ||
Statistics can be classified into two different categories. The two different types of Statistics are: | ||
\begin{itemize} | ||
\item Descriptive Statistics | ||
\item Inferential Statistics | ||
\end{itemize} | ||
|
||
In Statistics, descriptive statistics describe the data, | ||
whereas inferential statistics help you make predictions from the data. | ||
In inferential statistics, the data are taken from the sample and allows you to generalize the population. | ||
In general, inference means “guess”, which means making inference about something. | ||
So, statistical inference means, making inference about the population. | ||
To take a conclusion about the population, it uses various statistical analysis techniques. | ||
In this article, one of the types of statistics called inferential statistics is explained in detail. | ||
Now, you are going to learn the proper definition of statistical inference, types, solutions, and examples. | ||
|
||
\subsection{Statistical inference definition} | ||
Statistical inference is the process of analysing the result and making conclusions from data subject to random variation. | ||
It is also called inferential statistics. | ||
Hypothesis testing and confidence intervals are the applications of the statistical inference. | ||
Statistical inference is a method of making decisions about the parameters of a population, based on random sampling. | ||
It helps to assess the relationship between the dependent and independent variables. | ||
The purpose of statistical inference to estimate the uncertainty or sample to sample variation. | ||
It allows us to provide a probable range of values for the true values of something in the population. | ||
The components used for making statistical inference are: | ||
\begin{itemize} | ||
\item Sample Size | ||
\item Variability in the sample | ||
\item Size of the observed differences | ||
\end{itemize} | ||
|
||
\subsection{Types of statistical inference} | ||
There are different types of statistical inferences that are extensively used for making conclusions. | ||
They are: | ||
\begin{itemize} | ||
\item One sample hypothesis testing | ||
\item Confidence Interval | ||
\item Pearson Correlation | ||
\item Bi-variate regression | ||
\item Multi-variate regression | ||
\item Chi-square statistics and contingency table | ||
\item ANOVA or T-test | ||
\end{itemize} | ||
|
||
\section{Estimate | ||
Distribution Parameters} | ||
|
||
Until now we have studied Probability, proceeding as follows: we assumed parameters of all distributions to be known and, based on this, computed probabilities | ||
of various outcomes (in a random experiment). In this chapter we make the essential transition to Statistics, which is concerned with the exact opposite: the | ||
random experiment is performed (usually many times) and the individual outcomes | ||
recorded; based on these, we want to estimate values of the distribution parameters | ||
(one or more). | ||
|
||
\begin{definition}{}{} | ||
If the sample $\vect{X}_1,\vect{X}_2,..., \vect{X}_n$ are iid, then they constitute | ||
a random independent sample (RIS) of size $n$ from the population $\bold{X}$. | ||
\end{definition} | ||
|
||
\begin{definition}{}{} | ||
Let $T = T(\vect{X}_1,\vect{X}_2,..., \vect{X}_n)$ be a function of the sample | ||
$\vect{X}_1,\vect{X}_2,..., \vect{X}_n$. Then $T$ is called a statistic. | ||
\end{definition} | ||
\begin{remark} | ||
Once the sample is drawn, then $t=T(\vect{x}_1,\vect{x}_2,..., \vect{x}_n)$ is called the | ||
realization of $T$ , where $\vect{x}_1,\vect{x}_2,..., \vect{x}_n$ is the value of the sample. | ||
\end{remark} | ||
|
||
\begin{example}{}{} | ||
How should we estimate the mean $\mu$ of a Normal distribution | ||
$N (\mu, \sigma)$, based on a RIS of size $n$? We would probably take $\overline{X}$ | ||
(the sample mean) to be a 'reasonable' estimator of $\mu$ | ||
[note that this name applies to the random variable $\overline{X}$, | ||
with all its potential (would-be) values; as soon as | ||
the experiment is completed and a particular value of $\overline{X}$ | ||
recorded, this value (i.e. a specific number) is called an estimate of $\mu$]. | ||
\end{example} | ||
|
||
There is a few related issues we have to sort out: | ||
\begin{itemize} | ||
\item How do we know that $\overline{X}$ is a 'good' estimator of $\mu$, i.e. | ||
is there some sensible set of criteria which would enable us to judge the quality of individual | ||
estimators? | ||
\item Using these criteria, can we then find the best estimator of a parameter, at | ||
least in some restricted sense? | ||
\item Would not it be better to use, instead of a single number [the so called | ||
{\it point estimate}, which can never precisely agree with the exact value of | ||
the unknown parameter, and is thus in this sense always wrong], an interval | ||
of values which may have a good chance of containing the correct answer? | ||
\end{itemize} | ||
The rest of this section tackles the first two issues. We start with | ||
|
||
\section{Confidence intervals} | ||
The last section considered the issue of so called {\it point estimates} (good, better | ||
and best), but one can easily see that, even for the best of these, a statement which | ||
claims a parameter, say $\mu$, to be close to $8.3$, is not very informative, unless we | ||
can specify what 'close' means. This is the purpose of a confidence interval, | ||
which requires quoting the estimate together with specific limits, e.g. $8.3\pm 0.1$ (or | ||
$8.2 \leftrightarrow 8.4$, using an interval form). | ||
\par | ||
The limits are established to meet a certain (usually 95\%) level of confidence | ||
(not a probability, since the statement does not involve any randomness -- we are either 100\% right, or 100\% wrong!). | ||
The level of confidence ($1-\alpha$ in general) corresponds to the original, a-priori | ||
probability (i.e. before the sample is even taken) of the procedure to get it right | ||
(the probability is, as always, in the random sampling). To be able to calculate | ||
this probability exactly, we must know what distribution we are sampling from. | ||
So, until further notice, we will assume that the distribution is Normal. | ||
|
||
\subsection{Confidence interval for mean $\mu$} | ||
|
||
\begin{theorem}{Sums of Independent Normal Random Variables}{} | ||
If $\vect{X}_1,\vect{X}_2,...,\vect{X}_n$ are mutually independent normal random variables with means | ||
$\mu_1,\mu_2,...,\mu_n$ and variances $\sigma_1^2,\sigma_2^2,...,\sigma_n^2$, | ||
then the linear combination: | ||
\begin{align*} | ||
\vect{Y} = \sum\limits_{i=1}^{n} c_i\vect{X}_i | ||
\end{align*} | ||
follows the normal distribution: | ||
\begin{align*} | ||
N(\sum\limits_{i=1}^{n} c_i\mu_i,\sum\limits_{i=1}^{n} c_i^2\sigma_i^2) | ||
\end{align*} | ||
\end{theorem} | ||
|
||
\begin{corollary}{}{} | ||
If $\vect{X}_1,\vect{X}_2,...,\vect{X}_n$ are observations of a random sample of size $n$ from a | ||
$N(\mu,\sigma^2)$ population. | ||
\begin{itemize} | ||
\item $\overline{X}=\frac{1}{n} \sum\limits_{i=1}^{n} \vect{X}_i$ is the sample mean of the $n$ observations, and | ||
\item $S^2=\frac{1}{n-1}\sum\limits_{i=1}^{n}(\vect{X}_i-\overline{\vect{X}})^2$ is the sample variance of the $n$ observations. | ||
\end{itemize} | ||
Then: \\ | ||
(1) $\overline{X}\sim N(\mu,\frac{\sigma^2}{n})$;\\ | ||
(2) $\frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)$\\ | ||
(3) $\overline{X}$ and $S^2$ are independent | ||
\end{corollary} | ||
|
||
We first assume that, even though $\mu$ is to be estimated (being unknown), we still | ||
know the exact (population) value of $\sigma$ (based on past experience). | ||
We know that | ||
|
||
\section{Testing hypotheses} | ||
Suppose now that, instead of trying to estimate | ||
|
||
|
||
\section{Homework} | ||
\begin{exercise}{4.5.8}{} | ||
Let us say the life of a tire in miles, say $X$, | ||
is normally distributed with mean | ||
$\theta$ and standard deviation $5000$. | ||
Past experience indicates that $\theta = 30,000$. | ||
The manufacturer claims that the tires made by a new process have mean $\theta>30,000$. | ||
It is possible that $\theta=35,000$. | ||
Check his claim by testing $H_0:\theta=30,000$ against | ||
$H_1:\theta>30,000$. We observe $n$ independent values of $X$, | ||
say $x_1,...,x_n$, and we reject $H_0$ (thus accept $H_1$) | ||
if and only if $\overline{x}\geqs c$. | ||
Determine $n$ and $c$ so that the power function $\gamma(\theta)$ of the test | ||
has the values $\gamma(30,000)=0.01$ and $\gamma(35,000)=0.98$. | ||
\end{exercise} | ||
|
||
\begin{exercise}{4.5.11}{} | ||
Let $Y_1<Y_2<Y_3<Y_4$ be the order statistics of a random sample of size | ||
$n=4$ from a distribution with pdf $f(x;\theta)=1/\theta,0<x<\theta$, zero elsewhere, | ||
where $0<\theta$. The hypothesis $H_0:\theta=1$ is rejected and $H_1:\theta>1$ is accepted if | ||
the observed $Y_4\geqs c$.\\ | ||
(a) Find the constant $c$ so that the significance level is $\alpha=0.05$.\\ | ||
(b) Determine the power function of the test. | ||
\end{exercise} | ||
|
||
\begin{exercise}{4.6.5}{} | ||
On page 373 Rasmussen (1992) discussed a paired design. A baseball coach | ||
paired $20$ members of his team by their speed; i.e., each member of the pair has | ||
about the same speed. Then for each, he randomly chose one member of the | ||
pair and told him that if could beat his best time in circling the bases he would | ||
give him an award (call this response the time of the "self" member). For the other | ||
member of the pair the coach's instruction was an award if he could beat the time | ||
of the other member of the pair (call this response the time of the "rival" member). | ||
Each member of the pair knew who his rival was. The data are given below, but are | ||
also in the file {\tt selfrival.rda}. Let $\mu_d$ be the true difference in times (rival minus | ||
self) for a pair. The hypotheses of interest are $H_0:\mu_d=0$ versus $H_1:\mu_d<0$. The | ||
data are in order by pairs, so do not mix the order.\\ | ||
{\tt | ||
\quad self: 16.20 16.78 17.38 17.59 17.37 17.49 18.18 18.16 18.36 18.53 | ||
15.92 16.58 17.57 16.75 17.28 17.32 17.51 17.58 18.26 17.87\\ | ||
\quad rival: 15.95 16.15 17.05 16.99 17.34 17.53 17.34 17.51 18.10 18.19 | ||
16.04 16.80 17.24 16.81 17.11 17.22 17.33 17.82 18.19 17.88} \\ | ||
(a) Obtain comparison boxplots of the data. Comment on the comparison plots. | ||
Are there any outliers?\\ | ||
(b) Compute the paired $t$-test and obtain the $p$-value. Are the data significant at the | ||
$5\%$ level of significance?\\ | ||
(c) Obtain a point estimate of $\mu_d$ and a $95\%$ confidence interval for it.\\ | ||
(d) Conclude in terms of the problem. | ||
|
||
\end{exercise} | ||
|
||
|
||
|
||
|
||
\section{Reference} | ||
\begin{itemize} | ||
\item \href{https://byjus.com/maths/statistical-inference/}{Statistical Inference} | ||
\item \href{https://faculty.etsu.edu/gardnerr/4047/notes-Hogg-McKean-Craig/Hogg-McKean-Craig-4-1.pdf}{Sampling and Statistics} | ||
\item \href{https://spartan.ac.brocku.ca/~jvrbik/MATH2P82/Statistics.PDF}{Chapter 3 RANDOM SAMPLING} | ||
\item \href{https://spartan.ac.brocku.ca/~jvrbik/MATH2P82/Statistics.PDF}{Chapter 5 ESTIMATING DISTRIBUTION PARAMETERS} | ||
\item \href{https://spartan.ac.brocku.ca/~jvrbik/MATH2P82/Statistics.PDF}{Chapter 6 CONFIDENCE INTERVALS} | ||
\item \href{https://spartan.ac.brocku.ca/~jvrbik/MATH2P82/Statistics.PDF}{Chapter 7 TESTING HYPOTHESES} | ||
\item \href{https://cloud.moezx.cc/Document/mooc/%E6%B5%99%E5%A4%A7%E6%A6%82%E7%8E%87%E8%AE%BA/%E7%AC%AC42%E8%AE%B2%20.pdf}{Sampling distribution of a single normal population} | ||
\item \href{https://cloud.moezx.cc/Document/mooc/%E6%B5%99%E5%A4%A7%E6%A6%82%E7%8E%87%E8%AE%BA/%E7%AC%AC43%E8%AE%B2%20.pdf}{Sampling distribution of two normal populations} | ||
\item \href{https://online.stat.psu.edu/stat414/lesson/26/26.2}{Sampling Distribution of Sample Mean} | ||
\item \href{https://online.stat.psu.edu/stat415/lesson/25}{Power of a Statistical Test} | ||
\item \href{https://tomoki-okuno.com/files/math/Ch4_sol.pdf}{ex4.5.8} | ||
\item \href{https://tomoki-okuno.com/files/math/Ch4_sol.pdf}{ex4.5.11} | ||
\item \href{https://zhuanlan.zhihu.com/p/570096188}{ex4.6.5} | ||
\end{itemize} |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Oops, something went wrong.