假设检验 Hypothesis Testing

本文逻辑导图

A hypothesis test is a procedure that allows us to "confidently" reject a hypothesis if it is clearly statistically inconsistent with data.

1. 基本概念

1.1 假设检验四要素及基本过程

四要素：

Null hypothesis (原假设) $H_0$
Alternative hypothesis (备择假设) $H_1$ or $H_a$
Test statistics (统计量)
Rejection region (拒绝域)

基本过程：

根据试验需要，提出原假设和备择假设
收集试验数据，计算统计量
若统计量落在拒绝域内，则拒绝原假设；否则，无法拒绝原假设。

Note 1：关于 $H_0$ 和 $H_1$

$H_0$ 和 $H_1$ 并不是互补的或对称的。 $H_1$ 可以包含关于总体分布的一切使 $H_0$ 不成立的命题。
在实际操作中，通常将希望予以拒绝的假设作为 $H_0$ ，而将希望予以支持的假设作为 $H_1$ .

This is because hypothesis tests are designed to avoid rejecting $H_0$ when it is true. Therefore when the test rejects $H_0$ , one can be quite sure that $H_0$ is false. 这里涉及到下面要说的“假设检验中的两类错误”。

Note 2：关于统计量

根据检验目标（均值、方差...）的不同，会使用不同的统计量。
其理论根据源于：中心极限定理，正态分布的性质，Likelihood Ratio Test, Pearson's $\chi^2$ test等，具体见下。

Note 3：关于拒绝域

拒绝域是在 $H_0$ 成立的前提下，通过事先确定的显著性水平 $\alpha (e.g. 0.05, 0.01)$ 以及 $H_1$ ，计算出来的一个区间。
它代表的是一个小概率事件。
如果这个小概率事件发生了，则说明原假设 $H_0$ 在大概率上是错误的，于是我们拒绝原假设。

1.2 假设检验两类错误

Type I Error: rejecting $H_0$ when it is true
$\circ$ 避免这类错误是首要
$\circ$ 用 $\alpha$ 表示犯这类错误的概率
$\circ$ $\alpha$ 也被称作significance level（显著性水平）
Type II Error: not rejecting $H_0$ when it is false
$\circ$ 用 $\beta$ 表示犯这类错误的概率
$\circ$ $1 - \beta$ 被称作检验的 power

Type I error 和 Type II error 的关系：
We can always reduce the type I error by making the rejection region smaller. This will typically at the expense of larger type II error.
In practice，we want to have powerful tests with a given type I error.

1.3 P-values

The P-value is the smallest $\alpha$ for which the given observed data (once you have done the random experiment) suggests rejection of $H_0$
$\begin{aligned}&\text{P-value}\\ = &Pr(\text{observing an outcome that is more extreme than the data you observed} | H_0)\end{aligned}$
Smaller P-value indicates rejection of the null hypothesis.

2. 常用假设检验及其原理

2.1 中心极限定理 Central Limit Theorem

$X_1, ..., X_n$ are independently and identically distributed, with $E(X_i) = \mu$ and $Var(X_i) = \sigma^2$ known. Then
$\bar{X} \overset{approx} \sim N(\mu, \frac{\sigma^2}{n}) \Leftrightarrow \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \overset{approx} \sim N(0,1)$

2.1.1 大样本均值检验 $n \ge 30$

假设: To test the hypothesis
$H_0: \mu = \mu_0$
against one of these alternative hypothesis:
$H_1: \mu > \mu_0$ ; or
$H_1: \mu < \mu_0$ ; or
$H_1: \mu \ne \mu_0$
统计量： $\bar{X}$
拒绝域 (RR)：
Define $z_\alpha$ as $\alpha = P(Z > z_{\alpha})$ where $Z \sim N(0,1)$ . Then
(1) for $H_1: \mu > \mu_0$ , the RR is $\bar{X} > \mu_0 + z_{\alpha} \frac{\sigma}{\sqrt{n}}$
(2) for $H_1: \mu < \mu_0$ , the RR is $\bar{X} < \mu_0 - z_{\alpha} \frac{\sigma}{\sqrt{n}}$
(3) for $H_1: \mu \ne \mu_0$ , the RR is $|\bar{X} - \mu_0|> z_{\alpha / 2} \frac{\sigma}{\sqrt{n}}$

Note: If the variance $\sigma^2$ (总体方差) is unknown, you can replace it by $S^2 = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})^2}{n-1}$ (样本方差), since $n$ is large.

2.1.2 小样本均值检验 $n < 30$

小样本情况下，上述CLT中的正态分布可以用 $t$ 分布近似，即
$\sqrt{n}\frac{\bar{X} - \mu}{S} \overset{approx} \sim t_{n-1}$

假设：同上
统计量： $\bar{X}$
拒绝域：
Define $t_{\alpha, n-1}$ as $\alpha = P(T > t_{\alpha, n-1})$ where $T \sim t_{n-1}$ . Then
(1) for $H_1: \mu > \mu_0$ , the RR is $\bar{X} > \mu_0 + t_{\alpha, n-1} \frac{S}{\sqrt{n}}$
(2) for $H_1: \mu < \mu_0$ , the RR is $\bar{X} < \mu_0 - t_{\alpha, n-1} \frac{S}{\sqrt{n}}$
(3) for $H_1: \mu \ne \mu_0$ , the RR is $|\bar{X} - \mu_0|> t_{\alpha / 2, n-1} \frac{S}{\sqrt{n}}$

2.2 正态分布的性质

$X_1, ..., X_n \overset{iid} \sim N(\mu, \sigma^2)$ , then
$\sqrt{n}\frac{\bar{X} - \mu}{\sigma} \sim N(0,1), \quad \sqrt{n}\frac{\bar{X} - \mu}{S} \sim t_{n-1}, \quad \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$

2.2.1 正态分布均值检验

过程同2.1.1 大样本均值检验

2.2.2 正态分布方差检验

假设： To test the hypothesis
$H_0: \sigma = \sigma_0$
against one of these alternatives:
$H_1: \sigma > \sigma_0$
$H_1: \sigma < \sigma_0$
$H_1: \sigma \ne \sigma_0$
统计量： $\frac{(n-1)S^2}{\sigma^2}$
拒绝域：
Define $\chi^2_{\alpha, L, n-1}$ and $\chi^2_{\alpha, R, n-1}$ as
$P(W \le \chi^2_{\alpha, L, n-1}) = \alpha \quad \text{and}\quad P(W \ge \chi^2_{\alpha, R, n-1}) = \alpha$
where $W \sim \chi^2_{n-1}$ . Then

(1) for $H_1: \sigma > \sigma_0$ , the RR is $\frac{(n-1)S^2}{\sigma_0^2} > \chi^2_{\alpha, R, n-1}$

(2) for $H_1: \sigma < \sigma_0$ , the RR is $\frac{(n-1)S^2}{\sigma_0^2} < \chi^2_{\alpha, L, n-1}$

(3) for $H_1: \sigma \ne \sigma_0$ , the RR is
$\frac{(n-1)S^2}{\sigma_0^2} < \chi^2_{\alpha/2, L, n-1}$ or $\frac{(n-1)S^2}{\sigma_0^2} > \chi^2_{\alpha/2, R, n-1}$

2.3 似然比 Likelihood Ratio Tests

$X = (X_1, ..., X_n) \overset{iid} \sim f(x|\theta)$ , then we have

(1) The likelihood of $\theta = \theta_0$ is
$L(\theta_0) = Likelihood (\theta_0 | X) = f(x| \theta_0) = f(x_1 | \theta_0) \times ... \times f(x_n | \theta_0)$

(2) Suppose $H_0: \theta \in \Omega_0$ , $H_1: \theta \in \Omega_1$
where $\Omega_0, \Omega_1$ are some sets of possible parameter values and $\Omega = \Omega_0 \cup \Omega_1$ .

Define generalized likelihood ratio as
$\Lambda(X) = \frac{max_{\theta \in \Omega_0} f(X|\theta)}{max_{\theta \in \Omega}f(X|\theta)} \Rightarrow -2 log(\Lambda) \overset{approx} \sim \chi^2_{d-d_0}$
where $d$ is the dimension of parameter space $\Omega$ and $d_0$ is the dimension of parameter space $\Omega_0$

Note: 计算 $\Lambda(X)$ 时，涉及到 Maximum Likelihood Estimator.

假设： $H_0: \theta \in \Omega_0$ , $H_1: \theta \in \Omega_1$
统计量： $\Lambda(X) = \frac{max_{\theta \in \Omega_0} f(X|\theta)}{max_{\theta \in \Omega}f(X|\theta)}$
拒绝域： $-2log(\Lambda) > \chi^2_{\alpha, R, d-d_0}$

2.4 卡方检验 Pearson's $\chi^2$ test

2.4.1 $\chi^2$ test of multinomial data

Suppose each individual's category is a multinomial draw with probability $p = \{p_1, ..., p_m\}$ .

Let $O_1, ..., O_m$ be the number of observed individuals in each category. Then $\{O_1, ..., O_m \ge 0: O_1 +... + O_m = n\}$

Let $\triangle_m$ be the simplex, i.e. $\{p \in R^m: p \ge 0, \sum_i p_i = 1\}$ .

The maximum likelihood estimator (MLE) over all $\triangle_m$ is: $\hat{p}_i = \frac{O_i}{n}$

$H_0: p \in \Omega_0$ vs $H_1: p \in \triangle_m \backslash \Omega_0$

Under $H_0$ and using MLE, we can get the expected number for each category as $E_1, ..., E_m$ . Then

$X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i} \overset{approx} \sim \chi^2_{m - 1 - (\# \text{ parameters fitted for } H_0)}$

假设： $H_0: p \in \Omega_0$ , $H_1: p \in \triangle_m \backslash \Omega_0$
统计量： $X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}$
拒绝域： $X^2 > \chi^2_{\alpha, R, df}$

Note: While we could apply a likelihood ratio test here, Pearson's $\chi^2$ test has a bit more power.

2.4.2 $\chi^2$ test of independence

检验两个分类变量是否相互独立。

Suppose we have observed an $r \times c$ contingency table.

	Col1	Col 2	...	Col c
Row 1	$O_{11}$	$O_{12}$	..	$O_{1c}$
Row 2	$O_{21}$	$O_{22}$	...	$O_{2c}$
...	...	...	...	...
Row r	$O_{r1}$	$O_{r2}$	...	$O_{rc}$

假设：

$H_0:$ row and column variables are independent.
$H_1:$ row and column variables are dependent.

Under $H_0$ we have following contingency table:

	Col1	Col 2	...	Col c
Row 1	$p_{11} = p_1^R p_1^C$	$p_{12} = p_1^R p_2^C$	..	$p_{1c} = p_1^R p_c^C$
Row 2	$p_{21} = p_2^R p_1^C$	$p_{22} = p_2^R p_2^C$	...	$p_{2c} = p_2^R p_c^C$
...	...	...	...	...
Row r	$p_{r1} = p_r^R p_1^C$	$p_{r2} = p_r^R p_2^C$	...	$p_{rc} = p_r^R p_c^C$