假设检验 Hypothesis Testing

本文逻辑导图

A hypothesis test is a procedure that allows us to "confidently" reject a hypothesis if it is clearly statistically inconsistent with data.

1. 基本概念

1.1 假设检验四要素及基本过程

四要素:

  • Null hypothesis (原假设) H_0
  • Alternative hypothesis (备择假设) H_1 or H_a
  • Test statistics (统计量)
  • Rejection region (拒绝域)

基本过程:

  • 根据试验需要,提出原假设和备择假设
  • 收集试验数据,计算统计量
  • 若统计量落在拒绝域内,则拒绝原假设;否则,无法拒绝原假设。

Note 1:关于H_0H_1

  • H_0H_1并不是互补的或对称的。H_1可以包含关于总体分布的一切使H_0不成立的命题。
  • 在实际操作中,通常将希望予以拒绝的假设作为H_0,而将希望予以支持的假设作为H_1.

This is because hypothesis tests are designed to avoid rejecting H_0 when it is true. Therefore when the test rejects H_0, one can be quite sure that H_0 is false. 这里涉及到下面要说的“假设检验中的两类错误”。

Note 2:关于统计量

  • 根据检验目标(均值、方差...)的不同,会使用不同的统计量。
  • 其理论根据源于:中心极限定理,正态分布的性质,Likelihood Ratio Test, Pearson's \chi^2 test等,具体见下。

Note 3:关于拒绝域

  • 拒绝域是在H_0成立的前提下,通过事先确定的显著性水平\alpha (e.g. 0.05, 0.01)以及H_1,计算出来的一个区间。
  • 它代表的是一个小概率事件。
  • 如果这个小概率事件发生了,则说明原假设H_0在大概率上是错误的,于是我们拒绝原假设。

1.2 假设检验两类错误

  • Type I Error: rejecting H_0 when it is true
    \circ 避免这类错误是首要
    \circ\alpha 表示犯这类错误的概率
    \circ \alpha 也被称作significance level(显著性水平)

  • Type II Error: not rejecting H_0 when it is false
    \circ\beta 表示犯这类错误的概率
    \circ 1 - \beta 被称作检验的 power

Type I error 和 Type II error 的关系:
We can always reduce the type I error by making the rejection region smaller. This will typically at the expense of larger type II error.
In practice,we want to have powerful tests with a given type I error.

1.3 P-values

The P-value is the smallest \alpha for which the given observed data (once you have done the random experiment) suggests rejection of H_0
\begin{aligned}&\text{P-value}\\ = &Pr(\text{observing an outcome that is more extreme than the data you observed} | H_0)\end{aligned}
Smaller P-value indicates rejection of the null hypothesis.

2. 常用假设检验及其原理

2.1 中心极限定理 Central Limit Theorem

X_1, ..., X_n are independently and identically distributed, with E(X_i) = \mu and Var(X_i) = \sigma^2 known. Then
\bar{X} \overset{approx} \sim N(\mu, \frac{\sigma^2}{n}) \Leftrightarrow \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \overset{approx} \sim N(0,1)

2.1.1 大样本均值检验 n \ge 30
  • 假设: To test the hypothesis
    H_0: \mu = \mu_0
    against one of these alternative hypothesis:
    H_1: \mu > \mu_0; or
    H_1: \mu < \mu_0; or
    H_1: \mu \ne \mu_0

  • 统计量:\bar{X}

  • 拒绝域 (RR):
    Define z_\alpha as \alpha = P(Z > z_{\alpha}) where Z \sim N(0,1). Then
    (1) for H_1: \mu > \mu_0, the RR is \bar{X} > \mu_0 + z_{\alpha} \frac{\sigma}{\sqrt{n}}
    (2) for H_1: \mu < \mu_0, the RR is \bar{X} < \mu_0 - z_{\alpha} \frac{\sigma}{\sqrt{n}}
    (3) for H_1: \mu \ne \mu_0, the RR is |\bar{X} - \mu_0|> z_{\alpha / 2} \frac{\sigma}{\sqrt{n}}

Note: If the variance \sigma^2 (总体方差) is unknown, you can replace it by S^2 = \frac{\sum_{i = 1}^{n}(X_i - \bar{X})^2}{n-1} (样本方差), since n is large.

2.1.2 小样本均值检验 n < 30

小样本情况下,上述CLT中的正态分布可以用t分布近似,即
\sqrt{n}\frac{\bar{X} - \mu}{S} \overset{approx} \sim t_{n-1}

  • 假设:同上

  • 统计量:\bar{X}

  • 拒绝域:
    Define t_{\alpha, n-1} as \alpha = P(T > t_{\alpha, n-1}) where T \sim t_{n-1}. Then
    (1) for H_1: \mu > \mu_0, the RR is \bar{X} > \mu_0 + t_{\alpha, n-1} \frac{S}{\sqrt{n}}
    (2) for H_1: \mu < \mu_0, the RR is \bar{X} < \mu_0 - t_{\alpha, n-1} \frac{S}{\sqrt{n}}
    (3) for H_1: \mu \ne \mu_0, the RR is |\bar{X} - \mu_0|> t_{\alpha / 2, n-1} \frac{S}{\sqrt{n}}

2.2 正态分布的性质

X_1, ..., X_n \overset{iid} \sim N(\mu, \sigma^2), then
\sqrt{n}\frac{\bar{X} - \mu}{\sigma} \sim N(0,1), \quad \sqrt{n}\frac{\bar{X} - \mu}{S} \sim t_{n-1}, \quad \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}

2.2.1 正态分布均值检验

过程同2.1.1 大样本均值检验

2.2.2 正态分布方差检验
  • 假设: To test the hypothesis
    H_0: \sigma = \sigma_0
    against one of these alternatives:
    H_1: \sigma > \sigma_0
    H_1: \sigma < \sigma_0
    H_1: \sigma \ne \sigma_0

  • 统计量: \frac{(n-1)S^2}{\sigma^2}

  • 拒绝域:
    Define \chi^2_{\alpha, L, n-1} and \chi^2_{\alpha, R, n-1} as
    P(W \le \chi^2_{\alpha, L, n-1}) = \alpha \quad \text{and}\quad P(W \ge \chi^2_{\alpha, R, n-1}) = \alpha
    where W \sim \chi^2_{n-1}. Then

(1) for H_1: \sigma > \sigma_0, the RR is \frac{(n-1)S^2}{\sigma_0^2} > \chi^2_{\alpha, R, n-1}

(2) for H_1: \sigma < \sigma_0, the RR is \frac{(n-1)S^2}{\sigma_0^2} < \chi^2_{\alpha, L, n-1}

(3) for H_1: \sigma \ne \sigma_0, the RR is
\frac{(n-1)S^2}{\sigma_0^2} < \chi^2_{\alpha/2, L, n-1} or \frac{(n-1)S^2}{\sigma_0^2} > \chi^2_{\alpha/2, R, n-1}

2.3 似然比 Likelihood Ratio Tests

X = (X_1, ..., X_n) \overset{iid} \sim f(x|\theta), then we have

(1) The likelihood of \theta = \theta_0 is
L(\theta_0) = Likelihood (\theta_0 | X) = f(x| \theta_0) = f(x_1 | \theta_0) \times ... \times f(x_n | \theta_0)

(2) Suppose H_0: \theta \in \Omega_0, H_1: \theta \in \Omega_1
where \Omega_0, \Omega_1 are some sets of possible parameter values and \Omega = \Omega_0 \cup \Omega_1.

Define generalized likelihood ratio as
\Lambda(X) = \frac{max_{\theta \in \Omega_0} f(X|\theta)}{max_{\theta \in \Omega}f(X|\theta)} \Rightarrow -2 log(\Lambda) \overset{approx} \sim \chi^2_{d-d_0}
where d is the dimension of parameter space \Omega and d_0 is the dimension of parameter space \Omega_0

Note: 计算\Lambda(X)时,涉及到 Maximum Likelihood Estimator.

  • 假设: H_0: \theta \in \Omega_0, H_1: \theta \in \Omega_1

  • 统计量: \Lambda(X) = \frac{max_{\theta \in \Omega_0} f(X|\theta)}{max_{\theta \in \Omega}f(X|\theta)}

  • 拒绝域: -2log(\Lambda) > \chi^2_{\alpha, R, d-d_0}

2.4 卡方检验 Pearson's \chi^2 test

2.4.1 \chi^2 test of multinomial data

Suppose each individual's category is a multinomial draw with probability p = \{p_1, ..., p_m\}.

Let O_1, ..., O_m be the number of observed individuals in each category. Then \{O_1, ..., O_m \ge 0: O_1 +... + O_m = n\}

Let \triangle_m be the simplex, i.e. \{p \in R^m: p \ge 0, \sum_i p_i = 1\}.

The maximum likelihood estimator (MLE) over all \triangle_m is: \hat{p}_i = \frac{O_i}{n}

H_0: p \in \Omega_0 vs H_1: p \in \triangle_m \backslash \Omega_0

Under H_0 and using MLE, we can get the expected number for each category as E_1, ..., E_m. Then

X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i} \overset{approx} \sim \chi^2_{m - 1 - (\# \text{ parameters fitted for } H_0)}

  • 假设:H_0: p \in \Omega_0, H_1: p \in \triangle_m \backslash \Omega_0

  • 统计量: X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}

  • 拒绝域: X^2 > \chi^2_{\alpha, R, df}

Note: While we could apply a likelihood ratio test here, Pearson's \chi^2 test has a bit more power.

2.4.2 \chi^2 test of independence

检验两个分类变量是否相互独立。

Suppose we have observed an r \times c contingency table.

Col1 Col 2 ... Col c
Row 1 O_{11} O_{12} .. O_{1c}
Row 2 O_{21} O_{22} ... O_{2c}
... ... ... ... ...
Row r O_{r1} O_{r2} ... O_{rc}
  • 假设:

H_0: row and column variables are independent.
H_1: row and column variables are dependent.

Under H_0 we have following contingency table:

Col1 Col 2 ... Col c
Row 1 p_{11} = p_1^R p_1^C p_{12} = p_1^R p_2^C .. p_{1c} = p_1^R p_c^C
Row 2 p_{21} = p_2^R p_1^C p_{22} = p_2^R p_2^C ... p_{2c} = p_2^R p_c^C
... ... ... ... ...
Row r p_{r1} = p_r^R p_1^C p_{r2} = p_r^R p_2^C ... p_{rc} = p_r^R p_c^C

The MLEs for H_0 are
\hat{p}_i^R = \frac{ \# \text{ samples in row } i}{n}, \quad \hat{p}_j^C = \frac{ \# \text{ samples in col } j}{n}

Then we can get expected number of individuals E_i for each category.

  • 统计量: X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}

  • 拒绝域: X^2 > \chi^2_{\alpha, R, df}
    with df = rc - 1 - (r-1) - (c-1)

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。