温莎日记 23

Importance of Cochran’s Theorem

Cochran’s theorem tells us about the distributions of partitioned sums of squares of normally distributed random variables. Traditional linear regression analysis relies upon making statistical claims about the distribution of sums of squares of normally distributed random variables In the simple normal regression model:

\frac{SSE}{\sigma^2 } =\frac{\sum{(Y_i-\hat{Y}_i)}^2  }{\sigma^2} ∼ \chi ^2 (n-2)   .

Where does this come from?

- Establish the fact that the multivariate Gaussian sum of squares is \chi ^2(n) distributed. 

- Provide intuition for Cochran’s theorem.

- Prove a lemma in support of Cochran’s theorem.

- Prove Cochran’s theorem.

- Connect Cochran’s theorem back to matrix linear regression.

Theorem 1 for \chi ^2(n): SupposeZ_iare i.i.d. N(0, 1), we have \sum_{i=1}^n  Z_{i}^2  ∼ \chi ^2(n).

Proof: 

Z_i^2 ∼ \chi ^2(n). If Y_{1:n}are i.i.d. random variables with the MGFm_{Y_{1:n}}(t) . 

MGF for U=Y_1+Y_2+...+Y_nism_U(t)=m_{Y_1}(t) \times m_{Y_2}(t) \times ...\times m_{Y_n}(t)

MGF fully characterize the distribution, and the MGF for \chi ^2(n) is {(1-2t)}^{-n/2} .

- Quadratic forms of normal random variables are important in many branches of statistics: Least Squares, ANOVA, Regression Analysis. 

- General idea: Split the sum of the squares of observations into a of quadratic forms where each corresponds to some cause of variation.

- The conclusion of Cochran’s theorem is that, under the assumption of normality, the various quadratic forms are independent and χ^2 distributed. This fact is the foundation upon which many statistical tests rest.

Preliminaries: A Common Quadratic Form

Let X ∼ N(µ, Λ). Consider the quadratic form that appears in the exponent of the normal density (X − \mu )′Λ^{−1}(X − \mu ). In the special case of µ = 0 and Λ = I, this reduces to X′X which by what we just proved we know is \chi ^2(n) distributed. Let’s prove it holds in the general case.

Lemma 1: Let X ∼ N(µ, Λ) , with |Λ| > 0 , then (X −\mu )′Λ^{−1}(X − \mu ) ∼ χ^2(n).

人工神经网络:explorations in the micro-structure cognition | Perceptron

Cochran’s Theorem: Let X_{1:n}  be i.i.d. N(0,\sigma ^2) distributed random variables, and suppose that \sum_{i=1}^n X_{i}^2 =Q_1+Q_2+...+Q_k, where Q_{1:k}  are positive semi-definite quadratic forms in X_{1:n} . Set r_i =rank(A_i). If r_1+r_2+...+r_k=n , then Q_{1:k} is independent, and Q_i ∼  \sigma ^2\chi ^2(r_i).

X be a normal random vector. The components of X are independent iff they are uncorrelated. Let X ∼ N(µ, Λ), then Y = C′X ∼ N(C′µ, C′ΛC). We can find an orthogonal matrix C such that D = C′ΛC is a diagonal matrix. The components of Y will be independent and var(Y_k)=\lambda _k, where \lambda _{1:n} are the eigenvalues of Λ. 

Lemma 2:  Let X_{1:n}be real numbers. Suppose that \Sigma X_{i}^2  can be split into a sum of positive semi-definite quadratic forms, that is \Sigma X_{i}^2 =Q_1+Q_2+...+Q_k, where Qi=X'AiX with rank(A_i)=r_i. If \Sigma r_i=n, then there exists an orthogonal matrix C such that, with X = CY, we have Q_1=Y_1^2 + Y_2^2 +...+Y_{r_1}^2 Q_2=Y_{r_1+1}^2 + Y_{r_1+2}^2 +...+Y_{r_1+r_2}^2 ; ......; Q_k=Y_{n-r_k+1}^2 + Y_{n-r_k+2}^2 +...+Y_{n}^2 .

Different quadratic forms contain different Y -variables and that the number of terms in each Q_i equals that rank, r_i, of Q_i. The Y_i^2 end up in different sums, we have to use this to prove the  independence of the different quadratic forms. Just prove for k = 2 case, the general case can be obtained by induction.

Proof: For k = 2, we have Q = X′A_1X + X′A_2X. There exists an orthogonal matrix C such that C′A_1C = D, where D is a diagonal matrix with eigenvalues of A_1

Since rank(A_1) = r_1, r_1eigenvalues are positive and n − r_1eigenvalues are 0. Suppose without loss of generality, the first r_1eigenvalues are positive. Set X = CY, then we have X′X = Y′C′CY = Y′Y.

Therefore, Q=\sum_{i=1}^n Y_{i}^2 = \sum_{i=1}^{r_1} \lambda _i Y_i +  Y'C'A_2CY. Then, rearranging the terms, \sum_{i=1}^{r_1} (1-\lambda _i)Y_{i}^2+ \sum_{i=r_1+1}^n Y_{i}^2 =  Y'C'A_2CY. Since rank(A_2) = r_2 = n − r_1, we conclude that \lambda _1=\lambda _2=...=\lambda _{r_1}=1 ; Q_1=\sum_{i=1}^{r_1} Y_{i}^2 , Q_2=\sum_{i=r_1+1}^{r_1} Y_{i}^2 .

This lemma is about real numbers, not random variables. It says that \Sigma X_i^2can be split into a sum of positive semi-definite quadratic forms, then there is the orthogonal transformation X = CY such that each of the quadratic forms have nice properties: EachY_iappears in only one resulting sum of squares, which leads to the independence of the sum of squares.

Proof of Cochran’s Theorem:

Using the Lemma, Q_1, · · · , Q_k can be written using Y_i, they are independent. Furthermore, Q_1=\sum_{i=1}^{r_1} Y_{i}^2 ∼ \sigma ^2 \chi ^2(r_1). Other Q_is are the same.

 Applications:

Sample variance is independent from sample mean. RecallSSTO=(n-1)s^2(Y)

SSTO=\Sigma (Y_i -\hat{mean} )^2=\Sigma Y_i^2 -\frac{(\Sigma Y_i)^2}{n} .

Rearrange the term and express in matrix format 

\Sigma Y_i^2=\Sigma (Y_i-\hat{mean} )^2+\frac{(\Sigma Y_i)^2}{n}  ; Y'IY=Y'(I-\frac{1}{n} J)Y+Y'(\frac{1}{n} J)Y.

We know Y'IY ∼ \sigma ^2\chi ^2(n)rank(I-\frac{1}{n}J )=n-1 and rank(\frac{1}{n}J )=1. As a results, \Sigma (Y_i - \hat{mean} )^2 ∼ \sigma ^2 \chi ^2(n-1) , \frac{(\Sigma Y_i)^2 }{n}  ∼  \sigma ^2 \chi ^2(1).

Calculate rank(I-\frac{1}{n} J). First of all, we have 

rank(I-\frac{1}{n} J)\geq rank(I)-rank(\frac{1}{n}J )=n-1

On the other hand, since (I-\frac{1}{n}J )1=0, we have rank(I-\frac{1}{n}J ) \leq  n-1

Therefore, we have rank(I-\frac{1}{n}J ) = n-1

Another proof, noticing I-\frac{1}{n}J is also idempotent and symmetric, therefore, rank(I-\frac{1}{n}J ) =trace(I)-trace(\frac{1}{n}J ) =n-1.

ANOVA:

SSTO=Y'[1-\frac{1}{n}J ]Y ; SSE=Y[I-H]Y ; SSR=Y[H-\frac{1}{n}J ]Y.  

Under the null hypothesis, when \beta =0rank(H-\frac{1}{n}J )=p-1

From linear algebra: SSE~\sigma ^2 \chi ^2(n-p). Then we have: SSR ~ \sigma ^2 \chi ^2(p-1)

As a byproduct, MSE = SSE/(n − p) is an unbiased estimator of variance, since the mean of\chi ^2(n-p) is n-p.

We have trace(H)=trace[X(X'X)^{-1}X']=trace[(X'X)(X'X)^{-1}X']=trace(I_p)=p

Then, rank(I-H)=trace(I-H)=trace(I)-trace(H)=n-p.

First, since we have H1 = 1 (This amounts to do a multiple linear regression with the response always equal to 1 and therefore, the fitted value is still 1 because we can just use the constant to perfectly fit the model), then it is straightforward to check thatH-\frac{1}{n} Jis an idempotent and symmetric matrix. Then, we have rank(H-\frac{1}{n} J)=trace(H)-trace(\frac{1}{n} J)=p-1.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容