温莎日记 23

Importance of Cochran’s Theorem

Cochran’s theorem tells us about the distributions of partitioned sums of squares of normally distributed random variables. Traditional linear regression analysis relies upon making statistical claims about the distribution of sums of squares of normally distributed random variables In the simple normal regression model:

$\frac{SSE}{\sigma^2 } =\frac{\sum{(Y_i-\hat{Y}_i)}^2 }{\sigma^2} ∼ \chi ^2 (n-2)$ .

Where does this come from?

- Establish the fact that the multivariate Gaussian sum of squares is $\chi ^2(n)$ distributed.

- Provide intuition for Cochran’s theorem.

- Prove a lemma in support of Cochran’s theorem.

- Prove Cochran’s theorem.

- Connect Cochran’s theorem back to matrix linear regression.

Theorem 1 for $\chi ^2(n)$ : Suppose $Z_i$ are i.i.d. N(0, 1), we have $\sum_{i=1}^n Z_{i}^2 ∼ \chi ^2(n)$ .

Proof:

$Z_i^2 ∼ \chi ^2(n)$ . If $Y_{1:n}$ are i.i.d. random variables with the MGF $m_{Y_{1:n}}(t)$ .

MGF for $U=Y_1+Y_2+...+Y_n$ is $m_U(t)=m_{Y_1}(t) \times m_{Y_2}(t) \times ...\times m_{Y_n}(t)$ .

MGF fully characterize the distribution, and the MGF for $\chi ^2(n)$ is ${(1-2t)}^{-n/2}$ .

- Quadratic forms of normal random variables are important in many branches of statistics: Least Squares, ANOVA, Regression Analysis.

- General idea: Split the sum of the squares of observations into a of quadratic forms where each corresponds to some cause of variation.

- The conclusion of Cochran’s theorem is that, under the assumption of normality, the various quadratic forms are independent and $χ^2$ distributed. This fact is the foundation upon which many statistical tests rest.

Preliminaries: A Common Quadratic Form

Let X ∼ N(µ, Λ). Consider the quadratic form that appears in the exponent of the normal density $(X − \mu )′Λ^{−1}(X − \mu )$ . In the special case of µ = 0 and Λ = I, this reduces to X′X which by what we just proved we know is $\chi ^2(n)$ distributed. Let’s prove it holds in the general case.

Lemma 1: Let X ∼ N(µ, Λ) , with |Λ| > 0 , then $(X −\mu )′Λ^{−1}(X − \mu ) ∼ χ^2(n)$ .

人工神经网络：explorations in the micro-structure cognition | Perceptron

Cochran’s Theorem: Let $X_{1:n}$ be i.i.d. $N(0,\sigma ^2)$ distributed random variables, and suppose that $\sum_{i=1}^n X_{i}^2 =Q_1+Q_2+...+Q_k$ , where $Q_{1:k}$ are positive semi-definite quadratic forms in $X_{1:n}$ . Set $r_i =rank(A_i)$ . If $r_1+r_2+...+r_k=n$ , then $Q_{1:k}$ is independent, and $Q_i ∼ \sigma ^2\chi ^2(r_i)$ .

X be a normal random vector. The components of X are independent iff they are uncorrelated. Let X ∼ N(µ, Λ), then Y = C′X ∼ N(C′µ, C′ΛC). We can find an orthogonal matrix C such that D = C′ΛC is a diagonal matrix. The components of Y will be independent and $var(Y_k)=\lambda _k$ , where $\lambda _{1:n}$ are the eigenvalues of Λ.

Lemma 2: Let $X_{1:n}$ be real numbers. Suppose that $\Sigma X_{i}^2$ can be split into a sum of positive semi-definite quadratic forms, that is $\Sigma X_{i}^2 =Q_1+Q_2+...+Q_k$ , where Qi=X'AiX with $rank(A_i)=r_i$ . If $\Sigma r_i=n$ , then there exists an orthogonal matrix C such that, with X = CY, we have $Q_1=Y_1^2 + Y_2^2 +...+Y_{r_1}^2$ ; $Q_2=Y_{r_1+1}^2 + Y_{r_1+2}^2 +...+Y_{r_1+r_2}^2$ ; ......; $Q_k=Y_{n-r_k+1}^2 + Y_{n-r_k+2}^2 +...+Y_{n}^2$ .

Different quadratic forms contain different Y -variables and that the number of terms in each $Q_i$ equals that rank, $r_i$ , of $Q_i$ . The $Y_i^2$ end up in different sums, we have to use this to prove the independence of the different quadratic forms. Just prove for k = 2 case, the general case can be obtained by induction.

Proof: For k = 2, we have $Q = X′A_1X + X′A_2X$ . There exists an orthogonal matrix C such that $C′A_1C = D$ , where D is a diagonal matrix with eigenvalues of $A_1$ .

Since $rank(A_1) = r_1$ , $r_1$ eigenvalues are positive and $n − r_1$ eigenvalues are 0. Suppose without loss of generality, the first $r_1$ eigenvalues are positive. Set X = CY, then we have X′X = Y′C′CY = Y′Y.

Therefore, $Q=\sum_{i=1}^n Y_{i}^2 = \sum_{i=1}^{r_1} \lambda _i Y_i +$ $Y$ ' $C$ ' $A_2$ $CY$ . Then, rearranging the terms, $\sum_{i=1}^{r_1} (1-\lambda _i)Y_{i}^2+ \sum_{i=r_1+1}^n Y_{i}^2 =$ $Y$ ' $C$ ' $A_2$ $CY$ . Since $rank(A_2) = r_2 = n − r_1$ , we conclude that $\lambda _1=\lambda _2=...=\lambda _{r_1}=1$ ; $Q_1=\sum_{i=1}^{r_1} Y_{i}^2 , Q_2=\sum_{i=r_1+1}^{r_1} Y_{i}^2$ .

This lemma is about real numbers, not random variables. It says that $\Sigma X_i^2$ can be split into a sum of positive semi-definite quadratic forms, then there is the orthogonal transformation X = CY such that each of the quadratic forms have nice properties: Each $Y_i$ appears in only one resulting sum of squares, which leads to the independence of the sum of squares.

Proof of Cochran’s Theorem:

Using the Lemma, $Q_1, · · · , Q_k$ can be written using $Y_i$ , they are independent. Furthermore, $Q_1=\sum_{i=1}^{r_1} Y_{i}^2 ∼ \sigma ^2 \chi ^2(r_1)$ . Other $Q_i$ s are the same.

Applications:

Sample variance is independent from sample mean. Recall $SSTO=(n-1)s^2(Y)$ ,

$SSTO=\Sigma (Y_i -\hat{mean} )^2=\Sigma Y_i^2 -\frac{(\Sigma Y_i)^2}{n}$ .

Rearrange the term and express in matrix format

$\Sigma Y_i^2=\Sigma (Y_i-\hat{mean} )^2+\frac{(\Sigma Y_i)^2}{n}$ ; $Y$ ' $IY=Y$ ' $(I-\frac{1}{n} J)Y+Y$ ' $(\frac{1}{n} J)Y$ .

We know $Y$ ' $IY$ $∼$ $\sigma ^2\chi ^2(n)$ , $rank(I-\frac{1}{n}J )=n-1$ and $rank(\frac{1}{n}J )=1$ . As a results, $\Sigma (Y_i - \hat{mean} )^2 ∼ \sigma ^2 \chi ^2(n-1)$ , $\frac{(\Sigma Y_i)^2 }{n} ∼ \sigma ^2 \chi ^2(1)$ .

Calculate $rank(I-\frac{1}{n} J)$ . First of all, we have

$rank(I-\frac{1}{n} J)\geq rank(I)-rank(\frac{1}{n}J )=n-1$ .

On the other hand, since $(I-\frac{1}{n}J )1=0$ , we have $rank(I-\frac{1}{n}J ) \leq n-1$ .

Therefore, we have $rank(I-\frac{1}{n}J ) = n-1$ .

Another proof, noticing $I-\frac{1}{n}J$ is also idempotent and symmetric, therefore, $rank(I-\frac{1}{n}J ) =trace(I)-trace(\frac{1}{n}J ) =n-1$ .

ANOVA:

$SSTO=Y$ ' $[1-\frac{1}{n}J ]Y$ ; $SSE=Y$ ' $[I-H]Y$ ; $SSR=Y$ ' $[H-\frac{1}{n}J ]Y$ .

Under the null hypothesis, when $\beta =0$ , $rank(H-\frac{1}{n}J )=p-1$ .

From linear algebra: $SSE$ ~ $\sigma ^2 \chi ^2(n-p)$ . Then we have: $SSR$ ~ $\sigma ^2 \chi ^2(p-1)$ .

As a byproduct, MSE = SSE/(n − p) is an unbiased estimator of variance, since the mean of $\chi ^2(n-p)$ is n-p.

We have $trace(H)=trace[X(X$ ' $X)^{-1}$ $X$ ' $]$ =trace $[(X$ ' $X)(X$ ' $X)^{-1}$ $X$ ' $]$ = $trace(I_p)=p$ .

Then, $rank(I-H)=trace(I-H)=trace(I)-trace(H)=n-p$ .

First, since we have H1 = 1 (This amounts to do a multiple linear regression with the response always equal to 1 and therefore, the fitted value is still 1 because we can just use the constant to perfectly fit the model), then it is straightforward to check that $H-\frac{1}{n} J$ is an idempotent and symmetric matrix. Then, we have $rank(H-\frac{1}{n} J)=trace(H)-trace(\frac{1}{n} J)=p-1$ .

温莎日记 23

温莎日记 23

ANOVA:

相关阅读更多精彩内容

友情链接更多精彩内容