若两组数据独立,可以使用wilcoxon秩和检验(mann-whitney U检验),来评估观测是否是从相同的概率分布中抽取的(即,在一个总体中获得更高得分的概率是否比另一个总体要大)
wilcox.test(y~x, data),其中,y是一个数值型变量,x是一个二分变量。
> with(UScrime, by(Prob, So, median))
So: 0
[1] 0.038201
----------------------------------------------------------------------
So: 1
[1] 0.055552
> wilcox.test(Prob ~ So, data=UScrime)
Wilcoxon rank sum test
data: Prob by So
W = 81, p-value = 8.488e-05
alternative hypothesis: true location shift is not equal to 0
> sapply(UScrime[c("U1","U2")],median)
U1 U2
92 34
> with(UScrime, wilcox.test(U1,U2,paried=TRUE))
Wilcoxon rank sum test with continuity correction
data: U1 and U2
W = 2209, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(U1, U2, paried = TRUE) :
cannot compute exact p-value with ties
> states <- data.frame(state.region, state.x77)
> kruskal.test(Illiteracy~state.region, data=states)
Kruskal-Wallis rank sum test
data: Illiteracy by state.region
Kruskal-Wallis chi-squared = 22.672, df = 3, p-value = 4.726e-05
> #显然,结果表明,美国各个地区的文盲率是各不相同的(p<0.001)
> source("http://www.statmethods.net/RiA/wmc.txt")
> states<- data.frame(state.region, state.x77)
> wmc(Illiteracy ~ state.region, data=states, method="holm")
Descriptive Statistics
West North Central Northeast South
n 13.00000 12.00000 9.00000 16.00000
median 0.60000 0.70000 1.10000 1.75000
mad 0.14826 0.14826 0.29652 0.59304
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 West North Central 88.0 8.665618e-01
2 West Northeast 46.5 8.665618e-01
3 West South 39.0 1.788186e-02 *
4 North Central Northeast 20.5 5.359707e-02 .
5 North Central South 2.0 8.051509e-05 ***
6 Northeast South 18.0 1.187644e-02 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>#source()函数下载并执行了定义wmc()函数的R脚本。函数的形式wmc(y~A, data, method),其中,y是数值输出变量,A是分组变量, data是包含这些变量的数据框,method指定限制I类误差的方法。代码清单7.17使用的是基于holm提出的调整方法,可以很大程度上控制总体I类误差率。
wmc()函数首先给出了样本量、样本中位数、每组的绝对中位数,其中,西部地区文盲率最低,南部地区文盲率最高。然后,函数生成了六组统计比较。可以从双侧P看到,南部与其他三个区域有明显差别,但当显著性水平p<0.05时,其他三个区域间并没有统计显著的差别。
组间差异的非参数检验的基本知识到这就结束了,咱们下期再见!O(∩_∩)O哈哈~