「r<-统计」统计检验函数汇总

资料来源:《R 语言核心技术手册》和 R 文档

数据基本来自胡编乱造 和 R 文档

本文基本囊括了常用的统计检验在 R 中的实现函数和使用方法。

连续型数据

基于正态分布的检验

均值检验

t.test(1:10, 10:20)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  1:10 and 10:20
#> t = -7, df = 19, p-value = 2e-06
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -12.4  -6.6
#> sample estimates:
#> mean of x mean of y 
#>       5.5      15.0

配对 t 检验:

t.test(rnorm(10), rnorm(10, mean = 1), paired = TRUE)
#> 
#>  Paired t-test
#> 
#> data:  rnorm(10) and rnorm(10, mean = 1)
#> t = -5, df = 9, p-value = 7e-04
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -2.541 -0.962
#> sample estimates:
#> mean of the differences 
#>                   -1.75

使用公式:

df <- data.frame(
  value = c(rnorm(10), rnorm(10, mean = 1)),
  group = c(rep("control", 10), rep("test", 10))
)

t.test(value ~ group, data = df)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  value by group
#> t = -0.4, df = 15, p-value = 0.7
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.62  1.08
#> sample estimates:
#> mean in group control    mean in group test 
#>                 0.532                 0.802

假设方差同质:

t.test(value ~ group, data = df, var.equal = TRUE)
#> 
#>  Two Sample t-test
#> 
#> data:  value by group
#> t = -0.4, df = 18, p-value = 0.7
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.60  1.06
#> sample estimates:
#> mean in group control    mean in group test 
#>                 0.532                 0.802

更多查看 ?t.test

两总体方差检验

上面的例子假设方差同质,我们通过检验查看。

服从正态分布的两总体方差比较。

# 进行的是 F 检验
var.test(value ~ group, data = df)
#> 
#>  F test to compare two variances
#> 
#> data:  value by group
#> F = 0.4, num df = 9, denom df = 9, p-value = 0.2
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#>  0.103 1.671
#> sample estimates:
#> ratio of variances 
#>              0.415

使用 Bartlett 检验比较每个组(样本)数据的方差是否一致。

bartlett.test(value ~ group, data = df)
#> 
#>  Bartlett test of homogeneity of variances
#> 
#> data:  value by group
#> Bartlett's K-squared = 2, df = 1, p-value = 0.2

多个组间均值的比较

对于两组以上数据间均值的比较,使用方差分析 ANOVA。

aov(wt ~ factor(cyl), data = mtcars)
#> Call:
#>    aov(formula = wt ~ factor(cyl), data = mtcars)
#> 
#> Terms:
#>                 factor(cyl) Residuals
#> Sum of Squares         18.2      11.5
#> Deg. of Freedom           2        29
#> 
#> Residual standard error: 0.63
#> Estimated effects may be unbalanced

查看详细信息:

model.tables(aov(wt ~ factor(cyl), data = mtcars))
#> Tables of effects
#> 
#>  factor(cyl) 
#>           4       6      8
#>     -0.9315 -0.1001  0.782
#> rep 11.0000  7.0000 14.000

通常先用 lm() 函数对数据建立线性模型,再用 anova() 函数提取方差分析的信息更方便。

md <- lm(wt ~ factor(cyl), data = mtcars)
anova(md)
#> Analysis of Variance Table
#> 
#> Response: wt
#>             Df Sum Sq Mean Sq F value  Pr(>F)    
#> factor(cyl)  2   18.2    9.09    22.9 1.1e-06 ***
#> Residuals   29   11.5    0.40                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA 分析假设各组样本数据的方差是相等的,如果知道(或怀疑)不相等,可以使用 oneway.test() 函数。

oneway.test(wt ~ cyl, data = mtcars)
#> 
#>  One-way analysis of means (not assuming equal variances)
#> 
#> data:  wt and cyl
#> F = 20, num df = 2, denom df = 19, p-value = 2e-05

这与设定了 var.equal=FALSE 的 t.test 类似(两种方法都是 Welch 提出)。

多组样本的配对 t 检验

pairwise.t.test(mtcars$wt, mtcars$cyl)
#> 
#>  Pairwise comparisons using t tests with pooled SD 
#> 
#> data:  mtcars$wt and mtcars$cyl 
#> 
#>   4     6   
#> 6 0.01  -   
#> 8 6e-07 0.01
#> 
#> P value adjustment method: holm

可以自定义 p 值校正方法。

正态性检验

使用 Shapiro-Wilk 检验:

shapiro.test(rnorm(30))
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  rnorm(30)
#> W = 1, p-value = 1

可以通过 QQ 图辅助查看。

qqnorm(rnorm(30))
image

分布的对称性检验

用 Kolmogorov-Smirnov 检验查看一个向量是否来自对称的概率分布(不限于正态分布)。

ks.test(rnorm(10), pnorm)
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  rnorm(10)
#> D = 0.2, p-value = 0.7
#> alternative hypothesis: two-sided

函数第 1 个参数指定待检验的数据,第 2 个参数指定对称分布的类型,可以是数值型向量、指定概率分布函数的字符串或一个分布函数。

ks.test(rnorm(10), "pnorm")
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  rnorm(10)
#> D = 0.4, p-value = 0.09
#> alternative hypothesis: two-sided

ks.test(rpois(10, lambda = 1), "pnorm")
#> Warning in ks.test(rpois(10, lambda = 1), "pnorm"): ties should not be present
#> for the Kolmogorov-Smirnov test
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  rpois(10, lambda = 1)
#> D = 0.5, p-value = 0.01
#> alternative hypothesis: two-sided

检验两个向量是否服从同一分布

还是用上面的函数。

ks.test(rnorm(20), rnorm(30))
#> 
#>  Two-sample Kolmogorov-Smirnov test
#> 
#> data:  rnorm(20) and rnorm(30)
#> D = 0.1, p-value = 1
#> alternative hypothesis: two-sided

相关性检验

使用 cor.test() 函数。

cor.test(mtcars$mpg, mtcars$wt)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  mtcars$mpg and mtcars$wt
#> t = -10, df = 30, p-value = 1e-10
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.934 -0.744
#> sample estimates:
#>    cor 
#> -0.868

一共有 3 种方法,具体看选项 method 的说明。

cor.test(mtcars$mpg, mtcars$wt, method = "spearman", exact = F)
#> 
#>  Spearman's rank correlation rho
#> 
#> data:  mtcars$mpg and mtcars$wt
#> S = 10292, p-value = 1e-11
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#>    rho 
#> -0.886

不依赖分布的检验

均值检验

Wilcoxon 检验是 t 检验的非参数版本。

默认是秩和检验。

wilcox.test(1:10, 10:20)
#> Warning in wilcox.test.default(1:10, 10:20): cannot compute exact p-value with
#> ties
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  1:10 and 10:20
#> W = 0.5, p-value = 1e-04
#> alternative hypothesis: true location shift is not equal to 0

可以设定为符号检验。

wilcox.test(1:10, 10:19, paired = TRUE)
#> Warning in wilcox.test.default(1:10, 10:19, paired = TRUE): cannot compute exact
#> p-value with ties
#> 
#>  Wilcoxon signed rank test with continuity correction
#> 
#> data:  1:10 and 10:19
#> V = 0, p-value = 0.002
#> alternative hypothesis: true location shift is not equal to 0

多均值比较

多均值比较使 Kruskal-Wallis 秩和检验。

kruskal.test(wt ~ factor(cyl), data = mtcars)
#> 
#>  Kruskal-Wallis rank sum test
#> 
#> data:  wt by factor(cyl)
#> Kruskal-Wallis chi-squared = 23, df = 2, p-value = 1e-05

方差检验

使用Fligner-Killeen(中位数)检验完成不同组别的方差比较

fligner.test(wt ~ cyl, data = mtcars)
#> 
#>  Fligner-Killeen test of homogeneity of variances
#> 
#> data:  wt by cyl
#> Fligner-Killeen:med chi-squared = 0.5, df = 2, p-value = 0.8

尺度参数差异

R 有一些检验可以用来确定尺度参数的差异。分布的尺度参数确定分布函数的尺度,如 t 分布的自由度。

下面是针对两样本尺度参数差异的 Ansari-Bradley 检验。

## Hollander & Wolfe (1973, p. 86f):
## Serum iron determination using Hyland control sera
ramsay <- c(111, 107, 100, 99, 102, 106, 109, 108, 104, 99,
            101, 96, 97, 102, 107, 113, 116, 113, 110, 98)
jung.parekh <- c(107, 108, 106, 98, 105, 103, 110, 105, 104,
            100, 96, 108, 103, 104, 114, 114, 113, 108, 106, 99)
ansari.test(ramsay, jung.parekh)
#> Warning in ansari.test.default(ramsay, jung.parekh): cannot compute exact p-
#> value with ties
#> 
#>  Ansari-Bradley test
#> 
#> data:  ramsay and jung.parekh
#> AB = 186, p-value = 0.2
#> alternative hypothesis: true ratio of scales is not equal to 1

还可以使用 Mood 两样本检验做。

mood.test(ramsay, jung.parekh)
#> 
#>  Mood two-sample test of scale
#> 
#> data:  ramsay and jung.parekh
#> Z = 1, p-value = 0.3
#> alternative hypothesis: two.sided

离散数据

比例检验

使用 prop.test() 比较两组观测值发生的概率是否有差异。

heads <- rbinom(1, size = 100, prob = .5)
prop.test(heads, 100)          # continuity correction TRUE by default
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  heads out of 100, null probability 0.5
#> X-squared = 0.2, df = 1, p-value = 0.6
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#>  0.370 0.572
#> sample estimates:
#>    p 
#> 0.47
prop.test(heads, 100, correct = FALSE)
#> 
#>  1-sample proportions test without continuity correction
#> 
#> data:  heads out of 100, null probability 0.5
#> X-squared = 0.4, df = 1, p-value = 0.5
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#>  0.375 0.567
#> sample estimates:
#>    p 
#> 0.47

可以给定概率值。

prop.test(heads, 100, p = 0.3, correct = FALSE)
#> 
#>  1-sample proportions test without continuity correction
#> 
#> data:  heads out of 100, null probability 0.3
#> X-squared = 14, df = 1, p-value = 2e-04
#> alternative hypothesis: true p is not equal to 0.3
#> 95 percent confidence interval:
#>  0.375 0.567
#> sample estimates:
#>    p 
#> 0.47

二项式检验

binom.test(c(682, 243), p = 3/4)
#> 
#>  Exact binomial test
#> 
#> data:  c(682, 243)
#> number of successes = 682, number of trials = 925, p-value = 0.4
#> alternative hypothesis: true probability of success is not equal to 0.75
#> 95 percent confidence interval:
#>  0.708 0.765
#> sample estimates:
#> probability of success 
#>                  0.737
binom.test(682, 682 + 243, p = 3/4)  # The same
#> 
#>  Exact binomial test
#> 
#> data:  682 and 682 + 243
#> number of successes = 682, number of trials = 925, p-value = 0.4
#> alternative hypothesis: true probability of success is not equal to 0.75
#> 95 percent confidence interval:
#>  0.708 0.765
#> sample estimates:
#> probability of success 
#>                  0.737

与其他的检验函数不同,这里的 p 值表示试验成功率与假设值的最小差值

列联表检验

用来确定两个分类变量是否相关。

对于小的列联表,试验 Fisher 精确检验获得较好的检验结果。

Fisher 检验有一个关于喝茶的故事。

## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
## A British woman claimed to be able to distinguish whether milk or
##  tea was added to the cup first.  To test, she was given 8 cups of
##  tea, in four of which milk was added first.  The null hypothesis
##  is that there is no association between the true order of pouring
##  and the woman's guess, the alternative that there is a positive
##  association (that the odds ratio is greater than 1).
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
#> 
#>  Fisher's Exact Test for Count Data
#> 
#> data:  TeaTasting
#> p-value = 0.2
#> alternative hypothesis: true odds ratio is greater than 1
#> 95 percent confidence interval:
#>  0.314   Inf
#> sample estimates:
#> odds ratio 
#>       6.41
## => p = 0.2429, association could not be established

当列联表较大时,Fisher 计算量很大,可以使用卡方检验替代。

chisq.test(TeaTasting)
#> Warning in chisq.test(TeaTasting): Chi-squared approximation may be incorrect
#> 
#>  Pearson's Chi-squared test with Yates' continuity correction
#> 
#> data:  TeaTasting
#> X-squared = 0.5, df = 1, p-value = 0.5

对于三变量的混合影响,使用 Cochran-Mantel-Haenszel 检验。

## Agresti (1990), pages 231--237, Penicillin and Rabbits
## Investigation of the effectiveness of immediately injected or 1.5
##  hours delayed penicillin in protecting rabbits against a lethal
##  injection with beta-hemolytic streptococci.
Rabbits <-
array(c(0, 0, 6, 5,
        3, 0, 3, 6,
        6, 2, 0, 4,
        5, 6, 1, 0,
        2, 5, 0, 0),
      dim = c(2, 2, 5),
      dimnames = list(
          Delay = c("None", "1.5h"),
          Response = c("Cured", "Died"),
          Penicillin.Level = c("1/8", "1/4", "1/2", "1", "4")))
Rabbits
#> , , Penicillin.Level = 1/8
#> 
#>       Response
#> Delay  Cured Died
#>   None     0    6
#>   1.5h     0    5
#> 
#> , , Penicillin.Level = 1/4
#> 
#>       Response
#> Delay  Cured Died
#>   None     3    3
#>   1.5h     0    6
#> 
#> , , Penicillin.Level = 1/2
#> 
#>       Response
#> Delay  Cured Died
#>   None     6    0
#>   1.5h     2    4
#> 
#> , , Penicillin.Level = 1
#> 
#>       Response
#> Delay  Cured Died
#>   None     5    1
#>   1.5h     6    0
#> 
#> , , Penicillin.Level = 4
#> 
#>       Response
#> Delay  Cured Died
#>   None     2    0
#>   1.5h     5    0
## Classical Mantel-Haenszel test
mantelhaen.test(Rabbits)
#> 
#>  Mantel-Haenszel chi-squared test with continuity correction
#> 
#> data:  Rabbits
#> Mantel-Haenszel X-squared = 4, df = 1, p-value = 0.05
#> alternative hypothesis: true common odds ratio is not equal to 1
#> 95 percent confidence interval:
#>   1.03 47.73
#> sample estimates:
#> common odds ratio 
#>                 7

用 McNemar 卡方检验检验二维列联表的对称性。

## Agresti (1990), p. 350.
## Presidential Approval Ratings.
##  Approval of the President's performance in office in two surveys,
##  one month apart, for a random sample of 1600 voting-age Americans.
Performance <-
matrix(c(794, 86, 150, 570),
       nrow = 2,
       dimnames = list("1st Survey" = c("Approve", "Disapprove"),
                       "2nd Survey" = c("Approve", "Disapprove")))
Performance
#>             2nd Survey
#> 1st Survey   Approve Disapprove
#>   Approve        794        150
#>   Disapprove      86        570
mcnemar.test(Performance)
#> 
#>  McNemar's Chi-squared test with continuity correction
#> 
#> data:  Performance
#> McNemar's chi-squared = 17, df = 1, p-value = 4e-05

列联表非参数检验

Friedman 秩和检验是一个非参数版本的双边 ANOVA 检验。

## Hollander & Wolfe (1973), p. 140ff.
## Comparison of three methods ("round out", "narrow angle", and
##  "wide angle") for rounding first base.  For each of 18 players
##  and the three method, the average time of two runs from a point on
##  the first base line 35ft from home plate to a point 15ft short of
##  second base is recorded.
RoundingTimes <-
matrix(c(5.40, 5.50, 5.55,
         5.85, 5.70, 5.75,
         5.20, 5.60, 5.50,
         5.55, 5.50, 5.40,
         5.90, 5.85, 5.70,
         5.45, 5.55, 5.60,
         5.40, 5.40, 5.35,
         5.45, 5.50, 5.35,
         5.25, 5.15, 5.00,
         5.85, 5.80, 5.70,
         5.25, 5.20, 5.10,
         5.65, 5.55, 5.45,
         5.60, 5.35, 5.45,
         5.05, 5.00, 4.95,
         5.50, 5.50, 5.40,
         5.45, 5.55, 5.50,
         5.55, 5.55, 5.35,
         5.45, 5.50, 5.55,
         5.50, 5.45, 5.25,
         5.65, 5.60, 5.40,
         5.70, 5.65, 5.55,
         6.30, 6.30, 6.25),
       nrow = 22,
       byrow = TRUE,
       dimnames = list(1 : 22,
                       c("Round Out", "Narrow Angle", "Wide Angle")))
friedman.test(RoundingTimes)
#> 
#>  Friedman rank sum test
#> 
#> data:  RoundingTimes
#> Friedman chi-squared = 11, df = 2, p-value = 0.004

最后分享一张图,帮助读者选择一个合适的统计检验:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,233评论 6 495
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,357评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,831评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,313评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,417评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,470评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,482评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,265评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,708评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,997评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,176评论 1 342
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,827评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,503评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,150评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,391评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,034评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,063评论 2 352

推荐阅读更多精彩内容

  • 按照用途分类出以下统计函数: AVEDEV 用途:返回一组数据与其平均值的绝对偏差的平均值,该函数可以评测数据(例...
    四方院祭司阅读 2,890评论 0 3
  • 1、基石:大数定律和中心极限定律 大数定理。不管是强大数定理还是弱大数定理,都表达着这样一个意思:当样本数量足够大...
    龙鹰图腾223阅读 20,410评论 0 29
  • Chapter1 什么是统计学(statistics)?统计学是描述一系列可用于描述/整理/解释资料或数据的统计工...
    芒果芭乐阅读 5,154评论 0 18
  • 有人说,她是一颗飘渺的沙,随风飞逝。 有人说,她是一个孤独的人,沧桑憔颜。 我问她:"你去哪里啊?还会...
    Cry_5b90阅读 382评论 0 0
  • 我眼中的菊花 正值秋深霜浓、菊花盛开的时节,我怀着浓厚的兴趣在村里寻找有菊花的人家,欣赏一番,好大饱眼福。 村里种...
    回啦阅读 345评论 0 0