【R<-练习】生物统计学练习题3

Question1.

农场调查了A、B、C等3个品种草莓的维生素C含量,测定结果如表1所示。

表一

（1）检验维生素C含量对于品种A、B、C是否是正态的；如果是，是否满足方差齐次性要求;
（2）分析品种A、B、C的维生素C含量是否存在差异；如果是，那么分析各个品种两两之间存在差异与否。
注：正态性检验可用shapiro.test，本题数据可以直接从附件strawberry.txt读取。

解答：
采用shapiro.test进行正态性检验，原假设为‘A品种草莓的维生素C含量服从正态分布’，备择假设为‘A品种草莓的维生素C含量不服从正态分布’。

# 设置工作目录
> setwd("C:/Users/My/Desktop/")
# 读取数据并保存到strawberry中
> strawberry <- read.table("strawberry.txt",header = T)
# 查看strawberry文件的内容
> strawberry
   breed  vc
1      a 117
2      a  99
3      a 107
4      a 112
5      a 113
6      a 106
7      b  81
8      b  77
9      b  79
10     b  76
11     b  85
12     b  87
13     b  74
14     b  45
15     b  72
16     b  80
17     c  80
18     c  82
19     c  78
20     c  84
21     c  89
22     c  53
23     c  86
24     c  88
# 对A草莓的维生素C的含量进行正态性检验
> shapiro.test(strawberry$vc[strawberry$breed=='a'])

    Shapiro-Wilk normality test

data:  strawberry$vc[strawberry$breed == "a"]
W = 0.96749, p-value = 0.8752

p-value 值为 0.8752 > 0.05, 所以接受原假设，A种草莓维C含量符合正态分布

同样对B草莓和C草莓进行正态性检验

> shapiro.test(strawberry$vc[strawberry$breed == 'b'])

    Shapiro-Wilk normality test

data:  strawberry$vc[strawberry$breed == "b"]
W = 0.75208, p-value = 0.003783
# p 值结果为 0.003783 < 0.05, 所以B草莓维生素C含量不符合正态分布


# 对草莓C 进行状态性检验
> shapiro.test(strawberry$vc[strawberry$breed == 'c'])

    Shapiro-Wilk normality test

data:  strawberry$vc[strawberry$breed == "c"]
W = 0.72984, p-value = 0.004876

p < 0.05, 所以C草莓的维生素C含量不符合正态分布

对方差齐性做检验

> bartlett.test(vc~breed, data = strawberry)

    Bartlett test of homogeneity of variances

data:  vc by breed
Bartlett's K-squared = 2.0697, df = 2, p-value = 0.3553

# p-value= 0.3553  > 0.05, 所以拒绝原假设，所以满足方差齐次要求

采用方差分析的方法分析不同品种的维生素C含量是否相同，原假设为‘三种品种的均值相同’，备择假设为‘至少有一个品种的均值与其他品种不同’,由于数据不符合正态分布，我们采用kruskal.test。

> attach(strawberry)
> kruskal.test(vc~breed)

    Kruskal-Wallis rank sum test

data:  vc by breed
Kruskal-Wallis chi-squared = 14.344, df = 2, p-value = 0.0007676

#p-value < 0.05, 所以品种A、B、C的维生素C含量存在差异

# 分析各个品种两两之间存在差异与否
> pairwise.wilcox.test(vc,breed,p.adjust.method = "bonferroni")

    Pairwise comparisons using Wilcoxon rank sum test 

data:  vc and breed 

  a       b      
b 0.00075 -      
c 0.00200 0.39216

P value adjustment method: bonferroni 
Warning message:
In wilcox.test.default(xi, xj, paired = paired, ...) :
  无法精確計算带连结的p值

可以看出A与B、A与C之间成wilcox.test 检验的p值远小于0.05，而B与C之间则大于0.05，所以A品种的维生素C含量与其他品种不同，B、C品种则没有明显差异。

Question2:

we have data collected from the flammability test. a cloth has been cut into 55 pieces. there are 5 labs, and each lab received 11 pieces. these measure correspond to the length in the chart(fabric.csv). if the tests each of these labs have been set up same way would expect results to be similar. since all these measurements were taken from the same bolt of cloth.
1.use box plots to see the data outline
2.are the variances equal across groups according to the Bartlett Test?
3.are the mean lengths the same across groups?
4.if not, which lab is the difference? (pair wise comparison tests)

解：

> fabric <- read.table("fabric.txt",header = T);fabric
   Lab1 Lab2 Lab3 Lab4 Lab5
1   3.4  2.7  3.3  3.3  4.1
2   3.1  3.4  3.3  3.2  4.1
3   3.4  3.6  3.5  3.4  3.7
4   3.7  3.2  3.5  2.7  4.2
5   3.1  4.0  2.8  2.7  3.1
6   4.2  4.1  2.8  3.3  3.5
7   3.7  3.8  3.2  2.9  2.8
8   3.9  3.8  2.8  3.2  3.5
9   3.1  4.3  3.8  2.9  3.7
10  3.0  3.4  3.5  2.6  3.5
11  2.9  3.3  3.8  2.8  3.9
> boxplot(fabric)

What we see in the boxplot is there is some variation. The boxplot may suggest there is difference in the mean lengths across the labs So let's test that now use one-way ANOVA, cause there is one factor. we preprocess the data by rearranging this data to two columns(dependent variable, levels)

boxplot

# 将数据重新排列成两列
> stfabric<- stack(fabric)
# 修改列名为“len”和“lab”
> names(stfabric) <- c("len","lab")
> stfabric
   len  lab
1  3.4 Lab1
2  3.1 Lab1
3  3.4 Lab1
4  3.7 Lab1
5  3.1 Lab1
6  4.2 Lab1
7  3.7 Lab1
8  3.9 Lab1
9  3.1 Lab1
10 3.0 Lab1
11 2.9 Lab1
12 2.7 Lab2
13 3.4 Lab2
14 3.6 Lab2
15 3.2 Lab2
16 4.0 Lab2
17 4.1 Lab2
18 3.8 Lab2
19 3.8 Lab2
20 4.3 Lab2
21 3.4 Lab2
22 3.3 Lab2
23 3.3 Lab3
24 3.3 Lab3
25 3.5 Lab3
26 3.5 Lab3
27 2.8 Lab3
28 2.8 Lab3
29 3.2 Lab3
30 2.8 Lab3
31 3.8 Lab3
32 3.5 Lab3
33 3.8 Lab3
34 3.3 Lab4
35 3.2 Lab4
36 3.4 Lab4
37 2.7 Lab4
38 2.7 Lab4
39 3.3 Lab4
40 2.9 Lab4
41 3.2 Lab4
42 2.9 Lab4
43 2.6 Lab4
44 2.8 Lab4
45 4.1 Lab5
46 4.1 Lab5
47 3.7 Lab5
48 4.2 Lab5
49 3.1 Lab5
50 3.5 Lab5
51 2.8 Lab5
52 3.5 Lab5
53 3.7 Lab5
54 3.5 Lab5
55 3.9 Lab5
# 加载stfabric数据
> attach(stfabric)
> str(stfabric)
'data.frame':   55 obs. of  2 variables:
 $ len: num  3.4 3.1 3.4 3.7 3.1 4.2 3.7 3.9 3.1 3 ...
 $ lab: Factor w/ 5 levels "Lab1","Lab2",..: 1 1 1 1 1 1 1 1 1 1 ...
# 检验各组数据的正态性
> shapiro.test(stfabric[stfabric$lab=="Lab1",1])$p.value
[1] 0.3230597
> shapiro.test(stfabric[stfabric$lab=="Lab2",1])$p.value
[1] 0.9332868
> shapiro.test(stfabric[stfabric$lab=="Lab3",1])$p.value
[1] 0.1293238
> shapiro.test(stfabric[stfabric$lab=="Lab4",1])$p.value
[1] 0.2042771
> shapiro.test(stfabric[stfabric$lab=="Lab5",1])$p.value
[1] 0.4806684

# 其p-value均小于0.05，表明各组数据均服从正态分布

# 检验数据的方差齐性
> bartlett.test(len~lab)

    Bartlett test of homogeneity of variances

data:  len by lab
Bartlett's K-squared = 2.4179, df = 4, p-value = 0.6594

# 其中p-value < 0.05, 证明具有方差齐性

p-value is higher than 0.05 in each group. the result means that the variances are equal across groups according to the Bartlett Test.

# Are the mean lengths the same across groups? (4’)
# H0: There is no difference means across labs H1: there is at least one difference

# One way anova
> av1 <- aov(len~lab,data = stfabric)
> summary(av1)
            Df Sum Sq Mean Sq F value  Pr(>F)   
lab          4  2.969  0.7423   4.676 0.00277 **
Residuals   50  7.936  0.1587                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#the F value is 4.5. P value is 0.003, which is highly significant. what we conclude is there is a difference in the means across labs. we reject the null hypothesis

which lab is the difference? (pair wise comparison tests) next obvious question to ask is where the difference lie? here we use the pairwise comparison test TukeyHSD, which gives a reasonable intervals and only works on balanced designs.

# 探究各变量之间的不同，进行各组均值差异的成对检验
> tk <- TukeyHSD(av1)
> tk
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = len ~ lab, data = stfabric)

$`lab`
                 diff        lwr         upr     p adj
Lab2-Lab1  0.19090909 -0.2898210  0.67163915 0.7932570
Lab3-Lab1 -0.10909091 -0.5898210  0.37163915 0.9673122
Lab4-Lab1 -0.40909091 -0.8898210  0.07163915 0.1299798
Lab5-Lab1  0.23636364 -0.2443664  0.71709370 0.6359580
Lab3-Lab2 -0.30000000 -0.7807301  0.18073006 0.4044590
Lab4-Lab2 -0.60000000 -1.0807301 -0.11926994 0.0076519
Lab5-Lab2  0.04545455 -0.4352755  0.52618461 0.9988350
Lab4-Lab3 -0.30000000 -0.7807301  0.18073006 0.4044590
Lab5-Lab3  0.34545455 -0.1352755  0.82618461 0.2654246
Lab5-Lab4  0.64545455  0.1647245  1.12618461 0.0034711

Rplot.jpeg

置信区间包含0组别说明差异不显著（p>0.05）,所以由图可知，Lab4-Lab2， Lab5-Lab4 之间有着显著差异，并且其p-adj值均小于0.05

3.Question

用大麦生产啤酒的过程中，需要研究烘烤方式（A）和大麦水分（B）对糖化时间的影响，选择两种烘烤方式、四种大麦水分，共八种处理方式，每一次处理重复三次，结果如表2所示。

表二.png

（1）根据课件双因素方法分析给出各组假设，列出方差分析表（给出表达式即可）并指明表中各值的意义；
（2）试分析烘烤方式A、水分B以及二者相互作用对糖化时间的影响；
（3）对AiBj条件下平均时间做多重比较，并指出与A1B3组有显著差异的条件;
（4）用图形展示交互效应并作说明。
注：假设数据满足正态性和方差齐次性；
本题数据可直接从附件beer.txt读取。

解：

(2)

> setwd("C:/Users/My/Desktop")
> beer <- read.table("beer.txt",header = T)
> head(beer)
   A  B time
1 A1 B1 12.0
2 A1 B1 13.0
3 A1 B1 14.5
4 A1 B2  9.5
5 A1 B2 10.0
6 A1 B2 12.5
> attach(beer)

下面对数据进行两因素的方差分析，原假设为‘因素A、B及其相互作用不影响糖化时间’，备择假设为‘存在某个因素影响糖化时间’。

# 双因素ANOVA分析

> ff <- aov(time~A*B,data = beer)
> summary(ff)
            Df Sum Sq Mean Sq F value   Pr(>F)    
A            1   5.51    5.51   4.008   0.0625 .  
B            3 228.86   76.29  55.482 1.12e-08 ***
A:B          3 107.61   35.87  26.088 2.12e-06 ***
Residuals   16  22.00    1.38                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

以上结果可以看出，烘烤方式A对应的p值大于0.05，不对糖化时间造成显著影响；而水分B以及A、B的相互作用对糖化时间有显著影响。

# 多重比较
> multi.compr <- TukeyHSD(ff)
> AB <- multi.compr$`A:B`
> AB[AB[,4]<0.05,]
                 diff          lwr       upr        p adj
A2:B1-A1:B1 -7.500000 -10.81475692 -4.185243 1.615920e-05
A2:B3-A1:B1  4.166667   0.85190974  7.481424 9.013947e-03
A1:B4-A1:B1  4.833333   1.51857641  8.148090 2.329391e-03
A1:B2-A2:B1  5.000000   1.68524308  8.314757 1.667245e-03
A2:B2-A2:B1  8.333333   5.01857641 11.648090 4.122553e-06
A1:B3-A2:B1  9.500000   6.18524308 12.814757 7.018864e-07
A2:B3-A2:B1 11.666667   8.35190974 14.981424 3.850844e-08
A1:B4-A2:B1 12.333333   9.01857641 15.648090 1.726279e-08
A2:B4-A2:B1 10.500000   7.18524308 13.814757 1.733446e-07
A2:B2-A1:B2  3.333333   0.01857641  6.648090 4.821463e-02
A1:B3-A1:B2  4.500000   1.18524308  7.814757 4.572530e-03
A2:B3-A1:B2  6.666667   3.35190974  9.981424 6.934742e-05
A1:B4-A1:B2  7.333333   4.01857641 10.648090 2.146571e-05
A2:B4-A1:B2  5.500000   2.18524308  8.814757 6.207373e-04
A2:B3-A2:B2  3.333333   0.01857641  6.648090 4.821463e-02
A1:B4-A2:B2  4.000000   0.68524308  7.314757 1.265663e-02

从上述结果中可以看出，和A1B3存在差异的的组有A2B1和A1B2

（3）

> par(mfrow=c(1,2))
> attach(beer)
> interaction.plot(A,B,time,type = "b",col = c("red","blue","green"))
> interaction.plot(B,A,time,type = "b",col = c("red","blue","green"))

交互作用图

从上图可以看出，固定一个因素（A或者B）时候，另外一个因素的变化线段彼此之间不平行，会受到前者（B或A）的影响。故而A与B存在一定的交互作用。

最后编辑于：2019.05.28 09:47:38

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 215,133评论 6赞 497
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 91,682评论 3赞 390
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 160,784评论 0赞 350
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 57,508评论 1赞 288
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 66,603评论 6赞 386
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 50,607评论 1赞 293
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 39,604评论 3赞 415
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,359评论 0赞 270
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 44,805评论 1赞 307
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,121评论 2赞 330
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,280评论 1赞 344
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 34,959评论 5赞 339
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 40,588评论 3赞 322
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,206评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,442评论 1赞 268
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,193评论 2赞 367
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,144评论 2赞 352

【R<-练习】生物统计学练习题3

Question1.

Question2:

3.Question

推荐阅读更多精彩内容