在数据分析中,我们经常要对数据进行统计分析;但是返回的结果往往是一串很长的浮点数不能给人直观的感受,本节来解释如何使用lucid函数来改进数据格式使P值更加直观
原文链接:R中优雅的对P值进行转换
安装并加载R包
package.list=c("tidyverse","lucid","broom")
for (package in package.list) {
if (!require(package,character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
}
数据展示
Orange %>% group_by(Tree) %>%
do(tidy(lm(circumference ~ age, data=.))) %>% as.data.frame
可以看到返回的P值格式很不直观
Tree term estimate std.error statistic p.value
1 3 (Intercept) 19.20353638 5.863410215 3.275148 2.207255e-02
2 3 age 0.08111158 0.005628105 14.411881 2.901046e-05
3 1 (Intercept) 24.43784664 6.543311039 3.734783 1.350409e-02
4 1 age 0.08147716 0.006280721 12.972581 4.851902e-05
5 5 (Intercept) 8.75834459 8.176436207 1.071169 3.330518e-01
6 5 age 0.11102891 0.007848307 14.146861 3.177093e-05
7 2 (Intercept) 19.96090337 9.352361105 2.134317 8.593318e-02
8 2 age 0.12506176 0.008977041 13.931291 3.425041e-05
9 4 (Intercept) 14.63762022 11.233762751 1.303002 2.493507e-01
10 4 age 0.13517222 0.010782940 12.535748 5.733090e-05
lucid转换格式
Orange %>% group_by(Tree) %>%
do(tidy(lm(circumference ~ age, data=.))) %>% as.data.frame %>% lucid
Tree term estimate std.error statistic p.value
<ord> <chr> <chr> <chr> <chr> <chr>
1 3 (Intercept) "19.2 " " 5.86 " " 3.28" "0.0221 "
2 3 age " 0.0811" " 0.00563" "14.4 " "0.000029 "
3 1 (Intercept) "24.4 " " 6.54 " " 3.73" "0.0135 "
4 1 age " 0.0815" " 0.00628" "13 " "0.0000485"
5 5 (Intercept) " 8.76 " " 8.18 " " 1.07" "0.333 "
6 5 age " 0.111 " " 0.00785" "14.1 " "0.0000318"
7 2 (Intercept) "20 " " 9.35 " " 2.13" "0.0859 "
8 2 age " 0.125 " " 0.00898" "13.9 " "0.0000343"
9 4 (Intercept) "14.6 " "11.2 " " 1.3 " "0.249 "
10 4 age " 0.135 " " 0.0108 " "12.5 " "0.0000573"
经过lucid函数处理后,可以看到数据符合人类的感官了,但是请注意数据格式变为了字符串类型,因此后续我们需求将其重新转换为数值型
P值转换
通过symnum函数将P值转换为
*
Orange %>% group_by(Tree) %>%
do(tidy(lm(circumference ~ age, data=.))) %>% as.data.frame %>%
mutate(p.value=as.numeric(p.value)) %>%
lucid %>%
mutate(pvalue=as.numeric(p.value),
p_signif=symnum(pvalue,
cutpoints = c(0,0.001,0.01,0.05,1),
symbols = c("***","**","*"," "))) %>%
select(-pvalue)
Tree term estimate std.error statistic p.value pvalue signif
1 3 (Intercept) 19.2 5.86 3.28 0.0221 2.21e-02 *
2 3 age 0.0811 0.00563 14.4 0.000029 2.90e-05 ***
3 1 (Intercept) 24.4 6.54 3.73 0.0135 1.35e-02 *
4 1 age 0.0815 0.00628 13 0.0000485 4.85e-05 ***
5 5 (Intercept) 8.76 8.18 1.07 0.333 3.33e-01
6 5 age 0.111 0.00785 14.1 0.0000318 3.18e-05 ***
7 2 (Intercept) 20 9.35 2.13 0.0859 8.59e-02
8 2 age 0.125 0.00898 13.9 0.0000343 3.43e-05 ***
9 4 (Intercept) 14.6 11.2 1.3 0.249 2.49e-01
10 4 age 0.135 0.0108 12.5 0.0000573 5.73e-05 ***
自定义函数结合sapply对P值进行转换
myfun <- function(pval) {
stars = ""
if(pval <= 0.001)
stars = "***"
if(pval > 0.001 & pval <= 0.01)
stars = "**"
if(pval > 0.01 & pval <= 0.05)
stars = "*"
if(pval > 0.05 & pval <= 0.1)
stars = ""
stars
}
Orange %>% group_by(Tree) %>%
do(tidy(lm(circumference ~ age, data=.))) %>% as.data.frame %>%
lucid %>%
mutate(pvalue=as.numeric(p.value)) %>%
mutate(signif = sapply(p.value, function(x) myfun(x)))
Tree term estimate std.error statistic p.value pvalue signif
1 3 (Intercept) 19.2 5.86 3.28 0.0221 2.21e-02 *
2 3 age 0.0811 0.00563 14.4 0.000029 2.90e-05 ***
3 1 (Intercept) 24.4 6.54 3.73 0.0135 1.35e-02 *
4 1 age 0.0815 0.00628 13 0.0000485 4.85e-05 ***
5 5 (Intercept) 8.76 8.18 1.07 0.333 3.33e-01
6 5 age 0.111 0.00785 14.1 0.0000318 3.18e-05 ***
7 2 (Intercept) 20 9.35 2.13 0.0859 8.59e-02
8 2 age 0.125 0.00898 13.9 0.0000343 3.43e-05 ***
9 4 (Intercept) 14.6 11.2 1.3 0.249 2.49e-01
10 4 age 0.135 0.0108 12.5 0.0000573 5.73e-05 ***
喜欢的小伙伴欢迎关注我的公众号 ,下回更新不迷路
R语言数据分析指南,持续分享数据可视化的经典案例及一些生信知识,希望对大家