今天的重点是学习R包
思维导图如下
安装和加载R包
1. 镜像设置
# options函数就是设置R运行过程中的一些选项设置
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) #对应清华源
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") #对应中科大源
options()$BioC_mirror
查询镜像
写入配置文件
file.edit('~/.Rprofile')
2. 安装包及加载
install.packages("dplyr")
install.packages(“libs/xxxx.tgz”,repo=NULL,type=”source”) #安装本地包
BiocManager::install() # bioconductor上的包
devtools::install_github("user/repo") #安装github上的包
library(dplyr)
require(package)
附,自己总结的优雅的包安装及加载方法
.pkgcheck <- function(){
# List of packages for session
.packages = c("ggplot2", "dplyr", "rms")
# Install CRAN packages (if not already installed)
print("If check and install the packages? yes/no")
.selection = readline()
if (.selection == "yes") {
.inst <- .packages %in% installed.packages()
if(length(.packages[!.inst]) > 0) install.packages(.packages[!.inst])
}
# Load packages into session
lapply(.packages, require, character.only=TRUE)
rm(list=ls())
}
.pkgcheck()
dplyr基础函数
包的cheatsheet
1. mutate()
mutate(data,new_col = ...)
2.select()
select(data, col_num or col_name)
select(test,c(1,5))
select(test, Petal.Length, Petal.Width
vars <- c("Petal.Length", "Petal.Width")
select(test, one_of(vars)) # 直接使用select(test,vars)也可以,但是会有提示表明指代模糊
3.filter()
4.arrange()
和sort指令差不多
arrange(.data, ...)
arrange(test, Sepal.Length)#默认从小到大排序
arrange(test, desc(Sepal.Length))#用desc从大到小
5.summarise()
summarise(test, mean(Sepal.Length), sd(Sepal.Length))# 计算Sepal.Length的平均值和标准差
summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
VARIATIONS # summarise函数变种
- summarise_all() - Apply funs to every column.
iris %>%
group_by(Species) %>%
summarise_all(list(min, max,sd))
# 对所有的变量进行list中函数的summarise
- summarise_at() - Apply funs to specific columns.
iris %>%
summarise_at(vars(Sepal.Length:Petal.Width), list(mean,sd), na.rm = TRUE)
- summarise_if() - Apply funs to all cols of one type.
starwars %>%
summarise_if(is.numeric, mean, na.rm = TRUE)
6.group_by
iris %>% group_by(Species) %>%
summarise(mean(Sepal.Length),sd(Sepal.Width))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `sd(Sepal.Width)`
<fct> <dbl> <dbl>
1 setosa 5.01 0.379
2 versicolor 5.94 0.314
3 virginica 6.59 0.322
dplyr两个实用技能
1. 管道操作 %>% (cmd/ctr + shift + M)
将符号前面的结果作为符号后一个的数据
In computer programming, especially in UNIX operating systems, a pipe is a technique for passing information from one program process to another.
2. count()
count(iris,Species)
Species n
1 setosa 50
2 versicolor 50
3 virginica 50
dplyr处理关系数据
类似于sql表连接
1. inner_join()
2. left_join()
3. full_join()
4. semi_join()
返回能够与y表匹配的x表所有记录semi_join
5. anti_join()
返回无法与y表匹配的x表的所记录anti_join
6. 简单合并
- bind_rows()
- bind_cols()