高考下雨可能是有渊源的。

R的最大优点就是拥有丰富的R包。

安装和加载R包

1.镜像设置

初级模式

Tools --Global Options---packages--选择一个国内的镜像

图片来自生信星球

缺点：不能下载Biocondoutor的包。
进阶模式

# options函数就是设置R运行过程中的一些选项设置
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) #对应清华源
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") #对应中科大源
# 当然可以换成其他地区的镜像

检查options()$BioC_mirror

大神模式
使用R的配置文件.Rprofile
R在启动的时候会自动运行配置文件。
首先用file.edit（）来编辑文件

file.edit('~/.Rprofile)

然后在上面添加刚才复制好的两行代码。

两行代码

重启后，运行options()$repos和options()$BioC_mirror检查结果。

2.安装

确保联网

你的包在CRAN---install.packages(“包”)
你的包在Biocductor---BiocManager::install(“包”)
或者可以百度包名，如果在Biocductor，你可以用它网站的安装命令。

3. 加载包

library(包)
require(包)

实例

安装dplyr这个包

options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) 
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") 
install.packages("dplyr")
library(dplyr)

数据直接用鸢尾花数据集（简化版）

test <- iris[c(1:2,51:52,101:102),]

1. mutate（），新增列

mutate(test, new = Sepal.Length * Sepal.Width)

结果
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 17.85
2 4.9 3.0 1.4 0.2 setosa 14.70
3 7.0 3.2 4.7 1.4 versicolor 22.40
4 6.4 3.2 4.5 1.5 versicolor 20.48
5 6.3 3.3 6.0 2.5 virginica 20.79
6 5.8 2.7 5.1 1.9 virginica 15.66

2. select（），按列筛选

select(test,1)

按列筛选

select(test,c(1,5))  #筛选第一列和第五列

第一列和第五列

select(test,Sepal.Length)   #筛选Sepal.Length列

筛选固定列名

3. filter（）筛选行

filter(test, Species == "setosa")
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
filter(test, Species == "setosa"&Sepal.Length > 5 )
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
filter(test, Species %in% c("setosa","versicolor"))
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5

4. arrange（）排序

arrange(test, Sepal.Length)#默认从小到大排序

升序排列

arrange(test, desc(Sepal.Length))#用desc从大到小

降序排列

5.summarise（）：汇总

对数据进行汇总操作,结合group_by使用实用性强

summarise(test, mean(Sepal.Length), sd(Sepal.Length))# 计算Sepal.Length的平均值和标准差

平均值和误差值

先按照Species分组，计算每组Sepal.Length的平均值和标准差

group_by(test, Species)   #按照species分组
summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))

按物种分三类

按物种分进行各组内的平均值和误差值分析

dplyr的两个实用功能

1. 管道操作%>% (cmd/ctr + shift + M)

管道操作可能不太好理解，就是吧多行连续代码给穿起来，有点像“然后”的意思。

test %>% group_by %>%summarise(mean(Srpal.Length), sd(Sepal.Length))

管道操作

2.count（）统计某列的unique值

count(test,Species)

数列数

dplyr处理关系数据

即将2个表进行连接，注意：不要引入factor

options(stringsAsFactors = F)

test1 <- data.frame(x = c('b','e','f','x'), 
                    z = c("A","B","C",'D'),
                    stringsAsFactors = F)

test1

test2 <- data.frame(x = c('a','b','c','d','e','f'), 
                    y = c(1,2,3,4,5,6),
                    stringsAsFactors = F)

test2

1. 內连inner_join,取交集

inner_join(test1, test2, by = "x")

这里为取交集操作，交集为x

连接数据

2.左连left_join

left_join(test1, test2, by = 'x')

这里以test1为主要数据。

没有数据的显示为NA

left_join(test2, test1, by = 'x')

这里以test2为主要数据

左连

3.全连full_join

full_join( test1, test2, by = 'x')

全连

4.半连接：返回能够与y表匹配的x表所有记录semi_join

semi_join(x = test1, y = test2, by = 'x')
##   x z
## 1 b A
## 2 e B
## 3 f C

5.反连接：返回无法与y表匹配的x表的所记录anti_join

anti_join(x = test2, y = test1, by = 'x')
##   x y
## 1 a 1
## 2 c 3
## 3 d 4

6.简单合并

相同列的上下合并，向同行的左右合并。

学习小组D6-雷声千嶂落，雨色万峰来-万重山