1、打开 Rstudio 告诉我它的工作目录
命令为getwd(), [1] "D:/R_data/R_exercise"
如果更改工作目录可以 用setwd()命令。
2、新建6个向量,基于不同的原子类型。(重点是字符串,数值,逻辑值)
c(1,2,3,3)
c("a","b","c","d")
3
TRUE
FALSE
3>5
5>3
> c(1,2,3,3)
[1] 1 2 3 3
c("a","b","c","d")
[1] "a" "b" "c" "d"
3
[1] 3
TRUE
[1] TRUE
FALSE
[1] FALSE
3>5
[1] FALSE
5>3
[1] TRUE
3、typeof() or class()判断数据类型,将要判断的内容放进括号
typeof(3)
typeof("a")
typeof(3>5)
class(3)
class("a")
class(3>5)
> typeof(3)
[1] "double"
typeof("a")
[1] "character"
typeof(3>5)
[1] "logical"
class(3)
[1] "numeric"
class("a")
[1] "character"
class(3>5)
[1] "logical"
4、as族函数实现数据类型之间的转换
as.numeric("a")
as.logical(3)
as.character(3)
as.character(3>5)
as.numeric(3>5)
as.numeric(5>3)
> as.numeric("a")
[1] NA
Warning message:
NAs introduced by coercion
as.logical(3)
[1] TRUE
as.character(3)
[1] "3"
as.character(3>5)
[1] "FALSE"
as.numeric(3>5)
[1] 0
as.numeric(5>3)
[1] 1
5、is族函数,判断,返回值为TRUE或FALSE
is.numeric("a")
is.logical(3)
is.character("3")
is.logical(5>3)
> is.numeric("a")
[1] FALSE
is.logical(3)
[1] FALSE
is.character("3")
[1] TRUE
is.logical(5>3)
[1] TRUE
6、向量是如何生成的
1.用c()结合起来c(2,8,9,10,9) c("ao","fe","d","b")
2.连续的数字用冒号“:” 1:8
3.有重复的用rep(),有规律的序列用seq(),随机数用rnorm
rep("gene",times=10) #重复10次
seq(from=3, to=21,by=3) #3-21,间隔为3
rnorm(n=5,mean=3, sd=5) # 随机数标准差
4.通过组合更复杂的向量
paste0(rep("gene", times=10), 1:10)
paste0(rep("RNA", times=10), 1:15)
paste0(rep("DNA", times=15), 1:10) 循环补齐以长的为准,短的循环
一看效果,二改输入,三看帮助文档
> c(2,8,9,10,9)
[1] 2 8 9 10 9
> c("ao","fe","d","b")
[1] "ao" "fe" "d" "b"
1:8
[1] 1 2 3 4 5 6 7 8
rep("gene",times=10)
[1] "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene" "gene"
seq(from=3, to=21,by=3)
[1] 3 6 9 12 15 18 21
rnorm(n=5,mean=3, sd=5)
[1] 4.411656 -5.929244 1.181725 4.919650 3.256978
paste0(rep("gene", times=10), 1:10)
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6" "gene7" "gene8"
[9] "gene9" "gene10"
paste0(rep("RNA", times=10), 1:15)
[1] "RNA1" "RNA2" "RNA3" "RNA4" "RNA5" "RNA6" "RNA7" "RNA8" "RNA9"
[10] "RNA10" "RNA11" "RNA12" "RNA13" "RNA14" "RNA15"
paste0(rep("DNA", times=15), 1:10)
[1] "DNA1" "DNA2" "DNA3" "DNA4" "DNA5" "DNA6" "DNA7" "DNA8" "DNA9"
[10] "DNA10" "DNA1" "DNA2" "DNA3" "DNA4" "DNA5"
7、对两个向量的操作
x <- c(1,3,5,6,8)
y <- c(3,2,5)
x+y #循环补齐
x == y
paste(x,y,sep = "/")
> x <- c(1,3,5,6,8)
> y <- c(3,2,5)
> x+y
[1] 4 5 10 9 10
Warning message:In x + y : longer object length is not a multiple of shorter object length
> x == y
[1] FALSE FALSE TRUE FALSE FALSE
Warning message:
In x == y : longer object length is not a multiple of shorter object length
paste(x,y,sep = "/")
[1] "1/3" "3/2" "5/5" "6/3" "8/2"
8、认识数据框、矩阵和列表
- Vector 向量 — 一维
- matrix 矩阵 — 二维 多个长短相同、数据类型相同的向量
- data.frame 数据框 多个长短相同、数据类型可以不同的向量
- List 列表:数据长短可以不同、数据类型可以不同
如果想知道数据集具体是什么类型,只需要class一下。
9、在你新建的数据框进行切片操作,比如首先取第1,3行, 然后取第4,6列
df <- data.frame(gene = paste0("gene",1:3),
sample = paste0("sample",1:3),
exp = c(32,34,45),
p = c(0.001,0.05,0.1),
level = c(2,1,4),
change = c(4,2,1))
dim(df)
nrow(df)
ncol(df)
df[c(1,3),]
df[,c(4,6)]
df <- data.frame(gene = paste0("gene",1:3),
sample = paste0("sample",1:3),
exp = c(32,34,45),
p = c(0.001,0.05,0.1),
level = c(2,1,4),
change = c(4,2,1))
dim(df)
[1] 3 6
nrow(df)
[1] 3
ncol(df)
[1] 6
df[c(1,3),]
gene sample exp p level change
1 gene1 sample1 32 0.001 2 4
3 gene3 sample3 45 0.100 4 1
df[,c(4,6)]
p change
1 0.001 4
2 0.050 2
3 0.100 1
10、使用data函数来加载R内置数据集 rivers 描述它。
data("rivers")
rivers
length(rivers)
unique(rivers)
sort(rivers)
length(unique(rivers))
range(rivers)
which.max(rivers)
11 、下载 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的 RunInfo Table
文件读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。
df <- read.table(file = "c:/Users/Administrator/Desktop/SraRunTable.txt", sep = "\t", header = T, stringsAsFactors = F)
df
ncol(df)
nrow(df)
colnames(df)
下载 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229) 里面的`样本信息sample.csv`读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。
df1 <- read.table(file = "c:/Users/Administrator/Desktop/sample.csv", sep = ",", header = T, stringsAsFactors = F)
df1
ncol(df1)
nrow(df1)
colnames(df1)
把前面两个步骤的两个表(RunInfo Table 文件,样本信息sample.csv)关联起来,使用merge函数。
m <- merge(df,df1,by.x="Sample_Name", by.y = "Accession")
运行结果内容太多了,这里不再列出。
12、基于下午的统计可视化。。。。。。
df <- read.table(file = "c:/Users/Administrator/Desktop/SraRunTable.txt", sep = "\t", header = T, stringsAsFactors = F)
df
ncol(df)
nrow(df)
colnames(df)
df1 <- read.table(file = "c:/Users/Administrator/Desktop/sample.csv", sep = ",", header = T, stringsAsFactors = F)
df1
ncol(df1)
nrow(df1)
colnames(df1)
m <- merge(df,df1,by.x="Sample_Name", by.y = "Accession")
m3 <- df[,"MBases"]
e <- m[c("MBases","Title")]
boxplot(m4)
fivenum(m3)
hist(m3)
plot(density(m3))
density(m3)
class(m3)
save(e,file = 'input.Rdata')
rm(list = ls())
options(stringsAsFactors = F)
load(file = 'input.Rdata')
e[,2]
plate <- unlist(lapply(e[,2],function(x){
# x=e[1,2]
x
strsplit(x,'_')[[1]][3]
}))
c <- lapply(e[,2],function(x){
# x=e[1,2]
x
strsplit(x,'_')[[1]][3]
})
table(plate)
boxplot(e[,1]~plate)
t.test(e[,1]~plate)
e$plate=plate
library(ggplot2)
colnames(e)
ggplot(e,aes(x=plate,y=MBases))+geom_boxplot()
library(ggpubr)
p <- ggboxplot(e, x = "plate", y = "MBases",
color = "plate", palette = "jco",
add = "jitter")
# Add p-value
p + stat_compare_means(method = 't.test')