R语言练习题-初级

R语言基础知识的一些检验,最好是对照几本R基础语法书籍来理解。

全部答案及视频在:https://github.com/jmzeng1314/R_bilibili

首先做完了周末班工作, 包括软件安装以及R包安装:http://www.bio-info-trainee.com/3727.html

1. 打开 Rstudio 告诉我它的工作目录。

> getwd()
[1] "C:/Users/wlx/Documents"

2.新建6个向量,基于不同的原子类型。(重点是字符串,数值,逻辑值)

> a <- c(1:10)
> b <- c("a","b","c","d")
> c <- c(T,F,F,T)
> a
 [1]  1  2  3  4  5  6  7  8  9 10
> b
[1] "a" "b" "c" "d"
> c
[1]  TRUE FALSE FALSE  TRUE
> a <- c(1, 2, 5, 3, 6, -2, 4)
> b <- c("one", "two", "three")
> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
> a
[1]  1  2  5  3  6 -2  4
> b
[1] "one"   "two"   "three"
> c
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE
> 

3 告诉我在你打开的rstudio里面 getwd() 代码运行后返回的是什么?

> getwd()
[1] "C:/Users/wlx/Documents"

4新建一些数据结构,比如矩阵,数组,数据框,列表等重点是数据框,矩阵)

4.1 矩阵

> cnames <- c("C1", "C2")
> rnames <- c("g1", "g2")
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
+                    dimnames=list(rnames, cnames))
> mymatrix
   C1 C2
g1  1 26
g2 24 68
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=FALSE,
+                    dimnames=list(rnames, cnames))
> mymatrix
   C1 C2
g1  1 24
g2 26 68

4.2 数组

dim1 <- c("A1", "A2")
> dim2 <- c("B1", "B2", "B3")
> dim3 <- c("C1", "C2", "C3", "C4")
> z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
> z
, , C1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4

   B1 B2 B3
A1 19 21 23
A2 20 22 24

4.3 数据框

patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)
> patientdata
 patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor

4.4 因子

diabetes <- c("Type1", "Type2", "Type1", "Type1")
> diabetes
[1] "Type1" "Type2" "Type1" "Type1"
> diabetes <- factor(diabetes)
> diabetes
[1] Type1 Type2 Type1 Type1
Levels: Type1 Type2
> status <- c("Poor", "Improved", "Excellent", "Poor")
> status
[1] "Poor"      "Improved"  "Excellent" "Poor"     
> status <- factor(status, ordered=TRUE)
> status
[1] Poor      Improved  Excellent Poor     
Levels: Excellent < Improved < Poor
> status <- factor(status, order=TRUE,
+                  levels=c("Poor", "Improved", "Excellent"))
> status
[1] Poor      Improved  Excellent Poor     
Levels: Poor < Improved < Excellent
> sex <- factor(sex, levels=c(1, 2), labels=c("Male", "Female"))
Error in factor(sex, levels = c(1, 2), labels = c("Male", "Female")) : 
  object 'sex' not found
> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> diabetes <- factor(diabetes)
> status <- factor(status, order=TRUE)
> patientdata <- data.frame(patientID, age, diabetes, status)
> str(patientdata)
'data.frame':   4 obs. of  4 variables:
 $ patientID: num  1 2 3 4
 $ age      : num  25 34 28 52
 $ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
 $ status   : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 1 3
> patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor
> summary(patientdata)
   patientID         age         diabetes       status 
 Min.   :1.00   Min.   :25.00   Type1:3   Excellent:1  
 1st Qu.:1.75   1st Qu.:27.25   Type2:1   Improved :1  
 Median :2.50   Median :31.00             Poor     :2  
 Mean   :2.50   Mean   :34.75                          
 3rd Qu.:3.25   3rd Qu.:38.50                          
 Max.   :4.00   Max.   :52.00    

4.5 list

> g <- "My First List"
> h <- c(25, 26, 18, 39)
> j <- matrix(1:10, nrow=5)
> k <- c("one", "two", "three")
> mylist <- list(title=g, ages=h, j, k)
> mylist
$title
[1] "My First List"

$ages
[1] 25 26 18 39

[[3]]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

[[4]]
[1] "one"   "two"   "three"

> mylist[[2]]
[1] 25 26 18 39
> mylist[["ages"]]
[1] 25 26 18 39

提醒程序员注意的一些事项
经验丰富的程序员通常会发现R语言的某些方面不太寻常。以下是这门语言中你需要了解
的一些特性。
❑对象名称中的句点(.)没有特殊意义,但美元符号()却有着和其他语言中的句点 类似的含义,即指定一个数据框或列表中的某些部分。例如,Ax是指数据框A中的变
量x。
❑R不提供多行注释或块注释功能。你必须以#作为多行注释每行的开始。出于调试目的,
你也可以把想让解释器忽略的代码放到语句if(FALSE){... }中。将FALSE改为TRUE
即允许这块代码执行。
❑将一个值赋给某个向量、矩阵、数组或列表中一个不存在的元素时,R将自动扩展这个
数据结构以容纳新值。举例来说,考虑以下代码:

> x <- c(8, 6, 4)
> x[7] <- 10
> x
[1] 8 6 4 NA NA NA 10

通过赋值,向量x由三个元素扩展到了七个元素。x <- x[1:3]会重新将其缩减回三个
元素。
❑R中没有标量。标量以单元素向量的形式出现。
❑R中的下标不从0开始,而从1开始。在上述向量中,x[1]的值为8。
❑变量无法被声明。它们在首次被赋值时生成。

5. 在你新建的数据框进行切片操作,比如首先取第1,3行, 然后取第4,6列

> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)
> patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor
> df[c(1,3), ]
Error in df[c(1, 3), ] : object of type 'closure' is not subsettable
> patientdata[c(1,3), ]
  patientID age diabetes    status
1         1  25    Type1      Poor
3         3  28    Type1 Excellent
> patientdata[c(1:3), ]
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
> patientdata[,c(2,4) ]
  age    status
1  25      Poor
2  34  Improved
3  28 Excellent
4  52      Poor

5. 使用data函数来加载R内置数据集 rivers 描述它。并且可以查看更多的R语言内置的数据集:https://mp.weixin.qq.com/s/dZPbCXccTzuj0KkOL7R31g

> data("rivers")
> class(rivers)
[1] "numeric"
> str(rivers)
 num [1:141] 735 320 325 392 524 ...
> length(rivers)
[1] 141
> summary(rivers)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  135.0   310.0   425.0   591.2   680.0  3710.0 
> head(rivers); tail(rivers)
[1] 735 320 325 392 524 450
[1]  500  720  270  430  671 1770

6下载 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的 RunInfo Table 文件读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。(参考B站生信小技巧获取runinfo table) 这是一个单细胞转录组项目的数据,共768个细胞,如果你找不到RunInfo Table 文件,可以点击下载,然后读入你的R里面也可以。

7下载 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229 里面的样本信息sample.csv读入到R里面,了解这个数据框,多少列,每一列都是什么属性的元素。(参考 https://mp.weixin.qq.com/s/fbHMNXOdwiQX5BAlci8brA 获取样本信息sample.csv)如果你实在是找不到样本信息文件sample.csv,也可以点击下载

> aread.table("SraRunTable.txt",sep="\t",header=T,check.names=F)
Error in aread.table("SraRunTable.txt", sep = "\t", header = T, check.names = F) : 
  could not find function "aread.table"
> a<- read.table("SraRunTable.txt",sep="\t",header=T,check.names=F)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'SraRunTable.txt': No such file or directory
> a<- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\SraRunTable.txt",sep="\t",header=T,check.names=F)
> b<- b <- read.csv("C:\\Users\\wlx\\Desktop\\scRNA\\sample.csv",sep="\t",header=T,check.names=F)
> dim(a)
[1] 768  31
> dim(b)
[1] 768   1
> b<- b <- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\sample.csv",sep="\t",header=T,check.names=F)
> b <- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\sample.csv",sep="\t",header=T,check.names=F)
> dim(b)
[1] 768   1
> b <- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\sample.csv")
> dim(b)
[1] 768   6
> a<- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\SraRunTable.txt")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 44 elements
> a<- read.table("C:\\Users\\wlx\\Desktop\\scRNA\\SraRunTable.txt",sep="\t",header=T,check.names=F)
> dim(a)
[1] 768  31
> str(a)
'data.frame':   768 obs. of  31 variables:
 $ BioSample         : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
 $ Experiment        : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ MBases            : int  16 16 8 8 11 7 18 5 11 15 ...
 $ MBytes            : int  8 8 4 4 5 4 9 3 6 8 ...
 $ Run               : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ SRA_Sample        : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
 $ Sample_Name       : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Assay_Type        : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
 $ AssemblyName      : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
 $ AvgSpotLen        : int  43 43 43 43 43 43 43 43 43 43 ...
 $ BioProject        : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
 $ Center_Name       : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
 $ Consent           : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_filetype: Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
 $ DATASTORE_provider: Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
 $ InsertSize        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Instrument        : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibraryLayout     : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySelection  : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
 $ LibrarySource     : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
 $ LoadDate          : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
 $ Organism          : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
 $ Platform          : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
 $ ReleaseDate       : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
 $ SRA_Study         : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
 $ age               : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
 $ cell_type         : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
 $ marker_genes      : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
 $ source_name       : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
 $ strain            : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
 $ tissue            : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
> str(b)
'data.frame':   768 obs. of  6 variables:
 $ Accession.Title.Sample                              : Factor w/ 1 level "musculus\",1,GPL13112,GSE111229,\"SRA": 1 1 1 1 1 1 1 1 1 1 ...
 $ Type.Taxonomy.Channels.Platform.Series.Supplementary: Factor w/ 1 level "Run": 1 1 1 1 1 1 1 1 1 1 ...
 $ Types.Supplementary                                 : Factor w/ 768 levels "Selector\",\"https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901\",SRX3749901,\"Kristian",..: 2 3 4 5 6 7 8 9 10 1 ...
 $ Links.SRA                                           : Factor w/ 1 level "Pietras\",\"Nov": 1 1 1 1 1 1 1 1 1 1 ...
 $ Accession.Contact.Release                           : Factor w/ 1 level "23,": 1 1 1 1 1 1 1 1 1 1 ...
 $ Date                                                : Factor w/ 1 level "2018\"": 1 1 1 1 1 1 1 1 1 1 ...

把前面两个步骤的两个表(RunInfo Table 文件,样本信息sample.csv)关联起来,使用merge函数。

基于下午的统计可视化

8对前面读取的 RunInfo Table 文件在R里面探索其MBases列,包括 箱线图(boxplot)和五分位数(fivenum),还有频数图(hist),以及密度图(density) 。

把前面读取的样本信息表格的样本名字根据下划线分割看第3列元素的统计情况。第三列代表该样本所在的plate

根据plate把关联到的 RunInfo Table 信息的MBases列分组检验是否有统计学显著的差异。

分组绘制箱线图(boxplot),频数图(hist),以及密度图(density) 。

使用ggplot2把上面的图进行重新绘制。

使用ggpubr把上面的图进行重新绘制。

rm(list = ls())
options(stringsAsFactors = F)
# 或者下载:http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt
a=read.table('SraRunTable.txt',sep = '\t',header = T)
# 或者下载:http://www.bio-info-trainee.com/tmp/5years/sample.csv 
b=read.csv('sample.csv')
colnames(a)
colnames(b)
d=merge(a,b,by.x = 'Sample_Name',by.y = 'Accession')
e=d[,c("MBases","Title")]
save(e,file = 'input.Rdata')

rm(list = ls())
options(stringsAsFactors = F)
load(file = 'input.Rdata')

e[,2]
plate=unlist(lapply(e[,2],function(x){
 # x=e[1,2]
 x
 strsplit(x,'_')[[1]][3]

}))
table(plate)
boxplot(e[,1]~plate)
t.test(e[,1]~plate)
e$plate=plate
library(ggplot2)
colnames(e)
ggplot(e,aes(x=plate,y=MBases))+geom_boxplot()

library(ggpubr)
p <- ggboxplot(e, x = "plate", y = "MBases",
 color = "plate", palette = "jco",
 add = "jitter")
# Add p-value
p + stat_compare_means(method = 't.test')
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容