记录《Bioinformatics Data Skills》中关于R的实用操作

##################2019年1月18日14:34:07##########################

example("pheatmap") #获取函数的示例
help.search("heatmap") #根据关键词搜索相关的函数
library(help="pheatmap") #查看包的详细信息
ls() #We can see objects we’ve created in the global environment 
length() #return the length of vector

Alt - on Windows 快捷生成 “<-”

特点

  • R does not have a type for a single value (known as a scalar) such as 3.1 or “AGCTACGACT.” Rather, these values are stored in a vector of length 1.
    (R没有类型的变量用来存储一个值,例如字符串xx,相对应,这些值被存储在长度为1的向量中)
  • R’s vectors are the basis of one of R’s most important features: vectorization. Vectorization allows us to loop over vectors elementwise, without the need to write an explicit loop.
    (向量的一个重要特点是能够对元素进行迭代而不需要明确的循环)

################2019年1月22日09:48:01#######################

  • When we assign a value in our R session, we’re assigning it to an environment known
    as the global environment.
  • Calling the function search() returns where R looks when searching for the value of a variable—which includes the global environment (.GlobalEnv) and attached packages.
    (当使用search()查找变量的值时,会返回R在全局变量(.GlobalEnv)以及相应的包中查找的结果。
  • if one vector is longer than the other, R will recycle the values in the
    shorter vector. This is an intentional behavior, so R won’t warn you when this hap‐
    pens
> x <- c(1,2,3)
> x + 1
[1] 2 3 4
> y <- c(1,2)
> x + y #当两个元素的向量不是乘积倍的时候
[1] 2 4 4
Warning message:
In x + y : longer object length is not a multiple of shorter object length
  • R will return a missing value (NA; more on this later) if you try to access an ele‐
    ment in a position that’s greater than the number of elements.
> z[c(2, 1, 10)]
[1] 2.2 3.4 NA

It’s also possible to exclude certain elements from lists using negative indexes
(使用负号来跳过数据)

> order(z)
[1] 4 3 5 2 1
> z[order(z)]
> order(z, decreasing=TRUE)
[1] 1 2 5 3 4
> z[order(z, decreasing=TRUE)] #order返回排序后的索引
[1] 3.4 2.2 1.2 0.4 -0.4
> sort(b,decreasing = T) #返回排序后的值
  b  a1  a3  a2   c 
5.4 3.4 2.0 1.0 0.4

Again, often we use functions to generate indexing vectors for us. For example, one
way to resample a vector (with replacement) is to randomly sample its indexes using
the sample() function:
[1] https://www.jianshu.com/p/38d0a44630f8
[2] https://bbs.pinggu.org/thread-3068145-1-1.html

> set.seed(0) # we set the random number seed so this example is reproducible
> i <- sample(length(z), replace=TRUE) #replace是否放回取样
> i
[1] 5 2 2 3 5
> z[i]
[1] 1.2 2.2 2.2 0.4 1.2

NA is R’s built-in value to represent missing data.
NULL represents not having a value
-Inf, Inf These are just as they sound, negative infinite and positive infinite values.
NaN stands for “not a number,” which can occur in some computations that don’t
return numbers, i.e., 0/0 or Inf + -Inf.

> is.nan(0/0)
[1] TRUE
> x <- c()
> is.null(x)
[1] TRUE
> y <- c(1,2,3)
> is.na(y[4])
[1] TRUE

Because all elements in a vector must have homogeneous data type, R will silently coerce elements so that they have the same type.
(当构建向量时,R会自动进行数据类的强转。)

  • When called on numeric values, summary() returns a numeric summary with the
    quartiles and the mean.
  • Likewise, R’s data-reading functions can also read gzipped files directly—there’s
    no need to uncompress gzipped files first.
  • reshape2 package provides functions to reshape data: the function melt()
    turns wide data into long data, and cast() turns long data into wide data.
  • One nice feature of data.frame() is that if you provide vectors as named arguments, data.frame() will use these names as column names.
    ################2019年1月23日09:29:13#######################
    Omitting the row index retrieves all rows, and omitting the column index retrieves all columns.
    (省略列索引将检索所有的行,省略行索引将检索所有的列。)
> y <- cbind(x1 = 3, x2 = c(4:1))
> y
     x1 x2
[1,]  3  4
[2,]  3  3
[3,]  3  2
[4,]  3  1
> y['x1']
[1] NA
> y[1,'x1']
x1 
 3 
> y[,'x1'] 
[1] 3 3 3 3
  • It’s a good idea to avoid referring to specific dataframe rows in your
    analysis code.
  • From summary(), we see that this varies quite considerably across all windows on chromosome 20:
> summary(d$total.SNPs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 3.000 7.000 8.906 12.000 93.000
  • Remember, columns of a dataframe are just vectors. If you only need the data from
    one column, just subset it as you would a vector:
  • Note that there’s no need to use a comma in the bracket because d$percent is a vector, not a two-dimensional dataframe
> d$percent.GC[d$Pi > 16]
[1] 39.1391 38.0380 36.8368 36.7367 43.0430 41.1411 [...]

Thus, d[$Pi > 3, ] is identical to d[which(d$Pi > 3), ];

> d$Pi > 3
[1] FALSE TRUE FALSE TRUE TRUE TRUE [...]
> which(d$Pi > 3)
[1] 2 4 5 6 7 10 [...]

subset() takes two arguments: the dataframe to operate on, and then conditions to include a
row. With subset(), d[dPi > 16 & dpercent.GC > 80, ] can be expressed as:

$ subset(d, Pi > 16 & percent.GC > 80)
start end total.SNPs total.Bases depth [...]
58550 63097001 63098000 5 947 2.39 [...]
  • Note that we (somewhat magically) don’t need to quote column names. This is
    because subset() follows special evaluation rules, and for this reason, subset() is
    best used only for interactive work.
> subset(d, Pi > 16 & percent.GC > 80,
c(start, end, Pi, percent.GC, depth))
start end Pi percent.GC depth
58550 63097001 63098000 41.172 82.0821 2.39
58641 63188001 63189000 16.436 82.3824 3.21
58642 63189001 63190000 41.099 80.5806 1.89

#####################ggplot2##################

  • ggplot2 works exclusively with dataframes, so you’ll need to get your data tidy and into a dataframe before visualizing it with ggplot2.
  • Each layer updates our plot by adding geometric objects such as the points in a scatterplot, or the lines in a line plot.
    Geom = Geometric =几何学
    aes =aesthetic = 美学的
  • We specify the mapping of aesthetic attributes to columns in our dataframe using the function aes().
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,294评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,493评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,790评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,595评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,718评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,906评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,053评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,797评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,250评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,570评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,711评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,388评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,018评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,796评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,023评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,461评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,595评论 2 350

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,312评论 0 10
  • 老婆用她半个月的工资给我买了一个手机作为生日礼物,我开心极了,每天拿在手上视如珍宝。 有一天加班到晚上9点多才回到...
    秋天播放阅读 250评论 2 0
  • 每个人的心里,都有一方魂牵梦萦的土地。得意时想到它,失意时想到它。逢年过节,触景生情,那就是让我魂牵梦绕生我...
    梧桐树开花阅读 429评论 0 0
  • 我们的相逢 恰似那稍作安歇的过客 匆匆的 却留下清晰的足迹 微弱的灯光 指引我拾起记忆的碎片 回忆溅起的涟漪 一直...
    你好佳豫有约阅读 192评论 1 1
  • 二十一年来的成长教育学校都在传达着一种积极向上的社会主义价值观念,譬如:人心是善良的,法律是公正的!但这些亘古不变...
    福音阅读 612评论 0 1