如何学好R语言

我的亲师弟最近也开始学习R语言了,然后师弟每天“师姐,师姐...",“我这个怎么弄...”,“我怎么又报错了...”,“师姐师姐...”...我快被他搞疯了,于是有了这篇文章。

新手在学习R语言的过程中一定会出现各种各种问题,问题多到令人抓耳挠腮。

但其实不要觉得害怕或有打退堂鼓的心里,R的使用,就是不断报错不断找问题的过程。但是出现问题,第一反应一定要是上网搜索,找答案,不要第一时间就问身边的人,错失了思考的过程。生信的学习,其实就是一个漫长的自学过程。

推荐搜索引擎:必应,必应,必应!不要再用某度啦拜托!当然如果你能想办法用Google,那当然再好不过了。

搜索能解决百分之九十以上的问题,就算解决不了,如果解决不了,可能是因为你的搜索能力还不够高。在这个搜索、尝试解决以及思考的过程,对新学者来说也是一大收获。本身搜索能力的提升就是一个巨大收获。

如果自己尝试了好久,最终实在解决不了,那。。。就再去请教有经验的前辈吧~

其实这种搜索并独立解决问题的思维,我还是在同济大学,生信大牛刘小乐教授课题组学到的。刘小乐教授课题组每年都有为期一个月的生信培训,本人有幸学习过一段时间。她们会给很多生信相关的题目给到学员,然后附上一些教学视频,培训的大部分时间,其实就是写作业,自己想方设法找到解决方案的过程。那些大牛师兄师姐们虽然一直在陪伴我们,但是并不会直接告诉我们答案,而是引导我们自己思考,自己去解决。当时真的很崩溃,因为真的啥也不会,怎么搞。一天下来有可能一个问题都答不上来。

但是现在回头想想,我真的获益良多。因为我慢慢学会了独立思考,现在遇到R相关的问题,配合上搜索功能,基本上已经完全能自己驾驭了。

这可能就是“授人以鱼不如授人以渔”的道理吧。
R语言很简单,只要你想学,就一定能学会。

以下附上同济大学刘小乐课题组在培训时针对初学者第一周的初级练习题。希望对大家有所帮助。

首先你需要先安装几个最常用的数据处理软件

install.packages(c("ggplot2", "dplyr", "tidyr", "HistData", "mvtnorm",
                   "reticulate"))   #这几个就是最常见的绘图和数据处理软件啦
library(ggplot2) # for plotting
library(dplyr) # for data manipulation
library(reticulate) # needed to run python in Rstudio
# these next two are not essential to this course
library(mvtnorm) # need this to simulate data from multivariate normal
library(HistData) # need this for data

Problem 2: Getting help

You can use the mean() function to compute the mean of a vector like
so:

x1 <- c(1:10, 50)
mean(x1)

However, this does not work if the vector contains NAs:

x1_na <- c(1:10, 50, NA)
mean(x1_na)

Please use R documentation to find the mean after excluding NA's (hint: ?mean)

#na.rm=T, remove NA value
mean(x1_na,na.rm=T)

Part II: Data Manipulation

Problem 3: Basic Selection

In this question, we will practice data manipulation using a dataset
collected by Francis Galton in 1886 on the heights of parents and their
children. This is a very famous dataset, and Galton used it to come up
with regression and correlation.

The data is available as GaltonFamilies in the HistData package.
Here, we load the data and show the first few rows. To find out more
information about the dataset, use ?GaltonFamilies.

data(GaltonFamilies)
head(GaltonFamilies)

a. Please report the height of the 10th child in the dataset.


b. What is the breakdown of male and female children in the dataset?


c. How many observations are in Galton's dataset? Please answer this
question without consulting the R help.


d. What is the mean height for the 1st child in each family?


e. Create a table showing the mean height for male and female children.


f. What was the average number of children each family had?

.

g. Convert the children's heights from inches to centimeters and store
it in a column called childHeight_cm in the GaltonFamilies dataset.
Show the first few rows of this dataset.


Problem 4: Spurious Correlation

# set seed for reproducibility
set.seed(1234)
N <- 25
ngroups <- 100000
sim_data <- data.frame(group = rep(1:ngroups, each = N),
                       X = rnorm(N * ngroups),
                       Y = rnorm(N * ngroups))

In the code above, we generate r ngroups groups of r N observations
each. In each group, we have X and Y, where X and Y are independent
normally distributed data and have 0 correlation.

a. Find the correlation between X and Y for each group, and display
the highest correlations.

Hint: since the data is quite large and your code might take a few
moments to run, you can test your code on a subset of the data first
(e.g. you can take the first 100 groups like so):

sim_data_sub <- sim_data %>% filter(group <= 100)

In general, this is good practice whenever you have a large dataset:
If you are writing new code and it takes a while to run on the whole
dataset, get it to work on a subset first. By running on a subset, you
can iterate faster.

However, please do run your final code on the whole dataset.


b. The highest correlation is around 0.8. Can you explain why we see
such a high correlation when X and Y are supposed to be independent and
thus uncorrelated?

Part III: Plotting

Problem 5

Show a plot of the data for the group that had the highest correlation
you found in Problem 4.


Problem 6

We generate some sample data below. The data is numeric, and has 3
columns: X, Y, Z.

N <- 100
Sigma <- matrix(c(1, 0.75, 0.75, 1), nrow = 2, ncol = 2) * 1.5
means <- list(c(11, 3), c(9, 5), c(7, 7), c(5, 9), c(3, 11))
dat <- lapply(means, function(mu)
  rmvnorm(N, mu, Sigma))
dat <- as.data.frame(Reduce(rbind, dat)) %>%
  mutate(Z = as.character(rep(seq_along(means), each = N)))
names(dat) <- c("X", "Y", "Z")

a. Compute the overall correlation between X and Y.


b. Make a plot showing the relationship between X and Y. Comment on
the correlation that you see.


c. Compute the correlations between X and Y for each level of Z.


d. Make a plot showing the relationship between X and Y, but this
time, color the points using the value of Z. Comment on the result,
especially any differences between this plot and the previous plot.


©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,417评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,921评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,850评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,945评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,069评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,188评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,239评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,994评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,409评论 1 304
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,735评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,898评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,578评论 4 336
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,205评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,916评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,156评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,722评论 2 363
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,781评论 2 351

推荐阅读更多精彩内容