从GitHub下载源码,并参照原著逐段分析
因为在markdown代码块中,被“#”注释的语句颜色偏暗,故通过简单的“#”→“*”字符替换,使观看效果更好,若想复制到R中并运行,请先反向替换字符。
源码分析
*------------------------------------------------------------*
* R in Action (3rd ed): Chapter 2 *
* Creating a dataset *
*------------------------------------------------------------*
*第二章-创建数据集
* Types of vectors`向量类型`
a <- c(1, 2, 5, 3, 6, -2, 4) *numeric`数值型向量`
b <- c("one", "two", "three") * character`字符型向量`
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE) * logical`逻辑型向量`
* subsetting vectors`构造子集向量(原书中文版译为访问向量中的元素)`
a <- c("k", "j", "h", "a", "c", "m")*向量创建
a[3]*访问第3个元素
a[c(1, 3, 5)]*访问第1,3,5个向量."[]"中c(1,3,5)的输出结果是"1 3 5",但不能直接用a[1,3,5]访问,否则会报错
a[2:6]*访问第2至6个元素,等价于a[c(2,3,4,5,6)]
* Listing 2.1 Creating matrices`创建矩阵`
y <- matrix(1:20, nrow = 5, ncol = 4)*用matrix()函数创建一个5x4的矩阵.其中nrow指定行数,ncol指定列数,默认按列填充
y*输出y
cells <- c(1, 26, 24, 68)*创建向量
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells,
nrow = 2, ncol = 2, byrow = TRUE,
dimnames = list(rnames, cnames))*创建2x2矩阵,设置byrow=TRUE使按行填充,默认情况byrow=FALSE故按列填充.dimnames=list(r,c),"()"中r具体值对应列名标签,c具体值对应行名标签.
mymatrix
mymatrix <- matrix(cells,
nrow = 2, ncol = 2, byrow = FALSE,
dimnames = list(rnames, cnames))*将该矩阵设置为byrow=FAKASE使按列填充
mymatrix
* Listing 2.2 Using matrix subscripts`矩阵下标的使用`
x <- matrix(1:10, nrow = 2)*创建2xN矩阵x
x
x[2, ]*选择(访问)第2行的元素
x[, 2]*选择第2列的元素
x[1, 4]*选择第1行第4列的元素
x[1, c(4, 5)]*选择第1行第4&5个元素
* Listing 2.3 Creating an array`创建一个数组`
dim1 <- c("A1", "A2")*向量创建
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2, 3, 4), dimnames = list(dim1, dim2, dim3))*使用array()创建数组z.在"array()"中,1:24表示具有数字1~24;c(2,3,4)表示组成数组z的每个矩阵(即三维数组的二维"切面")应为2行3列,共4个矩阵;dimnames=list(1,2,3)使每个矩阵的行名为1,列名为2,每个矩阵的名字对应3中的字符
z
* Listing 2.4 Creating a data frame`创建一个数据框`
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)*用data.frame()创建数据框patientdata,使"()"中的4个向量组成该数据框的4个列.列向量之间可以是不同类型.
patientdata
* Listing 2.5 Specifying elements of a data frame`选取数据框中的元素`
patientdata[1:2]*选取第1&2列
patientdata$age*选取数据框patientdata中的age变量
patientdata["age"]*选取数据框中的age列*该行是源码分析者自主新增,原书无
patientdata[c("diabetes", "status")]*选取数据框中的diabetes和status列
* Listing 2.6 Using factors`因子的使用`
*本节已被增改,下面是被if(FALSE){...}语句"沉默"的源代码块,R默认不支持多行注释,因此无法使用"***"等方式进行多行注释
if(FALSE){
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
diabetes <- factor(diabetes)
status <- factor(status, order = TRUE)
patientdata <- data.frame(patientID, age, diabetes, status)
str(patientdata)
summary(patientdata)
}
*下面使源码加print()进行逐行输出,便于查看
print(patientID <- c(1,2,3,4))*以向量形式输入数据
print(age <- c(25,34,28,52))
print(diabetes <- c("Type1","Type2","Type1","Type1"))
print(status <- c("Poor","Improved","Excellent","Poor"))
print(diabetes <- factor(diabetes))*用factor()函数,将diabetes指定为普通因子
print(status <- factor(status))*先将status指定为普通因子*自主新增行,原书无
print(status <- factor(status,order=TRUE))*设置order=TRUE,最后将status指定为有序因子
print(patientdata <- data.frame(patientID,age,diabetes,status))*将()中的数据合并为数据框
print(str(patientdata))*显示对象的结构
print(summary(patientdata))*显示对象的统计概要
* Listing 2.7 Creating a list`创建一个列表`
g <- "My First List"
h <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title = g, ages = h, j, k)*用list()函数创建列表mylist,用g作为第1个成分并命名为title,依次放入第2~4个成分
mylist
mylist[[2]]*选取第2个成分
mylist[["ages"]]*选取被命名为ages的成分,即第2个成分
输出结果
> *------------------------------------------------------------*
> * R in Action (3rd ed): Chapter 2 *
> * Creating a datas .... [TRUNCATED]
> b <- c("one", "two", "three") * character`字符型向量`
> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE) * logical`逻辑型向量`
> * subsetting vectors`构造子集向量(原书中文版译为访问向量中的元素)`
> a <- c("k", "j", "h", "a", "c", "m")*向量创建
> a[3]*访问第3个元素
[1] "h"
> a[c(1, 3, 5)]*访问第1,3,5个向量."[]"中c(1,3,5)的输出结果是"1 3 5",但不能直接用a[1,3,5]访问,否则会报错
[1] "k" "h" "c"
> a[2:6]*访问第2至6个元素,等价于a[c(2,3,4,5,6)]
[1] "j" "h" "a" "c" "m"
> * Listing 2.1 Creating matrices`创建矩阵`
> y <- matrix(1:20, nrow = 5, ncol = 4)*用matrix()函数创建一个5x4的矩阵.其中nrow指定行数,ncol指定列数,默认按列填充
> y*输出y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
> cells <- c(1, 26, 24, 68)*创建向量
> rnames <- c("R1", "R2")
> cnames <- c("C1", "C2")
> mymatrix <- matrix(cells,
+ nrow = 2, ncol = 2, byrow = TRUE,
+ dimnames = list(rnames, cnames))*创建2x2矩阵,设置byrow=TRUE使按行填充,默认情况byrow=FALSE故按列填充. .... [TRUNCATED]
> mymatrix
C1 C2
R1 1 26
R2 24 68
> mymatrix <- matrix(cells,
+ nrow = 2, ncol = 2, byrow = FALSE,
+ dimnames = list(rnames, cnames))*将该矩阵设置为byrow=FAKASE使按列填充
> mymatrix
C1 C2
R1 1 24
R2 26 68
> * Listing 2.2 Using matrix subscripts`矩阵下标的使用`
> x <- matrix(1:10, nrow = 2)*创建2xN矩阵x
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> x[2, ]*选择(访问)第2行的元素
[1] 2 4 6 8 10
> x[, 2]*选择第2列的元素
[1] 3 4
> x[1, 4]*选择第1行第4列的元素
[1] 7
> x[1, c(4, 5)]*选择第1行第4&5个元素
[1] 7 9
> * Listing 2.3 Creating an array`创建一个数组`
> dim1 <- c("A1", "A2")*向量创建
> dim2 <- c("B1", "B2", "B3")
> dim3 <- c("C1", "C2", "C3", "C4")
> z <- array(1:24, c(2, 3, 4), dimnames = list(dim1, dim2, dim3))*使用array()创建数组z.在"array()"中,1:24表示具有数字1~24;c(2,3,4)表示组成数组z的每个矩阵(即三维数组的二维"切面")应为2行3列,共 .... [TRUNCATED]
> z
, , C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , C3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4
B1 B2 B3
A1 19 21 23
A2 20 22 24
> * Listing 2.4 Creating a data frame`创建一个数据框`
> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)*用data.frame()创建数据框patientdata,使"()"中的4个向量组成该数据框的4个列.列向量之间可以是不同类型.
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> * Listing 2.5 Specifying elements of a data frame`选取数据框中的元素`
> patientdata[1:2]*选取第1&2列
patientID age
1 1 25
2 2 34
3 3 28
4 4 52
> patientdata$age*选取数据框patientdata中的age变量
[1] 25 34 28 52
> patientdata["age"]*选取数据框中的age列*该行是源码分析者自主新增,原书无
age
1 25
2 34
3 28
4 52
> patientdata[c("diabetes", "status")]*选取数据框中的diabetes和status列
diabetes status
1 Type1 Poor
2 Type2 Improved
3 Type1 Excellent
4 Type1 Poor
> * Listing 2.6 Using factors`因子的使用`
> ---*本节由源码分析者增改,下面是被井号单行注释的源码,R默认不支持多行注释
+ *patientID <- c(1, 2, 3, 4)
+ *age <- c(25, 34, 28, 52)
+ *diabetes < .... [TRUNCATED]
[1] 1 2 3 4
[1] 1 2 3 4
> print(age <- c(25,34,28,52))
[1] 25 34 28 52
> print(diabetes <- c("Type1","Type2","Type1","Type1"))
[1] "Type1" "Type2" "Type1" "Type1"
> print(status <- c("Poor","Improved","Excellent","Poor"))
[1] "Poor" "Improved" "Excellent" "Poor"
> print(diabetes <- factor(diabetes))*用factor()函数,将diabetes指定为普通因子
[1] Type1 Type2 Type1 Type1
Levels: Type1 Type2
> print(status <- factor(status))*先将status指定为普通因子*自主新增行,原书无
[1] Poor Improved Excellent Poor
Levels: Excellent Improved Poor
> print(status <- factor(status,order=TRUE))*设置order=TRUE,最后将status指定为有序因子
[1] Poor Improved Excellent Poor
Levels: Excellent < Improved < Poor
> print(patientdata <- data.frame(patientID,age,diabetes,status))*将()中的数据合并为数据框
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> print(str(patientdata))*显示对象的结构
'data.frame': 4 obs. of 4 variables:
$ patientID: num 1 2 3 4
$ age : num 25 34 28 52
$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
$ status : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 1 3
NULL
> print(summary(patientdata))*显示对象的统计概要
patientID age diabetes status
Min. :1.00 Min. :25.00 Type1:3 Excellent:1
1st Qu.:1.75 1st Qu.:27.25 Type2:1 Improved :1
Median :2.50 Median :31.00 Poor :2
Mean :2.50 Mean :34.75
3rd Qu.:3.25 3rd Qu.:38.50
Max. :4.00 Max. :52.00
> * Listing 2.7 Creating a list`创建一个列表`
> g <- "My First List"
> h <- c(25, 26, 18, 39)
> j <- matrix(1:10, nrow = 5)
> k <- c("one", "two", "three")
> mylist <- list(title = g, ages = h, j, k)*用list()函数创建列表mylist,用g作为第1个成分并命名为title,依次放入第2~4个成分
> mylist
$title
[1] "My First List"
$ages
[1] 25 26 18 39
[[3]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
[[4]]
[1] "one" "two" "three"
> mylist[[2]]*选取第2个成分
[1] 25 26 18 39
> mylist[["ages"]]*选取被命名为ages的成分,即第2个成分
[1] 25 26 18 39