1. Understanding the Dataset
1.1 Vector (向量)
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form the vector.
(向量是一维数组,可以包含数字、字符、逻辑语句,用 c() 组成的向量)
> a = c(1, 2, 5, 3, 6, -2, 4)
> b = c("one", "two", "three")
> c = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
1.2 Matrix (矩阵)
Matrix is a two-dimensional array where each element has the same mode (numeric, character, or logical). Matrices are created with the matrix() function.
(矩阵是一个二维数组,存储数据类型跟向量一样,通过 matrix() 实现)
x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE) # 1到20,5行4列
> x
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
> x[2,] #返回x第2行数据
[1] 5 6 7 8
> x[,2] #返回x第2列数据
[1] 2 6 10 14 18
> x[1,4] #返回x第1行,第4列数据
[1] 4
> x[2,c(2,4)] #返回第2行,第2和第6个数
[1] 6 8
> x[3:5, 2] #返回第2列,第3到第5行数
[1] 10 14 18
> y = matrix(1:20, nrow=5, ncol=4, byrow=FALSE)
> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
> rnames=c("apple","banana","orange","melon","corn")
> cnames=c("cat","dog","bird","pig")
> x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE)
> rownames(x)=rnames
> colnames(x)=cnames
> x #对x的行和列赋名
cat dog bird pig
apple 1 2 3 4
banana 5 6 7 8
orange 9 10 11 12
melon 13 14 15 16
corn 17 18 19 20
1.3 Array (多维数组)
Arrays are similar to matrices but can have more than two dimensions.They are created with an array() function.
> dim1 = c("A1", "A2")
> dim2 = c("B1", "B2", "B3")
> dim3 = c("C1", "C2", "C3", "C4")
> dim4 = c("D1", "D2", "D3")
> z = array(1:72, c(2, 3, 4, 3), dimnames=list(dim1, dim2, dim3, dim4))
> z
, , C1, D1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2, D1
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , C3, D1
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4, D1
B1 B2 B3
A1 19 21 23
A2 20 22 24
, , C1, D2
B1 B2 B3
A1 25 27 29
A2 26 28 30
, , C2, D2
B1 B2 B3
A1 31 33 35
A2 32 34 36
, , C3, D2
B1 B2 B3
A1 37 39 41
A2 38 40 42
, , C4, D2
B1 B2 B3
A1 43 45 47
A2 44 46 48
, , C1, D3
B1 B2 B3
A1 49 51 53
A2 50 52 54
, , C2, D3
B1 B2 B3
A1 55 57 59
A2 56 58 60
, , C3, D3
B1 B2 B3
A1 61 63 65
A2 62 64 66
, , C4, D3
B1 B2 B3
A1 67 69 71
A2 68 70 72
> z[1,2,3,] # A1,B2,C3
D1 D2 D3
15 39 63
1.4 Data Frame ()
A data frame is more general than a matrix in that different columns can contain different modes of data (numeric, character, etc.). It is similar to the datasets you would typically see in SAS, SPSS, and Stata. Data frames are the most common data structure you will deal with in R.
> patientID = c(1, 2, 3, 4)
> age = c(25, 34, 28, 52)
> diabetes = c("Type1", "Type2", "Type1", "Type1")
> status = c("Poor", "Improved", "Excellent", "Poor")
> patientdata = data.frame(patientID, age, diabetes, status)
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> swim = read.csv("http://www.macalester.edu/~kaplan/ISM/datasets/swim100m.csv")
> patientdata[1:2] #取前2列的内容
patientID age
1 1 25
2 2 34
3 3 28
4 4 52
> patientdata[1:3] #取前3列的内容
patientID age diabetes
1 1 25 Type1
2 2 34 Type2
3 3 28 Type1
4 4 52 Type1
> patientdata[1,1:3] #取第1行的前3列的内容
patientID age diabetes
1 1 25 Type1
> patientdata[c(1,3),1:3] #取第1行到第3行前3列的内容
patientID age diabetes
1 1 25 Type1
3 3 28 Type1
> patientdata[1:2,]
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
1.5 Attach and Detach
The attach() function adds the data frame to the R search path.
The detach() function removes the data frame from the search path.
> attach(mtcars)
> layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
> hist(wt)
> hist(mpg)
> hist(disp)
> detach(mtcars)
> mtcars
mpg cyl disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
Duster 360 14.3 8 360.0 245 3.21
Merc 240D 24.4 4 146.7 62 3.69
Merc 230 22.8 4 140.8 95 3.92
Merc 280 19.2 6 167.6 123 3.92
Merc 280C 17.8 6 167.6 123 3.92
Merc 450SE 16.4 8 275.8 180 3.07
Merc 450SL 17.3 8 275.8 180 3.07
Merc 450SLC 15.2 8 275.8 180 3.07
Cadillac Fleetwood 10.4 8 472.0 205 2.93
Lincoln Continental 10.4 8 460.0 215 3.00
Chrysler Imperial 14.7 8 440.0 230 3.23
Fiat 128 32.4 4 78.7 66 4.08
Honda Civic 30.4 4 75.7 52 4.93
Toyota Corolla 33.9 4 71.1 65 4.22
Toyota Corona 21.5 4 120.1 97 3.70
Dodge Challenger 15.5 8 318.0 150 2.76
AMC Javelin 15.2 8 304.0 150 3.15
Camaro Z28 13.3 8 350.0 245 3.73
Pontiac Firebird 19.2 8 400.0 175 3.08
Fiat X1-9 27.3 4 79.0 66 4.08
Porsche 914-2 26.0 4 120.3 91 4.43
Lotus Europa 30.4 4 95.1 113 3.77
Ford Pantera L 15.8 8 351.0 264 4.22
Ferrari Dino 19.7 6 145.0 175 3.62
Maserati Bora 15.0 8 301.0 335 3.54
Volvo 142E 21.4 4 121.0 109 4.11
wt qsec vs am gear
Mazda RX4 2.620 16.46 0 1 4
Mazda RX4 Wag 2.875 17.02 0 1 4
Datsun 710 2.320 18.61 1 1 4
Hornet 4 Drive 3.215 19.44 1 0 3
Hornet Sportabout 3.440 17.02 0 0 3
Valiant 3.460 20.22 1 0 3
Duster 360 3.570 15.84 0 0 3
Merc 240D 3.190 20.00 1 0 4
Merc 230 3.150 22.90 1 0 4
Merc 280 3.440 18.30 1 0 4
Merc 280C 3.440 18.90 1 0 4
Merc 450SE 4.070 17.40 0 0 3
Merc 450SL 3.730 17.60 0 0 3
Merc 450SLC 3.780 18.00 0 0 3
Cadillac Fleetwood 5.250 17.98 0 0 3
Lincoln Continental 5.424 17.82 0 0 3
Chrysler Imperial 5.345 17.42 0 0 3
Fiat 128 2.200 19.47 1 1 4
Honda Civic 1.615 18.52 1 1 4
Toyota Corolla 1.835 19.90 1 1 4
Toyota Corona 2.465 20.01 1 0 3
Dodge Challenger 3.520 16.87 0 0 3
AMC Javelin 3.435 17.30 0 0 3
Camaro Z28 3.840 15.41 0 0 3
Pontiac Firebird 3.845 17.05 0 0 3
Fiat X1-9 1.935 18.90 1 1 4
Porsche 914-2 2.140 16.70 0 1 5
Lotus Europa 1.513 16.90 1 1 5
Ford Pantera L 3.170 14.50 0 1 5
Ferrari Dino 2.770 15.50 0 1 5
Maserati Bora 3.570 14.60 0 1 5
Volvo 142E 2.780 18.60 1 1 4
carb
Mazda RX4 4
Mazda RX4 Wag 4
Datsun 710 1
Hornet 4 Drive 1
Hornet Sportabout 2
Valiant 1
Duster 360 4
Merc 240D 2
Merc 230 2
Merc 280 4
Merc 280C 4
Merc 450SE 3
Merc 450SL 3
Merc 450SLC 3
Cadillac Fleetwood 4
Lincoln Continental 4
Chrysler Imperial 4
Fiat 128 1
Honda Civic 2
Toyota Corolla 1
Toyota Corona 1
Dodge Challenger 2
AMC Javelin 2
Camaro Z28 4
Pontiac Firebird 2
Fiat X1-9 1
Porsche 914-2 2
Lotus Europa 2
Ford Pantera L 4
Ferrari Dino 6
Maserati Bora 8
Volvo 142E 2
> mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4
[9] 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3
[25] 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
1.6 List
Lists are the most complex of the R data types. Basically, a list is an ordered
collection of objects (components). A list allows you to gather a variety of
(possibly unrelated) objects under one name.
> mylist = list(patientdata, swim, x)
> mylist
[[1]]
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
[[2]]
year time sex
1 1905 65.80 M
2 1908 65.60 M
3 1910 62.80 M
4 1912 61.60 M
5 1918 61.40 M
6 1920 60.40 M
7 1922 58.60 M
8 1924 57.40 M
9 1934 56.80 M
10 1935 56.60 M
11 1936 56.40 M
12 1944 55.90 M
13 1947 55.80 M
14 1948 55.40 M
15 1955 54.80 M
16 1957 54.60 M
17 1961 53.60 M
18 1964 52.90 M
19 1967 52.60 M
20 1968 52.20 M
21 1970 51.90 M
22 1972 51.22 M
23 1975 50.59 M
24 1976 49.44 M
25 1981 49.36 M
26 1985 49.24 M
27 1986 48.74 M
28 1988 48.42 M
29 1994 48.21 M
30 2000 48.18 M
31 2000 47.84 M
32 1908 95.00 F
33 1910 86.60 F
34 1911 84.60 F
35 1912 78.80 F
36 1915 76.20 F
37 1920 73.60 F
38 1923 72.80 F
39 1924 72.20 F
40 1926 70.00 F
41 1929 69.40 F
42 1930 68.00 F
43 1931 66.60 F
44 1933 66.00 F
45 1934 65.40 F
46 1936 64.60 F
47 1956 62.00 F
48 1958 61.20 F
49 1960 60.20 F
50 1962 59.50 F
51 1964 58.90 F
52 1972 58.50 F
53 1973 57.54 F
54 1974 56.96 F
55 1976 55.65 F
56 1978 55.41 F
57 1980 54.79 F
58 1986 54.73 F
59 1992 54.48 F
60 1994 54.01 F
61 2000 53.77 F
62 2004 53.52 F
[[3]]
cat dog bird pig
apple 1 2 3 4
banana 5 6 7 8
orange 9 10 11 12
melon 13 14 15 16
corn 17 18 19 20
2. Graphs
2.1 Graphical parameters
You can customize many features of a graph (fonts, colors, axes, titles) through options called graphical parameters. They are specified with an par() function.
> par(mfrow=c(2,2))
> plot(rnorm(50),pch=17) # pch=17,50个随机数,平均数是0
> plot(rnorm(20),type="l",lty=5) # lty=5,20个数,line→line type
> plot(rnorm(100),cex=0.5) # cex=0.5,100个正态分布随机数,大小为原来的0.5
> plot(rnorm(200),lwd=2) # lwd=2,200个随机数,粗细为原来的2倍
2.2 Text, Axes, and Legends
title()
axis()
legend()
> axis(1)
2.3 Layout
The layout() function has the form layout(mat) where mat is a matrix object speci- fying the location of the multiple plots to combine.
> attach(mtcars)
The following objects are masked from mtcars (pos = 3):
am, carb, cyl, disp, drat, gear, hp,
mpg, qsec, vs, wt
> layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
# layout() 输入一个矩阵,2行2列,
#第1个图占两个1,第2个图占2,第3个图占3
> hist(wt)
> hist(mpg)
> hist(disp)
> detach(mtcars)
> mylist[1]
[[1]]
year time sex
1 1905 65.80 M
2 1908 65.60 M
3 1910 62.80 M
4 1912 61.60 M
5 1918 61.40 M
6 1920 60.40 M
7 1922 58.60 M
8 1924 57.40 M
9 1934 56.80 M
10 1935 56.60 M
11 1936 56.40 M
12 1944 55.90 M
13 1947 55.80 M
14 1948 55.40 M
15 1955 54.80 M
16 1957 54.60 M
17 1961 53.60 M
18 1964 52.90 M
19 1967 52.60 M
20 1968 52.20 M
21 1970 51.90 M
22 1972 51.22 M
23 1975 50.59 M
24 1976 49.44 M
25 1981 49.36 M
26 1985 49.24 M
27 1986 48.74 M
28 1988 48.42 M
29 1994 48.21 M
30 2000 48.18 M
31 2000 47.84 M
32 1908 95.00 F
33 1910 86.60 F
34 1911 84.60 F
35 1912 78.80 F
36 1915 76.20 F
37 1920 73.60 F
38 1923 72.80 F
39 1924 72.20 F
40 1926 70.00 F
41 1929 69.40 F
42 1930 68.00 F
43 1931 66.60 F
44 1933 66.00 F
45 1934 65.40 F
46 1936 64.60 F
47 1956 62.00 F
48 1958 61.20 F
49 1960 60.20 F
50 1962 59.50 F
51 1964 58.90 F
52 1972 58.50 F
53 1973 57.54 F
54 1974 56.96 F
55 1976 55.65 F
56 1978 55.41 F
57 1980 54.79 F
58 1986 54.73 F
59 1992 54.48 F
60 1994 54.01 F
61 2000 53.77 F
62 2004 53.52 F
3. Next Topic
Operators, Control Flow & User-defined Function