Lecture 02

1. Understanding the Dataset

1.1 Vector (向量)

Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form the vector.
(向量是一维数组,可以包含数字、字符、逻辑语句,用 c() 组成的向量)

> a = c(1, 2, 5, 3, 6, -2, 4)
> b = c("one", "two", "three")
> c = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
1.2 Matrix (矩阵)

Matrix is a two-dimensional array where each element has the same mode (numeric, character, or logical). Matrices are created with the matrix() function.
(矩阵是一个二维数组,存储数据类型跟向量一样,通过 matrix() 实现)

x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE)    # 1到20,5行4列
> x
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

> x[2,]      #返回x第2行数据
[1] 5 6 7 8

> x[,2]      #返回x第2列数据
[1]  2  6 10 14 18

> x[1,4]      #返回x第1行,第4列数据
[1] 4

> x[2,c(2,4)]      #返回第2行,第2和第6个数
[1] 6 8

> x[3:5, 2]      #返回第2列,第3到第5行数
[1] 10 14 18


> y = matrix(1:20, nrow=5, ncol=4, byrow=FALSE)
> y
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20


> rnames=c("apple","banana","orange","melon","corn")
> cnames=c("cat","dog","bird","pig")
> x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE)
> rownames(x)=rnames
> colnames(x)=cnames
> x      #对x的行和列赋名
       cat dog bird pig
apple    1   2    3   4
banana   5   6    7   8
orange   9  10   11  12
melon   13  14   15  16
corn    17  18   19  20
1.3 Array (多维数组)

Arrays are similar to matrices but can have more than two dimensions.They are created with an array() function.

> dim1 = c("A1", "A2")
> dim2 = c("B1", "B2", "B3")
> dim3 = c("C1", "C2", "C3", "C4")
> dim4 = c("D1", "D2", "D3")
> z = array(1:72, c(2, 3, 4, 3), dimnames=list(dim1, dim2, dim3, dim4))
> z
, , C1, D1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2, D1

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3, D1

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4, D1

   B1 B2 B3
A1 19 21 23
A2 20 22 24

, , C1, D2

   B1 B2 B3
A1 25 27 29
A2 26 28 30

, , C2, D2

   B1 B2 B3
A1 31 33 35
A2 32 34 36

, , C3, D2

   B1 B2 B3
A1 37 39 41
A2 38 40 42

, , C4, D2

   B1 B2 B3
A1 43 45 47
A2 44 46 48

, , C1, D3

   B1 B2 B3
A1 49 51 53
A2 50 52 54

, , C2, D3

   B1 B2 B3
A1 55 57 59
A2 56 58 60

, , C3, D3

   B1 B2 B3
A1 61 63 65
A2 62 64 66

, , C4, D3

   B1 B2 B3
A1 67 69 71
A2 68 70 72

> z[1,2,3,]      # A1,B2,C3
D1 D2 D3 
15 39 63 
1.4 Data Frame ()

A data frame is more general than a matrix in that different columns can contain different modes of data (numeric, character, etc.). It is similar to the datasets you would typically see in SAS, SPSS, and Stata. Data frames are the most common data structure you will deal with in R.

> patientID = c(1, 2, 3, 4)
> age = c(25, 34, 28, 52)
> diabetes = c("Type1", "Type2", "Type1", "Type1")
> status = c("Poor", "Improved", "Excellent", "Poor")
> patientdata = data.frame(patientID, age, diabetes, status)
> patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor

> swim = read.csv("http://www.macalester.edu/~kaplan/ISM/datasets/swim100m.csv")
> patientdata[1:2]      #取前2列的内容
  patientID age
1         1  25
2         2  34
3         3  28
4         4  52

> patientdata[1:3]      #取前3列的内容
  patientID age diabetes
1         1  25    Type1
2         2  34    Type2
3         3  28    Type1
4         4  52    Type1

> patientdata[1,1:3]      #取第1行的前3列的内容
  patientID age diabetes
1         1  25    Type1

> patientdata[c(1,3),1:3]      #取第1行到第3行前3列的内容
  patientID age diabetes
1         1  25    Type1
3         3  28    Type1

> patientdata[1:2,]
  patientID age diabetes   status
1         1  25    Type1     Poor
2         2  34    Type2 Improved
1.5 Attach and Detach

The attach() function adds the data frame to the R search path.
The detach() function removes the data frame from the search path.

> attach(mtcars)
> layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
> hist(wt)
> hist(mpg)
> hist(disp)
> detach(mtcars)

> mtcars
                     mpg cyl  disp  hp drat
Mazda RX4           21.0   6 160.0 110 3.90
Mazda RX4 Wag       21.0   6 160.0 110 3.90
Datsun 710          22.8   4 108.0  93 3.85
Hornet 4 Drive      21.4   6 258.0 110 3.08
Hornet Sportabout   18.7   8 360.0 175 3.15
Valiant             18.1   6 225.0 105 2.76
Duster 360          14.3   8 360.0 245 3.21
Merc 240D           24.4   4 146.7  62 3.69
Merc 230            22.8   4 140.8  95 3.92
Merc 280            19.2   6 167.6 123 3.92
Merc 280C           17.8   6 167.6 123 3.92
Merc 450SE          16.4   8 275.8 180 3.07
Merc 450SL          17.3   8 275.8 180 3.07
Merc 450SLC         15.2   8 275.8 180 3.07
Cadillac Fleetwood  10.4   8 472.0 205 2.93
Lincoln Continental 10.4   8 460.0 215 3.00
Chrysler Imperial   14.7   8 440.0 230 3.23
Fiat 128            32.4   4  78.7  66 4.08
Honda Civic         30.4   4  75.7  52 4.93
Toyota Corolla      33.9   4  71.1  65 4.22
Toyota Corona       21.5   4 120.1  97 3.70
Dodge Challenger    15.5   8 318.0 150 2.76
AMC Javelin         15.2   8 304.0 150 3.15
Camaro Z28          13.3   8 350.0 245 3.73
Pontiac Firebird    19.2   8 400.0 175 3.08
Fiat X1-9           27.3   4  79.0  66 4.08
Porsche 914-2       26.0   4 120.3  91 4.43
Lotus Europa        30.4   4  95.1 113 3.77
Ford Pantera L      15.8   8 351.0 264 4.22
Ferrari Dino        19.7   6 145.0 175 3.62
Maserati Bora       15.0   8 301.0 335 3.54
Volvo 142E          21.4   4 121.0 109 4.11
                       wt  qsec vs am gear
Mazda RX4           2.620 16.46  0  1    4
Mazda RX4 Wag       2.875 17.02  0  1    4
Datsun 710          2.320 18.61  1  1    4
Hornet 4 Drive      3.215 19.44  1  0    3
Hornet Sportabout   3.440 17.02  0  0    3
Valiant             3.460 20.22  1  0    3
Duster 360          3.570 15.84  0  0    3
Merc 240D           3.190 20.00  1  0    4
Merc 230            3.150 22.90  1  0    4
Merc 280            3.440 18.30  1  0    4
Merc 280C           3.440 18.90  1  0    4
Merc 450SE          4.070 17.40  0  0    3
Merc 450SL          3.730 17.60  0  0    3
Merc 450SLC         3.780 18.00  0  0    3
Cadillac Fleetwood  5.250 17.98  0  0    3
Lincoln Continental 5.424 17.82  0  0    3
Chrysler Imperial   5.345 17.42  0  0    3
Fiat 128            2.200 19.47  1  1    4
Honda Civic         1.615 18.52  1  1    4
Toyota Corolla      1.835 19.90  1  1    4
Toyota Corona       2.465 20.01  1  0    3
Dodge Challenger    3.520 16.87  0  0    3
AMC Javelin         3.435 17.30  0  0    3
Camaro Z28          3.840 15.41  0  0    3
Pontiac Firebird    3.845 17.05  0  0    3
Fiat X1-9           1.935 18.90  1  1    4
Porsche 914-2       2.140 16.70  0  1    5
Lotus Europa        1.513 16.90  1  1    5
Ford Pantera L      3.170 14.50  0  1    5
Ferrari Dino        2.770 15.50  0  1    5
Maserati Bora       3.570 14.60  0  1    5
Volvo 142E          2.780 18.60  1  1    4
                    carb
Mazda RX4              4
Mazda RX4 Wag          4
Datsun 710             1
Hornet 4 Drive         1
Hornet Sportabout      2
Valiant                1
Duster 360             4
Merc 240D              2
Merc 230               2
Merc 280               4
Merc 280C              4
Merc 450SE             3
Merc 450SL             3
Merc 450SLC            3
Cadillac Fleetwood     4
Lincoln Continental    4
Chrysler Imperial      4
Fiat 128               1
Honda Civic            2
Toyota Corolla         1
Toyota Corona          1
Dodge Challenger       2
AMC Javelin            2
Camaro Z28             4
Pontiac Firebird       2
Fiat X1-9              1
Porsche 914-2          2
Lotus Europa           2
Ford Pantera L         4
Ferrari Dino           6
Maserati Bora          8
Volvo 142E             2
> mpg
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4
 [9] 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3
[25] 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
1.6 List

Lists are the most complex of the R data types. Basically, a list is an ordered
collection of objects (components). A list allows you to gather a variety of
(possibly unrelated) objects under one name.

> mylist = list(patientdata, swim, x)
> mylist
[[1]]
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
[[2]]
year time sex
1 1905 65.80 M
2 1908 65.60 M
3 1910 62.80 M
4 1912 61.60 M
5 1918 61.40 M
6 1920 60.40 M
7 1922 58.60 M
8 1924 57.40 M
9 1934 56.80 M
10 1935 56.60 M
11 1936 56.40 M
12 1944 55.90 M
13 1947 55.80 M
14 1948 55.40 M
15 1955 54.80 M
16 1957 54.60 M
17 1961 53.60 M
18 1964 52.90 M
19 1967 52.60 M
20 1968 52.20 M
21 1970 51.90 M
22 1972 51.22 M
23 1975 50.59 M
24 1976 49.44 M
25 1981 49.36 M
26 1985 49.24 M
27 1986 48.74 M
28 1988 48.42 M
29 1994 48.21 M
30 2000 48.18 M
31 2000 47.84 M
32 1908 95.00 F
33 1910 86.60 F
34 1911 84.60 F
35 1912 78.80 F
36 1915 76.20 F
37 1920 73.60 F
38 1923 72.80 F
39 1924 72.20 F
40 1926 70.00 F
41 1929 69.40 F
42 1930 68.00 F
43 1931 66.60 F
44 1933 66.00 F
45 1934 65.40 F
46 1936 64.60 F
47 1956 62.00 F
48 1958 61.20 F
49 1960 60.20 F
50 1962 59.50 F
51 1964 58.90 F
52 1972 58.50 F
53 1973 57.54 F
54 1974 56.96 F
55 1976 55.65 F
56 1978 55.41 F
57 1980 54.79 F
58 1986 54.73 F
59 1992 54.48 F
60 1994 54.01 F
61 2000 53.77 F
62 2004 53.52 F
[[3]]
cat dog bird pig
apple 1 2 3 4
banana 5 6 7 8
orange 9 10 11 12
melon 13 14 15 16
corn 17 18 19 20

2. Graphs

2.1 Graphical parameters

You can customize many features of a graph (fonts, colors, axes, titles) through options called graphical parameters. They are specified with an par() function.

> par(mfrow=c(2,2))     
> plot(rnorm(50),pch=17)    # pch=17,50个随机数,平均数是0
> plot(rnorm(20),type="l",lty=5)    # lty=5,20个数,line→line type
> plot(rnorm(100),cex=0.5)    # cex=0.5,100个正态分布随机数,大小为原来的0.5
> plot(rnorm(200),lwd=2)    # lwd=2,200个随机数,粗细为原来的2倍
2.2 Text, Axes, and Legends
title()
axis()
legend()
> axis(1)
2.3 Layout

The layout() function has the form layout(mat) where mat is a matrix object speci- fying the location of the multiple plots to combine.

> attach(mtcars)
The following objects are masked from mtcars (pos = 3):

    am, carb, cyl, disp, drat, gear, hp,
    mpg, qsec, vs, wt

> layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))    
# layout() 输入一个矩阵,2行2列,
#第1个图占两个1,第2个图占2,第3个图占3
> hist(wt)
> hist(mpg)
> hist(disp)
> detach(mtcars)
> mylist[1]
[[1]]
   year  time sex
1  1905 65.80   M
2  1908 65.60   M
3  1910 62.80   M
4  1912 61.60   M
5  1918 61.40   M
6  1920 60.40   M
7  1922 58.60   M
8  1924 57.40   M
9  1934 56.80   M
10 1935 56.60   M
11 1936 56.40   M
12 1944 55.90   M
13 1947 55.80   M
14 1948 55.40   M
15 1955 54.80   M
16 1957 54.60   M
17 1961 53.60   M
18 1964 52.90   M
19 1967 52.60   M
20 1968 52.20   M
21 1970 51.90   M
22 1972 51.22   M
23 1975 50.59   M
24 1976 49.44   M
25 1981 49.36   M
26 1985 49.24   M
27 1986 48.74   M
28 1988 48.42   M
29 1994 48.21   M
30 2000 48.18   M
31 2000 47.84   M
32 1908 95.00   F
33 1910 86.60   F
34 1911 84.60   F
35 1912 78.80   F
36 1915 76.20   F
37 1920 73.60   F
38 1923 72.80   F
39 1924 72.20   F
40 1926 70.00   F
41 1929 69.40   F
42 1930 68.00   F
43 1931 66.60   F
44 1933 66.00   F
45 1934 65.40   F
46 1936 64.60   F
47 1956 62.00   F
48 1958 61.20   F
49 1960 60.20   F
50 1962 59.50   F
51 1964 58.90   F
52 1972 58.50   F
53 1973 57.54   F
54 1974 56.96   F
55 1976 55.65   F
56 1978 55.41   F
57 1980 54.79   F
58 1986 54.73   F
59 1992 54.48   F
60 1994 54.01   F
61 2000 53.77   F
62 2004 53.52   F

3. Next Topic

Operators, Control Flow & User-defined Function

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。