R语言基本运算

在这一章里面，我们将介绍一些简单的R软件运算，包括基本数字运算、向量运算与统计运算，让读者们对R软件的基本计算功能先有一个初步的印象。

简单的数字与字符串运算

R软件的简单运算是通过程序语言通用的运算符符号来完成的。

1+1

## [1] 2

1*3.4

## [1] 3.4

1/2

## [1] 0.5

1%/%2

## [1] 0

余数（modulus)：

5 %% 2

## [1] 1

三角函数运算：

cos(1.0)

## [1] 0.5403023

幂次计算:

2 ^ 0.5

## [1] 1.414214

sqrt(2)

## [1] 1.414214

科学计数符号：

x = 1.2e-5
x * 10000

## [1] 0.12

逻辑运算会产生逻辑向量：

x = c(1,2,3,4,5)
x>3

## [1] FALSE FALSE FALSE  TRUE  TRUE

解方程

使用uniroot()函数可以解一元n次方程，还有二元一次方程

一元一次方程

f1<-function(x,a,b) a*x+b
a<-5;b<-12
result<-uniroot(f1,c(-10,10),a=a,b=b,tol = 0.0001)
result$root

## [1] -2.4

一元二次方程

f2<-function(x,a,b,c) a*x^2+b*x+c
a<-1;b<-5;c<-6
result<-uniroot(f2,c(-4,-3),a=a,b=b,c=c,tol = 0.0001)
result$root

## [1] -3

二元一次方程组（矩阵求解）

lf<-matrix(c(3,5,1,2),nrow = 2,byrow = TRUE)
rf<-matrix(c(4,1),nrow = 2)
result<-solve(lf,rf)
result

##      [,1]
## [1,]    3
## [2,]   -1

有序数列：规则性的数字集合

在R软件中，如果想要构建规则性的数字或向量，可以使用以下函数：

sequence（有序数列）运算符
seq（起始值，结束值，by：递增值）：sequence函数，例如，seq（5）会产生（1，2，3，4， 5）。若加上length：k ,则会产生k个等距数据
rep ()函数

1:9

## [1] 1 2 3 4 5 6 7 8 9

x = 1:9
x

## [1] 1 2 3 4 5 6 7 8 9

1.5:10

## [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5

c(1.5:10,11)

##  [1]  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 11.0

prod(1:8)

## [1] 40320

seq(1,5)

## [1] 1 2 3 4 5

seq(1,5,by=0.5)

## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

seq(1,5,length = 7)

## [1] 1.000000 1.666667 2.333333 3.000000 3.666667 4.333333 5.000000

rep(10,5)

## [1] 10 10 10 10 10

rep(c("A","B","C","D"),2)

## [1] "A" "B" "C" "D" "A" "B" "C" "D"

rep(1:4,times = 3,each =2)

##  [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

rep(1:4,each =2,length = 12)

##  [1] 1 1 2 2 3 3 4 4 1 1 2 2

matrix(rep(0,16),nrow = 4)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

matrix(rep(0,16),nrow = 4)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

matrix(0,nrow = 4,ncol = 4)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

基本向量运算

以下函数可用于vector变量：

length：向量中的元素个数
sum：向量中所有元素求和
prod：向量中所有元素求积
cumsum、cumprod：累积相加和累积相乘
sort：向量中的元素排序
rank：显示元素排序后的“排序顺位”，输出为向量

x=c(1,2.0,3);x

## [1] 1 2 3

(x=c(1.0,2.3,3))

## [1] 1.0 2.3 3.0

x = c(1,2,3)
x + 1

## [1] 2 3 4

x - 1.2

## [1] -0.2  0.8  1.8

x * 2

## [1] 2 4 6

x * x

## [1] 1 4 9

y = c(4,5,6,7)
x * y

## Warning in x * y: longer object length is not a multiple of shorter object
## length

## [1]  4 10 18  7

x = c(1,2,3,4)
y = c(5,6,7,8)
y / x

## [1] 5.000000 3.000000 2.333333 2.000000

y - x

## [1] 4 4 4 4

x ^ y

## [1]     1    64  2187 65536

cos(x*pi)+cos(y*pi)

## [1] -2  2 -2  2

s = c(1,2,3,4,5,6)
length(s)

## [1] 6

sum(s)

## [1] 21

prod(s)

## [1] 720

cumsum(s)

## [1]  1  3  6 10 15 21

x = c(1,2,3,4)
y = c(5,6,7)
z = c(x,y)
z

## [1] 1 2 3 4 5 6 7

向量的指标用法

一个向量x的第i个元素可以用x[i]表示。

x = c(11,12,13)
x[2]

## [1] 12

x[4]

## [1] NA

x[c(1,3)]

## [1] 11 13

x[1:3]

## [1] 11 12 13

y = x[1:2]
y

## [1] 11 12

基本统计计算

mean：期望（平均值）
var：样本方差
sd：样本标准差

x = c(11,12,13)
mean(x)

## [1] 12

max(x)

## [1] 13

min(x)

## [1] 11

var(x)

## [1] 1

sd(x)

## [1] 1

sum(x)

## [1] 36

也可以不使用sd函数，而是用自定义函数计算标准差：

my.sd <- function(y)
{
  n=length(y)
  s=sqrt((sum(y^2)-n*mean(y)^2)/(n-1))
  return(s)
}
my.sd(x)

## [1] 1

模拟100个人的身高体重数据(正态分布)

weight =rnorm(100,55,5)
height = rnorm(100,165,5)
plot(weight,height)

summary(lm(height~weight))

## 
## Call:
## lm(formula = height ~ weight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.4817  -3.9205   0.1501   3.8235  10.6129 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 164.84728    6.58191  25.046   <2e-16 ***
## weight       -0.01313    0.11738  -0.112    0.911    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.18 on 98 degrees of freedom
## Multiple R-squared:  0.0001277,  Adjusted R-squared:  -0.01008 
## F-statistic: 0.01252 on 1 and 98 DF,  p-value: 0.9111

数据对象

x <- seq(0,1,by = 0.2)
y <- seq(0,1,by = 0.2)
y[4]

## [1] 0.6

x[3]

## [1] 0.4

1 - x[3]

## [1] 0.6

y[4] > 1 - x[3]

## [1] TRUE

向量

向量赋值

x <- c(1,3,5,7,9)
x

## [1] 1 3 5 7 9

v <- paste("x",1:5,sep="")
v

## [1] "x1" "x2" "x3" "x4" "x5"

向量运算

x <- c(1,3,5,7,9)
y <- c(2,4,6,8,10)
x * y

## [1]  2 12 30 56 90

x %*% y

##      [,1]
## [1,]  190

生成有规则序列

(t <- 1:10)

##  [1]  1  2  3  4  5  6  7  8  9 10

(r <- 5:1)

## [1] 5 4 3 2 1

2 * 1:5

## [1]  2  4  6  8 10

seq(1,10,2)

## [1] 1 3 5 7 9

seq(1,by = 2,length = 10)

##  [1]  1  3  5  7  9 11 13 15 17 19

向量常见函数

x <- c(1,3,5,7,9)
length(x)

## [1] 5

y <- c(2,6,3,7,5)
sort(y)

## [1] 2 3 5 6 7

rev(y)

## [1] 5 7 3 6 2

append(y,10:15,after = 3)

##  [1]  2  6  3 10 11 12 13 14 15  7  5

sum(x)

## [1] 25

max(y)

## [1] 7

向量索引

x <- c(1,3,5,7,9)
x[2]

## [1] 3

x[c(1,3)] <- c(9,11)
x

## [1]  9  3 11  7  9

x[x < 9]

## [1] 3 7

y <- 1:10
y[-(1:5)]

## [1]  6  7  8  9 10

矩阵

matrix(1:12,nrow = 4,ncol = 3)

##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

matrix(1:12,nrow = 4,ncol = 3,byrow = T)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12

(A <- matrix(1:12,nrow = 3,ncol = 4))

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

t(A)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12

A * A

##      [,1] [,2] [,3] [,4]
## [1,]    1   16   49  100
## [2,]    4   25   64  121
## [3,]    9   36   81  144

A %*% t(A)

##      [,1] [,2] [,3]
## [1,]  166  188  210
## [2,]  188  214  240
## [3,]  210  240  270

diag(A)

## [1] 1 5 9

diag(diag(A))

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    5    0
## [3,]    0    0    9

diag(3)

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

(B <- matrix(rnorm(16),4,4))

##            [,1]       [,2]       [,3]       [,4]
## [1,] -1.9538051  0.5098616 -1.0881549  0.1604978
## [2,]  0.1839896 -0.3518385 -1.9846978 -0.8882620
## [3,] -0.9502093 -0.0159271  0.1742667 -1.3177211
## [4,] -0.6739234 -0.6401475 -0.8820410  0.8282945

solve(B)

##            [,1]        [,2]       [,3]        [,4]
## [1,] -0.2207098  0.21726896 -0.3533751 -0.28641333
## [2,]  0.6620299  0.04849994 -0.6097648 -1.04633505
## [3,] -0.1939894 -0.39835129  0.2810042  0.05744301
## [4,]  0.1254972 -0.20994032 -0.4595345  0.22677644

(B.eigen <- eigen(B,symmetric = T))

## $values
## [1]  1.5897823  0.3723692 -0.6790538 -2.5861801
## 
## $vectors
##             [,1]       [,2]       [,3]        [,4]
## [1,] -0.04166695 -0.4274455  0.2105799 0.878185799
## [2,] -0.27336537 -0.3874562 -0.8803738 0.009544999
## [3,] -0.48567781  0.7610716 -0.1799085 0.390538136
## [4,]  0.82924803  0.2965435 -0.3850079 0.276004638

svd(B)

## $d
## [1] 2.8061853 1.9267986 1.7623006 0.6912582
## 
## $u
##            [,1]        [,2]        [,3]       [,4]
## [1,] -0.7253026  0.53084041  0.05008235 -0.4354726
## [2,] -0.5422299 -0.82459354 -0.11309539 -0.1150725
## [3,] -0.2291858  0.18823573 -0.80168579  0.5189808
## [4,] -0.3569268  0.05311558  0.58480859  0.7264853
## 
## $v
##            [,1]       [,2]       [,3]        [,4]
## [1,] 0.63276331 -0.7284282  0.1412889 -0.22145051
## [2,] 0.01892586  0.2718390 -0.1681148 -0.94735567
## [3,] 0.76270396  0.5422910 -0.2755316  0.21973967
## [4,] 0.13242007  0.3184593  0.9358724 -0.07205128

dim(A)

## [1] 3 4

nrow(B)

## [1] 4

det(B)

## [1] 6.586777

A[row(A) < col(A)] = 0
A

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    2    5    0    0
## [3,]    3    6    9    0

apply(A,1,sum)

## [1]  1  7 18

apply(A,2,mean)

## [1] 2.000000 3.666667 3.000000 0.000000

数组

矩阵只能是2维的，数组是多维的。一维数组就是向量，二维数组就是矩阵。

(xx <- array(1:24,c(3,4,2)))

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24

xx[2,3,2]

## [1] 20

xx[2,1:3,2]

## [1] 14 17 20

dim(xx)

## [1] 3 4 2

数组的运算和矩阵类似。

因子

y <- c("male","female","female","male","female","male","male")
(f <- factor(y))

## [1] male   female female male   female male   male  
## Levels: female male

levels(f)

## [1] "female" "male"

列表

如果一个数据对象需要包含不同的数据类型，则可以采用列表（List）

x <- c(1,1,2,2,3,3,3)
y <- c("male","female","female","male","female","male","male")
z <- c(80,85,92,76,61,95,83)
(stu <- list(class = x, sex = y, score = z))

## $class
## [1] 1 1 2 2 3 3 3
## 
## $sex
## [1] "male"   "female" "female" "male"   "female" "male"   "male"  
## 
## $score
## [1] 80 85 92 76 61 95 83

stu[[3]]

## [1] 80 85 92 76 61 95 83

stu$sex

## [1] "male"   "female" "female" "male"   "female" "male"   "male"

数据框

数据框（data frame）是一种矩阵形式的数据，但各列可以是不同类型的数据，可以看做是矩阵的推广，类似于关系数据库的形式。

(student <- data.frame(class = x, sex = y, score = z))

##   class    sex score
## 1     1   male    80
## 2     1 female    85
## 3     2 female    92
## 4     2   male    76
## 5     3 female    61
## 6     3   male    95
## 7     3   male    83

row.names(student) <- c("zhao","qian","sun","li","zhou","wu","zhen")
student

##      class    sex score
## zhao     1   male    80
## qian     1 female    85
## sun      2 female    92
## li       2   male    76
## zhou     3 female    61
## wu       3   male    95
## zhen     3   male    83

student[,"score"]

## [1] 80 85 92 76 61 95 83

student[,2]

## [1] male   female female male   female male   male  
## Levels: female male

student$score

## [1] 80 85 92 76 61 95 83

student[["class"]]

## [1] 1 1 2 2 3 3 3

student[[3]]

## [1] 80 85 92 76 61 95 83

数据框绑定attach函数

#score
#Error: object 'score' not found
attach(student)
score

## [1] 80 85 92 76 61 95 83

detach()
#score
#Error: object 'score' not found