在这一章里面,我们将介绍一些简单的R软件运算,包括基本数字运算、向量运算与统计运算,让读者们对R软件的基本计算功能先有一个初步的印象。
简单的数字与字符串运算
R软件的简单运算是通过程序语言通用的运算符符号来完成的。
1+1
## [1] 2
1*3.4
## [1] 3.4
1/2
## [1] 0.5
1%/%2
## [1] 0
余数(modulus):
5 %% 2
## [1] 1
三角函数运算:
cos(1.0)
## [1] 0.5403023
幂次计算:
2 ^ 0.5
## [1] 1.414214
sqrt(2)
## [1] 1.414214
科学计数符号:
x = 1.2e-5
x * 10000
## [1] 0.12
逻辑运算会产生逻辑向量:
x = c(1,2,3,4,5)
x>3
## [1] FALSE FALSE FALSE TRUE TRUE
解方程
使用uniroot()函数可以解一元n次方程,还有二元一次方程
一元一次方程
f1<-function(x,a,b) a*x+b
a<-5;b<-12
result<-uniroot(f1,c(-10,10),a=a,b=b,tol = 0.0001)
result$root
## [1] -2.4
一元二次方程
f2<-function(x,a,b,c) a*x^2+b*x+c
a<-1;b<-5;c<-6
result<-uniroot(f2,c(-4,-3),a=a,b=b,c=c,tol = 0.0001)
result$root
## [1] -3
二元一次方程组(矩阵求解)
lf<-matrix(c(3,5,1,2),nrow = 2,byrow = TRUE)
rf<-matrix(c(4,1),nrow = 2)
result<-solve(lf,rf)
result
## [,1]
## [1,] 3
## [2,] -1
有序数列:规则性的数字集合
在R软件中,如果想要构建规则性的数字或向量,可以使用以下函数:
- sequence(有序数列)运算符
- seq(起始值,结束值,by:递增值):sequence函数,例如,seq(5)会产生(1,2,3,4, 5)。若加上length:k ,则会产生k个等距数据
- rep ()函数
1:9
## [1] 1 2 3 4 5 6 7 8 9
x = 1:9
x
## [1] 1 2 3 4 5 6 7 8 9
1.5:10
## [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
c(1.5:10,11)
## [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 11.0
prod(1:8)
## [1] 40320
seq(1,5)
## [1] 1 2 3 4 5
seq(1,5,by=0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
seq(1,5,length = 7)
## [1] 1.000000 1.666667 2.333333 3.000000 3.666667 4.333333 5.000000
rep(10,5)
## [1] 10 10 10 10 10
rep(c("A","B","C","D"),2)
## [1] "A" "B" "C" "D" "A" "B" "C" "D"
rep(1:4,times = 3,each =2)
## [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
rep(1:4,each =2,length = 12)
## [1] 1 1 2 2 3 3 4 4 1 1 2 2
matrix(rep(0,16),nrow = 4)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
matrix(rep(0,16),nrow = 4)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
matrix(0,nrow = 4,ncol = 4)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
基本向量运算
以下函数可用于vector变量:
- length:向量中的元素个数
- sum:向量中所有元素求和
- prod:向量中所有元素求积
- cumsum、cumprod:累积相加和累积相乘
- sort:向量中的元素排序
- rank:显示元素排序后的“排序顺位”,输出为向量
x=c(1,2.0,3);x
## [1] 1 2 3
(x=c(1.0,2.3,3))
## [1] 1.0 2.3 3.0
x = c(1,2,3)
x + 1
## [1] 2 3 4
x - 1.2
## [1] -0.2 0.8 1.8
x * 2
## [1] 2 4 6
x * x
## [1] 1 4 9
y = c(4,5,6,7)
x * y
## Warning in x * y: longer object length is not a multiple of shorter object
## length
## [1] 4 10 18 7
x = c(1,2,3,4)
y = c(5,6,7,8)
y / x
## [1] 5.000000 3.000000 2.333333 2.000000
y - x
## [1] 4 4 4 4
x ^ y
## [1] 1 64 2187 65536
cos(x*pi)+cos(y*pi)
## [1] -2 2 -2 2
s = c(1,2,3,4,5,6)
length(s)
## [1] 6
sum(s)
## [1] 21
prod(s)
## [1] 720
cumsum(s)
## [1] 1 3 6 10 15 21
x = c(1,2,3,4)
y = c(5,6,7)
z = c(x,y)
z
## [1] 1 2 3 4 5 6 7
向量的指标用法
一个向量x的第i个元素可以用x[i]表示。
x = c(11,12,13)
x[2]
## [1] 12
x[4]
## [1] NA
x[c(1,3)]
## [1] 11 13
x[1:3]
## [1] 11 12 13
y = x[1:2]
y
## [1] 11 12
基本统计计算
- mean:期望(平均值)
- var:样本方差
- sd:样本标准差
x = c(11,12,13)
mean(x)
## [1] 12
max(x)
## [1] 13
min(x)
## [1] 11
var(x)
## [1] 1
sd(x)
## [1] 1
sum(x)
## [1] 36
也可以不使用sd函数,而是用自定义函数计算标准差:
my.sd <- function(y)
{
n=length(y)
s=sqrt((sum(y^2)-n*mean(y)^2)/(n-1))
return(s)
}
my.sd(x)
## [1] 1
模拟100个人的身高体重数据(正态分布)
weight =rnorm(100,55,5)
height = rnorm(100,165,5)
plot(weight,height)

summary(lm(height~weight))
##
## Call:
## lm(formula = height ~ weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.4817 -3.9205 0.1501 3.8235 10.6129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 164.84728 6.58191 25.046 <2e-16 ***
## weight -0.01313 0.11738 -0.112 0.911
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.18 on 98 degrees of freedom
## Multiple R-squared: 0.0001277, Adjusted R-squared: -0.01008
## F-statistic: 0.01252 on 1 and 98 DF, p-value: 0.9111
数据对象
x <- seq(0,1,by = 0.2)
y <- seq(0,1,by = 0.2)
y[4]
## [1] 0.6
x[3]
## [1] 0.4
1 - x[3]
## [1] 0.6
y[4] > 1 - x[3]
## [1] TRUE
向量
- 向量赋值
x <- c(1,3,5,7,9)
x
## [1] 1 3 5 7 9
v <- paste("x",1:5,sep="")
v
## [1] "x1" "x2" "x3" "x4" "x5"
- 向量运算
x <- c(1,3,5,7,9)
y <- c(2,4,6,8,10)
x * y
## [1] 2 12 30 56 90
x %*% y
## [,1]
## [1,] 190
- 生成有规则序列
(t <- 1:10)
## [1] 1 2 3 4 5 6 7 8 9 10
(r <- 5:1)
## [1] 5 4 3 2 1
2 * 1:5
## [1] 2 4 6 8 10
seq(1,10,2)
## [1] 1 3 5 7 9
seq(1,by = 2,length = 10)
## [1] 1 3 5 7 9 11 13 15 17 19
- 向量常见函数
x <- c(1,3,5,7,9)
length(x)
## [1] 5
y <- c(2,6,3,7,5)
sort(y)
## [1] 2 3 5 6 7
rev(y)
## [1] 5 7 3 6 2
append(y,10:15,after = 3)
## [1] 2 6 3 10 11 12 13 14 15 7 5
sum(x)
## [1] 25
max(y)
## [1] 7
- 向量索引
x <- c(1,3,5,7,9)
x[2]
## [1] 3
x[c(1,3)] <- c(9,11)
x
## [1] 9 3 11 7 9
x[x < 9]
## [1] 3 7
y <- 1:10
y[-(1:5)]
## [1] 6 7 8 9 10
矩阵
matrix(1:12,nrow = 4,ncol = 3)
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
matrix(1:12,nrow = 4,ncol = 3,byrow = T)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
(A <- matrix(1:12,nrow = 3,ncol = 4))
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
t(A)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
A * A
## [,1] [,2] [,3] [,4]
## [1,] 1 16 49 100
## [2,] 4 25 64 121
## [3,] 9 36 81 144
A %*% t(A)
## [,1] [,2] [,3]
## [1,] 166 188 210
## [2,] 188 214 240
## [3,] 210 240 270
diag(A)
## [1] 1 5 9
diag(diag(A))
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 5 0
## [3,] 0 0 9
diag(3)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
(B <- matrix(rnorm(16),4,4))
## [,1] [,2] [,3] [,4]
## [1,] -1.9538051 0.5098616 -1.0881549 0.1604978
## [2,] 0.1839896 -0.3518385 -1.9846978 -0.8882620
## [3,] -0.9502093 -0.0159271 0.1742667 -1.3177211
## [4,] -0.6739234 -0.6401475 -0.8820410 0.8282945
solve(B)
## [,1] [,2] [,3] [,4]
## [1,] -0.2207098 0.21726896 -0.3533751 -0.28641333
## [2,] 0.6620299 0.04849994 -0.6097648 -1.04633505
## [3,] -0.1939894 -0.39835129 0.2810042 0.05744301
## [4,] 0.1254972 -0.20994032 -0.4595345 0.22677644
(B.eigen <- eigen(B,symmetric = T))
## $values
## [1] 1.5897823 0.3723692 -0.6790538 -2.5861801
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] -0.04166695 -0.4274455 0.2105799 0.878185799
## [2,] -0.27336537 -0.3874562 -0.8803738 0.009544999
## [3,] -0.48567781 0.7610716 -0.1799085 0.390538136
## [4,] 0.82924803 0.2965435 -0.3850079 0.276004638
svd(B)
## $d
## [1] 2.8061853 1.9267986 1.7623006 0.6912582
##
## $u
## [,1] [,2] [,3] [,4]
## [1,] -0.7253026 0.53084041 0.05008235 -0.4354726
## [2,] -0.5422299 -0.82459354 -0.11309539 -0.1150725
## [3,] -0.2291858 0.18823573 -0.80168579 0.5189808
## [4,] -0.3569268 0.05311558 0.58480859 0.7264853
##
## $v
## [,1] [,2] [,3] [,4]
## [1,] 0.63276331 -0.7284282 0.1412889 -0.22145051
## [2,] 0.01892586 0.2718390 -0.1681148 -0.94735567
## [3,] 0.76270396 0.5422910 -0.2755316 0.21973967
## [4,] 0.13242007 0.3184593 0.9358724 -0.07205128
dim(A)
## [1] 3 4
nrow(B)
## [1] 4
det(B)
## [1] 6.586777
A[row(A) < col(A)] = 0
A
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 2 5 0 0
## [3,] 3 6 9 0
apply(A,1,sum)
## [1] 1 7 18
apply(A,2,mean)
## [1] 2.000000 3.666667 3.000000 0.000000
数组
矩阵只能是2维的,数组是多维的。一维数组就是向量,二维数组就是矩阵。
(xx <- array(1:24,c(3,4,2)))
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 13 16 19 22
## [2,] 14 17 20 23
## [3,] 15 18 21 24
xx[2,3,2]
## [1] 20
xx[2,1:3,2]
## [1] 14 17 20
dim(xx)
## [1] 3 4 2
数组的运算和矩阵类似。
因子
y <- c("male","female","female","male","female","male","male")
(f <- factor(y))
## [1] male female female male female male male
## Levels: female male
levels(f)
## [1] "female" "male"
列表
如果一个数据对象需要包含不同的数据类型,则可以采用列表(List)
x <- c(1,1,2,2,3,3,3)
y <- c("male","female","female","male","female","male","male")
z <- c(80,85,92,76,61,95,83)
(stu <- list(class = x, sex = y, score = z))
## $class
## [1] 1 1 2 2 3 3 3
##
## $sex
## [1] "male" "female" "female" "male" "female" "male" "male"
##
## $score
## [1] 80 85 92 76 61 95 83
stu[[3]]
## [1] 80 85 92 76 61 95 83
stu$sex
## [1] "male" "female" "female" "male" "female" "male" "male"
数据框
数据框(data frame)是一种矩阵形式的数据,但各列可以是不同类型的数据,可以看做是矩阵的推广,类似于关系数据库的形式。
(student <- data.frame(class = x, sex = y, score = z))
## class sex score
## 1 1 male 80
## 2 1 female 85
## 3 2 female 92
## 4 2 male 76
## 5 3 female 61
## 6 3 male 95
## 7 3 male 83
row.names(student) <- c("zhao","qian","sun","li","zhou","wu","zhen")
student
## class sex score
## zhao 1 male 80
## qian 1 female 85
## sun 2 female 92
## li 2 male 76
## zhou 3 female 61
## wu 3 male 95
## zhen 3 male 83
student[,"score"]
## [1] 80 85 92 76 61 95 83
student[,2]
## [1] male female female male female male male
## Levels: female male
student$score
## [1] 80 85 92 76 61 95 83
student[["class"]]
## [1] 1 1 2 2 3 3 3
student[[3]]
## [1] 80 85 92 76 61 95 83
数据框绑定attach函数
#score
#Error: object 'score' not found
attach(student)
score
## [1] 80 85 92 76 61 95 83
detach()
#score
#Error: object 'score' not found