1概率函数
分布与R函数对应:
Screenshot_20220113_120852_tv.danmaku.bili_edit_39243700578985.jpg
Screenshot_20220113_121404_tv.danmaku.bili.png
在对应函数前加字母所表示:
- d:概率密度函数
- p:分布函数
- q:分布函数的反函数
- r:产生相同分布的随机数
描述性统计函数
summary函数
> summary(myvars)
mpg hp wt am
Min. : 1.00 Min. : 52.0 Min. :1.513 Min. :0.0000
1st Qu.: 8.75 1st Qu.: 96.5 1st Qu.:2.581 1st Qu.:0.0000
Median :16.50 Median :123.0 Median :3.325 Median :0.0000
Mean :16.50 Mean :146.7 Mean :3.217 Mean :0.4062
3rd Qu.:24.25 3rd Qu.:180.0 3rd Qu.:3.610 3rd Qu.:1.0000
Max. :32.00 Max. :335.0 Max. :5.424 Max. :1.0000
#summary函数会得出该数据的一系列结果,包括最大最小值,四分位。。。
aggregate函数(分类分析)
> aggregate(iris[c("Sepal.Length","Sepal.Width")],by=list(iris$Species),max)
Group.1 Sepal.Length Sepal.Width
1 setosa 5.8 4.4
2 versicolor 7.0 3.4
3 virginica 7.9 3.8
#()内第一部分表示需要统计的量,第二部分表示分类的依据,第三部分为需要统计的方法
频数统计函数
有因子才能分组,分组之后才能进行频数统计
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 1 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 2 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 3 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 5 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars$cyl <- as.factor(mtcars$cyl)#将所需要分组的列转换成因子
> split(mtcars,mtcars$cyl)#使用split函数将其分组
$`4`
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
$`6`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
$`8`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
对于因子不是很明显的数据进行分组前要先用cut函数进行分割
> cut(mtcars$mpg,c(seq(10,50,10)))#对目标列进行分割,从10分割到50,每10个分一组
[1] (20,30] (20,30] (20,30] (20,30] (10,20] (10,20] (10,20] (20,30] (20,30]
[10] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (30,40]
[19] (30,40] (30,40] (20,30] (10,20] (10,20] (10,20] (10,20] (20,30] (20,30]
[28] (30,40] (10,20] (10,20] (10,20] (20,30]
Levels: (10,20] (20,30] (30,40] (40,50]
> table(cut(mtcars$mpg,c(seq(10,50,10))))#使用table函数进行频数统计
(10,20] (20,30] (30,40] (40,50]
18 10 4 0
> prop.table(table(cut(mtcars$mpg,c(seq(10,50,10)))))#使用prop.table函数对频数进行频率统计
(10,20] (20,30] (30,40] (40,50]
0.5625 0.3125 0.1250 0.0000
对于二维数据(两列)的频数统计
> head(Arthritis)#载入数据集,该数据集里sex,treatment和improved都可以做因子,随便取两个进行统计
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
> table(Arthritis$Treatment,Arthritis$Improved)#选择treatment和improved作为统计,则返回一个表格,,前作为行,,后作为列表头为各自的level。
None Some Marked
Placebo 29 7 7
Treated 13 7 21
#对于二维数据的频数统计还可以使用xtabs函数:
> xtabs(~Treatment+Improved,data=Arthritis)#~后跟想要统计的两个列名,data=数据集
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
#使用prop.table函数进行频率统计,1为对行进行频率统计,2为列
> x <- xtabs(~Treatment+Improved,data=Arthritis)
> prop.table(x,1)
Improved
Treatment None Some Marked
Placebo 0.6744186 0.1627907 0.1627907
Treated 0.3170732 0.1707317 0.5121951
对于三维数据的频数统计
> y <- xtabs(~Treatment+Improved+Sex,data=Arthritis)#同二维数据一样使用xtabs函数
> ftable(y)#使用ftable函数使得结果更好看
Sex Female Male
Treatment Improved
Placebo None 19 10
Some 7 0
Marked 6 1
Treated None 6 7
Some 5 2
Marked 16 5