本文展示了普通因子和有序因子的不同是如何影响数据分析的。
#通过简单的print()实现代码的逐段输出,可直接粘贴至脚本并运行
p=print
p(patientID <- c(1,2,3,4))#1以向量形式输入数据
p(age <- c(25,34,28,52))
p(diabetes <- c("Type1","Type2","Type1","Type1"))
p(status <- c("Poor","Improved","Excellent","Poor"))
p(diabetes <- factor(diabetes))
p(status <- factor(status))
p(status <- factor(status,order=TRUE))
p(patientdata <- data.frame(patientID,age,diabetes,status))
p(str(patientdata))#2显示对象的结构
p(summary(patientdata))#3显示对象的统计概要
输出结果:
> p=print
> p(patientID <- c(1,2,3,4))#1以向量形式输入数据
[1] 1 2 3 4
> p(age <- c(25,34,28,52))
[1] 25 34 28 52
> p(diabetes <- c("Type1","Type2","Type1","Type1"))
[1] "Type1" "Type2" "Type1" "Type1"
> p(status <- c("Poor","Improved","Excellent","Poor"))
[1] "Poor" "Improved" "Excellent" "Poor"
> p(diabetes <- factor(diabetes))#将diabetes指定为普通因子
[1] Type1 Type2 Type1 Type1
Levels: Type1 Type2
> p(status <- factor(status))#先将status指定为普通因子
[1] Poor Improved Excellent Poor
Levels: Excellent Improved Poor
> p(status <- factor(status,order=TRUE))#最后将status指定为有序因子
[1] Poor Improved Excellent Poor
Levels: Excellent < Improved < Poor
> p(patientdata <- data.frame(patientID,age,diabetes,status))#将()中的数据合并为数据框
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> p(str(patientdata))#2显示对象的结构.函数str(object)提供对象的信息,显示数据类型及在对象内部是如何编码
'data.frame': 4 obs. of 4 variables:
$ patientID: num 1 2 3 4
$ age : num 25 34 28 52
$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
$ status : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 1 3
NULL
> p(summary(patientdata))#3显示对象的统计概要.函数summary()会区别对待各个变量,
patientID age diabetes status
Min. :1.00 Min. :25.00 Type1:3 Excellent:1
1st Qu.:1.75 1st Qu.:27.25 Type2:1 Improved :1
Median :2.50 Median :31.00 Poor :2
Mean :2.50 Mean :34.75
3rd Qu.:3.25 3rd Qu.:38.50
Max. :4.00 Max. :52.00
注:源码来自R语言实战第三版p28