字符串操作函数
(1)字符合并:paste()
> paste("abc", "bc", sep="-")#将abc和bc以指定的分隔符-连接
[1] "abc-bc"
(2)字符串大小写转换:toupper(),to lower()
> toupper('aB cd')
[1] "AB CD"
> tolower(c('aB', 'cd'))
[1] "ab" "cd"
> toupper('jan')=='JAN'
[1] TRUE
(3)返回字符串的长度:nchar()
> nchar("abcccc")
[1] 6
(4)字符串取子集:substr(),substring ()
> substr("abcdef", 2, 4)#从字符串的第二位取到第4位
[1] "bcd"
> substring("abcdef", 3)#从字符串的第三位取到末尾
[1] "cdef"
(5)字符替代:gsub()
注意要替换的内容是支持正则表达式的。
> gsub("abc", "123", c("abc", "abcc", "abcbc"))
#把"abc"替换成"123",c("abc", "abcc", "abcbc")为要替换的向量
[1] "123" "123c" "123bc"
(6)字符分离:strsplit()
将字符串以指定的分隔符分隔字符,fixed为T时,按原字符分割,为F时,按正则表达式分割。注意返回的是一个列表,分隔的元素放入列表的每个元素中。
> strsplit("a;b;c", ";", fixed = T)
[[1]]
[1] "a" "b" "c"
> strsplit("a222b2.2c", "2.2", fixed = F)
[[1]]
[1] "a" "b" "c"
> strsplit("a222b2.2c", "2.2", fixed = T)
[[1]]
[1] "a222b" "c"
(7)排序函数:sort()
> v <- c("a", "d", "z", "b")
> sort.result <- sort(v)#sort排序,默认升序
> sort.result
[1] "a" "b" "d" "z"
> revsort.result <- sort(v, decreasing = TRUE)#decreasing参数:按降序排列
> revsort.result
[1] "z" "d" "b" "a"
(8)order():对字符进行排序,返回排完序后的元素的索引值
> v <- c("a", "d", "z", "b")
> order.result <- order(v)
> order.result
[1] 1 4 2 3
> revordert.result <- order(v, decreasing = TRUE)
> revordert.result
[1] 3 2 4 1
应用一:排序函数
将data frame中的所有列按照身高的而从低到高排序
> df
height weight gender
tom 180 75 male
cindy 165 58 female
jimmy 175 72 male
sam 173 68 male
lucy 160 60 female
lily 165 55 female
> order(df$height)
[1] 5 2 6 4 3 1
> df[order(df$height),]
height weight gender
lucy 160 60 female
cindy 165 58 female
lily 165 55 female
sam 173 68 male
jimmy 175 72 male
tom 180 75 male
>
应用二:字符替代函数gsub()
替换所有的Stage I(AB)/Stage II(AB)为early;替换所有的Stage III(AB)/Stage IV(AB)为advanced.
> df1$pathlogic_stage
[1] "Stage I" "Stage I" "Stage I" "Stage I" "Stage I" "Stage I" "Stage I" "Stage IVA"
[9] "Stage IVA" "Stage II" "Stage II" "Stage I" "Stage I" "Stage IIB" "Stage IIB" "Stage II"
[17] "Stage II" "Stage I" "Stage I" "Stage II" "Stage II" "Stage III" "Stage III" "Stage II"
[25] "Stage II" "Stage I" "Stage I" "Stage IV" "Stage IV" "Stage IV" "Stage IV" "Stage I"
[33] "Stage I" "Stage IVA" "Stage IVA" "Stage I" "Stage I" "Stage I" "Stage I" "Stage II"
[41] "Stage II" "Stage I" "Stage I" "Stage I" "Stage I" "Stage II" "Stage II" "Stage II"
[49] "Stage II" "Stage IVB" "Stage IVB" "Stage I" "Stage I" "Stage IVB" "Stage IVB" "Stage IVA"
[57] "Stage IVA" "Stage II" "Stage II" "Stage IVB" "Stage IVB" "Stage IVB" "Stage IVB" "Stage IIB"
[65] "Stage IIB" "Stage I" "Stage I" "Stage I" "Stage I" "Stage I" "Stage I" "Stage I"
[73] "Stage I" "Stage III" "Stage III" "Stage I" "Stage I" "Stage IVB" "Stage IVB" "Stage III"
[81] "Stage III" "Stage I" "Stage I" "Stage III" "Stage III" "Stage II" "Stage II" "Stage II"
[89] "Stage II" "Stage I" "Stage I" "Stage IIB" "Stage IIB" "Stage II" "Stage II"
> t=gsub("Stage I{1,2}$","early",df1$pathlogic_stage)
> t=gsub("Stage I{2}[AB]$","early",t)
> t=gsub("Stage.*","advance",t)
> t
[1] "early" "early" "early" "early" "early" "early" "early" "advance" "advance"
[10] "early" "early" "early" "early" "early" "early" "early" "early" "early"
[19] "early" "early" "early" "advance" "advance" "early" "early" "early" "early"
[28] "advance" "advance" "advance" "advance" "early" "early" "advance" "advance" "early"
[37] "early" "early" "early" "early" "early" "early" "early" "early" "early"
[46] "early" "early" "early" "early" "advance" "advance" "early" "early" "advance"
[55] "advance" "advance" "advance" "early" "early" "advance" "advance" "advance" "advance"
[64] "early" "early" "early" "early" "early" "early" "early" "early" "early"
[73] "early" "advance" "advance" "early" "early" "advance" "advance" "advance" "advance"
[82] "early" "early" "advance" "advance" "early" "early" "early" "early" "early"
[91] "early" "early" "early" "early" "early"