R语言支持字符处理,内置了系列函数(grep、gsub等),但系列函数定义混乱,对使用者极不方便。stringr包是专门用于字符处理的R包,函数定义简洁、使用方式统一,是使用率较高的R包。
stringr
包中的大部分函数具有统一风格的命名方式,以str_
开头,正则表达式也完全适用该包。
安装包
install.packages("stringr")
字符串拼接
字符串拼接函数str_c
,与R语言自带的paste
和paste0
函数具有相同的作用。
library(stringr)
# 默认无向量分割符拼接
str_c("a","b")
## [1] "ab"
# 指定向量分隔符
str_c("a","b",sep = "_")
## [1] "a_b"
# 指定向量折叠符
str_c(c("a","b","c"),collapse = "_")
## [1] "a_b_c"
# 混合应用
str_c(c("a","b"),c("c","d"),sep = "/",collapse = "_")
## [1] "a/c_b/d"
字符计数
字符计数函数str_count
,计算字符串中指定字符的个数。
# 单个目标字符计数
str_count(string = c("sql","json","java"),pattern = "s")
## [1] 1 1 0
# 多个目标字符计数
str_count(string = c("sql","json","java"),pattern = c("s","j","a"))
## [1] 1 1 2
# 元字符查找计数(fixed包裹元字符)
str_count(string = "a..b", pattern = fixed(""))
## [1] 2
字符检查
字符检查函数str_detect
,检查字符串中是否包含指定字符,返回逻辑向量。
str_detect(string = c("sql","json","java"),pattern = "s")
## [1] TRUE TRUE FALSE
字符复制
字符复制函数str_dup
,将字符向量重复若干次,返回重复后的字符向量。
str_dup(string = c("sql","json","java"),times = 2)
## [1] "sqlsql" "jsonjson" "javajava"
字符提取
字符提取函数str_extract
和str_extract_all
,对字符串进行提取,str_extract_all
函数返回所有的匹配结果。
# 提取第一个匹配到的字符
str_extract(string = "banana",pattern = "a")
## [1] "a"
# 提取所有匹配到的字符(返回列表)
str_extract_all(string = "banana",pattern = "a")
## [[1]]
## [1] "a" "a" "a"
# 提取所有匹配到的字符(返回矩阵)
str_extract_all(string = "banana",pattern = "a",simplify = T)
## [,1] [,2] [,3]
## [1,] "a" "a" "a"
字符串格式化
字符串格式化函数str_glue
,用花括号{}
表示占位符,括号内的变量被替换成全局变量值。
# 定义全局变量
name <- "jack"
age <- 12
# 字符串格式化
str_glue("My name is {name},","\nmy age is {age}.")
## My name is jack,
## my age is 12.
字符串长度
字符串长度函数str_length
,计算字符串长度。
str_length(string = "banana")
## [1] 6
字符位置提取
字符位置提取函数str_locate
和str_locate_all
,返回匹配到的字符的位置。
# 返回第一个匹配到的字符的位置
str_locate(string = "banana",pattern = "a")
## start end
## [1,] 2 2
# 返回所有匹配到的字符的位置
str_locate_all(string = "banana",pattern = "a")
## [[1]]
## start end
## [1,] 2 2
## [2,] 4 4
## [3,] 6 6
字符匹配
字符匹配函数str_match
和str_match_all
与字符提取函数str_extract
类似,返回匹配到的字符,不同之处在于返回格式。
# 返回第一个匹配到的字符(矩阵)
str_match(string = "banana",pattern = "a")
## [,1]
## [1,] "a"
# 返回所有匹配到的字符(列表)
str_match_all(string = "banana",pattern = "a")
## [[1]]
## [,1]
## [1,] "a"
## [2,] "a"
## [3,] "a"
字符补齐
字符补齐函数str_pad
,用于在字符串中添加单个字符,可选择添加的位置,在参数side
中进行设置。
# 默认字符串左边补齐
str_pad(string = "jack",width = 6,pad = "S")
## [1] "SSjack"
# 字符串右边补齐
str_pad(string = "jack",width = 6,side = "right",pad = "S")
## [1] "jackSS"
# 字符串两边补齐
str_pad(string = "jack",width = 6,side = "both",pad = "S")
## [1] "SjackS"
字符删除
字符删除函数str_remove
和str_remove_all
,用于删除字符串中的部分字符。
# 删除第一个匹配到的字符
str_remove(string = "banana",pattern = "a")
## [1] "bnana"
# 删除所有匹配到的字符
str_remove_all(string = "banana",pattern = "a")
## [1] "bnn"
字符替换
字符替换函数str_replace
、str_replace_all
和str_replace_na
,用于替换字符串中的部分字符。
# 替换第一个匹配到的字符
str_replace(string = "banana",pattern = "a",replacement = "A")
## [1] "bAnana"
# 替换所有匹配到的字符
str_replace_all(string = "banana",pattern = "a",replacement = "A")
## [1] "bAnAnA"
# NA替换成NA字符
str_replace_na(string = c("banana",NA))
## [1] "banana" "NA"
字符排序
字符排序函数str_sort
和str_order
,对字符向量进行排序。
# 字符向量升序排序,返回字符向量
str_sort(c("sql","json","python"))
## [1] "json" "python" "sql"
# 字符向量降序排序,返回字符向量
str_sort(c("sql","json","python"),decreasing = TRUE)
## [1] "sql" "python" "json"
# 字符向量升序排序,返回索引向量
str_order(c("sql","json","pythn"))
## [1] 2 3 1
字符分割
字符分割函数str_split
和str_split_fixed
,对字符串进行分割。
# 字符分割,返回列表
str_split(string = "banana",pattern = "")
## [[1]]
## [1] "b" "a" "n" "a" "n" "a"
# 字符分割,返回矩阵
str_split(string = "banana",pattern = "",simplify = T)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "b" "a" "n" "a" "n" "a"
# 字符分割,指定分割块数
str_split_fixed(string = "banana",pattern = "",n = 3)
## [,1] [,2] [,3]
## [1,] "b" "a" "nana"
字符过滤
字符过滤函数str_sub
和str_subset
,str_sub
函数通过指定开始和结束位置,过滤出字符串的部分字符串。str_subset
函数通过匹配模式,过滤出满足模式的字符串。
# 字符过滤(正向索引)
str_sub(string = "banana",start = 1,end = 3)
## [1] "ban"
# 字符过滤(反向索引)
str_sub(string = "banana",start = -2,end = -1)
## [1] "na"
# 字符过滤,并赋值
x <- "banana"
str_sub(string = x,start = 1,end = 1) <- "A"
print(x)
## [1] "Aanana"
# 字符串过滤(返回字符串)
str_subset(string = c("java","sql","python"),pattern = "^s")
## [1] "sql"
# 字符串过滤(返回位置)
str_which(string = c("java","sql","python"),pattern = "^s")
## [1] 2
其他
stringr
包中其他的有用函数,用于常见的字符处理。
# 删除字符串两边的空格
str_trim(string = " you are beautiful! ")
## [1] "you are beautiful!"
# 删除字符串中多余的空格
str_squish(string = " you are beautiful! ")
## [1] "you are beautiful!"
# 字符转为小写
dog <- "The quick brown dog"
str_to_lower(dog)
## [1] "the quick brown dog"
# 字符转为大写
str_to_upper(dog)
## [1] "THE QUICK BROWN DOG"
# 字符转为标题
str_to_title(dog)
## [1] "The Quick Brown Dog"
# 字符转为语句
str_to_sentence(dog)
## [1] "The quick brown dog"