1 提取或替换
1.1提取或者替换元素中起始位置之间的内容
substr(x, start=n1, stop=n2)
x <- c("howareyou","fine","thank")
substr(x,2,4) # "owa" "ine" "han" 即每个字符串的第2-4个字符
substr(x, 2, 4) <- "1234567" #"h123reyou" "f123" "t123k" 每个字符串的第2-4换替换为后数据
1.2 替换匹配的元素
sub
替换第一次匹配的元素,gsub
是贪婪模式,替换所有匹配到的。
sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE, perl=FALSE, useBytes=FALSE)
x <- c("howareyouaaa","fine","thank")
sub("a",replacement = "A",x) # "howAreyouaaa" "fine" "thAnk"
gsub("a",replacement = "A",x=c("a1a","a2","b1","b2")) # "howAreyouAAA" "fine" "thAnk"
2 查找
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE);
grep 返回符合正则条件的元素在向量中位置、本身、或者逻辑值。
invert→若设置为TRUE,返回
不包含pattern的元素的下标
value→若设置为TRUE,返回
相应的元素
;-
fixed→若fixed =FALSE,则pattern是一个
正则表达式
。若fixed=TRUE,那么pattern是一个文本字符串
,返回匹配指数。x <- c("howareyou","fine","thank") grep("e",x) #1 2 返回包含“e”的元素的下标 grep("e",x,invert = T) # 3 返回不包含“e”的元素的下标 grep("e",x,value = T) # "howareyou" "fine" 返回元素本身 grep("e",x,value = T, invert =T) #"thank" 返回不包含“e”的元素本身
grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
类似grep,但是返回逻辑向量,即是否包含pattern
grepl("e",x) #TRUE TRUE FALSE 返回逻辑值
3 粘合和分割字符串
paste (..., sep = " ", collapse = NULL)
paste0(..., collapse = NULL)
paste("a","b",sep="-") # [1] "a-b"
paste("x",1:4,sep="") #"x1" "x2" "x3" "x4"
x <- c("howareyouaaa","fine","thank")
y <- c("de","ta")
paste(x,y, sep = "-" ) #"howareyou-de" "fine-ta" "thank-de"
strsplit(x,split,fixed = FALSE, pelr =FALSE, useBytes = FALSE)
strsplit(c("a1,a2"),split = "")
#[[1]]
# [1] "a" "1" "," "a" "2"
strsplit(c("a1","a2"),split = "")
# [[1]]
#[1] "a" "1"
#[[2]]
#[1] "a" "2"
问题:在每行之前的
[]和
[[]]分别是什么意思?
4 大小写字母
toupper(x) | 大写转换 |
---|---|
tolower(x) | 小写转换 |
toupper(c("wo")) #返回"WO"
tolower("whNIL") #返回"whnil"