strsplit(x, split, fixed=FALSE)
Split a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list, where each list item corresponds to an element of x that has been split. In the simplest case, x is a single character string, and strsplit outputs a one-item list.
- x – A character string or vector of character strings to split.
- split – The character string to split x. If the split is an empty string (""), then x is split between every character.
- fixed – If the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
Example. Several starter examples are shown below (note that a period is a stand in for "any character" in regular expressions), followed by a couple scenarios that are a little more practical. For instance, dates are split into year, month, and day, and names in the form Last, First are split at their comma.
x <- "Split the words in a sentence."
strsplit(x, " ")
[[1]]
[1] "Split" "the" "words" "in"
[5] "a" "sentence."
>
> x <- "Split at every character."
> strsplit(x, "")
[[1]]
[1] "S" "p" "l" "i" "t" " " "a" "t" " " "e" "v" "e" "r" "y"
[15] " " "c" "h" "a" "r" "a" "c" "t" "e" "r" "."
>
> x <- " Split at each space with a preceding character."
> strsplit(x, ". ")
[[1]]
[1] " Spli" "a" "eac" "spac"
[5] "wit" "" "precedin" "character."
>
> x <- "Do you wish you were Mr. Jones?"
> strsplit(x, ". ")
[[1]]
[1] "D" "yo" "wis" "yo" "wer" "Mr"
[7] "Jones?"
> strsplit(x, ". ", fixed=TRUE)
[[1]]
[1] "Do you wish you were Mr" "Jones?"
>
> #=====> Splitting Dates <=====#
> dates <- c("1999-05-23", "2001-12-30", "2004-12-17")
> temp <- strsplit(dates, "-")
> temp
[[1]]
[1] "1999" "05" "23"
[[2]]
[1] "2001" "12" "30"
[[3]]
[1] "2004" "12" "17"
> matrix(unlist(temp), ncol=3, byrow=TRUE)
[,1] [,2] [,3]
[1,] "1999" "05" "23"
[2,] "2001" "12" "30"
[3,] "2004" "12" "17"
>
> #=====> Cofounders of Google and Twitter <=====#
> Names <- c("Brin, Sergey", "Page, Larry",
+ "Dorsey, Jack", "Glass, Noah",
+ "Williams, Evan", "Stone, Biz")
> Cofounded <- rep(c("Google", "Twitter"), c(2,4))
> temp <- strsplit(Names, ", ")
> temp
[[1]]
[1] "Brin" "Sergey"
[[2]]
[1] "Page" "Larry"
[[3]]
[1] "Dorsey" "Jack"
[[4]]
[1] "Glass" "Noah"
[[5]]
[1] "Williams" "Evan"
[[6]]
[1] "Stone" "Biz"
> mat <- matrix(unlist(temp), ncol=2, byrow=TRUE)
> df <- as.data.frame(mat)
> df <- cbind(df, Cofounded)
> colnames(df) <- c("Last", "First", "Cofounded")
> df
Last First Cofounded
1 Brin Sergey Google
2 Page Larry Google
3 Dorsey Jack Twitter
4 Glass Noah Twitter
5 Williams Evan Twitter
6 Stone Biz Twitter</pre>