4 String Functions
String functions allow us to combine, pattern-match, and substitute character vectors. These functions are useful for detecting and recoding specific values.
4.1 Concatenate Strings
There are two concatenation functions we can use: paste()
and paste0()
. The former assumes you want to separate the concatenated elements with a space, whereas the latter will assume no separation.
paste('a', 'b')
## [1] "a b"
paste('a', 'b', sep = '-')
## [1] "a-b"
paste0('a', 'b')
## [1] "ab"
4.2 Subset Strings
In Excel, we can subset strings with LEFT()
, MID()
, and RIGHT()
. In R, we can subset strings with substr()
/substring()
, which both act similarly as MID()
from Excel.
<- 'Albatross'
x
substr(x, 1, 4)
## [1] "Alba"
substring(x, 5) # Goes to the end by default
## [1] "tross"
4.3 Split Strings
We can split strings with the strsplit()
function. The output is a list, where each list element is a character vector.
<- c('This is a sentence.',
x 'This is another sentence.',
'This is yet another sentence.')
x
## [1] "This is a sentence." "This is another sentence."
## [3] "This is yet another sentence."
# Split vector elements by space
<- strsplit(x, split = ' ')
my_split
# Output is a list
my_split
## [[1]]
## [1] "This" "is" "a" "sentence."
##
## [[2]]
## [1] "This" "is" "another" "sentence."
##
## [[3]]
## [1] "This" "is" "yet" "another" "sentence."
We can use do.call()
and c()
to combine these list elements into a single vector for a total of 13 elements. The function do.call()
iteratively executes a function and c()
(“combine”) combines elements into a vector.
do.call(c, my_split)
## [1] "This" "is" "a" "sentence." "This" "is"
## [7] "another" "sentence." "This" "is" "yet" "another"
## [13] "sentence."
4.4 Substitute Strings
We can make character substitutions with gsub()
.
<- c('This is a sentence.',
x 'This is another sentence.',
'This is yet another sentence.')
gsub('sentence', 'drink', x)
## [1] "This is a drink." "This is another drink."
## [3] "This is yet another drink."
4.5 Match String Patterns
We can pattern-match strings with grep()
and grepl()
. The former outputs the position (or value) of a pattern match, while the latter outputs a Boolean value (i.e. TRUE
/FALSE
).
# Cars that start with "M"
grep('^M', rownames(mtcars), value = TRUE)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Merc 240D" "Merc 230"
## [5] "Merc 280" "Merc 280C" "Merc 450SE" "Merc 450SL"
## [9] "Merc 450SLC" "Maserati Bora"
# Which cars start with and do not start with "M"?
grepl('^M', rownames(mtcars))
## [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# Selecting columns that start with "m".
# We set drop = FALSE to maintain a data frame.
head(mtcars[, grep('^m', names(mtcars)), drop = FALSE])
## mpg
## Mazda RX4 21.0
## Mazda RX4 Wag 21.0
## Datsun 710 22.8
## Hornet 4 Drive 21.4
## Hornet Sportabout 18.7
## Valiant 18.1
Check out more regular expressions with RStudio’s cheat sheet on strings.
4.6 Summary
Function | Description | Example |
---|---|---|
paste(x, y)/paste0(x, y) | Concatenation of x and y. | paste(‘a’, ‘b’); paste0(‘a’, ‘b’) |
substr(x, start, end) | Subset strings. | substr(‘Albatross’, 1, 4) |
strsplit(x, split = ’ ’) | Split a string by a splitting character. |
x <- c(‘This is a sentence.’, ‘This is another sentence.’, ‘This is yet another sentence.’) strsplit(x, split = ’ ’) |
gsub(pattern, replacement, x) | Substitute a portion of a string vector based on a given pattern. | gsub(‘sentence’, ‘drink’, ‘This is a sentence.’) |
grep/grepl(pattern, vector) | Pattern match a string and output its position OR Boolean (i.e. TRUE/FALSE). | grep(‘^M’, rownames(mtcars), value = TRUE) |