4 String Functions

String functions allow us to combine, pattern-match, and substitute character vectors. These functions are useful for detecting and recoding specific values.

4.1 Concatenate Strings

There are two concatenation functions we can use: paste() and paste0(). The former assumes you want to separate the concatenated elements with a space, whereas the latter will assume no separation.

paste('a', 'b')

## [1] "a b"

paste('a', 'b', sep = '-')

## [1] "a-b"

paste0('a', 'b')

## [1] "ab"

4.2 Subset Strings

In Excel, we can subset strings with LEFT(), MID(), and RIGHT(). In R, we can subset strings with substr()/substring(), which both act similarly as MID() from Excel.

x <- 'Albatross'

substr(x, 1, 4)

## [1] "Alba"

substring(x, 5) # Goes to the end by default

## [1] "tross"

4.3 Split Strings

We can split strings with the strsplit() function. The output is a list, where each list element is a character vector.

x <- c('This is a sentence.', 
       'This is another sentence.',
       'This is yet another sentence.')

x

## [1] "This is a sentence."           "This is another sentence."    
## [3] "This is yet another sentence."

# Split vector elements by space
my_split <- strsplit(x, split = ' ') 

# Output is a list
my_split

## [[1]]
## [1] "This"      "is"        "a"         "sentence."
## 
## [[2]]
## [1] "This"      "is"        "another"   "sentence."
## 
## [[3]]
## [1] "This"      "is"        "yet"       "another"   "sentence."

We can use do.call() and c() to combine these list elements into a single vector for a total of 13 elements. The function do.call() iteratively executes a function and c() (“combine”) combines elements into a vector.

do.call(c, my_split)

##  [1] "This"      "is"        "a"         "sentence." "This"      "is"       
##  [7] "another"   "sentence." "This"      "is"        "yet"       "another"  
## [13] "sentence."

4.4 Substitute Strings

We can make character substitutions with gsub().

x <- c('This is a sentence.', 
       'This is another sentence.',
       'This is yet another sentence.')

gsub('sentence', 'drink', x)

## [1] "This is a drink."           "This is another drink."    
## [3] "This is yet another drink."

4.5 Match String Patterns

We can pattern-match strings with grep() and grepl(). The former outputs the position (or value) of a pattern match, while the latter outputs a Boolean value (i.e. TRUE/FALSE).

# Cars that start with "M"
grep('^M', rownames(mtcars), value = TRUE)

##  [1] "Mazda RX4"     "Mazda RX4 Wag" "Merc 240D"     "Merc 230"     
##  [5] "Merc 280"      "Merc 280C"     "Merc 450SE"    "Merc 450SL"   
##  [9] "Merc 450SLC"   "Maserati Bora"

# Which cars start with and do not start with "M"?
grepl('^M', rownames(mtcars))

##  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

# Selecting columns that start with "m".
# We set drop = FALSE to maintain a data frame.
head(mtcars[, grep('^m', names(mtcars)), drop = FALSE])

##                    mpg
## Mazda RX4         21.0
## Mazda RX4 Wag     21.0
## Datsun 710        22.8
## Hornet 4 Drive    21.4
## Hornet Sportabout 18.7
## Valiant           18.1

Check out more regular expressions with RStudio’s cheat sheet on strings.

4.6 Summary

Table 4.1: Summary of String Functions
Function	Description	Example
paste(x, y)/paste0(x, y)	Concatenation of x and y.	paste(‘a’, ‘b’); paste0(‘a’, ‘b’)
substr(x, start, end)	Subset strings.	substr(‘Albatross’, 1, 4)
strsplit(x, split = ’ ’)	Split a string by a splitting character.	x <- c(‘This is a sentence.’, ‘This is another sentence.’, ‘This is yet another sentence.’) strsplit(x, split = ’ ’)
gsub(pattern, replacement, x)	Substitute a portion of a string vector based on a given pattern.	gsub(‘sentence’, ‘drink’, ‘This is a sentence.’)
grep/grepl(pattern, vector)	Pattern match a string and output its position OR Boolean (i.e. TRUE/FALSE).	grep(‘^M’, rownames(mtcars), value = TRUE)