9 Functionals
Functionals are functions that take a function as an input and output a value. They are useful for casting a function over all columns in a dataset or elements in a list.
This chapter will demonstrate a select handful of functionals–see ?lapply for more information.
9.1 lapply()
The lapply() function (“list apply”) casts a function over an object, such as a dataset or list, and outputs a list. This function is useful when you want to iterate over disparate elements and output similarly disparate results.
Below is a simple example of calculating the means for each column in the mtcars dataset. Note how the elements are not stored in a one- nor two-dimensional format like a vector or data frame.
# Means for each column in mtcars.
lapply(mtcars, mean)## $mpg
## [1] 20.09062
##
## $cyl
## [1] 6.1875
##
## $disp
## [1] 230.7219
##
## $hp
## [1] 146.6875
##
## $drat
## [1] 3.596563
##
## $wt
## [1] 3.21725
##
## $qsec
## [1] 17.84875
##
## $vs
## [1] 0.4375
##
## $am
## [1] 0.40625
##
## $gear
## [1] 3.6875
##
## $carb
## [1] 2.8125
For a more complex example, we can use lapply() to estimate several models based on different subsets of the same dataset. First, we’ll use split() to divide mtcars into smaller subsets.
# Split mtcars by gear
# i.e. each subset is based on a different number of gears.
subset_list <- split(mtcars, mtcars$gear)
subset_list## $`3`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
##
## $`4`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
##
## $`5`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Then, we will execute lapply() over the subsets to estimate the same model. Unlike before in which we can simply input mean into lapply(), we have to tell R that we want to use a customized function by inputting function(x) instead–in other words, we have to pass in an anonymous function, which is a function that is not named beforehand.
# Estimate models
models <- lapply(subset_list, function(x) lm(mpg ~ wt + hp + disp, x))
models## $`3`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = x)
##
## Coefficients:
## (Intercept) wt hp disp
## 29.496821 -2.312668 -0.030449 0.002989
##
##
## $`4`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = x)
##
## Coefficients:
## (Intercept) wt hp disp
## 41.75225 -0.16968 -0.08850 -0.07198
##
##
## $`5`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = x)
##
## Coefficients:
## (Intercept) wt hp disp
## 42.47699 -7.99454 0.01085 -0.01073
Note that the x represents each dataset in subset_list. We can easily replace x with y and receive the same results. This is because x is merely a placeholder that represents each list element in subset_list. To demonstrate, below is the same as before but with y as the input for function().
# Same results as before
models <- lapply(subset_list, function(y) lm(mpg ~ wt + hp + disp, y))
models## $`3`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = y)
##
## Coefficients:
## (Intercept) wt hp disp
## 29.496821 -2.312668 -0.030449 0.002989
##
##
## $`4`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = y)
##
## Coefficients:
## (Intercept) wt hp disp
## 41.75225 -0.16968 -0.08850 -0.07198
##
##
## $`5`
##
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = y)
##
## Coefficients:
## (Intercept) wt hp disp
## 42.47699 -7.99454 0.01085 -0.01073
9.2 sapply()
The sapply() function (“simplified apply”) casts a function over a dataset and outputs a vector or matrix (or list, depending on the function). This function can be useful when you want a “cleaner” representation of the results (i.e. results in a vector or matrix format).
sapply(mtcars, mean)## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
As with lapply(), we can estimate several models iteratively with sapply(); however, the difference is that we can store coefficients in a matrix with the latter.
# Split mtcars by gear
# i.e. each subset is based on a different number of gears.
subset_list <- split(mtcars, mtcars$gear)
coefs <- sapply(subset_list, function(x) coef(lm(mpg ~ wt + hp + disp, x)))
coefs## 3 4 5
## (Intercept) 29.496821154 41.75225248 42.47698779
## wt -2.312668246 -0.16968386 -7.99453576
## hp -0.030449156 -0.08849952 0.01085093
## disp 0.002988812 -0.07197566 -0.01073200
9.3 apply()
The apply() function can cast a function over a dataset row-wise or column-wise, returning a vector or matrix. This function is useful when you want to apply a function over a specific dimension.
Below is an example of using this function to apply a function row-wise on the mtcars dataset.
# Row-wise means.
# Show only a few with head().
head(apply(mtcars, 1, mean))## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## 29.90727 29.98136 23.59818 38.73955
## Hornet Sportabout Valiant
## 53.66455 35.04909
Again, but applying a function column-wise. Note that the results are similar to sapply(mtcars, mean).
# Column-wise means.
apply(mtcars, 2, mean)## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
9.4 vapply()
The vapply() function (“vectorized apply”) works similarly as sapply(); however, there is a type-checking component to it. In other words, one can set whether the output should be numeric or character, for example, beforehand. If the output does not match the set type, an error will occur. This function is useful for type-checking your results (i.e., making sure the output matches your expectations).
# Mean of all mtcars columns
# Type-check whether it is a numeric vector.
vapply(mtcars, mean, numeric(1))## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
Below is an example when vapply() throws an error due to an unexpected output type.
# Mean of all mtcars columns
# Type-check whether it is a character vector.
vapply(mtcars, mean, character(1))## Error in vapply(mtcars, mean, character(1)): values must be type 'character',
## but FUN(X[[1]]) result is type 'double'
9.5 mapply()/Map()
The functions mapply() and Map() allow us to compute a function iteratively over one or more data inputs. These functions are useful when we want to iterate over multiple datasets in a pairwise fashion.
9.5.1 Univariate Case
In the univariate case, mapply()/Map() work similarly as sapply()/lapply().
mapply(mean, mtcars)## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
head(Map(mean, mtcars)) # Just show a few.## $mpg
## [1] 20.09062
##
## $cyl
## [1] 6.1875
##
## $disp
## [1] 230.7219
##
## $hp
## [1] 146.6875
##
## $drat
## [1] 3.596563
##
## $wt
## [1] 3.21725
9.5.2 Multivariate Case
In the multivariate case, we can have multiple data inputs.
# Row bind mpg and wt from mtcars.
# Output = matrix
# Show only a few columns.
mapply(rbind, mtcars$mpg, mtcars$wt)[, 1:5]## [,1] [,2] [,3] [,4] [,5]
## [1,] 21.00 21.000 22.80 21.400 18.70
## [2,] 2.62 2.875 2.32 3.215 3.44
# Row bind mpg and wt from mtcars.
# Output = list.
# Show only a few rows.
head(Map(rbind, mtcars$mpg, mtcars$wt))## [[1]]
## [,1]
## [1,] 21.00
## [2,] 2.62
##
## [[2]]
## [,1]
## [1,] 21.000
## [2,] 2.875
##
## [[3]]
## [,1]
## [1,] 22.80
## [2,] 2.32
##
## [[4]]
## [,1]
## [1,] 21.400
## [2,] 3.215
##
## [[5]]
## [,1]
## [1,] 18.70
## [2,] 3.44
##
## [[6]]
## [,1]
## [1,] 18.10
## [2,] 3.46
9.6 rapply()
The rapply() function allows one to iterate over a list of datasets recursively. In effect, this function is useful when we want to execute a function over elements nested within other elements. For example, it allows us to compute the means for all columns in several datasets stored in a list simultaneously.
my_list <- list(mtcars, airquality, iris)
rapply(my_list, # For this list...
# Get all means...
mean,
# Remove missing values...
na.rm = TRUE,
# Calculate only for numeric columns
classes = 'numeric') ## mpg cyl disp hp drat wt
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
## qsec vs am gear carb Wind
## 17.848750 0.437500 0.406250 3.687500 2.812500 9.957516
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.843333 3.057333 3.758000 1.199333
9.7 tapply()
The function tapply() makes group-wise computations, outputting a vector as a result. The output being a vector can be useful when passing to other functions, such as barplot(). As such, you may want to use tapply() when (1) you want your grouped-computation output to be a vector of values and (2) you want to interact the output values with another function.
# Let's use iris, a pre-loaded dataset in R.
means <- with(iris, tapply(Sepal.Length, Species, mean))
means## setosa versicolor virginica
## 5.006 5.936 6.588
barplot(means, col = 'cyan4', ylab = 'Mean Sepal Length', xlab = 'Species')
9.8 aggregate()
Similar to tapply(), the function aggregate() allows you to make group-wise calculations; however, the output is a data frame rather than a vector. Additionally, you can input multiple independent variables (i.e. variables on the right-hand side of the formula syntax, y ~ x). This function may be preferred over tapply() when (1) you want multiple grouping variables and (2) you want your output to be in a 2-dimensional format.
# Get the mean MPG by gear and am.
my_agg <- aggregate(mpg ~ gear + am, mtcars, mean)
my_agg## gear am mpg
## 1 3 0 16.10667
## 2 4 0 21.05000
## 3 4 1 26.27500
## 4 5 1 21.38000
9.9 Summary
| Function | Description | Example |
|---|---|---|
| lapply(X, FUN) | Compute a function over data and output a list. | lapply(mtcars, mean) |
| sapply(X, FUN) | Compute a function over data and output a matrix (sometimes a list, depending on the function being passed). | sapply(mtcars, mean) |
| apply(X, MARGIN, FUN) | Compute a function row-wise or column-wise. | apply(mtcars, 1, mean); apply(mtcars, 2, mean) |
| vapply(X, FUN, FUN.VALUE) | Compute a function over data and check if the output matches a pre-specified type. | vapply(mtcars, mean, numeric(1)) |
| mapply(FUN, …) | Compute a function over one or more data inputs and output an array (vector or matrix). | mapply(rbind, mtcars$mpg, mtcars$wt) |
| Map(f, …) | Compute a function over one or more data inputs and output a list. | Map(rbind, mtcars$mpg, mtcars$wt) |
| rapply(object, f, classes) | Recursively compute a function over data and output a vector or list. | rapply(iris, mean, classes = “numeric”) |
| tapply(X, INDEX, FUN) | Generate grouped computations and output a vector. | with(iris, tapply(Sepal.Length, Species, mean)) |
| aggregate(formula, data, FUN) | Generate grouped computations and output a data frame. | aggregate(mpg ~ gear, mtcars, mean) |