13 Recommended R Libraries

The following is a list of recommended R libraries to install–they can be helpful for data management, graphing, and formatting.

13.1 `tidyverse`

The tidyverse package is a metapackage consisting of other libraries. The most useful ones for a beginner, I believe, are ggplot2, dplyr, tidyr, and purrr.

For more information, see the tidyverse website.

install.packages('tidyverse')

library(tidyverse)

13.1.1 `ggplot2`

The library ggplot2 offers visualization tools with a modern aesthetic. The following is an example of a small-multiples³ scatter plot. For more information, see the ggplot2 website.

ggplot(mtcars) + 
  aes(y = mpg, x = wt, col = factor(am), size = hp) + 
  geom_point(alpha = 0.5) + 
  labs(y     = 'MPG',
       x     = 'Weight',
       col   = 'Transmission',
       size  = 'Horsepower',
       title = 'MPG vs. Weight') +
  facet_wrap(~ gear) + 
  theme_light() + 
  theme(panel.grid.minor   = element_blank(),
        panel.grid.major.x = element_blank())

13.1.2 `dplyr`

The dplyr library provides aggregation tools for data management. The following is an example of calculating the mean and median MPG by gear.

For more information, see the dplyr website.

my_agg <- mtcars %>%
  select(mpg, gear) %>%
  group_by(gear) %>%
  summarise(mean_mpg = mean(mpg),
            median_mpg = median(mpg))

my_agg

## # A tibble: 3 x 3
##    gear mean_mpg median_mpg
##   <dbl>    <dbl>      <dbl>
## 1     3     16.1       15.5
## 2     4     24.5       22.8
## 3     5     21.4       19.7

13.1.3 `tidyr`

The tidyr library provides pivoting tools to reshape your dataset. The following are examples of how to reformat an aggregation from dplyr’s functions.

For more information, see the tidyr website.

# Aggregation
my_agg <- mtcars %>%
  select(mpg, gear, am) %>%
  group_by(gear, am) %>%
  summarise(mean_mpg = mean(mpg))

## `summarise()` has grouped output by 'gear'. You can override using the `.groups` argument.

# Pivot wide
my_agg2 <- my_agg %>%
  pivot_wider(id_cols     = gear, # rows
              names_from  = am, # columns
              values_from = mean_mpg) # values

my_agg2

## # A tibble: 3 x 3
## # Groups:   gear [3]
##    gear   `0`   `1`
##   <dbl> <dbl> <dbl>
## 1     3  16.1  NA  
## 2     4  21.0  26.3
## 3     5  NA    21.4

# Pivot long
my_agg2 %>%
  pivot_longer(2:3,
               names_to  = 'am',
               values_to = 'mpg',
               values_drop_na = TRUE) # drop NA values

## # A tibble: 4 x 3
## # Groups:   gear [3]
##    gear am      mpg
##   <dbl> <chr> <dbl>
## 1     3 0      16.1
## 2     4 0      21.0
## 3     4 1      26.3
## 4     5 1      21.4

13.1.4 `purrr`

The purrr library offers functionals similar to the *apply() functions (the former’s map() operates similarly as the latter’s lapply()); however, the former contains functions that maintain type consistency. For example, there is a function called map_dbl() that throws an error if the output is not a double vector (i.e., a numeric vector), which is useful when you want to catch your program’s errors.

The following are some examples from purrr. For more information on how to use these and other functions within the library, see the purrr website.

map(mtcars, mean) # == lapply(mtcars, mean)

## $mpg
## [1] 20.09062
## 
## $cyl
## [1] 6.1875
## 
## $disp
## [1] 230.7219
## 
## $hp
## [1] 146.6875
## 
## $drat
## [1] 3.596563
## 
## $wt
## [1] 3.21725
## 
## $qsec
## [1] 17.84875
## 
## $vs
## [1] 0.4375
## 
## $am
## [1] 0.40625
## 
## $gear
## [1] 3.6875
## 
## $carb
## [1] 2.8125

map_dbl(mtcars, mean) # == sapply(mtcars, mean)

##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500

map_df(mtcars, mean) # Maintains data frame class.

## # A tibble: 1 x 11
##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  20.1  6.19  231.  147.  3.60  3.22  17.8 0.438 0.406  3.69  2.81

Below is an example of combining purrr, dplyr and ggplot2 to compare the scaled coefficients from several models using different subsets of mtcars.

# Split the dataset and scale the coefficients.
scaled <- mtcars %>%
  split(.$gear) %>%
  map(~ as.data.frame(scale(.x)))

map(scaled, head) # see 1st few rows for each subset.

## $`3`
##                           mpg        cyl       disp          hp        drat
## Hornet 4 Drive     1.56996818 -1.2353648 -0.7200636 -1.38675494 -0.19244960
## Hornet Sportabout  0.76916577  0.4492236  0.3552876 -0.02376495  0.06333784
## Valiant            0.59120968 -1.2353648 -1.0679713 -1.49160032 -1.36176363
## Duster 360        -0.53584556  0.4492236  0.3552876  1.44407041  0.28258422
## Merc 450SE         0.08700076  0.4492236 -0.5324043  0.08108043 -0.22899066
## Merc 450SL         0.35393489  0.4492236 -0.5324043  0.08108043 -0.22899066
##                           wt        qsec         vs  am gear       carb
## Hornet 4 Drive    -0.8134523  1.29489500  1.9321836 NaN  NaN -1.4182716
## Hornet Sportabout -0.5433420 -0.49780861 -0.4830459 NaN  NaN -0.5673086
## Valiant           -0.5193322  1.87270856  1.9321836 NaN  NaN -1.4182716
## Duster 360        -0.3872782 -1.37193681 -0.4830459 NaN  NaN  1.1346173
## Merc 450SE         0.2129670 -0.21630969 -0.4830459 NaN  NaN  0.2836543
## Merc 450SL        -0.1951998 -0.06815237 -0.4830459 NaN  NaN  0.2836543
## 
## $`4`
##                       mpg        cyl       disp         hp       drat
## Mazda RX4     -0.66960225  1.3540064  0.9505021  0.7917156 -0.4588273
## Mazda RX4 Wag -0.66960225  1.3540064  0.9505021  0.7917156 -0.4588273
## Datsun 710    -0.32848412 -0.6770032 -0.3859407  0.1351710 -0.6188834
## Merc 240D     -0.02526801 -0.6770032  0.6086811 -1.0620575 -1.1310627
## Merc 230      -0.32848412 -0.6770032  0.4570463  0.2124115 -0.3948049
## Merc 280      -1.01072038  1.3540064  1.1458283  1.2937791 -0.3948049
##                         wt       qsec         vs         am gear       carb
## Mazda RX4      0.005268687 -1.5521599 -2.1408721  0.6770032  NaN  1.2794158
## Mazda RX4 Wag  0.408323233 -1.2051700 -2.1408721  0.6770032  NaN  1.2794158
## Datsun 710    -0.468913132 -0.2199668  0.4281744  0.6770032  NaN -1.0235326
## Merc 240D      0.906214143  0.6413116  0.4281744 -1.3540064  NaN -0.2558832
## Merc 230       0.842989901  2.4382232  0.4281744 -1.3540064  NaN -0.2558832
## Merc 280       1.301365659 -0.4120504  0.4281744 -1.3540064  NaN  1.2794158
## 
## $`5`
##                       mpg cyl       disp         hp       drat         wt
## Porsche 914-2   0.6938001  -1 -0.7115728 -1.0171748  1.3195547 -0.6015200
## Lotus Europa    1.3545620  -1 -0.9297723 -0.8032375 -0.3748151 -1.3671575
## Ford Pantera L -0.8379663   1  1.2859917  0.6651507  0.7804370  0.6562258
## Ferrari Dino   -0.2522909   0 -0.4977027 -0.2003231 -0.7598992  0.1677809
## Maserati Bora  -0.9581049   1  0.8530562  1.3555848 -0.9652774  1.1446708
##                      qsec         vs  am gear      carb
## Porsche 914-2   0.9376493 -0.4472136 NaN  NaN -0.920358
## Lotus Europa    1.1145643  1.7888544 NaN  NaN -0.920358
## Ford Pantera L -1.0084153 -0.4472136 NaN  NaN -0.153393
## Ferrari Dino   -0.1238405 -0.4472136 NaN  NaN  0.613572
## Maserati Bora  -0.9199578 -0.4472136 NaN  NaN  1.380537

# Obtain the coefficients
coefs <- scaled %>%
  map(~ lm(mpg ~ wt + hp + disp, .x)) %>%
  map(~ coef(summary(.x))) 

coefs

## $`3`
##                  Estimate Std. Error       t value  Pr(>|t|)
## (Intercept)  2.525720e-16  0.1587713  1.590791e-15 1.0000000
## wt          -5.713684e-01  0.3115032 -1.834230e+00 0.0937885
## hp          -4.306828e-01  0.2493851 -1.726979e+00 0.1121072
## disp         8.408336e-02  0.3589857  2.342248e-01 0.8191129
## 
## $`4`
##                  Estimate Std. Error       t value   Pr(>|t|)
## (Intercept)  2.970767e-16  0.1187895  2.500866e-15 1.00000000
## wt          -2.034460e-02  0.3011412 -6.755835e-02 0.94779518
## hp          -4.342680e-01  0.2117560 -2.050795e+00 0.07441627
## disp        -5.307267e-01  0.3487121 -1.521962e+00 0.16651485
## 
## $`5`
##                  Estimate Std. Error       t value  Pr(>|t|)
## (Intercept)  3.863715e-17  0.1057642  3.653139e-16 1.0000000
## wt          -9.831730e-01  0.3021465 -3.253961e+00 0.1898125
## hp           1.675696e-01  0.3475996  4.820766e-01 0.7140264
## disp        -1.861314e-01  0.2721580 -6.839094e-01 0.6181268

# Create columns for the variables and 
# indicators for the subsets.
coefs2 <- coefs %>%
  map(as.data.frame) %>%
  map(~ mutate(.x, Term = rownames(.x))) %>%
  map2_df(3:5, ~ mutate(.x, Gear = .y)) %>%
  select(Gear, Term, everything())

coefs2

##                 Gear        Term      Estimate Std. Error       t value
## (Intercept)...1    3 (Intercept)  2.525720e-16  0.1587713  1.590791e-15
## wt...2             3          wt -5.713684e-01  0.3115032 -1.834230e+00
## hp...3             3          hp -4.306828e-01  0.2493851 -1.726979e+00
## disp...4           3        disp  8.408336e-02  0.3589857  2.342248e-01
## (Intercept)...5    4 (Intercept)  2.970767e-16  0.1187895  2.500866e-15
## wt...6             4          wt -2.034460e-02  0.3011412 -6.755835e-02
## hp...7             4          hp -4.342680e-01  0.2117560 -2.050795e+00
## disp...8           4        disp -5.307267e-01  0.3487121 -1.521962e+00
## (Intercept)...9    5 (Intercept)  3.863715e-17  0.1057642  3.653139e-16
## wt...10            5          wt -9.831730e-01  0.3021465 -3.253961e+00
## hp...11            5          hp  1.675696e-01  0.3475996  4.820766e-01
## disp...12          5        disp -1.861314e-01  0.2721580 -6.839094e-01
##                   Pr(>|t|)
## (Intercept)...1 1.00000000
## wt...2          0.09378850
## hp...3          0.11210723
## disp...4        0.81911285
## (Intercept)...5 1.00000000
## wt...6          0.94779518
## hp...7          0.07441627
## disp...8        0.16651485
## (Intercept)...9 1.00000000
## wt...10         0.18981249
## hp...11         0.71402636
## disp...12       0.61812678

# Plot the scaled coefficients.
## Set the canvas.
ggplot(coefs2) + 
## Set the aesthetics.  
  aes(y = Estimate,
      x = Term,
      fill = factor(Gear)) + # Different bars for different gears.
## Create the bars.  
  geom_col(position = 'dodge', # Set bars beside each other.
           alpha = 0.8) + # Set transparency.
## Color the bars.
   scale_fill_brewer(palette = 'Blues') +
## Relabel y-axis and other labels.  
  labs(y = 'Coefficient (Scaled)',
       fill = 'Number of Gears',
       title = 'Scaled Coefficient Comparisons by Number of Gears') + 
## Customize the background.  
  theme_light() + 
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

13.2 `knitr`

The knitr library is an “engine for dynamic report generation,” which allows for better formatted tables and documentation capabilities when using R Markdown.⁴ The following example demonstrates kable() to format a table.

install.packages('knitr')

library(knitr)

my_table <- with(mtcars, table(gear, am))

kable(my_table)

	0	1
3	15	0
4	4	8
5	0	5

13.3 `stargazer`

The stargazer library allows one to format a regression model to be closer to journal-quality guidelines.

For more information, see its documentation on CRAN.

install.packages('stargazer')

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

my_ols <- lm(mpg ~ wt + hp + disp + gear + am, mtcars)

If you are using RGui or R Studio and not R Markdown, I recommend to set type = 'text' so that only textual output will be produced instead of LaTeX or HTML code.

# If NOT using R Markdown...
stargazer(my_ols, type = 'text')

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 mpg            
## -----------------------------------------------
## wt                           -3.113**          
##                               (1.179)          
##                                                
## hp                           -0.043***         
##                               (0.014)          
##                                                
## disp                           0.005           
##                               (0.012)          
##                                                
## gear                           0.652           
##                               (1.212)          
##                                                
## am                             1.605           
##                               (1.782)          
##                                                
## Constant                     32.108***         
##                               (4.844)          
##                                                
## -----------------------------------------------
## Observations                    32             
## R2                             0.842           
## Adjusted R2                    0.812           
## Residual Std. Error       2.616 (df = 26)      
## F Statistic           27.709*** (df = 5; 26)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

If you happen to use R Markdown, then set type = 'html' for HTML documents and omit type for PDF documents.

# If using RMarkdown...
stargazer(my_ols, 
          type = 'html', # for html documents.
          # Set asterisks for each p-value level.
          notes.append = FALSE,
          notes = c("<sup>&sstarf;</sup>p<0.1; <sup>&sstarf;&sstarf;</sup>p<0.05; <sup>&sstarf;&sstarf;&sstarf;</sup>p<0.01"))


	Dependent variable:

	mpg

wt	-3.113^**
	(1.179)

hp	-0.043^***
	(0.014)

disp	0.005
	(0.012)

gear	0.652
	(1.212)

am	1.605
	(1.782)

Constant	32.108^***
	(4.844)


Observations	32
R²	0.842
Adjusted R²	0.812
Residual Std. Error	2.616 (df = 26)
F Statistic	27.709^*** (df = 5; 26)

Note:	^⋆p<0.1; ^⋆⋆p<0.05; ^⋆⋆⋆p<0.01

#stargazer(my_ols) # for PDF documents.

We can also input several models into stargazer().

mtcars %>%
  split(.$gear) %>%
  map(~ lm(mpg ~ wt + hp + disp, .x)) %>%
  stargazer(type = 'html', 
            column.labels = paste(3:5, 'Gears'),
            notes.append = FALSE,
            notes = c("<sup>&sstarf;</sup>p<0.1; <sup>&sstarf;&sstarf;</sup>p<0.05; <sup>&sstarf;&sstarf;&sstarf;</sup>p<0.01"))


	Dependent variable:

	mpg
	3 Gears	4 Gears	5 Gears
	(1)	(2)	(3)

wt	-2.313^*	-0.170	-7.995
	(1.261)	(2.512)	(2.457)

hp	-0.030	-0.088^*	0.011
	(0.018)	(0.043)	(0.023)

disp	0.003	-0.072	-0.011
	(0.013)	(0.047)	(0.016)

Constant	29.497^***	41.752^***	42.477^*
	(2.838)	(3.117)	(3.377)


Observations	15	12	5
R²	0.703	0.877	0.986
Adjusted R²	0.622	0.831	0.944
Residual Std. Error	2.073 (df = 11)	2.171 (df = 8)	1.575 (df = 1)
F Statistic	8.675^*** (df = 3; 11)	18.987^*** (df = 3; 8)	23.506 (df = 3; 1)

Note:	^⋆p<0.1; ^⋆⋆p<0.05; ^⋆⋆⋆p<0.01

For more on R Markdown, see the R Markdown book by Yihui Xie, J. J. Allaire, and Garrett Grolemund.

13.4 Summary

Table 13.1: Summary of Recommended Libraries
Library	Function	Description	Example
ggplot2	ggplot(data) + aes(y, x, …) + geom_point()	Scatter plot with ggplot2.	ggplot(mtcars) + aes(y = mpg, x = wt, col = factor(am), size = hp) + geom_point(alpha = 0.5)
dplyr	select(data, …), group_by(data, data…), summarise(data, …)	Select, group by, and summarise data.	mtcars %>% select(mpg, gear) %>% group_by(gear) %>% summarise(mean_mpg = mean(mpg), median_mpg = median(mpg))
tidyr	pivot_wider(data, …), pivot_longer(data, …)	Pivot data long or wide.	my_agg <- mtcars %>% select(mpg, gear, am) %>% group_by(gear, am) %>% summarise(mean_mpg = mean(mpg)) my_agg2 <- my_agg %>% pivot_wider(id_cols = gear, names_from = am, values_from = mean_mpg)
purrr	map(.x, .f)	Apply a function over a data’s elements iteratively.	map(mtcars, mean)
knitr	kable(x)	Format a table.	my_table <- with(mtcars, table(gear, am)) kable(my_table)
stargazer	stargazer(x)	Format a regression.	my_ols <- lm(mpg ~ wt + hp + disp + gear + am, mtcars) stargazer(my_ols, type = ‘text’)

13 Recommended R Libraries

13.1 tidyverse

13.1.1 ggplot2

13.1.2 dplyr

13.1.3 tidyr

13.1.4 purrr

13.2 knitr

13.3 stargazer