Using across function in R | dplyr package
AGRON Info-Tech
July 13, 2021
Introduction
The across()
function in dplyr package allows you to utilize select()
semantics within “data-masking” methods like summarise()
and modify()
to apply the same modification to several columns. By restricting your options, the dplyr package makes these processes quick and straightforward, and it helps you think about your data manipulation issues.
The function across() supercedes the other scoped variants such as summarise_at(), summarise_if() and summarise_all().
Let’s look at some examples of how to apply the across() function and how it may be used to change data.
Load the package dplyr by using library() or require() function.
library(dplyr)
We shall use the iris
data set to apply across() and to modify data set according to the requirements. The head() function will print the first six rows of the data set.
data("iris")
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
Data modification across multiple columns
The .cols
parameter specifies the columns to be changed, while the .fns
argument specifies the functions to be applied to each of the selected columns. For example if we want to round the values for sepal length and sepal width variables then the following code will fulfill this requirement.
%>%
iris mutate(
across(.cols = c(Sepal.Length, Sepal.Width),
.fns = round)
)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5 4 1.4 0.2 setosa
# 2 5 3 1.4 0.2 setosa
# 3 5 3 1.3 0.2 setosa
# 4 5 3 1.5 0.2 setosa
# 5 5 4 1.4 0.2 setosa
# 6 5 4 1.7 0.4 setosa
# 7 5 3 1.4 0.3 setosa
# 8 5 3 1.5 0.2 setosa
# 9 4 3 1.4 0.2 setosa
# 10 5 3 1.5 0.1 setosa
# 11 5 4 1.5 0.2 setosa
# 12 5 3 1.6 0.2 setosa
# 13 5 3 1.4 0.1 setosa
# 14 4 3 1.1 0.1 setosa
# 15 6 4 1.2 0.2 setosa
# 16 6 4 1.5 0.4 setosa
# 17 5 4 1.3 0.4 setosa
# 18 5 4 1.4 0.3 setosa
# 19 6 4 1.7 0.3 setosa
# 20 5 4 1.5 0.3 setosa
# 21 5 3 1.7 0.2 setosa
# 22 5 4 1.5 0.4 setosa
# 23 5 4 1.0 0.2 setosa
# 24 5 3 1.7 0.5 setosa
# 25 5 3 1.9 0.2 setosa
# 26 5 3 1.6 0.2 setosa
# 27 5 3 1.6 0.4 setosa
# 28 5 4 1.5 0.2 setosa
# 29 5 3 1.4 0.2 setosa
# 30 5 3 1.6 0.2 setosa
# 31 5 3 1.6 0.2 setosa
# 32 5 3 1.5 0.4 setosa
# 33 5 4 1.5 0.1 setosa
# 34 6 4 1.4 0.2 setosa
# 35 5 3 1.5 0.2 setosa
# 36 5 3 1.2 0.2 setosa
# 37 6 4 1.3 0.2 setosa
# 38 5 4 1.4 0.1 setosa
# 39 4 3 1.3 0.2 setosa
# 40 5 3 1.5 0.2 setosa
# 41 5 4 1.3 0.3 setosa
# 42 4 2 1.3 0.3 setosa
# 43 4 3 1.3 0.2 setosa
# 44 5 4 1.6 0.6 setosa
# 45 5 4 1.9 0.4 setosa
# 46 5 3 1.4 0.3 setosa
# 47 5 4 1.6 0.2 setosa
# 48 5 3 1.4 0.2 setosa
# 49 5 4 1.5 0.2 setosa
# 50 5 3 1.4 0.2 setosa
# 51 7 3 4.7 1.4 versicolor
# 52 6 3 4.5 1.5 versicolor
# 53 7 3 4.9 1.5 versicolor
# 54 6 2 4.0 1.3 versicolor
# 55 6 3 4.6 1.5 versicolor
# 56 6 3 4.5 1.3 versicolor
# 57 6 3 4.7 1.6 versicolor
# 58 5 2 3.3 1.0 versicolor
# 59 7 3 4.6 1.3 versicolor
# 60 5 3 3.9 1.4 versicolor
# 61 5 2 3.5 1.0 versicolor
# 62 6 3 4.2 1.5 versicolor
# 63 6 2 4.0 1.0 versicolor
# 64 6 3 4.7 1.4 versicolor
# 65 6 3 3.6 1.3 versicolor
# 66 7 3 4.4 1.4 versicolor
# 67 6 3 4.5 1.5 versicolor
# 68 6 3 4.1 1.0 versicolor
# 69 6 2 4.5 1.5 versicolor
# 70 6 2 3.9 1.1 versicolor
# 71 6 3 4.8 1.8 versicolor
# 72 6 3 4.0 1.3 versicolor
# 73 6 2 4.9 1.5 versicolor
# 74 6 3 4.7 1.2 versicolor
# 75 6 3 4.3 1.3 versicolor
# 76 7 3 4.4 1.4 versicolor
# 77 7 3 4.8 1.4 versicolor
# 78 7 3 5.0 1.7 versicolor
# 79 6 3 4.5 1.5 versicolor
# 80 6 3 3.5 1.0 versicolor
# 81 6 2 3.8 1.1 versicolor
# 82 6 2 3.7 1.0 versicolor
# 83 6 3 3.9 1.2 versicolor
# 84 6 3 5.1 1.6 versicolor
# 85 5 3 4.5 1.5 versicolor
# 86 6 3 4.5 1.6 versicolor
# 87 7 3 4.7 1.5 versicolor
# 88 6 2 4.4 1.3 versicolor
# 89 6 3 4.1 1.3 versicolor
# 90 6 2 4.0 1.3 versicolor
# 91 6 3 4.4 1.2 versicolor
# 92 6 3 4.6 1.4 versicolor
# 93 6 3 4.0 1.2 versicolor
# 94 5 2 3.3 1.0 versicolor
# 95 6 3 4.2 1.3 versicolor
# 96 6 3 4.2 1.2 versicolor
# 97 6 3 4.2 1.3 versicolor
# 98 6 3 4.3 1.3 versicolor
# 99 5 2 3.0 1.1 versicolor
# 100 6 3 4.1 1.3 versicolor
# 101 6 3 6.0 2.5 virginica
# 102 6 3 5.1 1.9 virginica
# 103 7 3 5.9 2.1 virginica
# 104 6 3 5.6 1.8 virginica
# 105 6 3 5.8 2.2 virginica
# 106 8 3 6.6 2.1 virginica
# 107 5 2 4.5 1.7 virginica
# 108 7 3 6.3 1.8 virginica
# 109 7 2 5.8 1.8 virginica
# 110 7 4 6.1 2.5 virginica
# 111 6 3 5.1 2.0 virginica
# 112 6 3 5.3 1.9 virginica
# 113 7 3 5.5 2.1 virginica
# 114 6 2 5.0 2.0 virginica
# 115 6 3 5.1 2.4 virginica
# 116 6 3 5.3 2.3 virginica
# 117 6 3 5.5 1.8 virginica
# 118 8 4 6.7 2.2 virginica
# 119 8 3 6.9 2.3 virginica
# 120 6 2 5.0 1.5 virginica
# 121 7 3 5.7 2.3 virginica
# 122 6 3 4.9 2.0 virginica
# 123 8 3 6.7 2.0 virginica
# 124 6 3 4.9 1.8 virginica
# 125 7 3 5.7 2.1 virginica
# 126 7 3 6.0 1.8 virginica
# 127 6 3 4.8 1.8 virginica
# 128 6 3 4.9 1.8 virginica
# 129 6 3 5.6 2.1 virginica
# 130 7 3 5.8 1.6 virginica
# 131 7 3 6.1 1.9 virginica
# 132 8 4 6.4 2.0 virginica
# 133 6 3 5.6 2.2 virginica
# 134 6 3 5.1 1.5 virginica
# 135 6 3 5.6 1.4 virginica
# 136 8 3 6.1 2.3 virginica
# 137 6 3 5.6 2.4 virginica
# 138 6 3 5.5 1.8 virginica
# 139 6 3 4.8 1.8 virginica
# 140 7 3 5.4 2.1 virginica
# 141 7 3 5.6 2.4 virginica
# 142 7 3 5.1 2.3 virginica
# 143 6 3 5.1 1.9 virginica
# 144 7 3 5.9 2.3 virginica
# 145 7 3 5.7 2.5 virginica
# 146 7 3 5.2 2.3 virginica
# 147 6 2 5.0 1.9 virginica
# 148 6 3 5.2 2.0 virginica
# 149 6 3 5.4 2.3 virginica
# 150 6 3 5.1 1.8 virginica
A purrr-style formula
Take the iris data and then group it by species using group_by()
function and then summarise across the columns that starts with the name “Sepal” to get the mean values for each species.
%>%
iris group_by(Species) %>%
summarise(
across(
.cols = starts_with("Sepal"),
.fns = ~ mean(.x, na.rm = TRUE)
) )
# # A tibble: 3 x 3
# Species Sepal.Length Sepal.Width
# <fct> <dbl> <dbl>
# 1 setosa 5.01 3.43
# 2 versicolor 5.94 2.77
# 3 virginica 6.59 2.97
Named list of functions
We can list the functions to get the summary statistics of the selected variables in the data set. For example we can take the variable Species as grouping variable and then summarise across the variable Sepal.Length
to get some basic statistical measures as shown below:
library(plotrix)
%>%
iris group_by(Species) %>%
summarise(
across(
.cols = Sepal.Length,
.fns = list(mean = mean,
sd = sd,
var = var,
se = std.error,
n = length)
) )
# # A tibble: 3 x 6
# Species Sepal.Length_mean Sepal.Length_sd Sepal.Length_var Sepal.Length_se
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 0.352 0.124 0.0498
# 2 versicolor 5.94 0.516 0.266 0.0730
# 3 virginica 6.59 0.636 0.404 0.0899
# # ... with 1 more variable: Sepal.Length_n <int>
The summarise_each()
method is another approach to achieve the same result. Using the select()
function, we may select the variables for which we want to generate statistical measures. This will generate a data set with the variables we’ve chosen. Then, using summarise_each()
function, we can take the grouping variable and retrieve the same statistical measures for each selected variable.
%>%
iris select(Species, Sepal.Length, Petal.Length) %>%
group_by(Species) %>%
summarise_each(
funs(mean = mean,
sd = sd,
var = var,
se = std.error,
n = length)
)
# # A tibble: 3 x 11
# Species Sepal.Length_mean Petal.Length_mean Sepal.Length_sd Petal.Length_sd
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 1.46 0.352 0.174
# 2 versicolor 5.94 4.26 0.516 0.470
# 3 virginica 6.59 5.55 0.636 0.552
# # ... with 6 more variables: Sepal.Length_var <dbl>, Petal.Length_var <dbl>,
# # Sepal.Length_se <dbl>, Petal.Length_se <dbl>, Sepal.Length_n <int>,
# # Petal.Length_n <int>
Filtering the output
We can filter the data set to choose specific range of values by using filter()
function. Then apply filter()
function and within this function you can apply if_all()
function to the columns that ends_with() length. The variables Sepal.Length and Petal.Length will be filtered. The .fns = ~. > 4
parameter may be used, since I want to retrieve the values from length variables that are greater than four.
%>%
iris filter(
if_all(.cols = ends_with("Length"),
.fns = ~ . > 4)
)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 7.0 3.2 4.7 1.4 versicolor
# 2 6.4 3.2 4.5 1.5 versicolor
# 3 6.9 3.1 4.9 1.5 versicolor
# 4 6.5 2.8 4.6 1.5 versicolor
# 5 5.7 2.8 4.5 1.3 versicolor
# 6 6.3 3.3 4.7 1.6 versicolor
# 7 6.6 2.9 4.6 1.3 versicolor
# 8 5.9 3.0 4.2 1.5 versicolor
# 9 6.1 2.9 4.7 1.4 versicolor
# 10 6.7 3.1 4.4 1.4 versicolor
# 11 5.6 3.0 4.5 1.5 versicolor
# 12 5.8 2.7 4.1 1.0 versicolor
# 13 6.2 2.2 4.5 1.5 versicolor
# 14 5.9 3.2 4.8 1.8 versicolor
# 15 6.3 2.5 4.9 1.5 versicolor
# 16 6.1 2.8 4.7 1.2 versicolor
# 17 6.4 2.9 4.3 1.3 versicolor
# 18 6.6 3.0 4.4 1.4 versicolor
# 19 6.8 2.8 4.8 1.4 versicolor
# 20 6.7 3.0 5.0 1.7 versicolor
# 21 6.0 2.9 4.5 1.5 versicolor
# 22 6.0 2.7 5.1 1.6 versicolor
# 23 5.4 3.0 4.5 1.5 versicolor
# 24 6.0 3.4 4.5 1.6 versicolor
# 25 6.7 3.1 4.7 1.5 versicolor
# 26 6.3 2.3 4.4 1.3 versicolor
# 27 5.6 3.0 4.1 1.3 versicolor
# 28 5.5 2.6 4.4 1.2 versicolor
# 29 6.1 3.0 4.6 1.4 versicolor
# 30 5.6 2.7 4.2 1.3 versicolor
# 31 5.7 3.0 4.2 1.2 versicolor
# 32 5.7 2.9 4.2 1.3 versicolor
# 33 6.2 2.9 4.3 1.3 versicolor
# 34 5.7 2.8 4.1 1.3 versicolor
# 35 6.3 3.3 6.0 2.5 virginica
# 36 5.8 2.7 5.1 1.9 virginica
# 37 7.1 3.0 5.9 2.1 virginica
# 38 6.3 2.9 5.6 1.8 virginica
# 39 6.5 3.0 5.8 2.2 virginica
# 40 7.6 3.0 6.6 2.1 virginica
# 41 4.9 2.5 4.5 1.7 virginica
# 42 7.3 2.9 6.3 1.8 virginica
# 43 6.7 2.5 5.8 1.8 virginica
# 44 7.2 3.6 6.1 2.5 virginica
# 45 6.5 3.2 5.1 2.0 virginica
# 46 6.4 2.7 5.3 1.9 virginica
# 47 6.8 3.0 5.5 2.1 virginica
# 48 5.7 2.5 5.0 2.0 virginica
# 49 5.8 2.8 5.1 2.4 virginica
# 50 6.4 3.2 5.3 2.3 virginica
# 51 6.5 3.0 5.5 1.8 virginica
# 52 7.7 3.8 6.7 2.2 virginica
# 53 7.7 2.6 6.9 2.3 virginica
# 54 6.0 2.2 5.0 1.5 virginica
# 55 6.9 3.2 5.7 2.3 virginica
# 56 5.6 2.8 4.9 2.0 virginica
# 57 7.7 2.8 6.7 2.0 virginica
# 58 6.3 2.7 4.9 1.8 virginica
# 59 6.7 3.3 5.7 2.1 virginica
# 60 7.2 3.2 6.0 1.8 virginica
# 61 6.2 2.8 4.8 1.8 virginica
# 62 6.1 3.0 4.9 1.8 virginica
# 63 6.4 2.8 5.6 2.1 virginica
# 64 7.2 3.0 5.8 1.6 virginica
# 65 7.4 2.8 6.1 1.9 virginica
# 66 7.9 3.8 6.4 2.0 virginica
# 67 6.4 2.8 5.6 2.2 virginica
# 68 6.3 2.8 5.1 1.5 virginica
# 69 6.1 2.6 5.6 1.4 virginica
# 70 7.7 3.0 6.1 2.3 virginica
# 71 6.3 3.4 5.6 2.4 virginica
# 72 6.4 3.1 5.5 1.8 virginica
# 73 6.0 3.0 4.8 1.8 virginica
# 74 6.9 3.1 5.4 2.1 virginica
# 75 6.7 3.1 5.6 2.4 virginica
# 76 6.9 3.1 5.1 2.3 virginica
# 77 5.8 2.7 5.1 1.9 virginica
# 78 6.8 3.2 5.9 2.3 virginica
# 79 6.7 3.3 5.7 2.5 virginica
# 80 6.7 3.0 5.2 2.3 virginica
# 81 6.3 2.5 5.0 1.9 virginica
# 82 6.5 3.0 5.2 2.0 virginica
# 83 6.2 3.4 5.4 2.3 virginica
# 84 5.9 3.0 5.1 1.8 virginica
Please comment below if you have any questions.
Download Rscript — Click_here
Download R program —
Click_here
Download R studio —
Click_here
Comments
Post a Comment