Rapid publication-ready ANOVA table in R

Introduction
Creating for loop function
Printing publication ready ANOVA table

Introduction

In the biological sciences, statistical tables are an important part of data analysis and reporting. Traditional manual procedures for computing and presenting statistically significant findings are time-consuming and error-prone. One of the most important statistical tables is Analysis of variance (ANOVA) and may be found in a wide range of agricultural, biological and medicinal research.

Here, we shall focus on introducing a new method capable of generating publication ready ANOVA tables within few simple steps. You will be able to export this table in MS word, PDF and HTML document. None of the statistical software can generate such tables without advanced knowledge of the corresponding programming language. This will help researchers to quickly create ANOVA tables while saving time by eliminating the need to manually input each command. This has a significant benefit over typical manual procedures for computation and table display, which are not only time-consuming but also prone to errors.

So let’s get started!

The packages that must be loaded before utilizing certain functions are listed here. Load these packages by using library() function.

library(readxl)
library(dplyr)
library(tibble)
library(flextable)

The first variable in the data set specifies the number of replications, the following two are factor variables, and the remaining variables are response variables. To assess the relevance of treatments, we will use a two-factor randomised complete block design model. To import the data set, we should first locate the data file using the path parameter. When the col_names parameter is set to TRUE, the first row of the data set will be used as variable names. The elements of the data file will be stored in R as an object data.

data = readxl::read_excel(
          path = "Data.xlsx",
          col_names = TRUE
)
head(data)

# # A tibble: 6 x 10
#     Rep Water Priming `Plant height` Spikelets `Spike length` `Grain per spike`
#   <dbl> <chr> <chr>            <dbl>     <dbl>          <dbl>             <dbl>
# 1     1 CONV  NP                 100       204           12                  35
# 2     1 CONV  HP                 103       293           15.3                59
# 3     1 CONV  OP                  95       322           17.5                64
# 4     1 FLOOD NP                  99       284           12                  34
# 5     1 FLOOD HP                  86       304           12.2                44
# 6     1 FLOOD OP                  88       326           16.8                69
# # ... with 3 more variables: grain weight <dbl>, Biological yield <dbl>,
# #   Grain yield <dbl>

When you apply str() function to the object data then you will notice the two factor variables are being recognized as character variables in R. We can use as.factor() function to convert these variables to factors.

# Convert categorical variables to factor variables
data$Rep = as.factor(data$Rep)
data$Water = as.factor(data$Water)
data$Priming = as.factor(data$Priming)

str(data)

# tibble [27 x 10] (S3: tbl_df/tbl/data.frame)
#  $ Rep             : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ...
#  $ Water           : Factor w/ 3 levels "CONV","FLOOD",..: 1 1 1 2 2 2 3 3 3 1 ...
#  $ Priming         : Factor w/ 3 levels "HP","NP","OP": 2 1 3 2 1 3 2 1 3 2 ...
#  $ Plant height    : num [1:27] 100 103 95 99 86 88 91 89 95 97 ...
#  $ Spikelets       : num [1:27] 204 293 322 284 304 326 274 291 313 252 ...
#  $ Spike length    : num [1:27] 12 15.3 17.5 12 12.2 ...
#  $ Grain per spike : num [1:27] 35 59 64 34 44 69 32 37 48 32 ...
#  $ grain weight    : num [1:27] 37 46 56.5 37.5 49.8 59 39.5 44.8 50.6 40.5 ...
#  $ Biological yield: num [1:27] 13.3 16.5 17.4 12.3 15 ...
#  $ Grain yield     : num [1:27] 4.2 5.3 5.85 4.5 5.3 6.1 4.8 5.2 6.3 4.03 ...

Also apply attach() function to data object to mask its components.

attach(data)

Creating for loop function

The difficult part comes next, as we’ll build a for loop function that will ask for numerous parameters and, once answered, will generate a publication-ready ANOVA table for all of response variables in the data set.

Assign column names

To begin, we’ll need to build an object that has the column names for all of the data set’s response variables. The first response variable in this data set is in the fourth column. As a result, the values for response variables will be assigned from the 4th to the nth number of columns in the data set.

cols <- names(data)[4:ncol(data)]
cols

# [1] "Plant height"     "Spikelets"        "Spike length"     "Grain per spike" 
# [5] "grain weight"     "Biological yield" "Grain yield"

Analysis of variance model

The analysis of variance model will be implemented with the help of the lapply function, which returns a list of the same length as X containing column names for the response variables. The next step is to define the ANOVA model. The model in the termlabels argument is the component of the ANOVA model printed on the right side following the tilde. Depending on the experimental design, you may provide different model terms.

aov.model <- lapply(cols, FUN = function(x) 
                    aov(reformulate(termlabels = "Rep + Water*Priming", 
                                    response = x), 
                        data = data))
anova(aov.model[[2]])

# Analysis of Variance Table
# 
# Response: Spikelets
#               Df  Sum Sq Mean Sq F value    Pr(>F)    
# Rep            2 15894.9  7947.5 11.2003 0.0009083 ***
# Water          2   533.1   266.5  0.3756 0.6927466    
# Priming        2 28332.7 14166.3 19.9646 4.486e-05 ***
# Water:Priming  4   918.1   229.5  0.3235 0.8581023    
# Residuals     16 11353.2   709.6                      
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I’m going to choose the first, third, and fifth columns in the ANOVA table to represent degree of freedom, mean square, and probability value, respectively. If you wish to print the F-value column instead of the mean squares, use 4 instead of 3.

# print df, MS and Pvalue
final = anova(aov.model[[2]])[,c(1,3,5)]
final

#               Df Mean Sq    Pr(>F)    
# Rep            2  7947.5 0.0009083 ***
# Water          2   266.5 0.6927466    
# Priming        2 14166.3 4.486e-05 ***
# Water:Priming  4   229.5 0.8581023    
# Residuals     16   709.6              
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We also need to create an object that should specify rownames in ANOVA table because we need to rename the sources of variation in finally published ANOVA table.

# Getting rownames
rnames = rownames(final)
rnames

# [1] "Rep"           "Water"         "Priming"       "Water:Priming"
# [5] "Residuals"

Setting column names

Set the column names for the ANOVA table that match to the columns we’ve already chosen.

# Setting column names
colnames(final) = c("DF", "MS", "P-value")
colnames(final)[2] = cols[2]
final

#               DF Spikelets P-value
# Rep            2    7947.5 0.00091
# Water          2     266.5 0.69275
# Priming        2   14166.3 0.00004
# Water:Priming  4     229.5 0.85810
# Residuals     16     709.6

Rounding values

The data in the ANOVA table will then be rounded to three decimal places and the output will be saved as data frame.

# Rounding values to 2 decimal place
final = as.data.frame(round(final, digits = 2))
final

#               DF Spikelets P-value
# Rep            2   7947.47    0.00
# Water          2    266.55    0.69
# Priming        2  14166.35    0.00
# Water:Priming  4    229.53    0.86
# Residuals     16    709.58      NA

Assign astericks

Instead of adding a p-value column for each parameter, we’ll use single asterick, double asterick, and ns to indicate significant (\(p < 0.05\)), highly significant (\(p < 0.01\)), and non significant (\(p>0.05\)) impacts. After that, we’ll combine both the mean square column and the column where we just assign symbols. After giving the column numbers, both columns will be merged using the paste function.

# Assign astericks according to p values
final$sign[final$`P-value` < 0.05] <- "*" 
final$sign[final$`P-value` < 0.01] <- "**"
final$sign[final$`P-value` > 0.05] <- "ns"
          
# Merge MS and significance column together
final[[2]] = paste(final[[2]], ifelse(is.na(final[[4]]), "", final[[4]]))
final

#               DF   Spikelets P-value sign
# Rep            2  7947.47 **    0.00   **
# Water          2   266.55 ns    0.69   ns
# Priming        2 14166.35 **    0.00   **
# Water:Priming  4   229.53 ns    0.86   ns
# Residuals     16     709.58       NA <NA>

We shall delete the extra column from ANOVA table using subtraction sign before concatenate function.

final = final[-c(3,4)]
final

#               DF   Spikelets
# Rep            2  7947.47 **
# Water          2   266.55 ns
# Priming        2 14166.35 **
# Water:Priming  4   229.53 ns
# Residuals     16     709.58

Create list of tables in excel sheets

The next challenging aspect was to print this final object which if printed directly resulted in single variable output instead of list of all variables. So, I decided to first write this object in excel sheets separately for each variable by using write_excel() function. I specified path argument by pasting column names of each variable plus the additional text “-ANOVA” with extension .xlsx. This will result in printing the final shaped ANOVA table for each variable in excel sheets.

anova = writexl::write_xlsx(final, path = paste(cols[i], '-ANOVA.xlsx'))

Print final ANOVA object

I was able to print the final shaped ANOVA table in publication-ready format using the above above created object containing list of ANOVA tables in separate excel sheets. I made a list of all the excel sheets with the pattern -ANOVA and the extension .xlsx before printing the final table. Then, I used the lapply function to read this list by specifying FUN argument as read_excel.

# Print final ANOVA object
file.list <- list.files(pattern='*-ANOVA.xlsx')
          
df.list <- lapply(X = file.list, FUN = read_excel)
df.list

# [[1]]
# # A tibble: 5 x 2
#      DF `Biological yield`
#   <dbl> <chr>             
# 1     2 1.93 ns           
# 2     2 0.2 ns            
# 3     2 46.74 **          
# 4     4 0.18 ns           
# 5    16 0.7               
# 
# [[2]]
# # A tibble: 5 x 2
#      DF `Grain per spike`
#   <dbl> <chr>            
# 1     2 24.41 ns         
# 2     2 21.12 ns         
# 3     2 1374.04 **       
# 4     4 30.41 ns         
# 5    16 42.05            
# 
# [[3]]
# # A tibble: 5 x 2
#      DF `grain weight`
#   <dbl> <chr>         
# 1     2 79.42 **      
# 2     2 3.37 ns       
# 3     2 836.57 **     
# 4     4 7 ns          
# 5    16 5.23          
# 
# [[4]]
# # A tibble: 5 x 2
#      DF `Grain yield`
#   <dbl> <chr>        
# 1     2 0.27 **      
# 2     2 0.2 *        
# 3     2 8.58 **      
# 4     4 0.02 ns      
# 5    16 0.03         
# 
# [[5]]
# # A tibble: 5 x 2
#      DF `Plant height`
#   <dbl> <chr>         
# 1     2 0.45 ns       
# 2     2 29.93 ns      
# 3     2 0.5 ns        
# 4     4 15.85 ns      
# 5    16 15.5          
# 
# [[6]]
# # A tibble: 5 x 2
#      DF `Spike length`
#   <dbl> <chr>         
# 1     2 8.71 *        
# 2     2 1.1 ns        
# 3     2 40.3 **       
# 4     4 1.44 ns       
# 5    16 1.27          
# 
# [[7]]
# # A tibble: 5 x 2
#      DF Spikelets  
#   <dbl> <chr>      
# 1     2 7947.47 ** 
# 2     2 266.55 ns  
# 3     2 14166.35 **
# 4     4 229.53 ns  
# 5    16 709.58

Combine all variables

In the following stage, I used the list.cbind() function to join all of the previously generated lists. This combined ANOVA table contains DF and MS columns for all response variables in the data set. However, the DF column was duplicated for each variable, which we need to remove. In the following command, I created an object containing names of duplicated columns and subsequently deleted duplicated columns by using subtraction sign before this object.

# Combined ANOVA table for all variables
aov.table = rlist::list.cbind(df.list)
          
# Remove duplicate columns for DF
dup.cols = which(duplicated(names(aov.table)))
aov.table = aov.table[,-dup.cols]
aov.table

#   DF Biological yield Grain per spike grain weight Grain yield Plant height
# 1  2          1.93 ns        24.41 ns     79.42 **     0.27 **      0.45 ns
# 2  2           0.2 ns        21.12 ns      3.37 ns       0.2 *     29.93 ns
# 3  2         46.74 **      1374.04 **    836.57 **     8.58 **       0.5 ns
# 4  4          0.18 ns        30.41 ns         7 ns     0.02 ns     15.85 ns
# 5 16              0.7           42.05         5.23        0.03         15.5
#   Spike length   Spikelets
# 1       8.71 *  7947.47 **
# 2       1.1 ns   266.55 ns
# 3      40.3 ** 14166.35 **
# 4      1.44 ns   229.53 ns
# 5         1.27      709.58

In the final stage we just need to add rownames specifying the sources of variation in ANOVA table.

# Names for sources of variation in ANOVA
rownames(aov.table) = rnames
aov.table

#               DF Biological yield Grain per spike grain weight Grain yield
# Rep            2          1.93 ns        24.41 ns     79.42 **     0.27 **
# Water          2           0.2 ns        21.12 ns      3.37 ns       0.2 *
# Priming        2         46.74 **      1374.04 **    836.57 **     8.58 **
# Water:Priming  4          0.18 ns        30.41 ns         7 ns     0.02 ns
# Residuals     16              0.7           42.05         5.23        0.03
#               Plant height Spike length   Spikelets
# Rep                0.45 ns       8.71 *  7947.47 **
# Water             29.93 ns       1.1 ns   266.55 ns
# Priming             0.5 ns      40.3 ** 14166.35 **
# Water:Priming     15.85 ns      1.44 ns   229.53 ns
# Residuals             15.5         1.27      709.58

As I’ve finished my code, so I need to write a for loop function to repeat it for all of the response variables in the data set. This may be accomplished by enclosing all of the preceding instructions in the for loop function, where I’ve specified a range for i in 1 to n columns of the data set, except the first three variables.

Printing publication ready ANOVA table

Finally, in an MS Word document, we must print this publication-ready ANOVA table. To do so, we’ll utilise the flextable function from the flextable package. We will supply the aov.table object in the data parameter, and we may also give the column name for the table’s rownames following the pipe operator. By using the part parameter as header in the bold() function, you may bold the column names.

Click the compile report icon to produce this table in several document formats. You may choose from a variety of document types. I’m going to choose MS Word document.

This is how the ANOVA table will appear in the end. Do you believe it’s a matter of a few clicks to produce a publication-ready ANOVA table in R after such tedious work? The answer is yes you can get the same output by just altering few parameters for your own data.

You must provide the names of factor variables from your data set at the beginning part when we transformed variables from character to factor. Then you need to select a model term that best describes the experimental design. Here are some of the model terms for different experimental designs. You can choose the one that best describe your experimental design. Another thing that you have to alter is selecting the appropriate range of column names that may differ in your data set. That’s all!

Please comment below if you have any questions.

Download data file — Click_here

Download Rscript — Click_here

Download R program — Click_here

Download R studio — Click_here

Search This Blog

Data Analysis in R