Rapid publication-ready ANOVA table in R
AGRON Info-Tech
November 02, 2021
Introduction
In the biological sciences, statistical tables are an important part of data analysis and reporting. Traditional manual procedures for computing and presenting statistically significant findings are time-consuming and error-prone. One of the most important statistical tables is Analysis of variance (ANOVA) and may be found in a wide range of agricultural, biological and medicinal research.
Here, we shall focus on introducing a new method capable of generating publication ready ANOVA tables within few simple steps. You will be able to export this table in MS word, PDF and HTML document. None of the statistical software can generate such tables without advanced knowledge of the corresponding programming language. This will help researchers to quickly create ANOVA tables while saving time by eliminating the need to manually input each command. This has a significant benefit over typical manual procedures for computation and table display, which are not only time-consuming but also prone to errors.
So let’s get started!
The packages that must be loaded before utilizing certain functions are listed here. Load these packages by using library()
function.
library(readxl)
library(dplyr)
library(tibble)
library(flextable)
The first variable in the data set specifies the number of replications, the following two are factor variables, and the remaining variables are response variables. To assess the relevance of treatments, we will use a two-factor randomised complete block design model. To import the data set, we should first locate the data file using the path parameter. When the col_names parameter is set to TRUE, the first row of the data set will be used as variable names. The elements of the data file will be stored in R as an object data.
= readxl::read_excel(
data path = "Data.xlsx",
col_names = TRUE
)head(data)
# # A tibble: 6 x 10
# Rep Water Priming `Plant height` Spikelets `Spike length` `Grain per spike`
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 CONV NP 100 204 12 35
# 2 1 CONV HP 103 293 15.3 59
# 3 1 CONV OP 95 322 17.5 64
# 4 1 FLOOD NP 99 284 12 34
# 5 1 FLOOD HP 86 304 12.2 44
# 6 1 FLOOD OP 88 326 16.8 69
# # ... with 3 more variables: grain weight <dbl>, Biological yield <dbl>,
# # Grain yield <dbl>
When you apply str() function to the object data then you will notice the two factor variables are being recognized as character variables in R. We can use as.factor() function to convert these variables to factors.
# Convert categorical variables to factor variables
$Rep = as.factor(data$Rep)
data$Water = as.factor(data$Water)
data$Priming = as.factor(data$Priming)
data
str(data)
# tibble [27 x 10] (S3: tbl_df/tbl/data.frame)
# $ Rep : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ...
# $ Water : Factor w/ 3 levels "CONV","FLOOD",..: 1 1 1 2 2 2 3 3 3 1 ...
# $ Priming : Factor w/ 3 levels "HP","NP","OP": 2 1 3 2 1 3 2 1 3 2 ...
# $ Plant height : num [1:27] 100 103 95 99 86 88 91 89 95 97 ...
# $ Spikelets : num [1:27] 204 293 322 284 304 326 274 291 313 252 ...
# $ Spike length : num [1:27] 12 15.3 17.5 12 12.2 ...
# $ Grain per spike : num [1:27] 35 59 64 34 44 69 32 37 48 32 ...
# $ grain weight : num [1:27] 37 46 56.5 37.5 49.8 59 39.5 44.8 50.6 40.5 ...
# $ Biological yield: num [1:27] 13.3 16.5 17.4 12.3 15 ...
# $ Grain yield : num [1:27] 4.2 5.3 5.85 4.5 5.3 6.1 4.8 5.2 6.3 4.03 ...
Also apply attach() function to data object to mask its components.
attach(data)
Creating for loop function
The difficult part comes next, as we’ll build a for loop function that will ask for numerous parameters and, once answered, will generate a publication-ready ANOVA table for all of response variables in the data set.
Assign column names
To begin, we’ll need to build an object that has the column names for all of the data set’s response variables. The first response variable in this data set is in the fourth column. As a result, the values for response variables will be assigned from the 4th to the nth number of columns in the data set.
<- names(data)[4:ncol(data)]
cols cols
# [1] "Plant height" "Spikelets" "Spike length" "Grain per spike"
# [5] "grain weight" "Biological yield" "Grain yield"
Analysis of variance model
The analysis of variance model will be implemented with the help of the lapply function, which returns a list of the same length as X containing column names for the response variables. The next step is to define the ANOVA model. The model in the termlabels argument is the component of the ANOVA model printed on the right side following the tilde. Depending on the experimental design, you may provide different model terms.
<- lapply(cols, FUN = function(x)
aov.model aov(reformulate(termlabels = "Rep + Water*Priming",
response = x),
data = data))
anova(aov.model[[2]])
# Analysis of Variance Table
#
# Response: Spikelets
# Df Sum Sq Mean Sq F value Pr(>F)
# Rep 2 15894.9 7947.5 11.2003 0.0009083 ***
# Water 2 533.1 266.5 0.3756 0.6927466
# Priming 2 28332.7 14166.3 19.9646 4.486e-05 ***
# Water:Priming 4 918.1 229.5 0.3235 0.8581023
# Residuals 16 11353.2 709.6
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I’m going to choose the first, third, and fifth columns in the ANOVA table to represent degree of freedom, mean square, and probability value, respectively. If you wish to print the F-value column instead of the mean squares, use 4 instead of 3.
# print df, MS and Pvalue
= anova(aov.model[[2]])[,c(1,3,5)]
final final
# Df Mean Sq Pr(>F)
# Rep 2 7947.5 0.0009083 ***
# Water 2 266.5 0.6927466
# Priming 2 14166.3 4.486e-05 ***
# Water:Priming 4 229.5 0.8581023
# Residuals 16 709.6
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We also need to create an object that should specify rownames in ANOVA table because we need to rename the sources of variation in finally published ANOVA table.
# Getting rownames
= rownames(final)
rnames rnames
# [1] "Rep" "Water" "Priming" "Water:Priming"
# [5] "Residuals"
Setting column names
Set the column names for the ANOVA table that match to the columns we’ve already chosen.
# Setting column names
colnames(final) = c("DF", "MS", "P-value")
colnames(final)[2] = cols[2]
final
# DF Spikelets P-value
# Rep 2 7947.5 0.00091
# Water 2 266.5 0.69275
# Priming 2 14166.3 0.00004
# Water:Priming 4 229.5 0.85810
# Residuals 16 709.6
Rounding values
The data in the ANOVA table will then be rounded to three decimal places and the output will be saved as data frame.
# Rounding values to 2 decimal place
= as.data.frame(round(final, digits = 2))
final final
# DF Spikelets P-value
# Rep 2 7947.47 0.00
# Water 2 266.55 0.69
# Priming 2 14166.35 0.00
# Water:Priming 4 229.53 0.86
# Residuals 16 709.58 NA
Assign astericks
Instead of adding a p-value column for each parameter, we’ll use single asterick, double asterick, and ns to indicate significant (\(p < 0.05\)), highly significant (\(p < 0.01\)), and non significant (\(p>0.05\)) impacts. After that, we’ll combine both the mean square column and the column where we just assign symbols. After giving the column numbers, both columns will be merged using the paste function.
# Assign astericks according to p values
$sign[final$`P-value` < 0.05] <- "*"
final$sign[final$`P-value` < 0.01] <- "**"
final$sign[final$`P-value` > 0.05] <- "ns"
final
# Merge MS and significance column together
2]] = paste(final[[2]], ifelse(is.na(final[[4]]), "", final[[4]]))
final[[ final
# DF Spikelets P-value sign
# Rep 2 7947.47 ** 0.00 **
# Water 2 266.55 ns 0.69 ns
# Priming 2 14166.35 ** 0.00 **
# Water:Priming 4 229.53 ns 0.86 ns
# Residuals 16 709.58 NA <NA>
We shall delete the extra column from ANOVA table using subtraction sign before concatenate function.
= final[-c(3,4)]
final final
# DF Spikelets
# Rep 2 7947.47 **
# Water 2 266.55 ns
# Priming 2 14166.35 **
# Water:Priming 4 229.53 ns
# Residuals 16 709.58
Create list of tables in excel sheets
The next challenging aspect was to print this final object which if printed directly resulted in single variable output instead of list of all variables. So, I decided to first write this object in excel sheets separately for each variable by using write_excel() function. I specified path argument by pasting column names of each variable plus the additional text “-ANOVA” with extension .xlsx
. This will result in printing the final shaped ANOVA table for each variable in excel sheets.
= writexl::write_xlsx(final, path = paste(cols[i], '-ANOVA.xlsx')) anova
Print final ANOVA object
I was able to print the final shaped ANOVA table in publication-ready format using the above above created object containing list of ANOVA tables in separate excel sheets. I made a list of all the excel sheets with the pattern -ANOVA
and the extension .xlsx
before printing the final table. Then, I used the lapply function to read this list by specifying FUN argument as read_excel.
# Print final ANOVA object
<- list.files(pattern='*-ANOVA.xlsx')
file.list
<- lapply(X = file.list, FUN = read_excel)
df.list df.list
# [[1]]
# # A tibble: 5 x 2
# DF `Biological yield`
# <dbl> <chr>
# 1 2 1.93 ns
# 2 2 0.2 ns
# 3 2 46.74 **
# 4 4 0.18 ns
# 5 16 0.7
#
# [[2]]
# # A tibble: 5 x 2
# DF `Grain per spike`
# <dbl> <chr>
# 1 2 24.41 ns
# 2 2 21.12 ns
# 3 2 1374.04 **
# 4 4 30.41 ns
# 5 16 42.05
#
# [[3]]
# # A tibble: 5 x 2
# DF `grain weight`
# <dbl> <chr>
# 1 2 79.42 **
# 2 2 3.37 ns
# 3 2 836.57 **
# 4 4 7 ns
# 5 16 5.23
#
# [[4]]
# # A tibble: 5 x 2
# DF `Grain yield`
# <dbl> <chr>
# 1 2 0.27 **
# 2 2 0.2 *
# 3 2 8.58 **
# 4 4 0.02 ns
# 5 16 0.03
#
# [[5]]
# # A tibble: 5 x 2
# DF `Plant height`
# <dbl> <chr>
# 1 2 0.45 ns
# 2 2 29.93 ns
# 3 2 0.5 ns
# 4 4 15.85 ns
# 5 16 15.5
#
# [[6]]
# # A tibble: 5 x 2
# DF `Spike length`
# <dbl> <chr>
# 1 2 8.71 *
# 2 2 1.1 ns
# 3 2 40.3 **
# 4 4 1.44 ns
# 5 16 1.27
#
# [[7]]
# # A tibble: 5 x 2
# DF Spikelets
# <dbl> <chr>
# 1 2 7947.47 **
# 2 2 266.55 ns
# 3 2 14166.35 **
# 4 4 229.53 ns
# 5 16 709.58
Combine all variables
In the following stage, I used the list.cbind() function to join all of the previously generated lists. This combined ANOVA table contains DF and MS columns for all response variables in the data set. However, the DF column was duplicated for each variable, which we need to remove. In the following command, I created an object containing names of duplicated columns and subsequently deleted duplicated columns by using subtraction sign before this object.
# Combined ANOVA table for all variables
= rlist::list.cbind(df.list)
aov.table
# Remove duplicate columns for DF
= which(duplicated(names(aov.table)))
dup.cols = aov.table[,-dup.cols]
aov.table aov.table
# DF Biological yield Grain per spike grain weight Grain yield Plant height
# 1 2 1.93 ns 24.41 ns 79.42 ** 0.27 ** 0.45 ns
# 2 2 0.2 ns 21.12 ns 3.37 ns 0.2 * 29.93 ns
# 3 2 46.74 ** 1374.04 ** 836.57 ** 8.58 ** 0.5 ns
# 4 4 0.18 ns 30.41 ns 7 ns 0.02 ns 15.85 ns
# 5 16 0.7 42.05 5.23 0.03 15.5
# Spike length Spikelets
# 1 8.71 * 7947.47 **
# 2 1.1 ns 266.55 ns
# 3 40.3 ** 14166.35 **
# 4 1.44 ns 229.53 ns
# 5 1.27 709.58
In the final stage we just need to add rownames specifying the sources of variation in ANOVA table.
# Names for sources of variation in ANOVA
rownames(aov.table) = rnames
aov.table
# DF Biological yield Grain per spike grain weight Grain yield
# Rep 2 1.93 ns 24.41 ns 79.42 ** 0.27 **
# Water 2 0.2 ns 21.12 ns 3.37 ns 0.2 *
# Priming 2 46.74 ** 1374.04 ** 836.57 ** 8.58 **
# Water:Priming 4 0.18 ns 30.41 ns 7 ns 0.02 ns
# Residuals 16 0.7 42.05 5.23 0.03
# Plant height Spike length Spikelets
# Rep 0.45 ns 8.71 * 7947.47 **
# Water 29.93 ns 1.1 ns 266.55 ns
# Priming 0.5 ns 40.3 ** 14166.35 **
# Water:Priming 15.85 ns 1.44 ns 229.53 ns
# Residuals 15.5 1.27 709.58
As I’ve finished my code, so I need to write a for loop function to repeat it for all of the response variables in the data set. This may be accomplished by enclosing all of the preceding instructions in the for loop function, where I’ve specified a range for i in 1 to n columns of the data set, except the first three variables.
Printing publication ready ANOVA table
Finally, in an MS Word document, we must print this publication-ready ANOVA table. To do so, we’ll utilise the flextable function from the flextable package. We will supply the aov.table object in the data parameter, and we may also give the column name for the table’s rownames following the pipe operator. By using the part parameter as header in the bold() function, you may bold the column names.
Click the compile report icon to produce this table in several document formats. You may choose from a variety of document types. I’m going to choose MS Word document.
This is how the ANOVA table will appear in the end. Do you believe it’s a matter of a few clicks to produce a publication-ready ANOVA table in R after such tedious work? The answer is yes you can get the same output by just altering few parameters for your own data.
You must provide the names of factor variables from your data set at the beginning part when we transformed variables from character to factor. Then you need to select a model term that best describes the experimental design. Here are some of the model terms for different experimental designs. You can choose the one that best describe your experimental design. Another thing that you have to alter is selecting the appropriate range of column names that may differ in your data set. That’s all!
Please comment below if you have any questions.
Download data file — Click_here
Download Rscript — Click_here
Download R program —
Click_here
Download R studio —
Click_here
good
ReplyDeleteThanks for your support
DeleteThank you very much for the vital information. May you demystify on the ANOVA using alpha lattice design.
ReplyDeleteGood web site! I really love how it is simple on my eyes and the data are well written. I am wondering how I might be notified whenever a new post has been made. I've subscribed to your RSS feed which must do the trick! Have a great day!quite gaming mouse
ReplyDeleteHello Dear
ReplyDeleteIt is Good article Blog Thanks for sharing.
affordable video production
brand marketing video production washington dc
video production companies in washington dc
video production service washington dc
video production washington dc
affordable video marketing
affordable video production services
corporate video production dc
corporate video production washington dc
video production services dc
Thanks for sharing such an informative Article. It will be beneficial to those who seek information. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Solutions
Artificial Intelligence Services
Data Analytics Services
Data Modernization Services
I appreciate you taking the time and effort to share your knowledge. This material proved to be really efficient and beneficial to me. Thank you very much for providing this information. Continue to write your blog.
ReplyDeleteData Engineering Services
Machine Learning Solutions
Data Analytics Solutions
Data Modernization Services