Elegant barplot using ggplot function in R
See Video ⮞ ☝ |
AGRON Stats
August 03, 2020
Introduction
You will learn how to visualize bar graph for main and interaction effects using R studio. I shall use two way analysis of variance technique to fit a model for the given data set. Least significant difference test will be applied for mean comparisons. Bar graphs will be plotted using the alphabets from group component of LSD test to place just above the error bars on the plot.
Let's get started!
Importing data file
I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE))
before importing the data set. You can also clear the plots using graphics.off()
and clear everything in console using shell()
function.
Now let’s import the data set using read.csv()
function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file
argument specify the file name with extension CSV.
In header
argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE
for this argument. The head()
function will print the first six rows of the data set.
# genotype gender activity
# 1 FF Female 3.39
# 2 FF Female 3.34
# 3 FF Female 3.59
# 4 FO Female 3.58
# 5 FO Female 4.12
# 6 FO Female 4.72
Before proceeding for analysis it is a better practice to have a look on the data set to see the structure of variables. The data looks fine as shown in below table. The second and third variables are being read as factor which is true for the given data set. In case if it is integer or character then you can change it to factor using as.factor()
function as shown below.
# 'data.frame': 18 obs. of 3 variables:
# $ genotype: Factor w/ 3 levels "FF","FO","OO": 1 1 1 2 2 2 3 3 3 1 ...
# $ gender : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 2 ...
# $ activity: num 3.39 3.34 3.59 3.58 4.12 4.72 5.06 4.05 4.09 2.2 ...
# 'data.frame': 18 obs. of 3 variables:
# $ genotype: Factor w/ 3 levels "FF","FO","OO": 1 1 1 2 2 2 3 3 3 1 ...
# $ gender : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 2 ...
# $ activity: num 3.39 3.34 3.59 3.58 4.12 4.72 5.06 4.05 4.09 2.2 ...
Fit analysis of variance model
First, you need to load the library agricolae
using require()
function. If this package is not installed then first install this package using install.pakcages("agricolae")
command in R console.
For analysis of variance of split-plot design assign model by using sp.plot()
function. In parenthesis for block
argument you can type the block or replication variable name from data set. For pplot
argument specify the main plot variable name (varieties var
in this example) and for splot
argument specify the sub-plot variable name (seedling age age
in this example). In Y
argument you can specify the response variable name as heading
in this example.
This will construct the analysis of variance table showing the significance of sources of variation.
attach(data)
library(stats)
aov.res <- aov(activity ~
genotype +
gender +
genotype:gender)
# OR
aov.res <- aov(activity ~
genotype*gender)
anova(aov.res)
# Analysis of Variance Table
#
# Response: activity
# Df Sum Sq Mean Sq F value Pr(>F)
# genotype 2 6.2525 3.12627 10.9923 0.001938 **
# gender 1 1.3833 1.38334 4.8640 0.047669 *
# genotype:gender 2 0.7911 0.39554 1.3908 0.286269
# Residuals 12 3.4129 0.28441
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
LSD mean comparison test
LSD applied on genotypes
Now let’s compute multiple comparison of treatments by using least significant difference test with grouping of treatments. To apply this test load agricolae
package using library()
function. LSD.test()
function is used for multiple comparisons of treatments.
This function require some arguments. In Y
argument you can specify model or use response variable name as activity in this case. In trt
argument specify the first factor variable. You can set the value for DFerror
by writing df.residual
with model object after dollar sign (aov.res$df.residual
).
To set the value of error mean square MSerror
divide deviance of model object with residual degree of freedom. I shall use the default value for alpha argument that represents the 5 percent level of significance. In method argument (p.adj
) for adjusting probability value, I shall use bonferroni which is most commonly used method. Set a logical value for group
argument to specify grouping of treatment means. Set a logical value for console
argument to print the outputs.
library(agricolae)
# First factor variable (factor A)
LSD_A = LSD.test(y = activity,
trt = genotype,
DFerror = aov.res$df.residual,
MSerror = deviance(aov.res)/aov.res$df.residual,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: activity ~ genotype
#
# LSD t Test for activity
# P value adjustment method: bonferroni
#
# Mean Square Error: 0.2844056
#
# genotype, means and individual ( 95 %) CI
#
# activity std r LCL UCL Min Max
# FF 2.973333 0.5458816 6 2.498968 3.447699 2.20 3.59
# FO 3.760000 0.5871967 6 3.285634 4.234366 3.12 4.72
# OO 4.415000 0.6889630 6 3.940634 4.889366 3.43 5.26
#
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.779473
#
# Minimum Significant Difference: 0.8557972
#
# Treatments with the same letter are not significantly different.
#
# activity groups
# OO 4.415000 a
# FO 3.760000 ab
# FF 2.973333 b
LSD applied on gender
To compute multiple comparisons of treatments for the second factor gender you will need to only change the trt
argument. Type the second variable factor name instead of first variable factor as used in previous command. Rest of all arguments will remain the same. This will print the outputs for the second variable factor that is gender.
# Second factor variable (factor B)
LSD_B = LSD.test(y = activity,
trt = gender,
DFerror = aov.res$df.residual,
MSerror = deviance(aov.res)/aov.res$df.residual,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: activity ~ gender
#
# LSD t Test for activity
# P value adjustment method: bonferroni
#
# Mean Square Error: 0.2844056
#
# gender, means and individual ( 95 %) CI
#
# activity std r LCL UCL Min Max
# Female 3.993333 0.5935908 9 3.606015 4.380651 3.34 5.06
# Male 3.438889 0.9770932 9 3.051571 3.826207 2.20 5.26
#
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.178813
#
# Minimum Significant Difference: 0.5477504
#
# Treatments with the same letter are not significantly different.
#
# activity groups
# Female 3.993333 a
# Male 3.438889 b
LSD applied on interaction term
To compute multiple comparison of treatments for interaction term again use the same arguments as used in previous command except the treatment trt
argument. In treatment argument specify the interaction term as used in model object for analysis of variance (genotype:gender
). This will print the outputs for the interaction effect.
LSD_AB = LSD.test(y = activity,
trt = genotype:gender,
DFerror = aov.res$df.residual,
MSerror = deviance(aov.res)/aov.res$df.residual,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: activity ~ genotype:gender
#
# LSD t Test for activity
# P value adjustment method: bonferroni
#
# Mean Square Error: 0.2844056
#
# genotype:gender, means and individual ( 95 %) CI
#
# activity std r LCL UCL Min Max
# FF:Female 3.440000 0.1322876 3 2.769146 4.110854 3.34 3.59
# FF:Male 2.506667 0.2722744 3 1.835812 3.177521 2.20 2.72
# FO:Female 4.140000 0.5702631 3 3.469146 4.810854 3.58 4.72
# FO:Male 3.380000 0.3218695 3 2.709146 4.050854 3.12 3.74
# OO:Female 4.400000 0.5719266 3 3.729146 5.070854 4.05 5.06
# OO:Male 4.430000 0.9267686 3 3.759146 5.100854 3.43 5.26
#
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 3.648889
#
# Minimum Significant Difference: 1.588854
#
# Treatments with the same letter are not significantly different.
#
# activity groups
# OO:Male 4.430000 a
# OO:Female 4.400000 a
# FO:Female 4.140000 a
# FF:Female 3.440000 ab
# FO:Male 3.380000 ab
# FF:Male 2.506667 b
Original order of groups (for lettering)
Original order of genotypes
Next step is to arrange the treatment groups in the original order of treatments rather than ranked order of LSD test result. This will help while adding alphabets from treatment group in the same order as the order of treatments in the plot. Connect the group component from LSD object (LSD_A$group
) with group_by()
and arrange()
functions for the same component using the %>%
pipe operator. Print function for this object will show the original order of treatments.
library(dplyr)
## Original order of LSD$group for factor A
ascend_A = LSD_A$groups %>%
group_by(rownames(LSD_A$groups)) %>%
arrange(rownames(LSD_A$groups))
print(ascend_A)
# # A tibble: 3 x 3
# # Groups: rownames(LSD_A$groups) [3]
# activity groups `rownames(LSD_A$groups)`
# <dbl> <fct> <chr>
# 1 2.97 b FF
# 2 3.76 ab FO
# 3 4.42 a OO
Original order of gender
Use the same commands to arrange treatment groups in original order for the second factor variable. For this use treatment group component of LSD_B
object (LSD_B$group
). Print function for this object will show the original order of treatments for second factor.
# Original order of LSD$group for factor B
ascend_B = LSD_B$groups %>%
group_by(rownames(LSD_B$groups)) %>%
arrange(rownames(LSD_B$groups))
print(ascend_B)
# # A tibble: 2 x 3
# # Groups: rownames(LSD_B$groups) [2]
# activity groups `rownames(LSD_B$groups)`
# <dbl> <fct> <chr>
# 1 3.99 a Female
# 2 3.44 b Male
Original order of interaction term
Use the same commands again to arrange treatment groups in original order for the interaction term. For this use treatment group component of LSD_AB
object (LSD_AB$group
). Print function for this object will show the original order of treatments.
## Original order of LSD$group for interaction
ascend_AB = LSD_AB$groups %>%
group_by(rownames(LSD_AB$groups)) %>%
arrange(rownames(LSD_AB$groups))
print(ascend_AB)
# # A tibble: 6 x 3
# # Groups: rownames(LSD_AB$groups) [6]
# activity groups `rownames(LSD_AB$groups)`
# <dbl> <fct> <chr>
# 1 3.44 ab FF:Female
# 2 2.51 b FF:Male
# 3 4.14 a FO:Female
# 4 3.38 ab FO:Male
# 5 4.40 a OO:Female
# 6 4.43 a OO:Male
Visualizing effects
Mean and standard error (genotype)
Before visualizing main and interaction effects first compute mean values and standard error. For this load the library dplyr
that states the grammar of data manipulation. Connect data
set to a command that will it by genotypes by using group_by()
function and then connect with summarise()
function to get mean (avg_A
) and standard error (se
) values. For connection we shall use %>%
pipe operator.
In summarize function the objects for mean and standard error will be created. Print function will display the results for this object.
library(dplyr)
### Mean and SE for first factor (Factor A)
MeanSE_A = data %>%
group_by(genotype) %>%
summarise(avg_A = mean(activity),
se = sd(activity)/sqrt(length(activity)))
print(MeanSE_A)
# # A tibble: 3 x 3
# genotype avg_A se
# <fct> <dbl> <dbl>
# 1 FF 2.97 0.223
# 2 FO 3.76 0.240
# 3 OO 4.42 0.281
Mean and standard error (gender)
Similarly, get mean values and standard error for second variable factor using the same command except connecting the data
set to group_by()
gender using %>%
pipe operator. Print function for this object will display the results.
### Mean and SE for second factor (Factor B)
MeanSE_B = data %>%
group_by(gender) %>%
summarise(avg_B = mean(activity),
se = sd(activity)/sqrt(length(activity)))
print(MeanSE_B)
# # A tibble: 2 x 3
# gender avg_B se
# <fct> <dbl> <dbl>
# 1 Female 3.99 0.198
# 2 Male 3.44 0.326
Mean and standard error (interaction)
Again use the same command for interaction term. Here you will connect the data
set to group_by()
genotype and gender using the %>%
pipe operator. Print function for this object will display the results.
## Mean and SE for interaction effect (AxB)
MeanSE_AB = data %>%
group_by(genotype, gender) %>%
summarise(avg_AB = mean(activity),
se = sd(activity)/sqrt(length(activity)))
print(MeanSE_AB)
# # A tibble: 6 x 4
# # Groups: genotype [3]
# genotype gender avg_AB se
# <fct> <fct> <dbl> <dbl>
# 1 FF Female 3.44 0.0764
# 2 FF Male 2.51 0.157
# 3 FO Female 4.14 0.329
# 4 FO Male 3.38 0.186
# 5 OO Female 4.40 0.330
# 6 OO Male 4.43 0.535
Plotting genotypes
Create plotting object
Let’s start to display the main and interaction effects using grammar of graphics ggplot()
function. This will require to load the package ggplot2
. Create an object p1
using the grammar of graphics function.
This function requires data set (MeanSE_A
) and some aesthetic mappings to use for the plot. In aesthetic mappings function the argument X
specify the variable genotype and argument Y
specify the variable mean values object (avg_A
) created earlier. Print function will print a blank plot showing X and Y axis labels.
Add bars
Now let’s add layers to this plot object. For plotting bars use geom_bar()
function. In stat
argument set the value as identity to leave the data as is. In color
argument specify the color for borders of the bars. The position
argument specify the Position adjustment, either as a string, or the result of a call to a position adjustment function. The argument width
specify the width of the bars. Printing this layer with plot object will display bars on the plot.
Add error bars
Let’s add the second layer to show error bars on the bars of the plot. For this use geom_errorbar()
function. In aesthetic mappings you can specify the maximum ymax
and minimum ymin
values to show the heights of the error bars. Use the same position
argument as used in previous command so that the error bars could place in the center of the bars. You can adjust width
of the error bars by setting the value for width argument.
Change title, X and Y axis labels
Let’s change the main title, X, and Y-axis labels of the plot. For this use labs()
function. In title
you can specify the main title of the plot. For X
and Y
arguments specify the labels to be used for the X and Y-axis.
Place alphabets on top of error bars
Now let’s add alphabets from the mean comparison test on top of bars. For this use geom_text()
function. In aesthetic mappings, you can specify X
, Y
and label
arguments. The X
argument specify the levels of the variable factor. In Y
argument specify the Y axis position of the alphabets. I am using mean values plus standard error (avg_A + se
) for this argument. This will place the alphabets on the top of error bars but not above of it.
To place the alphabets slightly above the error bars using vjust
vertical adjustment argument. Set the value minus 0.5 for this argument. In label
argument use the matrix of LSD group (ascend_A$groups
) alphabets as arranged in original order of the means. Use the same position argument as used in previous command for plotting bars. Printing this layer will show the alphabets on top of the error bars.
# Adding layers to p1 object
# Plotting bars
plotA = p1 +
geom_bar(stat = "identity",
color = "black",
position = position_dodge(width=0.9),
width = 0.8)
# Adding error bars
plotB = plotA +
geom_errorbar(aes(ymax = avg_A + se,
ymin = avg_A - se),
position = position_dodge(width=0.9),
width = 0.25)
# Changing main title, X & Y labels
plotC = plotB +
labs(title = "",
x = "genotype",
y = "MPI Activity")
# Adding lettering from the test applied (LSD$group)
plotD = plotC +
geom_text(aes(x = genotype,
y = avg_A + se,
label = as.matrix(ascend_A$groups)),
position = position_dodge(width = 0.9),
vjust = -(0.5))
Plotting gender
Create plotting object
The same functions will be used to plot bar graph for second factor. For plotting gender first create an object p2
using grammar of graphics function. Use same arguments in this object changing the values to represent the second factor data set (MeanSE_B
). Similarly, use gender
for x
argument and avg_B
for y
argument Print function will display the empty plot for this object just showing the X and Y axis labels.
Adding layers to the plotting object
For adding layers to this plot object use the same functions as used in adding layers for first variable factor. Just change the values of the arguments representing factor B instead of factor A. This will result in printing all the layers to the object plot.
# Adding layers to p2 object
# Adding bars
plotA = p2 +
geom_bar(stat = "identity",
color = "black",
position = position_dodge(width=0.9))
# Adding error bars
plotB = plotA +
geom_errorbar(aes(ymax = avg_B + se,
ymin = avg_B - se),
position = position_dodge(width=0.9),
width = 0.25)
# Changing main title, X and Y labels
plotC = plotB +
labs(title = "",
x = "gender",
y = "MPI Activity")
# Adding lettering from test applied
plotD = plotC +
geom_text(aes(x = gender,
y = avg_B + se,
label = as.matrix(ascend_B$groups)),
position = position_dodge(width = 0.9),
vjust = -(0.5))
Plotting interaction
Create plotting object
First create the plot object similar to the one created for main effects. However, you will change the arguments to represent the interaction data set (MeanSE_AB
). In aesthetic mappings define X
, Y
and fill
arguments. The argument X specify gender
or the variable name that you want to place on X axis. In Y
argument set the mean values for interaction term.
The argument fill
specify the second factor used to set the color and fill of geom elements on a plot. I shall set the genotype for this argument. Print function will display the empty plot for this object just showing the X and Y axis labels.
Add bars
Now add the layers to this plot object. Use same arguments for geom_bar()
as used in previous command to plot bars except replacing the values of arguments to represent the interaction term instead of main effects.
Changing fill color of bars
Now I shall type a different layer for adding color for fill and changing legend text. For this use scale_fill_manual()
function. In values
argument set the fill color for the bars. In labels
you can specify labels for the legend text. Print function for this layer will add fill color and labels for legend text.
Add error bars
To add error bars on top of the plot bars use geom_errorbar()
function. You can specify settings for aesthetic mappings (aes()
), position
and width
of the error bars. In aesthetic mapping set the maximum ymax
and minimum ymin
values for error bars on Y axis. I shall set these values by adding avg_AB + se
and subtracting avg_AB - se
standard error from height of the bars.
Specify position
argument same as used in previous command (commond for plotting bars geom_bar()
). This will result placing the error bars exactly on the same position as the bars were plotted. You can change the width of the error bars by setting the value for width
argument. Printing this layer will add error bars.
Add title, X and Y axis labels
To add main title
, x
and y
labels use the same arguments as used in previous commands while plotting main effects. An additional fill
argument can be specified for legend title. Printing this layer will add main title, X-axis and Y-axis labels and legend title to the plot.
Place alphabets on top of error bars
To add alphabets taken from group component of interaction LSD result use geom_text()
function. Use the same arguments which were specified while adding this layer in plotting main effects except specifying the interaction component instead of main effects. Printing this layer will display the alphabets on the top of error bars.
# Plotting bars
plotA = p3 +
geom_bar(stat = "identity",
color = "black",
position = position_dodge(width=0.9))
# Adding color for fill and changing legend text
plotB = plotA +
scale_fill_manual(values = gray(1:3/3),
labels = c("FF", "FO", "OO"))
# Adding error bars
plotC = plotB +
geom_errorbar(aes(ymax = avg_AB + se,
ymin = avg_AB - se),
position = position_dodge(width=0.9),
width = 0.25)
# Changing main title, X and Y labels and legend title
plotD = plotC +
labs(title = "",
x = "gender",
y = "MPI Activity",
fill = "genotype")
# Adding lettering from test applied
plotE = plotD +
geom_text(aes(x = gender,
y = avg_AB + se,
label = as.matrix(ascend_AB$groups)),
position = position_dodge(width = 0.9),
vjust = -(0.5))
I hope this tutorial will help you to understand how to visualize the bar plots for the main and interaction term.
Please comment below if you have any questions.
Download data file — Click_here
Download Rscript — Download Rscript
Download R program —
Click_here
Download R studio —
Click_here
Hi. The link for the data set is not working.
ReplyDeleteThanks for telling me. I shall renew it.
DeleteActually, all links are not hyperlinks.
DeleteHi Noah,
DeleteThanks for telling me. I have just corrected the links and now they are working. Please let me know if you still unable to downlead these files.
Hi. I've been trying to add the alphabets from LSD in by bar graph, however, I always encounter a problem. The app always prompt me this:
ReplyDeleteError: Aesthetics must be either length 1 or the same as the data (7): label
What should I do?
Make sure you have properly arranged the group component of LSD test result containing letters. In geom_text() function use the same object which was created while arranging the group component. Practice this example data set with Rscript to know why are you getting this error. Further you can send data through email and I shall see how to fix this error
DeleteHi
ReplyDeleteThanks for your efforts. It is very helpful and easy to follow. Could you please give me the code on how can I give LSD Bar (Interaction LSD Value) at the top of the graph?
Just as like this vedio https://youtu.be/s8MJdOTnrCA
You can download R script and data file from the download links provided at the end of this blog post.
DeleteThank you so much for the in depth tutorial! When I add the letters to my plot, how do I make it so that they are just above the SD line? Right now they are all in the same position but some are on top of the SD.
ReplyDeleteYou can send me the Rscript and data file at agron.infotech@gmail.com
DeleteI shall send you back the same after making necessary corrections
Hi, I would like to run Your script. That's exactly what I need. My dataset is very similar to Yours. Unfortunatelly after attach(MeanSE_A) I got an error massage:
ReplyDelete'The following objects are masked from MeanSE_A:' and it is followed by my grouping variable. The same happens in case of MeanSE_B and AB. What can be wrong? Could you help me with this please?
Thank You!
May be you have not converted grouping variable to factor variable before masking the components of data object.
Deletedata$factorA = as.factor(data$factorA)
data$factorB = as.factor(data$factorB)
attach(data)
Data is no longer available. Greetings!
ReplyDeleteThanks for letting me know. I have now embedded data file and Rscript files to the blog post. Hope this will fix the issue on permanent basis.
DeleteHello, cordial greetings. Excellent tutorial.
ReplyDeleteI am trying to graph the interaction with three factors (AxBxC) and I have not been able to.
You can guide me in that last part, I stay tuned.
Thanks a lot.
Just send me your data with one response variable at agron.infotech@gmail.com and I shall send you the code script file.
ReplyDeleteA bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. Here are few Bar Chart Examples for consideration.
ReplyDeleteThank you very much, this post solved my problem.
ReplyDeleteThank you this helps a lot! I just don't know to get a specific order of my factors in the tables (with the 'group_by()' and 'Arrange()' functions. I now get the barplots in the correct order, but the lettering from the LSD-test in the barplots is in the wrong order. Is there a way to manually reorder them (not alphabetically)?
ReplyDeleteHey thank you for sharing, but I have this error how can i fix it
ReplyDeleteError in Age:Height: NA/NaN argument
In addition: Warning messages:
1: In Age:Height:
numerical expression has 36 elements: only the first used
2: In data.frame(y, trt) : NAs introduced by coercion
3: In data.frame(y, trt) : NAs introduced by coercion
You can explore our enhanced website for additional tutorials on data analysis and visualizations utilizing more advanced techniques by accessing the following link:
Deletehttps://www.agroninfo.com/analysis/r-program/
I tried to use our data but, the LSD result for the interaction effect is giving me this error how can I fix it.
ReplyDeleteError in genotype:gender : NA/NaN argument
In addition: Warning messages:
1: In genotype:gender :
numerical expression has 18 elements: only the first used
2: In data.frame(y, trt) : NAs introduced by coercion
3: In data.frame(y, trt) : NAs introduced by coercion
Please share the code that you have used for conducting LSD test for interaction term?
DeleteThe message indicates that there is an error related to the 'Age:Height' argument, specifically mentioning that it has NA (Not Available) or NaN (Not a Number) values. Additionally, there are warning messages listed:
ReplyDelete1. The warning message indicates that in the 'Age:Height' expression, there are 36 elements, but only the first one is being utilized.
2. It suggests that there are NAs introduced due to coercion in the data frame with the variables 'y' and 'trt'.
3. Similar to the previous warning, it states that there are NAs introduced by coercion in the data frame with the variables 'y' and 'trt'.
To fix the error, you may need to check and handle the NA/NaN values in the 'Age:Height' argument and ensure that the data types in your data frame (specifically 'y' and 'trt') are compatible.
Make sure to convert factor variables to factors using the as.factor() function before conducting ANOVA or LSD tests. This should address the issue. If you encounter any further problems, please don't hesitate to ask.