Elegant barplot using ggplot function in R

See Video  ☝

Introduction

You will learn how to visualize bar graph for main and interaction effects using R studio. I shall use two way analysis of variance technique to fit a model for the given data set. Least significant difference test will be applied for mean comparisons. Bar graphs will be plotted using the alphabets from group component of LSD test to place just above the error bars on the plot.

Let's get started!

Importing data file

I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE)) before importing the data set. You can also clear the plots using graphics.off() and clear everything in console using shell() function.

rm(list = ls(all = TRUE))
graphics.off()
shell("cls")

Now let’s import the data set using read.csv() function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file argument specify the file name with extension CSV.

In header argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE for this argument. The head() function will print the first six rows of the data set.

data <- read.csv(file = "data_2way.csv", 
                 header = TRUE)
head(data)
#   genotype gender activity
# 1       FF Female     3.39
# 2       FF Female     3.34
# 3       FF Female     3.59
# 4       FO Female     3.58
# 5       FO Female     4.12
# 6       FO Female     4.72

Before proceeding for analysis it is a better practice to have a look on the data set to see the structure of variables. The data looks fine as shown in below table. The second and third variables are being read as factor which is true for the given data set. In case if it is integer or character then you can change it to factor using as.factor() function as shown below.

str(data)
# 'data.frame': 18 obs. of  3 variables:
#  $ genotype: Factor w/ 3 levels "FF","FO","OO": 1 1 1 2 2 2 3 3 3 1 ...
#  $ gender  : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 2 ...
#  $ activity: num  3.39 3.34 3.59 3.58 4.12 4.72 5.06 4.05 4.09 2.2 ...
data$genotype <- as.factor(x = data$genotype)
str(data)
# 'data.frame': 18 obs. of  3 variables:
#  $ genotype: Factor w/ 3 levels "FF","FO","OO": 1 1 1 2 2 2 3 3 3 1 ...
#  $ gender  : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 2 ...
#  $ activity: num  3.39 3.34 3.59 3.58 4.12 4.72 5.06 4.05 4.09 2.2 ...

Fit analysis of variance model

First, you need to load the library agricolae using require() function. If this package is not installed then first install this package using install.pakcages("agricolae") command in R console.

For analysis of variance of split-plot design assign model by using sp.plot() function. In parenthesis for block argument you can type the block or replication variable name from data set. For pplot argument specify the main plot variable name (varieties var in this example) and for splot argument specify the sub-plot variable name (seedling age age in this example). In Y argument you can specify the response variable name as heading in this example.

This will construct the analysis of variance table showing the significance of sources of variation.

attach(data)
library(stats)
aov.res <- aov(activity ~ 
                          genotype + 
                          gender + 
                          genotype:gender)
# OR
aov.res <- aov(activity ~ 
                          genotype*gender)
anova(aov.res)
# Analysis of Variance Table
# 
# Response: activity
#                 Df Sum Sq Mean Sq F value   Pr(>F)   
# genotype         2 6.2525 3.12627 10.9923 0.001938 **
# gender           1 1.3833 1.38334  4.8640 0.047669 * 
# genotype:gender  2 0.7911 0.39554  1.3908 0.286269   
# Residuals       12 3.4129 0.28441                    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

LSD mean comparison test

LSD applied on genotypes

Now let’s compute multiple comparison of treatments by using least significant difference test with grouping of treatments. To apply this test load agricolae package using library() function. LSD.test() function is used for multiple comparisons of treatments.

This function require some arguments. In Y argument you can specify model or use response variable name as activity in this case. In trt argument specify the first factor variable. You can set the value for DFerror by writing df.residual with model object after dollar sign (aov.res$df.residual).

To set the value of error mean square MSerror divide deviance of model object with residual degree of freedom. I shall use the default value for alpha argument that represents the 5 percent level of significance. In method argument (p.adj) for adjusting probability value, I shall use bonferroni which is most commonly used method. Set a logical value for group argument to specify grouping of treatment means. Set a logical value for console argument to print the outputs.

library(agricolae)

# First factor variable (factor A)
LSD_A = LSD.test(y = activity,
                 trt = genotype,
                 DFerror = aov.res$df.residual,
                 MSerror = deviance(aov.res)/aov.res$df.residual,
                 alpha = 0.05,
                 p.adj = "bonferroni",
                 group = TRUE,
                 console = TRUE)
# 
# Study: activity ~ genotype
# 
# LSD t Test for activity 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  0.2844056 
# 
# genotype,  means and individual ( 95 %) CI
# 
#    activity       std r      LCL      UCL  Min  Max
# FF 2.973333 0.5458816 6 2.498968 3.447699 2.20 3.59
# FO 3.760000 0.5871967 6 3.285634 4.234366 3.12 4.72
# OO 4.415000 0.6889630 6 3.940634 4.889366 3.43 5.26
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.779473 
# 
# Minimum Significant Difference: 0.8557972 
# 
# Treatments with the same letter are not significantly different.
# 
#    activity groups
# OO 4.415000      a
# FO 3.760000     ab
# FF 2.973333      b

LSD applied on gender

To compute multiple comparisons of treatments for the second factor gender you will need to only change the trt argument. Type the second variable factor name instead of first variable factor as used in previous command. Rest of all arguments will remain the same. This will print the outputs for the second variable factor that is gender.

# Second factor variable (factor B)
LSD_B = LSD.test(y = activity,
                 trt = gender,
                 DFerror = aov.res$df.residual,
                 MSerror = deviance(aov.res)/aov.res$df.residual,
                 alpha = 0.05,
                 p.adj = "bonferroni",
                 group = TRUE,
                 console = TRUE)
# 
# Study: activity ~ gender
# 
# LSD t Test for activity 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  0.2844056 
# 
# gender,  means and individual ( 95 %) CI
# 
#        activity       std r      LCL      UCL  Min  Max
# Female 3.993333 0.5935908 9 3.606015 4.380651 3.34 5.06
# Male   3.438889 0.9770932 9 3.051571 3.826207 2.20 5.26
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.178813 
# 
# Minimum Significant Difference: 0.5477504 
# 
# Treatments with the same letter are not significantly different.
# 
#        activity groups
# Female 3.993333      a
# Male   3.438889      b

LSD applied on interaction term

To compute multiple comparison of treatments for interaction term again use the same arguments as used in previous command except the treatment trt argument. In treatment argument specify the interaction term as used in model object for analysis of variance (genotype:gender). This will print the outputs for the interaction effect.

LSD_AB = LSD.test(y = activity,
                  trt = genotype:gender,
                  DFerror = aov.res$df.residual,
                  MSerror = deviance(aov.res)/aov.res$df.residual,
                  alpha = 0.05,
                  p.adj = "bonferroni",
                  group = TRUE,
                  console = TRUE)
# 
# Study: activity ~ genotype:gender
# 
# LSD t Test for activity 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  0.2844056 
# 
# genotype:gender,  means and individual ( 95 %) CI
# 
#           activity       std r      LCL      UCL  Min  Max
# FF:Female 3.440000 0.1322876 3 2.769146 4.110854 3.34 3.59
# FF:Male   2.506667 0.2722744 3 1.835812 3.177521 2.20 2.72
# FO:Female 4.140000 0.5702631 3 3.469146 4.810854 3.58 4.72
# FO:Male   3.380000 0.3218695 3 2.709146 4.050854 3.12 3.74
# OO:Female 4.400000 0.5719266 3 3.729146 5.070854 4.05 5.06
# OO:Male   4.430000 0.9267686 3 3.759146 5.100854 3.43 5.26
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 3.648889 
# 
# Minimum Significant Difference: 1.588854 
# 
# Treatments with the same letter are not significantly different.
# 
#           activity groups
# OO:Male   4.430000      a
# OO:Female 4.400000      a
# FO:Female 4.140000      a
# FF:Female 3.440000     ab
# FO:Male   3.380000     ab
# FF:Male   2.506667      b

Original order of groups (for lettering)

Original order of genotypes

Next step is to arrange the treatment groups in the original order of treatments rather than ranked order of LSD test result. This will help while adding alphabets from treatment group in the same order as the order of treatments in the plot. Connect the group component from LSD object (LSD_A$group) with group_by() and arrange() functions for the same component using the %>% pipe operator. Print function for this object will show the original order of treatments.

library(dplyr)
## Original order of LSD$group for factor A
ascend_A = LSD_A$groups %>%
          group_by(rownames(LSD_A$groups)) %>%
          arrange(rownames(LSD_A$groups))

print(ascend_A)
# # A tibble: 3 x 3
# # Groups:   rownames(LSD_A$groups) [3]
#   activity groups `rownames(LSD_A$groups)`
#      <dbl> <fct>  <chr>                   
# 1     2.97 b      FF                      
# 2     3.76 ab     FO                      
# 3     4.42 a      OO

Original order of gender

Use the same commands to arrange treatment groups in original order for the second factor variable. For this use treatment group component of LSD_B object (LSD_B$group). Print function for this object will show the original order of treatments for second factor.

# Original order of LSD$group for factor B
ascend_B = LSD_B$groups %>%
          group_by(rownames(LSD_B$groups)) %>%
          arrange(rownames(LSD_B$groups))

print(ascend_B)
# # A tibble: 2 x 3
# # Groups:   rownames(LSD_B$groups) [2]
#   activity groups `rownames(LSD_B$groups)`
#      <dbl> <fct>  <chr>                   
# 1     3.99 a      Female                  
# 2     3.44 b      Male

Original order of interaction term

Use the same commands again to arrange treatment groups in original order for the interaction term. For this use treatment group component of LSD_AB object (LSD_AB$group). Print function for this object will show the original order of treatments.

## Original order of LSD$group for interaction
ascend_AB = LSD_AB$groups %>%
          group_by(rownames(LSD_AB$groups)) %>%
          arrange(rownames(LSD_AB$groups))

print(ascend_AB)
# # A tibble: 6 x 3
# # Groups:   rownames(LSD_AB$groups) [6]
#   activity groups `rownames(LSD_AB$groups)`
#      <dbl> <fct>  <chr>                    
# 1     3.44 ab     FF:Female                
# 2     2.51 b      FF:Male                  
# 3     4.14 a      FO:Female                
# 4     3.38 ab     FO:Male                  
# 5     4.40 a      OO:Female                
# 6     4.43 a      OO:Male

Visualizing effects

Mean and standard error (genotype)

Before visualizing main and interaction effects first compute mean values and standard error. For this load the library dplyr that states the grammar of data manipulation. Connect data set to a command that will it by genotypes by using group_by() function and then connect with summarise() function to get mean (avg_A) and standard error (se) values. For connection we shall use %>% pipe operator.

In summarize function the objects for mean and standard error will be created. Print function will display the results for this object.

library(dplyr)
### Mean and SE for first factor (Factor A)
MeanSE_A = data %>%
          group_by(genotype) %>%
          summarise(avg_A = mean(activity),
                    se = sd(activity)/sqrt(length(activity)))

print(MeanSE_A)
# # A tibble: 3 x 3
#   genotype avg_A    se
#   <fct>    <dbl> <dbl>
# 1 FF        2.97 0.223
# 2 FO        3.76 0.240
# 3 OO        4.42 0.281
attach(MeanSE_A)

Mean and standard error (gender)

Similarly, get mean values and standard error for second variable factor using the same command except connecting the data set to group_by() gender using %>% pipe operator. Print function for this object will display the results.

### Mean and SE for second factor (Factor B)
MeanSE_B = data %>%
          group_by(gender) %>%
          summarise(avg_B = mean(activity),
                    se = sd(activity)/sqrt(length(activity)))
print(MeanSE_B)
# # A tibble: 2 x 3
#   gender avg_B    se
#   <fct>  <dbl> <dbl>
# 1 Female  3.99 0.198
# 2 Male    3.44 0.326
attach(MeanSE_B)

Mean and standard error (interaction)

Again use the same command for interaction term. Here you will connect the data set to group_by() genotype and gender using the %>% pipe operator. Print function for this object will display the results.

## Mean and SE for interaction effect (AxB)
MeanSE_AB = data %>%
          group_by(genotype, gender) %>%
          summarise(avg_AB = mean(activity),
                    se = sd(activity)/sqrt(length(activity)))

print(MeanSE_AB)
# # A tibble: 6 x 4
# # Groups:   genotype [3]
#   genotype gender avg_AB     se
#   <fct>    <fct>   <dbl>  <dbl>
# 1 FF       Female   3.44 0.0764
# 2 FF       Male     2.51 0.157 
# 3 FO       Female   4.14 0.329 
# 4 FO       Male     3.38 0.186 
# 5 OO       Female   4.40 0.330 
# 6 OO       Male     4.43 0.535
attach(MeanSE_AB)

Plotting genotypes

Create plotting object

Let’s start to display the main and interaction effects using grammar of graphics ggplot() function. This will require to load the package ggplot2. Create an object p1 using the grammar of graphics function.

This function requires data set (MeanSE_A) and some aesthetic mappings to use for the plot. In aesthetic mappings function the argument X specify the variable genotype and argument Y specify the variable mean values object (avg_A) created earlier. Print function will print a blank plot showing X and Y axis labels.

library(ggplot2)

# For first factor (Factor A = genotype)
p1 = ggplot(MeanSE_A, aes(x = genotype,
                          y = avg_A))
print(p1)

Add bars

Now let’s add layers to this plot object. For plotting bars use geom_bar() function. In stat argument set the value as identity to leave the data as is. In color argument specify the color for borders of the bars. The position argument specify the Position adjustment, either as a string, or the result of a call to a position adjustment function. The argument width specify the width of the bars. Printing this layer with plot object will display bars on the plot.

Add error bars

Let’s add the second layer to show error bars on the bars of the plot. For this use geom_errorbar() function. In aesthetic mappings you can specify the maximum ymax and minimum ymin values to show the heights of the error bars. Use the same position argument as used in previous command so that the error bars could place in the center of the bars. You can adjust width of the error bars by setting the value for width argument.

Change title, X and Y axis labels

Let’s change the main title, X, and Y-axis labels of the plot. For this use labs() function. In title you can specify the main title of the plot. For X and Y arguments specify the labels to be used for the X and Y-axis.

Place alphabets on top of error bars

Now let’s add alphabets from the mean comparison test on top of bars. For this use geom_text() function. In aesthetic mappings, you can specify X, Y and label arguments. The X argument specify the levels of the variable factor. In Y argument specify the Y axis position of the alphabets. I am using mean values plus standard error (avg_A + se) for this argument. This will place the alphabets on the top of error bars but not above of it.

To place the alphabets slightly above the error bars using vjust vertical adjustment argument. Set the value minus 0.5 for this argument. In label argument use the matrix of LSD group (ascend_A$groups) alphabets as arranged in original order of the means. Use the same position argument as used in previous command for plotting bars. Printing this layer will show the alphabets on top of the error bars.

# Adding layers to p1 object
# Plotting bars
plotA = p1 + 
          geom_bar(stat = "identity",
                   color = "black",
                   position = position_dodge(width=0.9),
                   width = 0.8)
# Adding error bars       
plotB = plotA + 
          geom_errorbar(aes(ymax = avg_A + se,
                            ymin = avg_A - se), 
                        position = position_dodge(width=0.9), 
                        width = 0.25)
# Changing main title, X & Y labels
plotC = plotB +
          labs(title = "",
               x = "genotype",
               y = "MPI Activity")
# Adding lettering from the test applied (LSD$group)
plotD = plotC +
          geom_text(aes(x = genotype,
                        y = avg_A + se,
                        label = as.matrix(ascend_A$groups)),
                    position = position_dodge(width = 0.9),
                    vjust = -(0.5))

Plotting gender

Create plotting object

The same functions will be used to plot bar graph for second factor. For plotting gender first create an object p2 using grammar of graphics function. Use same arguments in this object changing the values to represent the second factor data set (MeanSE_B). Similarly, use gender for x argument and avg_B for y argument Print function will display the empty plot for this object just showing the X and Y axis labels.

# For second factor (Factor B = gender) 
p2 = ggplot(MeanSE_B, aes(x = gender,
                          y = avg_B))
print(p2)

Adding layers to the plotting object

For adding layers to this plot object use the same functions as used in adding layers for first variable factor. Just change the values of the arguments representing factor B instead of factor A. This will result in printing all the layers to the object plot.

# Adding layers to p2 object
# Adding bars
plotA = p2 + 
          geom_bar(stat = "identity",
                   color = "black",
                   position = position_dodge(width=0.9))
# Adding error bars  
plotB = plotA +
          geom_errorbar(aes(ymax = avg_B + se,
                            ymin = avg_B - se), 
                        position = position_dodge(width=0.9), 
                        width = 0.25)
# Changing main title, X and Y labels
plotC = plotB + 
          labs(title = "",
               x = "gender",
               y = "MPI Activity")
# Adding lettering from test applied
plotD = plotC + 
          geom_text(aes(x = gender,
                        y = avg_B + se,
                        label = as.matrix(ascend_B$groups)),
                    position = position_dodge(width = 0.9),
                    vjust = -(0.5))

Plotting interaction

Create plotting object

First create the plot object similar to the one created for main effects. However, you will change the arguments to represent the interaction data set (MeanSE_AB). In aesthetic mappings define X, Y and fill arguments. The argument X specify gender or the variable name that you want to place on X axis. In Y argument set the mean values for interaction term.

The argument fill specify the second factor used to set the color and fill of geom elements on a plot. I shall set the genotype for this argument. Print function will display the empty plot for this object just showing the X and Y axis labels.

library(ggplot2)
p3 = ggplot(MeanSE_AB, aes(x = gender,
                          y = avg_AB,
                          fill = factor(genotype)))
print(p3)

Add bars

Now add the layers to this plot object. Use same arguments for geom_bar() as used in previous command to plot bars except replacing the values of arguments to represent the interaction term instead of main effects.

Changing fill color of bars

Now I shall type a different layer for adding color for fill and changing legend text. For this use scale_fill_manual() function. In values argument set the fill color for the bars. In labels you can specify labels for the legend text. Print function for this layer will add fill color and labels for legend text.

Add error bars

To add error bars on top of the plot bars use geom_errorbar() function. You can specify settings for aesthetic mappings (aes()), position and width of the error bars. In aesthetic mapping set the maximum ymax and minimum ymin values for error bars on Y axis. I shall set these values by adding avg_AB + se and subtracting avg_AB - se standard error from height of the bars.

Specify position argument same as used in previous command (commond for plotting bars geom_bar()). This will result placing the error bars exactly on the same position as the bars were plotted. You can change the width of the error bars by setting the value for width argument. Printing this layer will add error bars.

Add title, X and Y axis labels

To add main title, x and y labels use the same arguments as used in previous commands while plotting main effects. An additional fill argument can be specified for legend title. Printing this layer will add main title, X-axis and Y-axis labels and legend title to the plot.

Place alphabets on top of error bars

To add alphabets taken from group component of interaction LSD result use geom_text() function. Use the same arguments which were specified while adding this layer in plotting main effects except specifying the interaction component instead of main effects. Printing this layer will display the alphabets on the top of error bars.

# Plotting bars
plotA = p3 + 
          geom_bar(stat = "identity",
                   color = "black",
                   position = position_dodge(width=0.9))
# Adding color for fill and changing legend text
plotB = plotA + 
          scale_fill_manual(values = gray(1:3/3),
                            labels = c("FF", "FO", "OO"))
# Adding error bars
plotC = plotB + 
          geom_errorbar(aes(ymax = avg_AB + se,
                            ymin = avg_AB - se), 
                        position = position_dodge(width=0.9), 
                        width = 0.25)
# Changing main title, X and Y labels and legend title
plotD = plotC + 
          labs(title = "",
               x = "gender",
               y = "MPI Activity",
               fill = "genotype")
# Adding lettering from test applied
plotE = plotD + 
          geom_text(aes(x = gender,
                        y = avg_AB + se,
                        label = as.matrix(ascend_AB$groups)),
                    position = position_dodge(width = 0.9),
                    vjust = -(0.5))

I hope this tutorial will help you to understand how to visualize the bar plots for the main and interaction term.

Please comment below if you have any questions.

Download data file — Click_here

Download Rscript — Download Rscript

Download R program — Click_here

Download R studio — Click_here

Comments

  1. Hi. The link for the data set is not working.

    ReplyDelete
    Replies
    1. Thanks for telling me. I shall renew it.

      Delete
    2. Actually, all links are not hyperlinks.

      Delete
    3. Hi Noah,

      Thanks for telling me. I have just corrected the links and now they are working. Please let me know if you still unable to downlead these files.

      Delete
  2. Hi. I've been trying to add the alphabets from LSD in by bar graph, however, I always encounter a problem. The app always prompt me this:

    Error: Aesthetics must be either length 1 or the same as the data (7): label

    What should I do?

    ReplyDelete
    Replies
    1. Make sure you have properly arranged the group component of LSD test result containing letters. In geom_text() function use the same object which was created while arranging the group component. Practice this example data set with Rscript to know why are you getting this error. Further you can send data through email and I shall see how to fix this error

      Delete
  3. Hi
    Thanks for your efforts. It is very helpful and easy to follow. Could you please give me the code on how can I give LSD Bar (Interaction LSD Value) at the top of the graph?
    Just as like this vedio https://youtu.be/s8MJdOTnrCA

    ReplyDelete
    Replies
    1. You can download R script and data file from the download links provided at the end of this blog post.

      Delete
  4. Thank you so much for the in depth tutorial! When I add the letters to my plot, how do I make it so that they are just above the SD line? Right now they are all in the same position but some are on top of the SD.

    ReplyDelete
    Replies
    1. You can send me the Rscript and data file at agron.infotech@gmail.com

      I shall send you back the same after making necessary corrections

      Delete
  5. Hi, I would like to run Your script. That's exactly what I need. My dataset is very similar to Yours. Unfortunatelly after attach(MeanSE_A) I got an error massage:

    'The following objects are masked from MeanSE_A:' and it is followed by my grouping variable. The same happens in case of MeanSE_B and AB. What can be wrong? Could you help me with this please?
    Thank You!

    ReplyDelete
    Replies
    1. May be you have not converted grouping variable to factor variable before masking the components of data object.

      data$factorA = as.factor(data$factorA)
      data$factorB = as.factor(data$factorB)
      attach(data)

      Delete
  6. Replies
    1. Thanks for letting me know. I have now embedded data file and Rscript files to the blog post. Hope this will fix the issue on permanent basis.

      Delete
  7. Hello, cordial greetings. Excellent tutorial.

    I am trying to graph the interaction with three factors (AxBxC) and I have not been able to.

    You can guide me in that last part, I stay tuned.
    Thanks a lot.

    ReplyDelete
  8. Just send me your data with one response variable at agron.infotech@gmail.com and I shall send you the code script file.

    ReplyDelete
  9. A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. Here are few Bar Chart Examples for consideration.

    ReplyDelete
  10. Thank you very much, this post solved my problem.

    ReplyDelete
  11. Thank you this helps a lot! I just don't know to get a specific order of my factors in the tables (with the 'group_by()' and 'Arrange()' functions. I now get the barplots in the correct order, but the lettering from the LSD-test in the barplots is in the wrong order. Is there a way to manually reorder them (not alphabetically)?

    ReplyDelete

Post a Comment

Popular posts from this blog

Two way repeated measures analysis in R

Split plot analysis in R