Bargarph with factor categories

See Video  ☝


Introduction

This tutorial is in continuation of the previous one on strip_plot_analysis

If you have not gone through the previous tutorial I would suggest first you should have a look at this.

Points of learning: * How to plot interaction bar graph with factor categories * How to show standard error bars on top of the graph bars.

Import data set

I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE)). You can also clear the plots using graphics.off() and clear everything in console using shell() function.

rm(list = ls(all = TRUE))
graphics.off()
shell("cls")

Now let’s import the data set using read.csv() function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file argument specify the file name with extension CSV. In header argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE for this argument. The head() function will print the first six rows of the data set.

data <- read.csv(file = "data_strip.csv", 
                 header = TRUE)
head(data)
#   block Varieties Fertilizer yield
# 1     1        V1         F1  10.2
# 2     1        V1         F2  11.1
# 3     1        V1         F3   6.8
# 4     1        V1         F4   5.3
# 5     1        V2         F1   8.0
# 6     1        V2         F2   9.7

Download the data set used in this example — Click_here.

Get summary statistics

In the previous tutorial, I plotted interaction bar graph which actually does not look decent. We can change the graph to reshape it in better look with categories for treatment factors. For this first, we need to aggregate our data by Varieties and Fertilizer and specify that we want to return the mean, standard deviation, and the number of observations for each group. By setting values for x, by and FUN argument as shown below you will be able to get the required results.

require(stats)

attach(data)

mydata <- aggregate(x = yield, 
                    by = list(Varieties = Varieties, 
                              Fertilizer = Fertilizer), 
                    FUN = function(x) c(mean = mean(x), 
                                        sd = sd(x), 
                                        n = length(x)))
head(mydata)
#   Varieties Fertilizer    x.mean      x.sd       x.n
# 1        V1         F1 11.175000  1.187083  4.000000
# 2        V2         F1  9.375000  1.946578  4.000000
# 3        V3         F1  4.900000  2.089657  4.000000
# 4        V1         F2  9.725000  1.043631  4.000000
# 5        V2         F2  9.775000  1.393736  4.000000
# 6        V3         F2  8.750000  1.826655  4.000000

After this, do a little manipulation since the previous function returned matrices instead of vectors. Use the base function of do.call() to get the results as vectors.

mydata <- do.call(data.frame, mydata)
head(mydata)
#   Varieties Fertilizer x.mean     x.sd x.n
# 1        V1         F1 11.175 1.187083   4
# 2        V2         F1  9.375 1.946578   4
# 3        V3         F1  4.900 2.089657   4
# 4        V1         F2  9.725 1.043631   4
# 5        V2         F2  9.775 1.393736   4
# 6        V3         F2  8.750 1.826655   4

Compute standard error

Next you need to compute the standard error for each group. We can then rename the columns just for ease of use. To calculate standard error divide the standard deviation x.sd with the square root of the number of observations x.n used to calculate the deviation.

mydata$se <- mydata$x.sd / sqrt(mydata$x.n)
print(mydata$se)
#  [1] 0.5935416 0.9732891 1.0448285 0.5218157 0.6968680 0.9133273 0.7652614
#  [8] 0.2954516 0.5893146 0.7226975 1.0680005 0.5121442

Define variable names

Use colnames() function to assign variable names in squence for the above data set containing summary statistics. You can assign variable names using concatinate function c() while the typing variable names in sequence in quotations.

colnames(mydata) <- c("Varieties", "Fertilizer", "mean", 
                      "sd", "n", "se")
colnames(mydata)
# [1] "Varieties"  "Fertilizer" "mean"       "sd"         "n"         
# [6] "se"

Give below command to paste in sequence the first and second factor variable from data. The head() function will print the first six rows of the data set.

mydata$names <- c(paste(mydata$Varieties, "Varieties/", 
                        mydata$Fertilizer, "Fertilizer"))
head(mydata)
#   Varieties Fertilizer   mean       sd n        se                       names
# 1        V1         F1 11.175 1.187083 4 0.5935416 V1 Varieties/ F1 Fertilizer
# 2        V2         F1  9.375 1.946578 4 0.9732891 V2 Varieties/ F1 Fertilizer
# 3        V3         F1  4.900 2.089657 4 1.0448285 V3 Varieties/ F1 Fertilizer
# 4        V1         F2  9.725 1.043631 4 0.5218157 V1 Varieties/ F2 Fertilizer
# 5        V2         F2  9.775 1.393736 4 0.6968680 V2 Varieties/ F2 Fertilizer
# 6        V3         F2  8.750 1.826655 4 0.9133273 V3 Varieties/ F2 Fertilizer

Upper bounds of error bars

It’s also a good habit to specify the upper bounds of your plot since the error bars are going to extend past the height of the bars.

plotTop <- max(mydata$mean) + 
           mydata[mydata$mean == max(mydata$mean), 6] * 3
plotTop
# [1] 12.95562

Matrix of mean and SE

Get the matrix of mean for interaction term with tapply() function. In list() function specify the levels of the group variable.

tapply(mydata$mean, 
       list(mydata$Fertilizer, mydata$Varieties),
       function(x) c(x = x))
#        V1    V2    V3
# F1 11.175 9.375 4.900
# F2  9.725 9.775 8.750
# F3  9.025 9.425 3.775
# F4  6.175 5.625 2.175

Use the same tapply() function to create the objects for mean and standard error.

tabbedMeans <- tapply(mydata$mean, 
                      list(mydata$Fertilizer, mydata$Varieties),
                      function(x) c(x = x))
head(tabbedMeans)
#        V1    V2    V3
# F1 11.175 9.375 4.900
# F2  9.725 9.775 8.750
# F3  9.025 9.425 3.775
# F4  6.175 5.625 2.175
tabbedSE <- tapply(mydata$se, 
                   list(mydata$Fertilizer, mydata$Varieties),
                   function(x) c(x = x))
head(tabbedSE)
#           V1        V2        V3
# F1 0.5935416 0.9732891 1.0448285
# F2 0.5218157 0.6968680 0.9133273
# F3 0.7652614 0.2954516 0.5893146
# F4 0.7226975 1.0680005 0.5121442

Plot bargraph for interaction term

Now plot the graph by using barplot() function. Set the values for different arguments of the bar graph such as height, beside, Y axis limits ylim, main title, Y and X axis labels (ylab and xlab), border and legend.

barCenters <- barplot(height = tabbedMeans,
                      beside = TRUE, 
                      las = 1,
                      ylim = c(0, plotTop),
                      cex.names = 0.75,
                      main = "Yield of strawberries",
                      ylab = "yield",
                      xlab = "Interaction",
                      border = "black", 
                      axes = TRUE,
                      legend.text = TRUE,
                      args.legend = list(title = "Fertilizer",
                                         x = "topright", 
                                         cex = .7,
                                         bty = "n"))

This will plot a nice bar plot but it lacks standard error bars. For this use segment() function to subtract and add the tabbedSE in tabbedMeans while keeping the error bars in the center of the graph bars.

segments(x0 = barCenters, y0 = tabbedMeans - tabbedSE,
         x1 = barCenters, y1 = tabbedMeans + tabbedSE,
         lwd = 1.5)

Now there is need to add arrows or cap for standard error bars. Use arrow() function to add the cap on standard error bars both on top and bottom.

arrows(x0 = barCenters, y0 = tabbedMeans - tabbedSE, 
       x1 = barCenters, y1 = tabbedMeans + tabbedSE, 
       lwd = 1.5, angle = 90, code = 3, length = 0.05)

Please comment below if you have any questions.


Download data file — Click_here

Download Rscript — Download Rscript


Download R program — Click_here

Download R studio — Click_here


Comments

  1. How do l manually assign colors to the Fertilizer variables

    ReplyDelete
    Replies
    1. Use below color argument in barplot function

      col = c("lightblue", "mistyrose", "lightcyan", "lavender")

      Delete
  2. Thank you for sharing your knowledge in R
    Please, is the a way to plot for example the fertilizer on x-axis and fill (legends) using the varieties??

    ReplyDelete
  3. Yes just reverse sequence. In place of fertilizer you can write varieties and for varieties type fertilizer

    ReplyDelete
  4. I really appreciate your nice presentation and share us your knowledge. I tried my data set in your command and now looks fine. But after exporting my barplot, few of the variety names in the Xlab disappeared. How can I solve this problem? Thank you for your time and effort.

    ReplyDelete
  5. Since I have ~43 varieties, putting their label horizontally is not displayed all. So how to display them at 90 degree on Xlab?

    ReplyDelete

Post a Comment

Popular posts from this blog

Two way repeated measures analysis in R

Split plot analysis in R

Visualizing clustering dendrogram in R | Hierarchical clustering