Bargarph with factor categories
See Video ⮞ ☝ |
AGRON Stats Lectures
July 23, 2020
Introduction
This tutorial is in continuation of the previous one on
strip_plot_analysis
If you have not gone through the previous tutorial I would suggest first you should have a look at this.
Points of learning: * How to plot interaction bar graph with factor categories * How to show standard error bars on top of the graph bars.
Import data set
I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE))
. You can also clear the plots using graphics.off()
and clear everything in console using shell()
function.
Now let’s import the data set using read.csv()
function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file
argument specify the file name with extension CSV. In header
argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE
for this argument. The head()
function will print the first six rows of the data set.
# block Varieties Fertilizer yield
# 1 1 V1 F1 10.2
# 2 1 V1 F2 11.1
# 3 1 V1 F3 6.8
# 4 1 V1 F4 5.3
# 5 1 V2 F1 8.0
# 6 1 V2 F2 9.7
Download the data set used in this example —
Click_here
.
Get summary statistics
In the previous tutorial, I plotted interaction bar graph which actually does not look decent. We can change the graph to reshape it in better look with categories for treatment factors. For this first, we need to aggregate
our data by Varieties
and Fertilizer
and specify that we want to return the mean, standard deviation, and the number of observations for each group. By setting values for x
, by
and FUN
argument as shown below you will be able to get the required results.
require(stats)
attach(data)
mydata <- aggregate(x = yield,
by = list(Varieties = Varieties,
Fertilizer = Fertilizer),
FUN = function(x) c(mean = mean(x),
sd = sd(x),
n = length(x)))
head(mydata)
# Varieties Fertilizer x.mean x.sd x.n
# 1 V1 F1 11.175000 1.187083 4.000000
# 2 V2 F1 9.375000 1.946578 4.000000
# 3 V3 F1 4.900000 2.089657 4.000000
# 4 V1 F2 9.725000 1.043631 4.000000
# 5 V2 F2 9.775000 1.393736 4.000000
# 6 V3 F2 8.750000 1.826655 4.000000
After this, do a little manipulation since the previous function returned matrices instead of vectors. Use the base function of do.call()
to get the results as vectors.
# Varieties Fertilizer x.mean x.sd x.n
# 1 V1 F1 11.175 1.187083 4
# 2 V2 F1 9.375 1.946578 4
# 3 V3 F1 4.900 2.089657 4
# 4 V1 F2 9.725 1.043631 4
# 5 V2 F2 9.775 1.393736 4
# 6 V3 F2 8.750 1.826655 4
Compute standard error
Next you need to compute the standard error for each group. We can then rename the columns just for ease of use. To calculate standard error divide the standard deviation x.sd
with the square root of the number of observations x.n
used to calculate the deviation.
# [1] 0.5935416 0.9732891 1.0448285 0.5218157 0.6968680 0.9133273 0.7652614
# [8] 0.2954516 0.5893146 0.7226975 1.0680005 0.5121442
Define variable names
Use colnames()
function to assign variable names in squence for the above data set containing summary statistics. You can assign variable names using concatinate function c()
while the typing variable names in sequence in quotations.
# [1] "Varieties" "Fertilizer" "mean" "sd" "n"
# [6] "se"
Give below command to paste
in sequence the first and second factor variable from data. The head()
function will print the first six rows of the data set.
mydata$names <- c(paste(mydata$Varieties, "Varieties/",
mydata$Fertilizer, "Fertilizer"))
head(mydata)
# Varieties Fertilizer mean sd n se names
# 1 V1 F1 11.175 1.187083 4 0.5935416 V1 Varieties/ F1 Fertilizer
# 2 V2 F1 9.375 1.946578 4 0.9732891 V2 Varieties/ F1 Fertilizer
# 3 V3 F1 4.900 2.089657 4 1.0448285 V3 Varieties/ F1 Fertilizer
# 4 V1 F2 9.725 1.043631 4 0.5218157 V1 Varieties/ F2 Fertilizer
# 5 V2 F2 9.775 1.393736 4 0.6968680 V2 Varieties/ F2 Fertilizer
# 6 V3 F2 8.750 1.826655 4 0.9133273 V3 Varieties/ F2 Fertilizer
Upper bounds of error bars
It’s also a good habit to specify the upper bounds of your plot since the error bars are going to extend past the height of the bars.
# [1] 12.95562
Matrix of mean and SE
Get the matrix of mean for interaction term with tapply()
function. In list()
function specify the levels of the group variable.
# V1 V2 V3
# F1 11.175 9.375 4.900
# F2 9.725 9.775 8.750
# F3 9.025 9.425 3.775
# F4 6.175 5.625 2.175
Use the same tapply()
function to create the objects for mean and standard error.
tabbedMeans <- tapply(mydata$mean,
list(mydata$Fertilizer, mydata$Varieties),
function(x) c(x = x))
head(tabbedMeans)
# V1 V2 V3
# F1 11.175 9.375 4.900
# F2 9.725 9.775 8.750
# F3 9.025 9.425 3.775
# F4 6.175 5.625 2.175
tabbedSE <- tapply(mydata$se,
list(mydata$Fertilizer, mydata$Varieties),
function(x) c(x = x))
head(tabbedSE)
# V1 V2 V3
# F1 0.5935416 0.9732891 1.0448285
# F2 0.5218157 0.6968680 0.9133273
# F3 0.7652614 0.2954516 0.5893146
# F4 0.7226975 1.0680005 0.5121442
Plot bargraph for interaction term
Now plot the graph by using barplot()
function. Set the values for different arguments of the bar graph such as height
, beside
, Y axis limits ylim
, main
title, Y and X axis labels (ylab
and xlab
), border
and legend
.
barCenters <- barplot(height = tabbedMeans,
beside = TRUE,
las = 1,
ylim = c(0, plotTop),
cex.names = 0.75,
main = "Yield of strawberries",
ylab = "yield",
xlab = "Interaction",
border = "black",
axes = TRUE,
legend.text = TRUE,
args.legend = list(title = "Fertilizer",
x = "topright",
cex = .7,
bty = "n"))
This will plot a nice bar plot but it lacks standard error bars. For this use segment()
function to subtract and add the tabbedSE
in tabbedMeans
while keeping the error bars in the center of the graph bars.
segments(x0 = barCenters, y0 = tabbedMeans - tabbedSE,
x1 = barCenters, y1 = tabbedMeans + tabbedSE,
lwd = 1.5)
Now there is need to add arrows or cap for standard error bars. Use arrow()
function to add the cap on standard error bars both on top and bottom.
arrows(x0 = barCenters, y0 = tabbedMeans - tabbedSE,
x1 = barCenters, y1 = tabbedMeans + tabbedSE,
lwd = 1.5, angle = 90, code = 3, length = 0.05)
Please comment below if you have any questions.
Download data file — Click_here
Download Rscript — Download Rscript
Download R program —
Click_here
Download R studio —
Click_here
How do l manually assign colors to the Fertilizer variables
ReplyDeleteUse below color argument in barplot function
Deletecol = c("lightblue", "mistyrose", "lightcyan", "lavender")
Thank you for sharing your knowledge in R
ReplyDeletePlease, is the a way to plot for example the fertilizer on x-axis and fill (legends) using the varieties??
Yes just reverse sequence. In place of fertilizer you can write varieties and for varieties type fertilizer
ReplyDeleteI really appreciate your nice presentation and share us your knowledge. I tried my data set in your command and now looks fine. But after exporting my barplot, few of the variety names in the Xlab disappeared. How can I solve this problem? Thank you for your time and effort.
ReplyDeleteSince I have ~43 varieties, putting their label horizontally is not displayed all. So how to display them at 90 degree on Xlab?
ReplyDelete