Split plot analysis in R
See Video ⮞ ☝ |
AGRON Stats
June 26, 2018
You will learn how to carry out split plot analysis, mean separation test using LSD.test()
function and plotting the main and interaction effects using base graphics functions in R studio.
Description of design
Split-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized, completely randomized block, and Latin square designs. The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.
For example, In case of rice crop if there are two varieties and three seedling age and we want more precision for the seedling age, then with randomized complete block design with three replications, the varieties are randomly assigned to the main plots within three blocks using a separate randomization for each. Then the levels of seedling age are randomly assigned to the subplots within the main plots using a separate randomization in each main plot. The layout is shown in the table below.
V1A1 | V2A2 | V2A1 | V1A3 |
V1A3 | V2A1 | V2A2 | V1A2 |
V1A2 | V2A3 | V2A3 | V1A1 |
The total number of treatments for this experiment are 2 x 3 = 6
. From the layout, it is clear that all the 6 treatments occur once in each block but within a block, all levels of two varieties occur together, so with respect to varieties, it is a randomized complete block with two varieties and three blocks. But is an incomplete block as for as the full set of treatments is concerned.
Split-plot designs are often called incomplete block designs.
Import data file
A researcher was interested to compare 2 varieties of rice and three seedling age (2 weeks old, 3 weeks old & 4 weeks old). An experiment was conducted in a completely randomized design with 3 replications while varieties were kept in the main plot and seedling age in subplots.
The response variable was recorded as time in days from heading to maturity of rice plants. I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE))
before importing the data set. You can also clear the plots using graphics.off()
and clear everyting in console using shell()
function.
rm(list = ls(all = TRUE))
graphics.off()
shell("cls")
Now let’s import the data set using read.csv()
function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file
argument specify the file name with extension CSV.
In header
argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE
for this argument. The head()
function will print the first six rows of the data set.
<- read.csv(file = "data_split.csv",
data header = TRUE)
head(data)
# block var age heading
# 1 1 Super 2W 89
# 2 1 Super 3W 94
# 3 1 Super 4W 96
# 4 1 Shaheen 2W 75
# 5 1 Shaheen 3W 80
# 6 1 Shaheen 4W 84
Observe variables
Before proceeding for analysis it is a better practice to have a look on the data set to see the structure of variables. The data looks fine as shown in below table. The second and third variables are being read as factor which is true for the given data set. In case if it is integer or character then you can change it to factor using as.factor()
function as shown below
$var = as.factor(data$var)
data$age = as.factor(data$age)
datastr(data)
# 'data.frame': 26 obs. of 4 variables:
# $ block : int 1 1 1 1 1 1 2 2 2 2 ...
# $ var : Factor w/ 3 levels "","Shaheen","Super": 3 3 3 2 2 2 3 3 3 2 ...
# $ age : Factor w/ 4 levels "","2W","3W","4W": 2 3 4 2 3 4 2 3 4 2 ...
# $ heading: int 89 94 96 75 80 84 85 99 101 72 ...
attach(data)
Fit analysis of variance model
First, you need to load the library agricolae
using require()
function. If this package is not installed then first install this package using install.pakcages("agricolae")
command in R console.
To fit the analysis of variance model for split-plot design assign a function to model
object by using sp.plot()
function. In parenthesis for block
argument you can type the block or replication variable name from data set. For pplot
argument specify the main plot variable name (varieties var
in this example) and for splot
argument specify the sub-plot variable name (seedling age age
in this example). In Y
argument you can specify the response variable name as heading
in this example.
This will construct the analysis of variance table showing the significance of sources of variation.
library(agricolae)
attach(data)
# The following objects are masked from data (pos = 4):
#
# age, block, heading, var
<- sp.plot(block = block,
model pplot = var,
splot = age,
Y = heading)
#
# ANALYSIS SPLIT PLOT: heading
# Class level information
#
# var : Super Shaheen
# age : 2W 3W 4W
# block : 1 2 3 4 NA
#
# Number of observations: 26
#
# Analysis of Variance Table
#
# Response: heading
# Df Sum Sq Mean Sq F value Pr(>F)
# block 3 28.46 9.49 0.4415 0.74034
# var 1 1365.04 1365.04 63.5314 0.00412 **
# Ea 3 64.46 21.49
# age 2 641.33 320.67 44.5714 2.789e-06 ***
# var:age 2 16.33 8.17 1.1351 0.35359
# Eb 12 86.33 7.19
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# cv(a) = 5.4 %, cv(b) = 3.1 %, Mean = 85.45833
The analysis of variance table showed that both the main effects significantly affected the time taken from heading to maturity in rice. While the interaction was not significant regarding days from heading to maturity. However, I shall proceed with the mean comparison test for both individual treatment and interaction effect in case you get the significant effect of the interaction term.
Mean separation test
Get error df and error MS values
Before proceeding for mean separation test using LSD.test()
function, you need to first assign Edf_a
as the value of first error degree of freedom (model$gl.a
) for main plot factor. Similarly, assign Edf_b
to represent error degree of freedom (model$gl.b
) for the subplot factor and interaction term.
# Get first error df
<- model$gl.a
Edf_a
Edf_a# Get second error df
<- model$gl.b
Edf_b
Edf_b# Get first error MS
<- model$Ea
EMS_a
EMS_a# Get second error MS
<- model$Eb
EMS_b EMS_b
# [1] 3
# [1] 12
# [1] 21.48611
# [1] 7.194444
LSD test on main effects
For main plot factor i.e. varieties LSD.test()
function require some arguments to be specified. In Y
argument specify the response variable name (heading). The trt
argument specify the main plot factor that is varieties (var
) in this case. For DFerror
and MSerror
arguments specify the main plot error df (Edf_a
) and error MS (EMS_a
).
Keep the alpha
level at 5 percent. For p.adj
argument you can specify different methods while I shall use the mostly commonly used method bonferroni. Type TRUE for group
argument to get the mean values with letters showing the rank of each treatment level.
<- LSD.test(y = heading,
out1 trt = var,
DFerror = Edf_a,
MSerror = EMS_a,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: heading ~ var
#
# LSD t Test for heading
# P value adjustment method: bonferroni
#
# Mean Square Error: 21.48611
#
# var, means and individual ( 95 %) CI
#
# heading std r LCL UCL Min Max
# Shaheen 77.91667 5.550730 12 73.65824 82.17510 67 87
# Super 93.00000 6.728501 12 88.74157 97.25843 81 101
#
# Alpha: 0.05 ; DF Error: 3
# Critical Value of t: 3.182446
#
# Minimum Significant Difference: 6.022327
#
# Treatments with the same letter are not significantly different.
#
# heading groups
# Super 93.00000 a
# Shaheen 77.91667 b
Similarly, for second factor or subplot factor just change the values of trt
, DFerror
and MSerror
arguments. In trt argument type the variable name of the subplot factor (age
). In DFerror argumnet set the value of second error df (Edf_b
) and second error MS (EMS_b
) in MSerror argument. Keep rest of the arguments same as in above command.
<- LSD.test(y = heading,
out2 trt = age,
DFerror = Edf_b,
MSerror = EMS_b,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: heading ~ age
#
# LSD t Test for heading
# P value adjustment method: bonferroni
#
# Mean Square Error: 7.194444
#
# age, means and individual ( 95 %) CI
#
# heading std r LCL UCL Min Max
# 2W 78.625 7.558108 8 76.55879 80.69121 67 89
# 3W 86.625 9.117291 8 84.55879 88.69121 74 99
# 4W 91.125 9.093758 8 89.05879 93.19121 79 101
#
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.779473
#
# Minimum Significant Difference: 3.727616
#
# Treatments with the same letter are not significantly different.
#
# heading groups
# 4W 91.125 a
# 3W 86.625 b
# 2W 78.625 c
LSD test on interaction term
Keep all the arguments same as used in subplot factor for LSD test except the trt
argument. Specify interaction term in trt
argument using the concatenate function. This will result as the output of LSD test for interaction term.
<- LSD.test(y = heading,
out3 trt = var:age,
DFerror = Edf_b,
MSerror = EMS_b,
alpha = 0.05,
p.adj = "bonferroni",
group = TRUE,
console = TRUE)
#
# Study: heading ~ var:age
#
# LSD t Test for heading
# P value adjustment method: bonferroni
#
# Mean Square Error: 7.194444
#
# var:age, means and individual ( 95 %) CI
#
# heading std r LCL UCL Min Max
# Shaheen:2W 72.25 3.774917 4 69.32794 75.17206 67 75
# Shaheen:3W 78.50 3.109126 4 75.57794 81.42206 74 81
# Shaheen:4W 83.00 3.366502 4 80.07794 85.92206 79 87
# Super:2W 85.00 3.265986 4 82.07794 87.92206 81 89
# Super:3W 94.75 2.872281 4 91.82794 97.67206 93 99
# Super:4W 99.25 2.362908 4 96.32794 102.17206 96 101
#
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 3.648889
#
# Minimum Significant Difference: 6.920609
#
# Treatments with the same letter are not significantly different.
#
# heading groups
# Super:4W 99.25 a
# Super:3W 94.75 a
# Super:2W 85.00 b
# Shaheen:4W 83.00 b
# Shaheen:3W 78.50 bc
# Shaheen:2W 72.25 c
Plot variation
To simply plot the main effects showing the variation as interquartile range use the plot()
function as shown in below command. In plot() function specify the object as created earlier in LSD.test()
function.
The out1
object from LSD test specify the main plot factor while out2
specify the subplot factor (seedling age). In xlab
and ylab
arguments type the labels in quotations that will be used for X and Y axis. For variation
argument specify IQR (interquartile range) or you can also specify the range or variance for this argument. Similarly you can also plot variation or interquartile range for the interaction term.
plot(out1,
xlab = "varieties",
ylab = "heading (days)",
las = 1,
variation = "IQR")
plot(out2,
xlab = "seedling age",
ylab = "heading (days)",
las = 1,
variation = "IQR")
plot(out3,
xlab = "var:age",
ylab = "heading (days)",
las = 1,
variation = "IQR")
Plot bargraphs
Main effects
To plot bargraph with standard error bars, use bar.error()
function and in parenthesis type out1$means
to get means for the height of the bars. In variation argument specify standard error variation = "SE"
as source for error bars.
Similary you can display bargraph for subplot factor by slightly changing few arguments. Use out2$means
instead of out1 and also change labels for sub plot factor in names.arg
argument.
To add title to this plot use title()
function. In main
argument type the title of the plot. You can set the value to control the size of the main title using cex.main
argument. To add X and Y axis labels use xlab
and ylab
arguments.
# Main plot factor (varieties)
bar.err(out1$means,
variation = "SE",
ylim = c(0, 100),
names.arg = c("Shaheen rice","Super rice"))
# Adding title, X and Y labels
title(main = "Barplot with standard error",
cex.main = 0.8,
xlab = "Varieties",
ylab = "heading (days)")
# Sub-plot factor (seedling age)
bar.err(out2$means,
variation = "SE",
ylim = c(0, 100),
names.arg = c("2-weeks", "3-weeks", "4-weeks"))
# Adding title, X and Y labels
title(main = "Barplot with standard error",
cex.main = 0.8,
xlab = "Seedling age",
ylab = "heading (days)")
Interaction effect
Similarly, by specifying the values of the interaction term for same arguments you can plot bargraph for interaction term.
# Main plot factor (varieties)
bar.err(out3$means,
variation = "SE",
ylim = c(0, 110),
names.arg = c("V1W1","V1W2", "V1W3", "V2W1", "V2W2", "V2W3"))
# Adding title, X and Y labels
title(main = "Barplot with standard error",
cex.main = 0.8,
xlab = "var:age",
ylab = "heading (days)")
For more enhanced visualization of bar graphs using ggplot() function as shown below then visit our blog on plotting bar graph for main and interaction effects using ggplot function in R —
Click_here
.
Watch video tutorials on advanced and interactive plots —
Bargraph_1
and —Bargraph_2
.
Download data file — Click_here
Download Rscript — Download Rscript
Download R program —
Click_here
Download R studio —
Click_here
How do i rearrange the x-axis? I found that if it is 25:a, 50:a, 100:a, it starts with 100:a, then 25:a, 50:a.
ReplyDeleteYou may reverse the sequence. Let me know which sequence you used first?
DeleteHi just wondering if I can ask you some questions for my R analysis of RCBD across locations. My email address is wangtao.home@gmail.com. Could you kindly drop me an email to connect? Appreciated!!
ReplyDeleteYou can use combined analysis of variance repeated over years. Click on the link given below for more details.
Deletehttps://youtu.be/NbrsGDFE1HQ
Hey, nice info!! How are the diagnostic plots for the linear mixed-effects fit obtained?
ReplyDeleteI have still not worked on this aspect. I shall work on it and will upload relevant content later on. Thanks for your suggestion.
Delete