Split plot analysis in R

See Video  ☝


You will learn how to carry out split plot analysis, mean separation test using LSD.test() function and plotting the main and interaction effects using base graphics functions in R studio.

Description of design

Split-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized, completely randomized block, and Latin square designs. The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.

For example, In case of rice crop if there are two varieties and three seedling age and we want more precision for the seedling age, then with randomized complete block design with three replications, the varieties are randomly assigned to the main plots within three blocks using a separate randomization for each. Then the levels of seedling age are randomly assigned to the subplots within the main plots using a separate randomization in each main plot. The layout is shown in the table below.

Block1
Block2
V1
V2
V2
V1
Layout split plot design
V1A1 V2A2 V2A1 V1A3
V1A3 V2A1 V2A2 V1A2
V1A2 V2A3 V2A3 V1A1

The total number of treatments for this experiment are 2 x 3 = 6. From the layout, it is clear that all the 6 treatments occur once in each block but within a block, all levels of two varieties occur together, so with respect to varieties, it is a randomized complete block with two varieties and three blocks. But is an incomplete block as for as the full set of treatments is concerned.

Split-plot designs are often called incomplete block designs.

Import data file

A researcher was interested to compare 2 varieties of rice and three seedling age (2 weeks old, 3 weeks old & 4 weeks old). An experiment was conducted in a completely randomized design with 3 replications while varieties were kept in the main plot and seedling age in subplots.

The response variable was recorded as time in days from heading to maturity of rice plants. I often recommend to first clear all the objects or values in global environment using rm(list = ls(all = TRUE)) before importing the data set. You can also clear the plots using graphics.off() and clear everyting in console using shell() function.

rm(list = ls(all = TRUE))
graphics.off()
shell("cls")

Now let’s import the data set using read.csv() function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file argument specify the file name with extension CSV.

In header argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE for this argument. The head() function will print the first six rows of the data set.

data <- read.csv(file = "data_split.csv", 
                 header = TRUE)
head(data)
#   block     var age heading
# 1     1   Super  2W      89
# 2     1   Super  3W      94
# 3     1   Super  4W      96
# 4     1 Shaheen  2W      75
# 5     1 Shaheen  3W      80
# 6     1 Shaheen  4W      84

Observe variables

Before proceeding for analysis it is a better practice to have a look on the data set to see the structure of variables. The data looks fine as shown in below table. The second and third variables are being read as factor which is true for the given data set. In case if it is integer or character then you can change it to factor using as.factor() function as shown below

data$var = as.factor(data$var)
data$age = as.factor(data$age)
str(data)
# 'data.frame': 26 obs. of  4 variables:
#  $ block  : int  1 1 1 1 1 1 2 2 2 2 ...
#  $ var    : Factor w/ 3 levels "","Shaheen","Super": 3 3 3 2 2 2 3 3 3 2 ...
#  $ age    : Factor w/ 4 levels "","2W","3W","4W": 2 3 4 2 3 4 2 3 4 2 ...
#  $ heading: int  89 94 96 75 80 84 85 99 101 72 ...
attach(data)

Fit analysis of variance model

First, you need to load the library agricolae using require() function. If this package is not installed then first install this package using install.pakcages("agricolae") command in R console.

To fit the analysis of variance model for split-plot design assign a function to model object by using sp.plot() function. In parenthesis for block argument you can type the block or replication variable name from data set. For pplot argument specify the main plot variable name (varieties var in this example) and for splot argument specify the sub-plot variable name (seedling age age in this example). In Y argument you can specify the response variable name as heading in this example.

This will construct the analysis of variance table showing the significance of sources of variation.

library(agricolae)
attach(data)
# The following objects are masked from data (pos = 4):
# 
#     age, block, heading, var
model <- sp.plot(block = block, 
                 pplot = var, 
                 splot = age, 
                 Y = heading)
# 
# ANALYSIS SPLIT PLOT:  heading 
# Class level information
# 
# var   :  Super Shaheen  
# age   :  2W 3W 4W  
# block     :  1 2 3 4 NA 
# 
# Number of observations:  26 
# 
# Analysis of Variance Table
# 
# Response: heading
#         Df  Sum Sq Mean Sq F value    Pr(>F)    
# block    3   28.46    9.49  0.4415   0.74034    
# var      1 1365.04 1365.04 63.5314   0.00412 ** 
# Ea       3   64.46   21.49                      
# age      2  641.33  320.67 44.5714 2.789e-06 ***
# var:age  2   16.33    8.17  1.1351   0.35359    
# Eb      12   86.33    7.19                      
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# cv(a) = 5.4 %, cv(b) = 3.1 %, Mean = 85.45833

The analysis of variance table showed that both the main effects significantly affected the time taken from heading to maturity in rice. While the interaction was not significant regarding days from heading to maturity. However, I shall proceed with the mean comparison test for both individual treatment and interaction effect in case you get the significant effect of the interaction term.

Mean separation test

Get error df and error MS values

Before proceeding for mean separation test using LSD.test() function, you need to first assign Edf_a as the value of first error degree of freedom (model$gl.a) for main plot factor. Similarly, assign Edf_b to represent error degree of freedom (model$gl.b) for the subplot factor and interaction term.

# Get first error df
Edf_a <- model$gl.a
Edf_a
# Get second error df
Edf_b <- model$gl.b
Edf_b
# Get first error MS
EMS_a <- model$Ea
EMS_a
# Get second error MS
EMS_b <- model$Eb
EMS_b
# [1] 3
# [1] 12
# [1] 21.48611
# [1] 7.194444

LSD test on main effects

For main plot factor i.e. varieties LSD.test() function require some arguments to be specified. In Y argument specify the response variable name (heading). The trt argument specify the main plot factor that is varieties (var) in this case. For DFerror and MSerror arguments specify the main plot error df (Edf_a) and error MS (EMS_a).

Keep the alpha level at 5 percent. For p.adj argument you can specify different methods while I shall use the mostly commonly used method bonferroni. Type TRUE for group argument to get the mean values with letters showing the rank of each treatment level.

out1 <- LSD.test(y = heading, 
                 trt = var,
                 DFerror = Edf_a, 
                 MSerror = EMS_a,
                 alpha = 0.05,
                 p.adj = "bonferroni",
                 group = TRUE,
                 console = TRUE)
# 
# Study: heading ~ var
# 
# LSD t Test for heading 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  21.48611 
# 
# var,  means and individual ( 95 %) CI
# 
#          heading      std  r      LCL      UCL Min Max
# Shaheen 77.91667 5.550730 12 73.65824 82.17510  67  87
# Super   93.00000 6.728501 12 88.74157 97.25843  81 101
# 
# Alpha: 0.05 ; DF Error: 3
# Critical Value of t: 3.182446 
# 
# Minimum Significant Difference: 6.022327 
# 
# Treatments with the same letter are not significantly different.
# 
#          heading groups
# Super   93.00000      a
# Shaheen 77.91667      b

Similarly, for second factor or subplot factor just change the values of trt, DFerror and MSerror arguments. In trt argument type the variable name of the subplot factor (age). In DFerror argumnet set the value of second error df (Edf_b) and second error MS (EMS_b) in MSerror argument. Keep rest of the arguments same as in above command.

out2 <- LSD.test(y = heading, 
                 trt = age,
                 DFerror = Edf_b, 
                 MSerror = EMS_b,
                 alpha = 0.05,
                 p.adj = "bonferroni",
                 group = TRUE,
                 console = TRUE)
# 
# Study: heading ~ age
# 
# LSD t Test for heading 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  7.194444 
# 
# age,  means and individual ( 95 %) CI
# 
#    heading      std r      LCL      UCL Min Max
# 2W  78.625 7.558108 8 76.55879 80.69121  67  89
# 3W  86.625 9.117291 8 84.55879 88.69121  74  99
# 4W  91.125 9.093758 8 89.05879 93.19121  79 101
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.779473 
# 
# Minimum Significant Difference: 3.727616 
# 
# Treatments with the same letter are not significantly different.
# 
#    heading groups
# 4W  91.125      a
# 3W  86.625      b
# 2W  78.625      c

LSD test on interaction term

Keep all the arguments same as used in subplot factor for LSD test except the trt argument. Specify interaction term in trt argument using the concatenate function. This will result as the output of LSD test for interaction term.

out3 <- LSD.test(y = heading, 
                 trt = var:age,
                 DFerror = Edf_b, 
                 MSerror = EMS_b,
                 alpha = 0.05,
                 p.adj = "bonferroni",
                 group = TRUE,
                 console = TRUE)
# 
# Study: heading ~ var:age
# 
# LSD t Test for heading 
# P value adjustment method: bonferroni 
# 
# Mean Square Error:  7.194444 
# 
# var:age,  means and individual ( 95 %) CI
# 
#            heading      std r      LCL       UCL Min Max
# Shaheen:2W   72.25 3.774917 4 69.32794  75.17206  67  75
# Shaheen:3W   78.50 3.109126 4 75.57794  81.42206  74  81
# Shaheen:4W   83.00 3.366502 4 80.07794  85.92206  79  87
# Super:2W     85.00 3.265986 4 82.07794  87.92206  81  89
# Super:3W     94.75 2.872281 4 91.82794  97.67206  93  99
# Super:4W     99.25 2.362908 4 96.32794 102.17206  96 101
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 3.648889 
# 
# Minimum Significant Difference: 6.920609 
# 
# Treatments with the same letter are not significantly different.
# 
#            heading groups
# Super:4W     99.25      a
# Super:3W     94.75      a
# Super:2W     85.00      b
# Shaheen:4W   83.00      b
# Shaheen:3W   78.50     bc
# Shaheen:2W   72.25      c

Plot variation

To simply plot the main effects showing the variation as interquartile range use the plot() function as shown in below command. In plot() function specify the object as created earlier in LSD.test() function.

The out1 object from LSD test specify the main plot factor while out2 specify the subplot factor (seedling age). In xlab and ylab arguments type the labels in quotations that will be used for X and Y axis. For variation argument specify IQR (interquartile range) or you can also specify the range or variance for this argument. Similarly you can also plot variation or interquartile range for the interaction term.

plot(out1, 
     xlab = "varieties",
     ylab = "heading (days)",
     las = 1, 
     variation = "IQR")
plot(out2, 
     xlab = "seedling age",
     ylab = "heading (days)",
     las = 1, 
     variation = "IQR")
plot(out3, 
     xlab = "var:age",
     ylab = "heading (days)",
     las = 1, 
     variation = "IQR")

Plot bargraphs

Main effects

To plot bargraph with standard error bars, use bar.error() function and in parenthesis type out1$means to get means for the height of the bars. In variation argument specify standard error variation = "SE" as source for error bars.

Similary you can display bargraph for subplot factor by slightly changing few arguments. Use out2$means instead of out1 and also change labels for sub plot factor in names.arg argument.

To add title to this plot use title() function. In main argument type the title of the plot. You can set the value to control the size of the main title using cex.main argument. To add X and Y axis labels use xlab and ylab arguments.

# Main plot factor (varieties)
bar.err(out1$means, 
        variation = "SE", 
        ylim = c(0, 100),
        names.arg = c("Shaheen rice","Super rice"))

# Adding title, X and Y labels
title(main = "Barplot with standard error",
      cex.main = 0.8,
      xlab = "Varieties", 
      ylab = "heading (days)")

# Sub-plot factor (seedling age)
bar.err(out2$means, 
        variation = "SE", 
        ylim = c(0, 100),
        names.arg = c("2-weeks", "3-weeks", "4-weeks"))

# Adding title, X and Y labels
title(main = "Barplot with standard error",
      cex.main = 0.8,
      xlab = "Seedling age", 
      ylab = "heading (days)")

Interaction effect

Similarly, by specifying the values of the interaction term for same arguments you can plot bargraph for interaction term.

# Main plot factor (varieties)
bar.err(out3$means, 
        variation = "SE", 
        ylim = c(0, 110),
        names.arg = c("V1W1","V1W2", "V1W3", "V2W1", "V2W2", "V2W3"))

# Adding title, X and Y labels
title(main = "Barplot with standard error",
      cex.main = 0.8,
      xlab = "var:age", 
      ylab = "heading (days)")

For more enhanced visualization of bar graphs using ggplot() function as shown below then visit our blog on plotting bar graph for main and interaction effects using ggplot function in R — Click_here.


Watch video tutorials on advanced and interactive plots — Bargraph_1 and — Bargraph_2.


Download data file — Click_here

Download Rscript — Download Rscript


Download R program — Click_here

Download R studio — Click_here


Comments

  1. How do i rearrange the x-axis? I found that if it is 25:a, 50:a, 100:a, it starts with 100:a, then 25:a, 50:a.


    ReplyDelete
    Replies
    1. You may reverse the sequence. Let me know which sequence you used first?

      Delete
  2. Hi just wondering if I can ask you some questions for my R analysis of RCBD across locations. My email address is wangtao.home@gmail.com. Could you kindly drop me an email to connect? Appreciated!!

    ReplyDelete
    Replies
    1. You can use combined analysis of variance repeated over years. Click on the link given below for more details.

      https://youtu.be/NbrsGDFE1HQ

      Delete
  3. Hey, nice info!! How are the diagnostic plots for the linear mixed-effects fit obtained?

    ReplyDelete
    Replies
    1. I have still not worked on this aspect. I shall work on it and will upload relevant content later on. Thanks for your suggestion.

      Delete

Post a Comment

Popular posts from this blog

Two way repeated measures analysis in R

Principal component analysis in R