Customizing plot function in R

See Video  ☝


Introduction

You will learn methods for adjusting the attributes of the graph and for interacting with the graph that will enable the user to produce a better graphical display. Here, the generic plot() function for plotting of R objects will be used. It is a simple and most used visualization function. Here are the options The example used here is obtained from http://www.baseball-reference.com website. The data file contains various measures of hitting for each season of professional baseball.

Import data

Now let’s import the data set using read.csv() function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file argument specify the file name with extension CSV.

In header argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE for this argument. The head() function will print the first six rows of the data set. The database is attached to R search path using attach() to access the objects of data by simply giving their names.

data <- read.csv(file = "sports_ball.csv", 
                 header = TRUE)
head(data)
#   Year BOS
# 1 2018  32
# 2 2017  93
# 3 2016  93
# 4 2015  78
# 5 2014  71
# 6 2013  97
attach(data)

The variables Year and BOS contain respectively the baseball season and average score for Boston Red Sox.

Let’s plot the data set and customize its various options

A scatter plot

First construct a time series plot of BOS score against season using the plot() function. Generally the plot show a constant score with somewhat spread against seasons.

plot(x = Year, y = BOS)

The plot type

The argument type specify what type of plot is be to drawn. The possible types are no plotting ”n”, points ”p”, lines ”l”, both points and lines ”b”, only lines ”c”, for both overplotted ”o”, for histogram ”h”, for stair steps ”s” when \(x1>x2\) moves first horizontal and then vertical and other stair type ”S” moves the other way around.

# No plotting
plot(x = Year, y = BOS,
     type = "n", main = "No plotting")
# Points
plot(x = Year, y = BOS,
     type = "p", main = "Points")
# Lines
plot(x = Year, y = BOS,
     type = "l", main = "Lines")
# Both points and lines
plot(x = Year, y = BOS,
     type = "b", main = "Points and lines")
# Only lines
plot(x = Year, y = BOS,
     type = "c", main = "Lines alone")
# Point and lines overplotted
plot(x = Year, y = BOS,
     type = "o", main = "Points & lines overplotted")
# Stair steps
plot(x = Year, y = BOS,
     type = "s", main = "Stair steps (x1>x2)")
# Other stair type
plot(x = Year, y = BOS,
     type = "s", main = "Stair steps (y1>y2)")

Add title and labels

I shall use the type ”b” of the plot to further add main title and labels to the axis. These attributes can be added by using title() function. The first four arguments from this function can be used as arguments in most high level plotting functions. The argument main specify the main title of the plot. Set the values for xlab and ylab arguments to specify the X and Y axis labels.

plot(x = Year, y = BOS,
     type = "b")
title(main = "Average hitting score for Boston Redsox", 
      xlab = "Year", ylab = "BOS")

Customize symbols

Symbols of plots can be customized using pch argument. You can specify symbols of plotting character together with the corresponding values of pch. In this argument set the value either an integer or a single character to be used as default in plotting symbols. The full set of symbols is available with pch = 0:25. Values pch = 26:31 are currently unused and pch = 32:127 give the ASCII characters.



Note that only integers and single-character strings can be set as a graphics parameter (and not NA nor NULL).
If pch supplied is a logical, integer or character NA or an empty character string the point is omitted from the plot. 
If pch is NULL or otherwise of length 0, par("pch") is used.

Symbols can be expanded by using cex argument. This argument can be specified by setting a numerical vector that works as a multiple of par(“cex”). The default value of this argument is one. Setting NA for cex specify the points will be omitted from the plot.

# pch=0, cex=1
plot(x = Year, y = BOS,
     pch = 0, cex = 1)
# pch=1, cex=1.5
plot(x = Year, y = BOS,
     pch = 1, cex = 1.5)
# pch=3, cex=2
plot(x = Year, y = BOS,
     pch = 3, cex = 2)
title(main = "pch=3, cex=2")
# pch=19, cex=2.5
plot(x = Year, y = BOS,
     pch = 19, cex = 2.5)
title(main = "pch=19, cex=2.5")

Scatter Plot Smoothing

To better understand the general pattern in a time series graph, it is helpful to apply a smoothing function to the data. Smoothing is implemented using the lowess() function and the lines() function overlays the smoothed points. In lowess function the arguments x and y specify vectors giving the coordinates of the points in the scatter plot.

plot(x = Year, y = BOS,
     pch = 0, cex = 1)
lines(lowess(x = Year, y = BOS))

Smoother span

This line does not appear to be a good representation of the pattern of changes in the current graph. The degree of smoothness is controlled by the smoother span parameter represented by f. This argument gives the proportion of points in the plot which influence the smooth at each value. The default choice for this parameter is \(2/3\). You should try several smaller values for this parameter to see the effect of these choices on the smooth. In this plot I shall prefer the smoother span f= 1/12 as this lowest value is the best match to the pattern of increase and decrease in the scatter plot.

Larger values give more smoothness
lines(lowess(Year, BOS, f = 1/3))
lines(lowess(Year, BOS, f= 1/9))
lines(lowess(Year, BOS, f= 1/12))

Line style

The width of smooth line can be increased by lwd argument. To specify the line style use lty argument. Six possible line styles are numbers 1 through 6. The default value for this argument is one which means a solid line.

lines(lowess(Year, BOS, f= 1/12),
      lwd = 2, lty = 2)
lines(lowess(Year, BOS, f= 1/12),
      lwd = 3, lty = 4)
lines(lowess(Year, BOS, f= 1/12),
      lwd = 4, lty = 6)

Add legend

Add legend to this plot by using legend() function. In x argument specify the position of the legend as character strings. The legend argument specify a character or expression vector of length \(≥ 1\) to appear in the legend. Set the value for inset argument that specify inset distance(s) from the margins as a fraction of the plot region when legend is placed by keyword.

Note that a call to the function locator(1) can be used in place of the x and y arguments.
legend(x = "bottomright", 
       legend = "f = 1/12",
       inset = 0.05)

Change color

R has a large number of color choices that are accessible by the col argument to graphics functions. To make the colors more visible, the plot function uses the cex and pch arguments to draw large solid points.

plot(x = Year, y = BOS,
     pch = 0, cex = 1, col = "red")
lines(lowess(Year, BOS, f= 1/12),
      lwd = 4, lty = 6, col = "blue")

Adding and customizing text

Another important thing is changing the format of text. You can choose the font family, color, size and rotation of text through arguments of the text() function. Style and color of the text can be changed by using the font and col arguments. The argument font specify an integer that represent the font to be used for text. Possible values for this argument includes; \(1\) corresponds to plain text (the default), \(2\) to bold face, \(3\) to italic and \(4\) to bold italic.

The text size can be controlled by setting a numeric value for cex argument giving the amount by which the text is magnified relevant to the default. The srt argument allow us to rotate the text through an angle specified in degrees. The typical default value for this argument is zero.

Note that string/character rotation via argument srt to par does not affect the axis labels.
plot(x = Year, y = BOS,
     pch = 0, cex = NA)
# Change font of text
text(x = c(1920, 1940, 1960, 1980), y = 60, 
     labels = c("font=1", "font=2", "font=3", "font=4"),
     font = c(1, 2, 3, 4))
# cahnging color of text
text(x = c(1920, 1940, 1960, 1980), y = 60, 
     labels = c("Red", "Blue", "Green", "Magenta"),
     col = c("red","blue","green","magenta"))
# Change size of text
text(x = c(1920, 1960, 2000), y = 60, 
     labels = c("cex=1", "cex=1.5", "cex=2"),
     cex = c(1, 1.5, 2))
# Change angle of text
text(x = 1920, y = 60, 
     labels = "srt=30",
     srt = 30)
text(x = 1960, y = 60, 
     labels = "srt=60",
     srt = 60)
text(x = 2000, y = 60, 
     labels = "srt=90",
     srt = 90)

If you have any question feel free to ask in comment box

Download data file — Click_here

Download Rscript — Download Rscript


Download R program — Click_here

Download R studio — Click_here


Comments

  1. Thanks for such info very help and easy to read, please write more relevant content for us.

    ReplyDelete
    Replies
    1. Thank you. We welcome your suggestion and will continue to work in this way

      Delete
  2. I appreciate you taking the time and effort to share your knowledge. This material proved to be really efficient and beneficial to me. Thank you very much for providing this information. Continue to write your blog.

    Data Engineering Services 

    Machine Learning Solutions

    Data Analytics Solutions

    Data Modernization Services

    ReplyDelete

Post a Comment

Popular posts from this blog

Two way repeated measures analysis in R

Split plot analysis in R

Visualizing clustering dendrogram in R | Hierarchical clustering