Customizing plot function in R
See Video ⮞ ☝ |
AGRON Lectures
August 14, 2020
Introduction
You will learn methods for adjusting the attributes of the graph and for interacting with the graph that will enable the user to produce a better graphical display. Here, the generic plot()
function for plotting of R objects will be used. It is a simple and most used visualization function. Here are the options The example used here is obtained from http://www.baseball-reference.com website. The data file contains various measures of hitting for each season of professional baseball.
Import data
Now let’s import the data set using read.csv()
function. I have already saved the data file as CSV (comma delimited file) in the working directory. The file
argument specify the file name with extension CSV.
In header
argument you can set a logical value that will indicate whether the data file contains first variable names as first row. In my data set the file contains the variable names in the first row, so I shall use TRUE
for this argument. The head()
function will print the first six rows of the data set. The database is attached to R search path using attach()
to access the objects of data by simply giving their names.
<- read.csv(file = "sports_ball.csv",
data header = TRUE)
head(data)
# Year BOS
# 1 2018 32
# 2 2017 93
# 3 2016 93
# 4 2015 78
# 5 2014 71
# 6 2013 97
attach(data)
The variables Year and BOS contain respectively the baseball season and average score for Boston Red Sox.
Let’s plot the data set and customize its various options
A scatter plot
First construct a time series plot of BOS score against season using the plot()
function. Generally the plot show a constant score with somewhat spread against seasons.
plot(x = Year, y = BOS)
The plot type
The argument type
specify what type of plot is be to drawn. The possible types are no plotting ”n”
, points ”p”
, lines ”l”
, both points and lines ”b”
, only lines ”c”
, for both overplotted ”o”
, for histogram ”h”
, for stair steps ”s”
when \(x1>x2\) moves first horizontal and then vertical and other stair type ”S”
moves the other way around.
# No plotting
plot(x = Year, y = BOS,
type = "n", main = "No plotting")
# Points
plot(x = Year, y = BOS,
type = "p", main = "Points")
# Lines
plot(x = Year, y = BOS,
type = "l", main = "Lines")
# Both points and lines
plot(x = Year, y = BOS,
type = "b", main = "Points and lines")
# Only lines
plot(x = Year, y = BOS,
type = "c", main = "Lines alone")
# Point and lines overplotted
plot(x = Year, y = BOS,
type = "o", main = "Points & lines overplotted")
# Stair steps
plot(x = Year, y = BOS,
type = "s", main = "Stair steps (x1>x2)")
# Other stair type
plot(x = Year, y = BOS,
type = "s", main = "Stair steps (y1>y2)")
Add title and labels
I shall use the type ”b”
of the plot to further add main title and labels to the axis. These attributes can be added by using title()
function. The first four arguments from this function can be used as arguments in most high level plotting functions. The argument main
specify the main title of the plot. Set the values for xlab
and ylab
arguments to specify the X and Y axis labels.
plot(x = Year, y = BOS,
type = "b")
title(main = "Average hitting score for Boston Redsox",
xlab = "Year", ylab = "BOS")
Customize symbols
Symbols of plots can be customized using pch
argument. You can specify symbols of plotting character together with the corresponding values of pch
. In this argument set the value either an integer or a single character to be used as default in plotting symbols. The full set of symbols is available with pch = 0:25
. Values pch = 26:31
are currently unused and pch = 32:127
give the ASCII characters.
Note that only integers and single-character strings can be set as a graphics parameter (and not NA nor NULL).
If pch supplied is a logical, integer or character NA or an empty character string the point is omitted from the plot.
If pch is NULL or otherwise of length 0, par("pch") is used.
Symbols can be expanded by using cex
argument. This argument can be specified by setting a numerical vector that works as a multiple of par(“cex”)
. The default value of this argument is one. Setting NA
for cex specify the points will be omitted from the plot.
# pch=0, cex=1
plot(x = Year, y = BOS,
pch = 0, cex = 1)
# pch=1, cex=1.5
plot(x = Year, y = BOS,
pch = 1, cex = 1.5)
# pch=3, cex=2
plot(x = Year, y = BOS,
pch = 3, cex = 2)
title(main = "pch=3, cex=2")
# pch=19, cex=2.5
plot(x = Year, y = BOS,
pch = 19, cex = 2.5)
title(main = "pch=19, cex=2.5")
Scatter Plot Smoothing
To better understand the general pattern in a time series graph, it is helpful to apply a smoothing function to the data. Smoothing is implemented using the lowess()
function and the lines()
function overlays the smoothed points. In lowess function the arguments x
and y
specify vectors giving the coordinates of the points in the scatter plot.
plot(x = Year, y = BOS,
pch = 0, cex = 1)
lines(lowess(x = Year, y = BOS))
Smoother span
This line does not appear to be a good representation of the pattern of changes in the current graph. The degree of smoothness is controlled by the smoother span parameter represented by f
. This argument gives the proportion of points in the plot which influence the smooth at each value. The default choice for this parameter is \(2/3\). You should try several smaller values for this parameter to see the effect of these choices on the smooth. In this plot I shall prefer the smoother span f= 1/12
as this lowest value is the best match to the pattern of increase and decrease in the scatter plot.
Larger values give more smoothness
lines(lowess(Year, BOS, f = 1/3))
lines(lowess(Year, BOS, f= 1/9))
lines(lowess(Year, BOS, f= 1/12))
Line style
The width of smooth line can be increased by lwd
argument. To specify the line style use lty
argument. Six possible line styles are numbers 1 through 6. The default value for this argument is one which means a solid line.
lines(lowess(Year, BOS, f= 1/12),
lwd = 2, lty = 2)
lines(lowess(Year, BOS, f= 1/12),
lwd = 3, lty = 4)
lines(lowess(Year, BOS, f= 1/12),
lwd = 4, lty = 6)
Add legend
Add legend to this plot by using legend()
function. In x
argument specify the position of the legend as character strings. The legend
argument specify a character or expression vector of length \(≥ 1\) to appear in the legend. Set the value for inset
argument that specify inset distance(s) from the margins as a fraction of the plot region when legend is placed by keyword.
Note that a call to the function locator(1) can be used in place of the x and y arguments.
legend(x = "bottomright",
legend = "f = 1/12",
inset = 0.05)
Change color
R has a large number of color choices that are accessible by the col
argument to graphics functions. To make the colors more visible, the plot function uses the cex
and pch
arguments to draw large solid points.
plot(x = Year, y = BOS,
pch = 0, cex = 1, col = "red")
lines(lowess(Year, BOS, f= 1/12),
lwd = 4, lty = 6, col = "blue")
Adding and customizing text
Another important thing is changing the format of text. You can choose the font family, color, size and rotation of text through arguments of the text()
function. Style and color of the text can be changed by using the font
and col
arguments. The argument font
specify an integer that represent the font to be used for text. Possible values for this argument includes; \(1\) corresponds to plain text (the default), \(2\) to bold face, \(3\) to italic and \(4\) to bold italic.
The text size can be controlled by setting a numeric value for cex
argument giving the amount by which the text is magnified relevant to the default. The srt
argument allow us to rotate the text through an angle specified in degrees. The typical default value for this argument is zero.
Note that string/character rotation via argument srt to par does not affect the axis labels.
plot(x = Year, y = BOS,
pch = 0, cex = NA)
# Change font of text
text(x = c(1920, 1940, 1960, 1980), y = 60,
labels = c("font=1", "font=2", "font=3", "font=4"),
font = c(1, 2, 3, 4))
# cahnging color of text
text(x = c(1920, 1940, 1960, 1980), y = 60,
labels = c("Red", "Blue", "Green", "Magenta"),
col = c("red","blue","green","magenta"))
# Change size of text
text(x = c(1920, 1960, 2000), y = 60,
labels = c("cex=1", "cex=1.5", "cex=2"),
cex = c(1, 1.5, 2))
# Change angle of text
text(x = 1920, y = 60,
labels = "srt=30",
srt = 30)
text(x = 1960, y = 60,
labels = "srt=60",
srt = 60)
text(x = 2000, y = 60,
labels = "srt=90",
srt = 90)
If you have any question feel free to ask in comment box
Download data file — Click_here
Download Rscript — Download Rscript
Download R program —
Click_here
Download R studio —
Click_here
Thanks for such info very help and easy to read, please write more relevant content for us.
ReplyDeleteThank you. We welcome your suggestion and will continue to work in this way
DeleteI appreciate you taking the time and effort to share your knowledge. This material proved to be really efficient and beneficial to me. Thank you very much for providing this information. Continue to write your blog.
ReplyDeleteData Engineering Services
Machine Learning Solutions
Data Analytics Solutions
Data Modernization Services