Back to overview

3. Visualization of data

This document lists and illustrates a number of basic plot functions. For replicating the example scripts, please load the restructured data set AirPassengers (see help(AirPassengers)) in R:

load('dat-AirPassengers.rda')

BASIC PLOT FUNCTIONS

plot

The function plot sets up a plotting grid. The function optionally adds data in the form of a line or points, depending on the argument type. If you just would like to set up an empty grid and add data later, you could use the argument type='n'. Here are some examples:

dat$Time <- dat$Year+dat$Month/12

# by default, type is set to 'p': points
plot(x=dat$Time, y=dat$AirPassengers)

# ... but lines may be more insightful here:
plot(x=dat$Time, y=dat$AirPassengers, type='l',
     main="Title", xlab="Year", ylab="AirPassengers")

# we could also set up an empty plot, to add points or lines later:
plot(x=c(), y=c(), type='n',
     xlim=range(dat$Time), ylim=range(dat$AirPassengers),
     main="Title", xlab="Year", ylab="AirPassengers",
     bty='n') # <- no box around plot

# the library plotfunctions provides a wrapper for doing this:
library(plotfunctions)
# note that we set the *range* rather than actual values here
emptyPlot( range(dat$Time), range(dat$AirPassengers)) 

# main, xlab, and ylab can be also included as arguments (and many others) 

See help(par) for graphical parameters that can be added to most plotting functions to change the layout and colors.

points

The function points is used to add points to a plot grid. The argument pch specifies the layout.

emptyPlot( range(dat$Time), range(dat$AirPassengers)) 
points(dat$Time, dat$AirPassengers, 
       pch=16, col=alpha(2)) # <- alpha() from package plotfunctions 

                             # could be used to adjust transparency of colors

Various types of points:

par(cex=1.1)
emptyPlot(c(0,6), c(0,6), axes=FALSE, bty='o')
for(i in 1:5){
  for(j in 1:5){
    points(i, j, pch=(i-1)*5+j, bg=alpha(2))
    text(i,j,labels=(i-1)*5+j, cex=.5, pos=3, col='red')
  }
}

lines

The function lines is used to add lines to a plot grid. The argument lwd specifies the width, and lty the type.

emptyPlot( range(dat$Time), range(dat$AirPassengers)) 
points(dat$Time, dat$AirPassengers, pch=16, col=alpha(2)) 
lines(dat$Time, dat$AirPassengers, lwd=1, lty=5)

# different types:
emptyPlot( range(dat$Time), range(dat$AirPassengers)) 
lines(dat$Time, dat$AirPassengers, lwd=1, type='b', pch=21)

emptyPlot( range(dat$Time), range(dat$AirPassengers)) 
lines(dat$Time, dat$AirPassengers, lwd=1, type='o', pch=21)

emptyPlot( range(dat$Time), range(dat$AirPassengers)) 
lines(dat$Time, dat$AirPassengers, lwd=1, type='h')

Various types of lines:

emptyPlot(c(0,6), c(0,7), axes=FALSE, bty='o')
for(i in 1:6){
  abline(h=i, lty=i)
  text(1,i,labels=i, cex=.5, pos=3, col='red')
}



Parameters

Choosing colors

R allows for specifying 8 colors as numbers, which are recycled when higher values are used:

par(cex=1.1)
emptyPlot(c(0,4), c(0,4), axes=FALSE, bty='o')
for(i in 1:4){
  for(j in 1:3){
    points(i, j, pch=15, cex=2, col=(i-1)*3+j)
    text(i,j,labels=(i-1)*3+j, cex=.5, pos=3, col='red')
  }
}

In addition, R also accepts 657 predefined color names, such as ‘red’, ‘black’, ‘cornflowerblue’, and ‘limegreen’ to name a few examples. All colors can be accessed with the function colors().

Labels and titles

Axis labels and titles can be provided to most generic plot functions, such as plot(), barplot(), emptyPlot(), using the following arguments:

  • main: adds an overall title for the plot;
  • sub: adds a sub title for the plot;
  • xlab, ylab: adds a title for the x axis or y axis, respectively;

Instead, the function title() is available for adding labels and titles afterwards. The following two code snippets result in the same plot:

par(cex=1.1)
emptyPlot(1, 1, 
          main="Title", sub="Sub title", 
          xlab="X label", ylab="Y label")
par(cex=1.1)
emptyPlot(1, 1)
title(main="Title", sub="Sub title", xlab="X label", ylab="Y label")

Axes

Setting the argument axes to FALSE will exclude the axes. This is useful when you would like to specify your own axes: The function axis() will add an axis at a specific side of the plot.

In R, each side of the plot has a number, as illustrated below. Therefore, we can use the number 1 for adding an axis to the bottom: axis(1).

The function axis() gives more control over the layout of the axes. For example, the arguments at and labels specify the tick marks and their labels, and the arguments pos (value measured in coordinates) and line (the number of lines into the margin) allow to vary the distance into the plotregion or into the margin at which the axis is drawn.

Another useful argument is las (see help(par) for more information):

  • las=0 presents the axis labels always parallel to the axis (default);
  • las=1 presents the axis labels always horizontal;
  • las=2 presents the axis labels always perpendicular to the axis;
  • las=3 presents the axis labels always vertical.
emptyPlot(1,1,axes=FALSE)
# axis at bottom, range and labels automatically:
axis(1)
# axis at left, horizontal labels:
axis(2, las=1)
# axis at top, perpendicular and customized labels:
axis(3, at=c(.25,.75), labels=c("position 1", "position 2"), las=2)
# axis at right, no labels:
axis(4, labels = FALSE)

# different positions and layout:
axis(4, at=c(0.25,1), labels=c(1,6), pos=0.8, las=1, 
     col.ticks=2, col=2, col.axis=2, font=2, lwd=2)
axis(4, at=c(0.25,1), labels=FALSE, pos=-0.25, las=1, 
     col.ticks=2, col=2, col.axis=2, font=2, lwd=2)
for(i in c(-1:3)){
  axis(4, at=c(0.25,1), labels=FALSE, line=i, las=1, 
     col.ticks=2, col=2, col.axis=2, lwd=0.5)
}

Border box

With the argument bty the border box around the plot region is adjusted. The argument accepts the following options:

  • bty="o" : closed box (the default),
  • bty="l" : lines at sides 1 and 2,
  • bty="7" : lines at sides 3 and 4,
  • bty="c" : lines at sides 1, 2, 3,
  • bty="u" : lines at sides 4, 1, 3,
  • bty="]" : lines at sides 3, 4, 1,
  • bty="n" : no box.

The function box() draws a box around the plot region and accepts the same values to determine the style.

# Empty plot defaults to bty='n'
emptyPlot(1,1, bty='l')
box(bty="7", col=2, lty=3, lwd=3)



Other types of plots

Besides these basic building blocks, there is also a range of functions for making special plots. Here we illustrate barplot, hist, density, and boxplot.

barplot

For illustrating the use of the basic R function barplot, we first calculate the average air passengers per month. For more information on aggregation see aggregation.

avg1 <- tapply(dat$AirPassengers, list(dat$Month), mean)
sd1 <- tapply(dat$AirPassengers, list(dat$Month), sd)

Basic use of barplot:

barplot(avg1, main="PLOT 1")

But we could also add error bars (which are illustrated in more detail below). The important point here is that the output b saves the x-positions of the bars.

b <- barplot(avg1, main="PLOT 1", 
             col="cornflowerblue", ylim=c(0,600))
errorBars(b, avg1, sd1)

b
##       [,1]
##  [1,]  0.7
##  [2,]  1.9
##  [3,]  3.1
##  [4,]  4.3
##  [5,]  5.5
##  [6,]  6.7
##  [7,]  7.9
##  [8,]  9.1
##  [9,] 10.3
## [10,] 11.5
## [11,] 12.7
## [12,] 13.9

A nice feature of barplot is that it can also handle two-dimensional data.

avg2 <- with(dat[dat$Year %in% c(1950, 1960),], 
             matrix(AirPassengers, ncol=12, nrow=2, byrow=TRUE))
rownames(avg2) <- c("1950", "1960")
colnames(avg2) <- 1:12
avg2
##        1   2   3   4   5   6   7   8   9  10  11  12
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1960 417 391 419 461 472 535 622 606 508 461 390 432

There are different variations of how to visualize two dimensional data. Below we show 3 examples.

par(mfrow=c(1,3), cex=1.1)

# PLOT 1
# stacked (default)
barplot(avg2, main="PLOT 1")

# PLOT 2
# besides
barplot(avg2,beside=TRUE, main="PLOT 2")

# PLOT 3
# transposed dimensions
avg3 <- t(avg2)
dim(avg2)
## [1]  2 12
dim(avg3)
## [1] 12  2
barplot(avg3, beside=TRUE, main="PLOT 3")


hist

Histograms present the distribution of numerical data. Each bar represents the number of observations in that particular interval. The data organised in intervals is optionally returned.

h <- hist(dat$AirPassengers, col='gray')

… with the following information returned:

h
## $breaks
##  [1] 100 150 200 250 300 350 400 450 500 550 600 650
## 
## $counts
##  [1] 24 24 21 13 21 13 13  8  4  1  2
## 
## $density
##  [1] 0.0033333333 0.0033333333 0.0029166667 0.0018055556 0.0029166667
##  [6] 0.0018055556 0.0018055556 0.0011111111 0.0005555556 0.0001388889
## [11] 0.0002777778
## 
## $mids
##  [1] 125 175 225 275 325 375 425 475 525 575 625
## 
## $xname
## [1] "dat$AirPassengers"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

Options include the arguments xlim and breaks to control the size of the intervals.

hist(dat$AirPassengers, col='gray', xlim=c(0,750), breaks=seq(0,750, by=25))


boxplot

Boxplots (i.e., box-and-whisker plots) summarize the distribution of numerical predictors in five values, namely the median, the upper and lower hinge of the box (i.e., first and third quartile), and extremes of the lower and the upper whisker. The box is representing the IQR (inter-quartile range, first to third quantile of data), and the whiskers extend to the most extreme data point which is no further than 1.5 times the length of the box away (i.e., 1.5*IQR) from the box. Data points outside this range are considered as outliers and plotted as points.

boxplot(dat$AirPassengers, horizontal = TRUE, 
        col="red", main="AirPassengers")

boxplot(AirPassengers ~ Year, data=dat,  
        col="gray", main="Grouped data", las=1,
        xlab="Year", ylab="AirPassengers")


contour plot

A contour plot is a graphical representation of a 3-dimensional surface by indicating the value of z with contourlines. Given a value for z, lines are drawn for connecting the x and y coordinates where that z value is measured.

x <- unique(dat$Year)
y <- unique(dat$Month)
z <- matrix(dat$AirPassengers, ncol=length(x), byrow=TRUE)
contour(x, y, z,
        main="AirPassengers",
        xlab="Year", ylab="Month")

Generally, the functions image() and contour() are combined.


images

An image is a graphical representation of a 3-dimensional surface by indicating the value of z with a colorspectrum.

x <- unique(dat$Year)
y <- unique(dat$Month)
z <- matrix(dat$AirPassengers, ncol=length(x), byrow=TRUE)
image(x, y, z, col=terrain.colors(50),
      zlim=c(0,700),
      main="AirPassengers", 
      xlab="Year", ylab="Month")
gradientLegend(valRange = c(0,700),color = terrain.colors(50),
               side=3,pos=0.125,inside = FALSE)

Generally, the functions image() and contour() are combined.

For example, the function plotsurface() combines image and contour for visualizing the contents of a dataframe.

library(plotfunctions)
plotsurface(dat, view=c("Year", "Month"), predictor = "AirPassengers", color=terrain.colors(50), col=1, zlim=c(0,700))



Other additions

legend

The function legend adds a legend in the specified location within the plot region. Possible location values include:

‘topleft’ ‘top’ ‘topleft’
‘left’ ‘center’ ‘right’
‘bottomleft’ ‘bottom’ ‘bottomright’
emptyPlot(1,1, bty='o')
legend('topright', legend=c("A", "B", "C"),
       col=c(2, 3, 4), lwd=c(1,2,3), lty=c(1,2,3),
       pch=c(16, NA, NA), merge=TRUE)
legend('center', legend=c("A", "B", "C"),
       col=c(2, 3, 4), lwd=c(1,2,3), lty=c(1,2,3),
       pch=c(16, NA, NA), merge=TRUE)

The function legend_margin works the same, but instead plots the legend in the plot margins. So the position is specified relative to the figure panel rather than relative to the plotregion.

emptyPlot(1,1, bty='o')
legend_margin('topright', legend=c("A", "B", "C"),
       col=c(2, 3, 4), lwd=c(1,2,3), lty=c(1,2,3),
       pch=c(16, NA, NA), merge=TRUE)
legend('center', legend=c("A", "B", "C"),
       col=c(2, 3, 4), lwd=c(1,2,3), lty=c(1,2,3),
       pch=c(16, NA, NA), merge=TRUE)

text and mtext

The function text is useful for adding labels to plots. The function expecteds an x- and y-position and a label to plot. In addition other parameters specify the layout:

  • argument pos: pos=1 under coordinates (x,y); pos=2 left from coordinates (x,y); pos=3 above coordinates (x,y); pos=4 right from coordinates (x,y).
  • argument adj: horizontal alignment; the two extremes are adj=0, which causes the text to start at the right side of the position, and adj=1, which causes the text to end at the position.
  • font: font=1 normal font, font=2 bold, font=3 italics, and font=4 italics bold.
emptyPlot(1,1, las=1, v=0.5)
for(i in seq(0,1, by=0.2)){
  text(0.5, i, labels=sprintf("### adj=%01.1f ###", i), adj=i, col=alpha(2), xpd=TRUE)
}
points(0.2,0.2, pch=16)
for(i in 1:4){
  text(0.2,0.2, labels=sprintf("pos=%d",i), pos=i)
}

The function mtext is used for adding labels into the margins of the plot.

emptyPlot(1,1, axes=FALSE)
axis(1, at=c(0,1))
axis(2, at=c(0,1))
box(lty=3)

for(i in 1:4){
  for(j in -2:4){
    mtext(sprintf("%d | line=%d",i, j), side=i, line=j, col=j+3)
  }
}

mtext(c(0,1), side=1, line=2, at=c(0,1),col=2 )
mtext(c(0,1), side=2, line=2, at=c(0,1),col=2 )

segments and arrows

Segments (straight lines) and arrows are useful for highlighting aspects in the plots. The functions segments() and arrws() work in a similar way and expect the following information:

  • x0 : x-coordinate starting point;
  • y0 : y-coordinate starting point;
  • x1 : x-coordinate end point;
  • y1 : y-coordinate end point.

In addition, the following arguments determine the arrow head:

  • code: code=1 draws arrowhead at starting point, code=2 draws arrowhead at end point, code=3 draws arrowheads at both sides.
  • length: length of arrowhead in inches.
  • angle: angle from the shaft of the arrow to the edge of the arrow head.

Finally, the generic layout options such as col (color), lwd (line with), and lty (line style) apply.

emptyPlot(1,1)
# draw arrows of length=1
x2y <- function(x, l=1){
  y = sqrt(l^2-x^2)
  return(y)
}
x <- c(0.2,0.3)
y <- x2y(x)

segments(x0=x, y0=c(0,0), 
         x1=x, y1=y, 
         lty=3, col=2)
arrows(x0=c(0,0), y0=c(0,0),
       x1=x, y1=y,
       length=.15,
       lwd=c(1,2), col=c(1,4))

error bars

Error bars can be added using arrows. The function errorBars is a wrapper around arrows.

For plotting symmetric errorbars, the function expects a series of x-coordinatoes (or y-coordinates when horiz=TRUE), a series of means, and a series of confidence intervals (arguments x, mean, and ci).

For plotting asymmetric errorbars, the optional argument ci.l represents the value for the lower confidence intervals, whereas the argument ci is taken as the value for the upper confidence band.

Note that the values for ci and ci.l in any case will be added to the mean.

library(plyr)
library(plotfunctions)

# calculate means and sd and quantiles:
means <- ddply(dat, "Year", summarise,
               mean = mean(AirPassengers),
               sd   = sd(AirPassengers),
               min  = min(AirPassengers),
               max  = max(AirPassengers))

# Two different plots:
# PLOT 1:
b <- barplot(means$mean, names.arg = means$Year, beside = TRUE,
        ylim=c(0,600), las=2,
        main="AirPassengers")
with(means, errorBars(b, mean, sd))
text(min(b), 600, labels=expression("" %+-% "1SD"),
     adj=0, xpd=TRUE)

# PLOT 2:
emptyPlot(c(0,600), range(means$Year), 
          ylab="Year", xlab="AirPassengers", las=1)
with(means, {
    errorBars(Year, mean, ci=mean-min, ci.l=max-mean, horiz = TRUE, length=.05)
    points(mean, Year, pch=15)
  })
legend("bottomright", legend=c("mean", "min-max"),
              lwd=c(0, 1), pch=c(15, NA),
              merge=TRUE, bty='n')

adding images to a plot

The function plot_images() can add images (png, jpg, gif, or matrices) to a plot - as background or at specfic coordinates. See help(plot_images) for more information and examples.

library(plotfunctions)

# see Volcano example at help(image)
# create image object:
myimg <- list(image=volcano-min(volcano), col=topo.colors(max(volcano)-min(volcano)))
# create emoty plot window:
emptyPlot(1,1, main="Volcano images")
# add image topleft corner:
plot_image(img=myimg, xrange=c(0,.25), yrange=c(.75,1), add=TRUE)