Chapter 5 Base R graphics

In chapter 3.10 you saw a brief introduction to ggplot2 graphics framework, that has been increasingly popular in the recent decades. It is still useful to also be familiar with the base R graphics functionality that has existed long before ggplot, as both frameworks have their advantages and disadvantages.

There are three kinds of plotting functions in base R:

  • Functions that generate a new plot, e.g. hist() and plot().
  • Functions that add extra things to an existing plot, e.g. lines() and text().
  • Functions that allow you to interact with the plot, e.g. locator() and identify().

The normal procedure for making a graph in R is to make a fairly simple initial plot and then add on points, lines, text etc., preferably in a script.

5.1 Simple plot on the screen

Load the births data and get an overview of the variables:

library(Epi)
data(births)
str(births)

Now look at the birth weight distribution with

hist(births$bweight)

The histogram can be refined – take a look at the possible options with

help(hist)

and try some of the options, for example:

hist(births$bweight, col = "gray", border = "white")

To look at the relationship between birthweight and gestational weeks, try

with(births, plot(gestwks, bweight))

You can change the plot-symbol by the option pch=. If you want to see all the plot symbols try:

plot(1:25, pch = 1:25)
  • Make a plot of the birth weight versus maternal age with
with(births, plot(matage, bweight))
  • Label the axes with
with( births, 
  plot(matage, bweight, 
    xlab = "Maternal age", 
    ylab = "Birth weight (g)"
  ))

5.2 Colours

There are many colours recognized by R. You can list them all by colours() or, equivalently, colors() (R allows you to use British or American spelling). There are also some helpful webpages where you can actually see the available colors: for instance https://r-charts.com/colors/. To colour the points of birthweight versus gestational weeks, try, for instance:

with(births, plot(gestwks, bweight, pch = 16, col = "green"))

This creates a solid mass of colour in the centre of the cluster of points and it is no longer possible to see individual points. You can recover this information by overwriting the points with black circles using the points() function.

with(births, plot(gestwks, bweight, pch = 16, col = "green"))
with(births, points(gestwks, bweight, pch = 1))

Note: when the number of data points on a scatter plot is large, you may also want to decrease the point size: to get points that are 50% of the original size, add the parameter cex=0.5 (or another number <1 for different sizes).

5.3 Adding to a plot

The points() function just used is one of several functions that add elements to an existing plot. By using these functions, you can create quite complex graphs in small steps.

Suppose we wish to recreate the plot of birthweight vs gestational weeks using different colours for male and female babies. To start with an empty plot, with type='n' argument.

Then add the points with the points function.

with(births, plot(gestwks, bweight, type = "n"))
with(
  births, 
  points(gestwks[sex == 1], bweight[sex == 1], col = "blue")
)
with(
  births, 
  points(gestwks[sex == 2], bweight[sex == 2], col = "red")
)

To add a legend explaining the colours, try

with(births, plot(gestwks, bweight, type = "n"))
with(
  births, 
  points(gestwks[sex == 1], bweight[sex == 1], col = "blue")
)
with(
  births, 
  points(gestwks[sex == 2], bweight[sex == 2], col = "red")
)
legend(
  "topleft", 
  pch = 1, 
  legend = c("Boys", "Girls"), 
  col = c("blue", "red")
)

which puts the legend in the top left hand corner.

Finally we can add a title to the plot with

with(births, plot(gestwks, bweight, type = "n"))
with(
  births, 
  points(gestwks[sex == 1], bweight[sex == 1], col = "blue")
)
with(
  births, 
  points(gestwks[sex == 2], bweight[sex == 2], col = "red")
)
legend(
  "topleft", 
  pch = 1, 
  legend = c("Boys", "Girls"), 
  col = c("blue", "red")
)
title(
  "Birth weight vs gestational weeks in 500 singleton births"
)

5.4 Using indexing for plot elements

One of the most powerful features of R is the possibility to index vectors, not only to get subsets of them, but also for repeating their elements in complex sequences.

Putting separate colours on males and female as above would become very clumsy if we had a 5 level factor instead of sex.

Instead of specifying one color for all points, we may specify a vector of colours of the same length as the gestwks and bweight vectors. This is rather tedious to do directly, but R allows you to specify an expression anywhere, so we can use the fact that sex takes the values 1 and 2, as follows:

First create a colour vector with two colours, and take look at sex:

c("blue", "red")
births$sex

Now see what happens if you index the colour vector by sex:

c("blue", "red")[births$sex]

For every occurrence of a 1 in sex you get "blue", and for every occurrence of 2 you get "red", so the result is a long vector of "blue"s and "red"s corresponding to the males and females. This can now be used in the plot:

with(
  births, 
  plot(gestwks, bweight, pch = 16, col = c("blue", "red")[sex])
)

The same trick can be used if we want to have a separate symbol for different categories (use pch=c(15,17)[sex], for instance).

5.5 Saving your graphs for use in other documents

If you need to use the plot in a report or presentation, you can save it in a graphics file. Once you have generated the script (sequence of R commands) that produce the graph (and it looks ok on screen), you can start a non-interactive graphics device and then re-run the script. Instead of appearing on the screen, the plot will now be written directly to a file. After the plot has been completed you will need to close the device again in order to be able to access the file. Try:

pdf(file = "bweight_gwks.pdf", height = 4, width = 4)
with(births, plot(gestwks, bweight, col = c("blue", "red")[sex]))
legend(
  "topleft", 
  pch = 1, 
  legend = c("Boys", "Girls"), 
  col = c("blue", "red")
)
dev.off()

This will give you a portable document file bweight_gwks.pdf with a graph which is 4 inches tall and 4 inches wide.

Instead of pdf, other formats can be used (jpg, png, tiff, …). See help(Devices) for the available options.

In window-based environments (R GUI for Windows, R-Studio) you may also use the menu (File\(\rightarrow\)Save as ... or Export) to save the active graph as a file and even copy-paste may work (from R graphics window to Word, for instance) – however, writing it manually into the file is recommended for reproducibility purposes (in case you need to redraw your graph with some modifications).

5.6 Interacting with a plot

The locator() function allows you to interact with the plot using the mouse. Typing locator(1) shifts you to the graphics window and waits for one click of the left mouse button. When you click, it will return the corresponding coordinates.

You can use locator() inside other graphics functions to position graphical elements exactly where you want them. Recreate the birth-weight plot,

with(births, plot(gestwks, bweight, col = c("blue", "red")[sex]))

and then add the legend where you wish it to appear by typing

legend(
  locator(1), 
  pch = 1, 
  legend = c("Boys", "Girls"), 
  col = c("blue", "red")
)

The identify() function allows you to find out which records in the data correspond to points on the graph. Try

with(births, identify(gestwks, bweight))

When you click the left mouse button, a label will appear on the graph identifying the row number of the nearest point in the data frame births. If there is no point nearby, R will print a warning message on the console instead. To end the interaction with the graphics window, right click the mouse: the identify function returns a vector of identified points.

  • Use identify() to find which records correspond to the smallest and largest number of gestational weeks and view the corresponding records:
with(births, births[identify(gestwks, bweight), ])

5.7 A completely custom plot

A useful feature of base R graphics is that you can basically plot just anything, adding text or even mathematical formulas or symbols (see the help page for plotmath()), even without any data. The following code creates one “pointless example with two points”, just to give you an idea. That is possibly an advantage over ggplot2 that sometimes lacks flexibility.

plot(0:100,0:100,type="n",axes=F, xlab=" ",ylab=" ") # create an invisible plot for coordinate system
 points(30,30,pch=16,cex=10, col="blue")
 text(30,50,"big blue dot")
 points(60,60,pch=16,cex=2, col="red")
 text(60,70,"small red dot")
 arrows(40,40,55,55) # you can customize it for nicer arrows (see help)
 text(45,50,expression(beta))