ggplot2

Layered Grammar of Graphics in R
Contributing Authors

In this series of three posts, we’ll look at colours in R graphics produced with ggplot2: what are the available choices of colour schemes, and how to choose a colour palette most suitable for a particular graphic?

In kindergarten, choosing a colour was easy, palettes were limited to a few classics. As cool kids grow older and use R, the spectrum expands to present us with overwhelming choice of millions of colours, most of them with poorly defined labels such as "#A848F2" or "lavenderblush3". Inasmuch as scientific graphics resemble a paint-by-numbers game, R can help us design more elegant palettes with pertinent colour choices based on the data to display.

Overview of basic colour functions in R

Base graphics rely mostly on the grDevices package for the selection of colours, with a few palettes to choose from:

(some palettes can have many more colours, this image is only an illustration of their structure)

The package also provides a number of basic operations to convert colours (adjustcolor, col2rgb, make.rgb, rgb2hsv, convertColor) and create interpolating palettes (rgb, hsv, hcl, gray, colorRamp, colorRampPalette, densCols, gray.colors).

Beyond that, a good resource is the colorspace package which provides further utilities to convert from one colorspace to another (HLS, HSV, LAB, LUV, RGB, sRGB, XYZ) and perform various operations on colours. A special note can be made of a few palette functions, “diverge_hcl”, “diverge_hsv”, “heat_hcl”, “rainbow_hcl”, “sequential_hcl”, “terrain_hcl”, which provide an easy way to produce colour palettes following a particular path in the colour space (varying hue with constant luminosity and saturation, for example).

Other packages such as RColorBrewer, munsell and dichromat provide more colour palettes and utilities.

While the combination of these tools is quite flexible, the user interface becomes a little bit chaotic. More recently, the scales package has provided wrappers around these functions to provide some consistency in the naming schemes and organise the different categories of palettes in a structured way:

Utilities functions, such as col2hcl, fullseq, muted, rescale, rescale_mid, rescale_none, rescale_pal, seq_gradient_pal, show_col

Palettes with consistent interface, brewer_pal, dichromat_pal, gradient_n_pal , div_gradient_pal, hue_pal, grey_pal, identity_pal, manual_pal.

The ggplot2 package uses scales internally, and mirrors this structure. In this first part, we’ll review the basic commands to assign colours in ggplot2.

Colours in ggplot2

Let’s consider three plots for illustration:

p1 maps the colour of points to a continuous variable, p2 maps the fill of bars to a discrete variable, and p3 maps the fill of tiles to a continuous variable.

Colour vs fill aesthetic

Fill and colour scales in ggplot2 can use the same palettes. Some shapes such as lines only accept the colour aesthetic, while others, such as polygons, accept both colour and fill aesthetics. In the latter case, the colour refers to the border of the shape, and the fill to the interior.

Aesthetic mapping vs set values

Another common source of confusion, general to ggplot2, is the distinction between set values and mapped values in a layer. Consider the following example,

 d = data.frame(x = 1:10, y = rnorm(10), z = gl(5, 2)) 
 a = ggplot(d, aes(x, y, group=z))

 grid.arrange(a + geom_path( colour = "red" ), 
                   a + geom_path( aes(colour = z )), 
                   nrow=1)

mapping

Continuous scales

The default continuous scale in ggplot2 is a blue gradient, from low = "#132B43" to high = "#56B1F7" which can be reproduced as

  scales::seq_gradient_pal(low = "#132B43", high = "#56B1F7", space = "Lab")

continuous scale

Discrete scales

The default discrete scale in ggplot2 is a range of hues from hcl,

  scales::hue_pal(h = c(0, 360) + 15, c = 100, l = 65, h.start = 0, 
                          direction = 1)

discrete scale

In the next post of this series we’ll describe how one can fine-tune or change altogether these default colours, and, perhaps more importantly, give some pointers on choosing an appropriate colour scheme for a particular graphic.


Source code for the graphs

Ben Schmid took ship’s log data (previously visualized in static form on the the Spatial Analysis blog), and used ggplot and ffmpeg to animate the paths of individual voyages from 1750-1850. The images above come from the animation that combines all years to emphasize seasonal patterns. Another animation depicts the the data year-by-year. I especially like how the name of the month is positioned to reflect the sun’s meridian.

The syntax of ggplot2 emphasizes constructing plots by adding components, or layers, using +.

Possibly one of the most useful, but least remarked upon, consequences of this syntax is that it allows for an incredible degree of flexibility in saving and reusing components of plots. Here are two very simple examples that come up frequently for me.

I frequently make line plots where the x axis is categorical. For instance, consider the following example data:

x <- factor(paste(1990:2005,1991:2006,sep = "-"))
dat <- data.frame(x = x,y = rnorm(length(x))

which we can plot like so:

p <- ggplot(dat,aes(x = x,y = y)) + geom_line(aes(group = 1))
p

Obviously, we can’t keep those x axis labels like that, they’re unreadable! So I’m frequently doing something like the following:

p + opts(axis.text.x = theme_text(size = 7, 
                                  hjust = 0, 
                                  vjust = 1,
                                  angle = 310))

But who wants to type all that over and over for each plot? So instead, I just store the results of that opts() call:

x_angle <- opts(axis.text.x = theme_text(size = 7,
                                        hjust = 0,
                                        vjust = 1,
                                        angle = 310))
p + x_angle

While in this case it doesn’t look too bad, if my x axis has even more values, showing all of the labels can seem a little excessive. Maybe we really only need to show every other x axis tick label:

l <- levels(dat$x)[seq(1,length(levels(dat$x)),by = 2)]
p + x_angle + scale_x_discrete(breaks = l,labels = l)

Again, this kind of thing comes up a lot, and typing this over and over can get a bit tedious. But you can write a simple function that takes the axis tick labels (in the correct order) and returns the scale_x_discrete object as needed:

every_other <- function(labs,side = "x",...){
    l <- labs[seq(1,length(labs),by = 2)]
    if (side == 'x'){
        return(scale_x_discrete(breaks = l,labels = l,...))
    }
    if (side == 'y'){
        return(scale_y_discrete(breaks = l,labels = l,...))
    }
}

So in the end you can do all that simply with the following code:

p + x_angle + every_other(levels(dat$x))

These examples are fairly simple, but perhaps they’ll get you thinking about components of your plots that can be stored and reused, or generated by functions.

A question was raised today on the mailing list: Is there an easy way to add a watermark to a ggplot?

There are several options, depending on the type of watermark and the required level of control over the output,

  • add a text label using annotate (the original idea of the poster)

  • add a custom grob (graphical object from the Grid package), using annotation_custom

In either case, the placement of a watermark at an absolute location on the plot is greatly facilitated if you use +/- Inf values, which correspond to the extreme edges of the plot panel.

Here is an example with annotate

 library(ggplot2)
 library(grid)

 qplot(1:10, rnorm(10)) +
   annotate("text", x = Inf, y = -Inf, label = "PROOF ONLY",
            hjust=1.1, vjust=-1.1, col="white", cex=6,
            fontface = "bold", alpha = 0.8)

where the label is placed at the bottom-right, and the justification is adjusted to make sure the label stays in the panel area.

watermark 1

Below is a fancier example with a custom grob, which we define such that its width spans the full plot panel, even after resizing the interactive plot window,

 watermarkGrob <- function(lab = "PROOF ONLY"){
   grob(lab=lab, cl="watermark") 
 }

 ## custom draw method to
 ## calculate expansion factor on-the-fly
 drawDetails.watermark <- function(x, rot = 45, ...){
 cex <- convertUnit(unit(1,"npc"), "mm", val=TRUE) /
   convertUnit(unit(1,"grobwidth", textGrob(x$val)), "mm",val=TRUE)

 grid.text(x$lab,  rot=rot, gp=gpar(cex = cex, col="white",
                                        fontface = "bold", alpha = 0.5))

 }

 qplot(1:10, rnorm(10)) +
   annotation_custom(xmin=-Inf, ymin=-Inf, xmax=Inf, ymax=Inf, watermarkGrob())

watermark 2

You can of course replace this grob with a more complex one, e.g a table of labels to tile the panel with multiple repetitions of the watermark, or an external graphic (consider the annotation_raster function), etc. As an example, the following function uses rpatternGrob from the gridExtra package to tile multiple copies of the R logo, imported as a raster image,

 library(png)
 library(gridExtra)
 ## import logo as raster image
 m <- readPNG(system.file("img", "Rlogo.png", package="png"), FALSE)
 w <- matrix(rgb(m[,,1],m[,,2],m[,,3], m[,,4] * 0.2), # adjust alpha
             nrow=dim(m)[1])


 qplot(1:10, rnorm(10), geom = "blank") +
      annotation_custom(xmin=-Inf, ymin=-Inf, xmax=Inf, ymax=Inf, 
         rpatternGrob(motif=w, motif.width = unit(1, "cm"))) +
 geom_point()

watermark 3

This time we made sure that the logo was the first layer plotted, so that it doesn’t obfuscate the data but stays in the background.

Christopher Gandrud uses ggplot2 to visualize potential partisan bias in US Federal Reserve inflation forecasts as a PhD student at the London School of Economics.

Next week I’ll present a glimpse of R and ggplot2 graphics at VUW. This is a MESA seminar¬†on 'Data analysis and plotting with free and open source tools' where we’ll present spreadsheet alternatives based on gnuplot, Python, and R.

Presentation

Neat demo real of d3 (js & svg powered interactive graphics in the browser).  Hopefully there will be ggplot2 integration one day!

ggplot2 in the Atlantic