ggplot2

Layered Grammar of Graphics in R
Contributing Authors

In my previous blog post, I explored what was needed to create a new transformation for the scales package and gave an example of a mathematical transformation. In this post, I want to show an additional example related to the other mentioned use case (mapping a continuous like variable with specific structure and formatting) and extend the example into creating new scales functions which integrate into ggplot even more directly.

Time

Dates and times are tricky to work with because they have detailed external constraints and conventions. Within the R ecosystem, several packages exist solely to deal with dates and times (chron, lubridate, date, mondate, timeDate, TimeWarp, etc.), and an article has appeared in R News on the topic (Brian D. Ripley and Kurt Hornik. Date-time classes. R News, 1(2):8-11, June 2001.).

There is already support for dates (using the Date class, via date_trans in scales and scale_*_date in ggplot2) and datetimes (using the POSIXt class, via time_trans in scales and scale_*_datetime in ggplot2). The piece that is missing is for time, separate from any date; “clock time”, if you will.

Existing solutions

Exercising the first of the three great virtues of a programmer, laziness, it is worth seeing what has already been done (classes and functions) to deal with clock time.

The chron package has a class times which can specify times of day, independent of a date. Additionally, there are many supporting functions for this class:

> methods(class="times")
 [1] [.times*             [[.times*            [<-.times*          
 [4] as.character.times*  as.data.frame.times* axis.times*         
 [7] Axis.times*          c.times*             diff.times*         
[10] format.times*        hist.times*          identify.times*     
[13] is.na.times*         lines.times*         Math.times*         
[16] mean.times*          Ops.times*           plot.times*         
[19] points.times*        pretty.times*        print.times*        
[22] quantile.times*      summary.times*       Summary.times*      
[25] trunc.times*         unique.times*        xtfrm.times*        

   Non-visible functions are asterisked

Following the pattern of the previous post, each of the parts of the transformation can be determined.

transform and inverse

When dealing with variable that is a class, transform must take the specific representation and convert it to a simple numeric representation (map to [part of] the real line in mathematical terms); inverse does the opposite functional mapping. Generally, this requires delving into the structure of the class to see how it is really put together. To do that, let’s create some data. The times documentation says it can convert a character vector (by default in 24-hour, minute, second format, separated by colons) to times.

Time <- times(c("18:37:11", "16:51:34", "15:05:57", "13:20:20",
                "11:34:43", "09:49:06", "08:03:29", "06:17:52",
                "04:32:15", "02:46:38", "01:01:01"))

which if printed gives

> Time
 [1] 18:37:11 16:51:34 15:05:57 13:20:20 11:34:43 09:49:06
 [7] 08:03:29 06:17:52 04:32:15 02:46:38 01:01:01

So far, so good. But what does this object/class really look like?

> str(Time)
Class 'times'  atomic [1:11] 0.776 0.702 0.629 0.556 0.482 ...
  ..- attr(*, "format")= chr "h:m:s"
> dput(Time)
structure(c(0.775821759259259, 0.702476851851852, 0.629131944444444, 
0.555787037037037, 0.48244212962963, 0.409097222222222, 0.335752314814815, 
0.262407407407407, 0.1890625, 0.115717592592593, 0.0423726851851852
), format = "h:m:s", class = "times")

times are just vectors with an attribute and a class. A little more digging and testing can show that the numeric part is just the fraction of a day that that time represents.

> str(times(c("00:00:00","6:00:00","12:00:00","23:59:59")))
Class 'times'  atomic [1:4] 0 0.25 0.5 1
  ..- attr(*, "format")= chr "h:m:s"
> dput(times(c("00:00:00","6:00:00","12:00:00","23:59:59")))
structure(c(0, 0.25, 0.5, 0.999988425925926), format = "h:m:s", class = "times")

Most of the work of creating a mapping to numeric values is already done; all that is needed is to strip off the class and attributes. as.numeric() does that nicely.

> as.numeric(Time)
 [1] 0.77582176 0.70247685 0.62913194 0.55578704 0.48244213
 [6] 0.40909722 0.33575231 0.26240741 0.18906250 0.11571759
[11] 0.04237269

That is only half the mapping. We also need to go from this representation to a times object. Looking at the constructor for times, it can take a numeric vector representing “number of days since an origin.” It’s not stated, but maybe times are then just fractions of a day?

> times(as.numeric(Time))
 [1] 18:37:11 16:51:34 15:05:57 13:20:20 11:34:43 09:49:06
 [7] 08:03:29 06:17:52 04:32:15 02:46:38 01:01:01

Sure looks like it.

> identical(Time, times(as.numeric(Time)))
[1] TRUE

So transform is just the as.numeric function and inverse is the times function.

breaks

Getting breaks on time right is important; an axis where the ticks are every 7 seconds is going to look odd (unless there is a really compelling reason), as would 25 seconds. In base graphics, the generic function pretty has the responsibility to find “nice” breaks. Looking at the methods for times, there is a pretty.times. Does it work (well enough)?

> pretty(Time)
[1] 03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00
attr(,"labels")
[1] 03:00 06:00 09:00 12:00 15:00 18:00

That’s pretty reasonable. Checking under the hood to see what is going on, chron::pretty.times calls chron::pretty.chron which calls grDevices::pretty.POSIXt which calls grDevices::prettyDate. Looking at the code for prettyDate, the allowed (sub-day) breaks are 1 second, 2 seconds, 5 seconds, 10 seconds, 15 seconds, 30 seconds, 1 minute, 2 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, and 12 hours. I might have added a 2 hour option, but it is not worth throwing away others’ work because of. pretty_breaks already wraps pretty in the format expected by scales, so we can just use pretty_breaks() as the breaks function.

> pretty_breaks()(range(Time))
   03:00    06:00    09:00    12:00    15:00    18:00 
03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 
attr(,"labels")
[1] 03:00 06:00 09:00 12:00 15:00 18:00

format

Here, we almost catch another break. When format is not defined, the names that are associated with what breaks returns in principle are used. Unfortunately, in practice, this is not the case because inside ggplot code, the breaks get transformed back and forth between data spaces and lose their attributes (names). If we want the default formatting (full hour minute and second), then this can simply be format. If we only want seconds to appear when they are not all 0 (when the increment is less than 1 minute), then we have to write our own function that passes the appropriate flag (simplify) as to whether the seconds should be suppressed.

fmt <- function(x) {
    format(x, simplify = !any(diff(x) < 1/(24*60)))
}

domain

Since times is defined in terms of a fraction of a day, it is only meaningful in the range 0 to 1 (inclusive on the left, exclusive on the right). domain does not have a way of defining inclusivity or exclusivity of the endpoints, so the domain is just c(0,1)

name

The transform object for datetime (POSIXt) objects already use the name “time”, so the obvious name “times” would be confusing. I’ve chosen “chrontimes” as a name, to indicate that it is the times object from the chron package.

Putting it all together

times_trans <- function() {
    fmt <- function(x) {
        format(x, simplify = !any(diff(x) < 1/(24*60)))
    }
    trans_new("chrontimes",
              transform = as.numeric,
              inverse = times,
              breaks = pretty_breaks(),
              format = fmt,
              domain=c(0,1))
}

Using the transformation in ggplot

Using the Time values previously created, and some other random data, make a data frame to plot.

dat <- data.frame(time = Time,
                  value = c(7L, 6L, 9L, 11L, 10L, 1L,
                            4L, 2L, 3L, 5L, 8L))

A default plot of this gives

ggplot(dat, aes(time, value)) + geom_point()

ggplot(dat, aes(time, value)) + geom_point() +
  scale_x_continuous(trans=times_trans())

Integrating as a ggplot scale

If you want to go the next step and create scale_*_times functions for using directly in ggplot, you can. In doing so, you may realize that, when done along a y-axis, you would expect time to run from top to bottom, not bottom to top as the y axis typically runs. Using the ideas of the reversed scale described in the previous post, a reversed times transformation can also be made. Then making the scale_x_times and scale_y_times is just a matter of passing the right transformation to scale_x_continuous and scale_y_continuous.

timesreverse_trans <- function() {
    trans <- function(x) {-as.numeric(x)}
    inv <- function(x) {times(-x)}
    fmt <- function(x) {format(x, simplify = !any(diff(x) < 1/(24*60)))}
    trans_new("chrontimes-reverse",
              transform = trans,
              inverse = inv,
              breaks = pretty_breaks(),
              format = fmt,
              domain=c(0,1))
}


scale_x_times <- function(..., trans=NULL) {
    scale_x_continuous(trans=times_trans(), ...)
}

scale_y_times <- function(..., trans=NULL) {
    scale_y_continuous(trans=timesreverse_trans(), ...)
}

Examples of plots with times on each axis in full ggplot syntax

ggplot(dat, aes(time, value)) + geom_point() +
  scale_x_times()

ggplot(dat, aes(value, time)) + geom_point() +
  scale_y_times()

  1. ggplot2 posted this