In my previous blog post, I explored what was needed to create a new transformation for the scales package and gave an example of a mathematical transformation. In this post, I want to show an additional example related to the other mentioned use case (mapping a continuous like variable with specific structure and formatting) and extend the example into creating new scales functions which integrate into ggplot even more directly.
Dates and times are tricky to work with because they have detailed external constraints and conventions. Within the R ecosystem, several packages exist solely to deal with dates and times (chron, lubridate, date, mondate, timeDate, TimeWarp, etc.), and an article has appeared in R News on the topic (Brian D. Ripley and Kurt Hornik. Date-time classes. R News, 1(2):8-11, June 2001.).
There is already support for dates (using the Date
class, via date_trans
in scales
and scale_*_date
in ggplot2
) and datetimes (using the POSIXt
class, via time_trans
in scales
and scale_*_datetime
in ggplot2
). The piece that is missing is for time, separate from any date; “clock time”, if you will.
Exercising the first of the three great virtues of a programmer, laziness, it is worth seeing what has already been done (classes and functions) to deal with clock time.
The chron
package has a class times
which can specify times of day, independent of a date. Additionally, there are many supporting functions for this class:
> methods(class="times")
[1] [.times* [[.times* [<-.times*
[4] as.character.times* as.data.frame.times* axis.times*
[7] Axis.times* c.times* diff.times*
[10] format.times* hist.times* identify.times*
[13] is.na.times* lines.times* Math.times*
[16] mean.times* Ops.times* plot.times*
[19] points.times* pretty.times* print.times*
[22] quantile.times* summary.times* Summary.times*
[25] trunc.times* unique.times* xtfrm.times*
Non-visible functions are asterisked
Following the pattern of the previous post, each of the parts of the transformation can be determined.
transform
and inverse
When dealing with variable that is a class, transform
must take the specific representation and convert it to a simple numeric representation (map to [part of] the real line in mathematical terms); inverse
does the opposite functional mapping. Generally, this requires delving into the structure of the class to see how it is really put together. To do that, let’s create some data. The times
documentation says it can convert a character vector (by default in 24-hour, minute, second format, separated by colons) to times.
Time <- times(c("18:37:11", "16:51:34", "15:05:57", "13:20:20",
"11:34:43", "09:49:06", "08:03:29", "06:17:52",
"04:32:15", "02:46:38", "01:01:01"))
which if printed gives
> Time
[1] 18:37:11 16:51:34 15:05:57 13:20:20 11:34:43 09:49:06
[7] 08:03:29 06:17:52 04:32:15 02:46:38 01:01:01
So far, so good. But what does this object/class really look like?
> str(Time)
Class 'times' atomic [1:11] 0.776 0.702 0.629 0.556 0.482 ...
..- attr(*, "format")= chr "h:m:s"
> dput(Time)
structure(c(0.775821759259259, 0.702476851851852, 0.629131944444444,
0.555787037037037, 0.48244212962963, 0.409097222222222, 0.335752314814815,
0.262407407407407, 0.1890625, 0.115717592592593, 0.0423726851851852
), format = "h:m:s", class = "times")
times
are just vectors with an attribute and a class. A little more digging and testing can show that the numeric part is just the fraction of a day that that time represents.
> str(times(c("00:00:00","6:00:00","12:00:00","23:59:59")))
Class 'times' atomic [1:4] 0 0.25 0.5 1
..- attr(*, "format")= chr "h:m:s"
> dput(times(c("00:00:00","6:00:00","12:00:00","23:59:59")))
structure(c(0, 0.25, 0.5, 0.999988425925926), format = "h:m:s", class = "times")
Most of the work of creating a mapping to numeric values is already done; all that is needed is to strip off the class and attributes. as.numeric()
does that nicely.
> as.numeric(Time)
[1] 0.77582176 0.70247685 0.62913194 0.55578704 0.48244213
[6] 0.40909722 0.33575231 0.26240741 0.18906250 0.11571759
[11] 0.04237269
That is only half the mapping. We also need to go from this representation to a times
object. Looking at the constructor for times
, it can take a numeric vector representing “number of days since an origin.” It’s not stated, but maybe times are then just fractions of a day?
> times(as.numeric(Time))
[1] 18:37:11 16:51:34 15:05:57 13:20:20 11:34:43 09:49:06
[7] 08:03:29 06:17:52 04:32:15 02:46:38 01:01:01
Sure looks like it.
> identical(Time, times(as.numeric(Time)))
[1] TRUE
So transform
is just the as.numeric
function and inverse
is the times
function.
Getting breaks on time right is important; an axis where the ticks are every 7 seconds is going to look odd (unless there is a really compelling reason), as would 25 seconds. In base graphics, the generic function pretty
has the responsibility to find “nice” breaks. Looking at the methods for times
, there is a pretty.times
. Does it work (well enough)?
> pretty(Time)
[1] 03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00
attr(,"labels")
[1] 03:00 06:00 09:00 12:00 15:00 18:00
That’s pretty reasonable. Checking under the hood to see what is going on, chron::pretty.times
calls chron::pretty.chron
which calls grDevices::pretty.POSIXt
which calls grDevices::prettyDate
. Looking at the code for prettyDate
, the allowed (sub-day) breaks are 1 second, 2 seconds, 5 seconds, 10 seconds, 15 seconds, 30 seconds, 1 minute, 2 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, and 12 hours. I might have added a 2 hour option, but it is not worth throwing away others’ work because of. pretty_breaks
already wraps pretty
in the format expected by scales, so we can just use pretty_breaks()
as the breaks
function.
> pretty_breaks()(range(Time))
03:00 06:00 09:00 12:00 15:00 18:00
03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00
attr(,"labels")
[1] 03:00 06:00 09:00 12:00 15:00 18:00
Here, we almost catch another break. When format
is not defined, the names that are associated with what breaks
returns in principle are used. Unfortunately, in practice, this is not the case because inside ggplot code, the breaks get transformed back and forth between data spaces and lose their attributes (names). If we want the default formatting (full hour minute and second), then this can simply be format
. If we only want seconds to appear when they are not all 0 (when the increment is less than 1 minute), then we have to write our own function that passes the appropriate flag (simplify
) as to whether the seconds should be suppressed.
fmt <- function(x) {
format(x, simplify = !any(diff(x) < 1/(24*60)))
}
Since times
is defined in terms of a fraction of a day, it is only meaningful in the range 0 to 1 (inclusive on the left, exclusive on the right). domain
does not have a way of defining inclusivity or exclusivity of the endpoints, so the domain is just c(0,1)
The transform object for datetime (POSIXt
) objects already use the name “time”, so the obvious name “times” would be confusing. I’ve chosen “chrontimes” as a name, to indicate that it is the times
object from the chron
package.
times_trans <- function() {
fmt <- function(x) {
format(x, simplify = !any(diff(x) < 1/(24*60)))
}
trans_new("chrontimes",
transform = as.numeric,
inverse = times,
breaks = pretty_breaks(),
format = fmt,
domain=c(0,1))
}
Using the Time
values previously created, and some other random data, make a data frame to plot.
dat <- data.frame(time = Time,
value = c(7L, 6L, 9L, 11L, 10L, 1L,
4L, 2L, 3L, 5L, 8L))
A default plot of this gives
ggplot(dat, aes(time, value)) + geom_point()
ggplot(dat, aes(time, value)) + geom_point() +
scale_x_continuous(trans=times_trans())
If you want to go the next step and create scale_*_times
functions for using directly in ggplot, you can. In doing so, you may realize that, when done along a y-axis, you would expect time to run from top to bottom, not bottom to top as the y axis typically runs. Using the ideas of the reversed scale described in the previous post, a reversed times transformation can also be made. Then making the scale_x_times
and scale_y_times
is just a matter of passing the right transformation to scale_x_continuous
and scale_y_continuous
.
timesreverse_trans <- function() {
trans <- function(x) {-as.numeric(x)}
inv <- function(x) {times(-x)}
fmt <- function(x) {format(x, simplify = !any(diff(x) < 1/(24*60)))}
trans_new("chrontimes-reverse",
transform = trans,
inverse = inv,
breaks = pretty_breaks(),
format = fmt,
domain=c(0,1))
}
scale_x_times <- function(..., trans=NULL) {
scale_x_continuous(trans=times_trans(), ...)
}
scale_y_times <- function(..., trans=NULL) {
scale_y_continuous(trans=timesreverse_trans(), ...)
}
Examples of plots with times on each axis in full ggplot syntax
ggplot(dat, aes(time, value)) + geom_point() +
scale_x_times()
ggplot(dat, aes(value, time)) + geom_point() +
scale_y_times()