ggplot2

Layered Grammar of Graphics in R
Contributing Authors

Over a staff meeting at work, the topic of price of solid state hard drives came up (what are they, is it non linear with size, etc.). I decided to sample 120 solid state hard drives from newegg.com and recorded their size (in GB) and price (in USD) as well as their class (SATA II or SATA III). Note that the sampling was semi-random, in that I had no particular agenda, but did not go to great lengths to sample randomly. To look at this, I used ggplot2.

 ssd <- read.csv("http://joshuawiley.com/files/ssd.csv")
 ssd$class <- factor(ssd$class)

 require(ggplot2)
 ## first pass
 p <- ggplot(ssd, aes(x = price, y = size, colour = class)) +
     geom_point()
 print(p)

Scatter plot of Size and Price of SSDs

Not too bad, but the data is sparser at higher sizes and prices, so we can use a log-log scale to make it a little easier to see, and add locally weighted regression (loess) lines to assess linearity (or lack there of).

 ## add smooths and log to make clearer
 p <- p +
  stat_smooth(se=FALSE) +
  scale_x_log10(breaks = seq(0, 1000, 100)) +
  scale_y_log10(breaks = seq(0, 600, 100))

Scatter plot of Size and Price of SSDs in log 10
scale with loess smooth lines

Okay, that is nice. Lastly, let’s add better labels, make the x-axis text not overlap, and include the intercept and slope parameters for the linear lines of best fit for each class of hard drive.

 ## fit separate intercept and slope model
 m <- lm(size ~ 0 + class*price, data = ssd)
 est <- round(coef(m), 2)

 size2 <- paste0("II Size = ", est[1], " + ", est[3], "price")
 size3 <- paste0("III Size = ", est[2], " + ", est[4], "price")

 ## finalize
 p <- p +
  annotate("text", x = 100, y = 600, label = size2) +
  annotate("text", x = 100, y = 500, label = size3) +
  labs(x = "Price in USD", y = "Size in GB") +
  opts(title = "Log-Log Plot of SSD Size and Price",
       axis.text.x = theme_text(angle = 45, hjust = 1, vjust = 1))

Fancy Scatter plot of Size and Price of SSDs in log
10 with loess smooth lines

(guest post by Joshua Wiley)

  1. Joshua Wiley submitted this to ggplot2