Show all data in the background of your faceted ggplot

One of the game-changing features of ggplot2 was the ease with which one can explore the dimensions of the data using small multiples.¹ There is a small trick that I was to share today – put all the data in background of every panel. This can considerably improve comparability of the data across the dimension which splits the dataset into the subsets for the small multiples. Better to show right away what I mean and then explain in details.

¹ At some point lattice became pretty popular for the task, but then ggplot2 entered the scene.

² Images are clickable

There is a weekly dataviz challenge organized by Cole Knaflic. One particular challenge stroke me as an ultimate case to showcase this background data trick. Here are the two plots:² the challenge one and my version.

It is impossible to meaningfully compare and distinguish multiple spaghetti lines at once. Thus, my choice here was to use small multiples and look at the lines one by one. But we still want to compare the lines. For this, I added the pale background lines that show the spread of all data. Note that I also sorted the small multiples in the decreasing order and added the average line in yellow.

But the main trick here is adding all the data in the background. And with ggplot2 it’s super easy to do. All we need is to add a layer to the plot in which we modify the data by removing the variable that was used for faceting.

Here we use the nice feature of ggplot2 – the layers inherit whatever you specify in the main ggplot() call. In this case our background layer is inheriting the data parameter and all we need is just to remove the variable that is later used for faceting. Consider the following pseudo-code:

df |> 
    ggplot(PLOT_PARAMETERS)+
    geom_WHATEVER(data = df |> select(-FACETING_VARIABLE))+
    facet_wrap(~ FACETING_VARIABLE)

Once we’ve done this, gpplot2 no longer knows how to assign subsets of data to the corresponding small multiples. Note that this only happens in the layer where we perform the trick and explicitly throw out the faceting variable. As the result, in each small multiple we end up with all the data in this layer. Put this layer in the background, make it appropriately pale/transparent – and that’s it. I find this dataviz trick amazingly straightforward, simple, and powerful.

You can replicate the full figure above using the R code from this gist

This post is one in the faceting series. Other posts:

Save space in faceted plots

--- title: "Show all data in the background of your faceted ggplot" description-meta: "{{< meta website.description >}}" date: "2020-06-19" image: teaser.png aliases: - "/2023/background-data/" categories: [r, faceting, dataviz, ggplot2, trick] --- ```{r, include=FALSE} knitr::opts_chunk$set( message = FALSE, warning = FALSE ) ``` *** One of the game-changing features of `ggplot2` was the ease with which one can explore the dimensions of the data using **small multiples**.[^1] There is a small trick that I was to share today -- put all the data in background of every panel. This can considerably improve comparability of the data across the dimension which splits the dataset into the subsets for the small multiples. Better to show right away what I mean and then explain in details. There is a weekly dataviz challenge organized by Cole Knaflic. [One particular challenge][swd] stroke me as an ultimate case to showcase this background data trick. Here are the two plots:[^2] **the challenge one** and **my version**. [![Image to improve](swd-challenge.png)](https://ikashnitsky.github.io/2023/background-data/swd-challenge.png){width=70%} [![My version](improved.png)](https://ikashnitsky.github.io/2023/background-data/improved.png) It is impossible to meaningfully compare and distinguish multiple spaghetti lines at once. Thus, my choice here was to use small multiples and look at the lines one by one. But we still want to compare the lines. For this, I added the pale background lines that show the spread of all data. Note that I also sorted the small multiples in the decreasing order and added the average line in yellow. But the **main trick** here is adding all the data in the background. And with `ggplot2` it's super easy to do. All we need is to add a layer to the plot in which we modify the data by removing the variable that was used for faceting. Here we use the nice feature of `ggplot2` -- the layers inherit whatever you specify in the main `ggplot()` call. In this case our background layer is inheriting the `data` parameter and all we need is just to remove the variable that is later used for faceting. Consider the following pseudo-code: ```{r, eval=FALSE} df |> ggplot(PLOT_PARAMETERS)+ geom_WHATEVER(data = df |> select(-FACETING_VARIABLE))+ facet_wrap(~ FACETING_VARIABLE) ``` Once we've done this, `gpplot2` no longer knows how to assign subsets of data to the corresponding small multiples. Note that this only happens in the layer where we perform the trick and explicitly throw out the faceting variable. As the result, in each small multiple we end up with all the data in this layer. Put this layer in the background, make it appropriately pale/transparent -- and that's it. I find this dataviz trick amazingly straightforward, simple, and powerful. *** ::: {.callout-tip} # You can replicate the full figure above using the `R` code from this [gist][gist] ::: ::: {.callout-note} # This post is one in the **faceting series**. Other posts: - [Save space in faceted plots][shrink] ::: [^1]: At some point [`lattice`][lat] became pretty popular for the task, but then `ggplot2` entered the scene. [^2]: Images are clickable [cole]: https://twitter.com/storywithdata [swd]: https://twitter.com/storywithdata/status/1274019877779619841 [lat]: https://cran.r-project.org/web/packages/lattice [gist]: https://gist.github.com/ikashnitsky/ee73b39e93f9d074d3362c7fb0d6c815 [shrink]: https://ikashnitsky.github.io/2023/shrink-space