[This is a guest post by Severino Ribecca*, as part of a series dedicated to each individual kind of chart that he has read into as part of his main research project.]
Box Plots, also known as Box & Whisker Plots, are a type of chart ideal for visually displaying the distribution of numerical data through displaying their quartiles (or percentiles) and their averages. Typically, Box Plots are used in descriptive statistics, as they are a great way to quickly examine the overview of one or more sets of distributed data and their range. While they may seem primitive in comparison to a histogram or density plot, they have the advantage of being more compact.
Below is a diagram on how to read a Box Plot:
If you need to test yourself on reading Box Plots, you can use Khan Academy’s section on Box Plots to improve your skill.
There are a number of observations one can make from viewing a box plot:
• What the key values are, such as the average, the median or the lower quartile etc.
• If there are any outliers and what their values are.
• If the data is symmetrical or not.
• How tightly the data is grouped.
• If the data is skewed at all and if so, in what direction.
History
Box Plots were invented by John Wilder Tukey, an American mathematician. Tukey first developed the Box Plot in 1970* as part of his toolkit for exploratory data analysis. However, his technique didn’t become widely known until he formally published it in his book Exploratory Data Analysis in 1977.
*Date reference: source 1, source 2
Different types
After Tukey introduced the Box Plot, there have been a number variations that have been developed:
Variable Width Box Plots use the width of the box to represent the size of the data within each group. So a group with a larger total in the data will have a larger width. Notched Box Plots have a narrowing of the box around the median. This is useful way to compare the differences between median values as the “notches” act as a visual guide. Violin Plots are a pair of joined kernel density plots and Vase Plots and Bean Plots are another couple variations of a Box Plot.
While Box Plots are great for showing the different ranges in a data set, their structure is not intuitive and reading them takes time to learn. Box Plots are primarily used for statistical insights, so would not be understood for an audience who are not literate in statistics. This would be most of the population, so if you plan to design for a wider audience, avoid Box Plots.
In my next post I will be looking at Bubbles Charts.
Further reading on Box Plots:
• Box and Whisker Plot Reference Page – The Data Visualisation Catalogue
• How to Read and Use a Box-and-Whisker Plot – Flowing Data
• 40 years of boxplots – Hadley Wickham and Lisa Stryjewski
• Box Plots – Khan Academy
• Boxes of Insight – Stephen Few
• Wikipedia entry on box plots
*Severino Ribecca is a British graphic and information designer interested in data visualization. Currently he’s building an online library of different information visualization methods called The Data Visualisation Catalogue. You can follow the project’s updates on Twitter (@dataviz_catalog) and support further developments on the Patreon Page.