A column chart is the staple of data analytics. Everyone has them. Everyone knows how to read them. But in a world of more information, can they leave a lasting data impression?What does the data above tell you? Could the X axis be items? Is the Y axis a quantitative value? Believe it or not, it is a column graph visualization of Anscombe’s Quartet. This is not the best way to visualize this data.
In the previous blog post in this series, we discussed the abilities, and limitations, of our brain to retain information it has been presented. Humans have limited capacity and duration when it comes to short term memory. And even shorter attention spans. By representing data visually, we attempt to overcome these limitations and leave a lasting data impression on the users.
Data Impression Concept 1: Classifications of Data
Data is consumed faster when we can group like data points into something we have experienced before1. These are data classifications. There are 4 ways we classify data:
Nominal data is data that can be grouped by name, such as Fast or Slow.
Ordinal data is where the value, or rank, of the data matters, but the relationship between the values bares no meaning, such as 1st, 2nd, and 3rd place in a race.
Interval data is where the value can be measured on a scale and the difference between data points has a numerical significant, such as 10 kph verses 5 kph.
Ratio data is a measured value where one point is directly compared to another and the 0-value difference is significant, such as 10 kph is twice as fast as 5 kph.
Data Impression Concept 2: Encoding of Data
Now that we have defined the different types of data points, it is important to know how we best use our sensory register. Encoding1 is the practice of representing data. There are 10 different ways to encode data. These are Position, Length, Slope, Area, Volume, Shape, Color Hue, Color Saturation, Contrast, and Texture.
Position involves placing data points along at least one scale so that the location of one point can be compared to others.
Length is an encoding technique where a single scale can depict various quantitative values.
Slope, or sometimes referred to as angle, depicts values on a single quantitative scale with a general understanding that the previous value logically comes before the next value.
Area displays data that is quantifiable into shapes that are varying in size dependent on the quantitative value.
Volume is a combination of length and area where the 3-dimensional size of the object represents an X, Y, and Z value.
Shape is a method of encoding data into objects that can be easily distinguished between each other.
Color Hue is providing data a color to help distinguish different data points which also indicates that the values are distinctly different.Color Saturation uses a scale of data that is colored to a color’s gradient scale that would not only indicate category, but a quantitative value as well.
Contrast is a similar method to Color Saturation, however works more from a Black to White scale with a color in the middle rather than the saturation of a particular color.
Texture is encoding data on a scale that goes from a dense texture (such as crosshatch), to a less dense texture of the same, or similar, pattern.
Examples:
To bring data types and data encoding together, we’ll examine the average High and Low Fahrenheit temperatures in November for 5 major cities in the US. The easiest way to interpret the difference between the data points is to plot them so that the position of the points can be related to each other, like a scatter chart.
We represent to nominal “City” data with a label. We represent the nominal “Temperature” label as a color hue. The High and Low temperatures are depicted as intervals on a single scale, which helps us recognize that position matters. An even better way be to order these from highest to lowest average high temperature so that the nominal “City” data can be ordinal.
To prove that not all visualizations are right for the data, the next 2 examples depict how visualizations can be misleading.
If we were to represent these points as slope, we would still be able to discern what values are which. However, it gets harder to interpret the data because we assume the rise and fall mean something. Which in our case, the average high temperature between cities does not.
If we skip to this hardest way to interpret data, we can color these individual values. The data is hard to distinguish differences between values. This is not to mention that Color Vision Deficiency (or CVD) affects almost 5% of our population, with 1 in 12 men and 1 in 200 women affected2. For example, the low temperature in Los Angeles verses the high temperature in New York.
We consume categorized and encoded data daily whether we realize it or not. Street sign use position to help drivers understand the order of cities to come. Hills or inclines are evaluated to determine if we need to be cautious or not. We use colors to inform us whether things are safe to continue or if we need to be aware.
Conclusion: Data that Leaves Impressions
Now we know how our memory works, as well as how we best perceive information,. So now we can now encode data for the best end user data impression. The next installment of this blog post series discusses 2 new visualization types. Packed bubble charts verses tree maps. Plus, we identify the merits and drawbacks of each.
1. Qlik: Create Visualizations with Qlik Sense. QlikTech International AB. Published March 2016.
2. Colour Blind Awareness. Retrieved November 2018. Color Blindness. http://www.colourblindawareness.org/colour-blindness/