Across the globe, businesses are at a simultaneously empowering and frightening place where data is concerned. They have more of it, more avenues to access it, and more ways to share it. One such way to share data has been a longstanding centerpiece of business: the data chart.
It is fair to say that these are a favorite way to visually represent data for a company. It would also be fair to say that companies like it when these charts depict the numbers favorably—quarterly profits that show an upward trend, for example. However, the pull toward hyperbole might be too strong for a business to resist.
In order to avoid misrepresenting data, follow these steps.
Recognize Framing and Don’t Exaggerate
The main vehicle for misrepresentation in many cases is what statisticians and psychologists call framing. This is an example of cognitive bias where the presentation of information, whether done consciously or unconsciously, impacts the interpretation of that information.
This phenomenon is easier to show than to tell.In the above example, there is an increase in sales figures from 2017 to 2018, and boy, it looks like a big one. Sales for 2018 (orange) are more than twice those of 2017 (blue). Or are they? If anyone looks closely at the numbers, there is nowhere near a doubling of figures. That is because the numbers on the left have been strategically scaled to make incremental changes look exponential.
We can also put a negative spin on the same data.
Suddenly, the picture is looking a lot grimmer.
“You mean to say you came nowhere near 50 million in sales for the last two years?” investors might ask.
The input data is the exact same—approximately 5 million in sales for 2017 and approximately 6 million in sales for 2018, but the scales for the charts make a huge difference in framing this data.
Realistically, there is about a 10 percent increase in sales—but neither graph represents that number very well. Whereas the first makes it look more like a 150 percent increase, the second makes the increase look closer to 2 percent.
So how can you make this graph a more accurate reflection on reality?
First, be conscientious in your choices. Would you believe that Excel framed that first chart for me? I just put in the numbers, and, voila, they thought that looked great! Why is that? It automatically guessed the minimum and maximum values I’d like to use on the y-axis. Let this be a lesson: don’t rely on default settings.
Second, let the visual reflect the math. It is about a 10 percent increase, so should the graph numbers look like they’ve doubled visually? No. Should they look like they’ve barely moved at all? Not really.
It should look like we’ve added a tenth onto our original graph. It should look something like this.
That is quite a bit more realistic!
Framing isn’t the only thing you should worry about, however.
Don’t Imply That Causation Equals Correlation
Correlation does not equal causation. Repeat that. Correlation does not equal causation. It is a mantra that all data analysts and statisticians know by heart.
Correlation means a mutual relationship between things. Causation, on the other hand, means “as a direct result of.” Do you see the distinction? If apples are fruits and oranges are fruits, should I eat both of them? No. I happen to like apples, but not oranges. I’m not going to eat apples because they’re fruits. They’re both simply related in that they’re fruits. Causation and correlation are something like that.
Based on the above graph, it’s probably tempting to say that the average annual stock price went up because a company sold more units over time. That’s not necessarily true, so don’t say as much. There is a positive correlation between units sold and average annual stock price. Leave it at that.
Read next: How To Promote Data Literacy
Gather Data Appropriately
Let’s use surveys as an example for this one.
Surveys are a popular data collection technique for many companies. The term for the population surveyed is the sample, and not all samples are made equal.
Let’s say a company sends out a satisfaction survey to 40,000 customers. For whatever reason, it only nets about 200 responses. Is that going to be enough to have confidence in the results?
The answer is that it depends. It is important to be transparent about how the survey was conducted, how results were obtained, the population size, and more importantly, the sample size and make-up.
Sample size should be proportional to the population such that it invokes reasonable confidence. For this, one can use confidence-level calculator tools available online
The survey should be optional and should not include coercion tactics to elicit favorable responses. That would taint the sample and make the results invalid.
Finally, the sample size should include some diversity. That is, if the company is asking questions about how the general public feels about its products, it should expand the sample beyond the people on its mailing list. Ideally, the sample tested should be represented of the population at large.
The point to remember here is that how data is gathered matters a great deal to how impactful and truthful a company can be in its visual representations of that data. Revealing exactly how numbers for graphs were obtained can go a long way in building confidence in your results, so try not to omit those crucial details.
Graphs have been an important staple in how businesses present data, whether it be to the public, investors, or other business. It’s all too easy to frame data in an unnaturally flattering light, be hyperbolic, lead people to believe that results are equal to the relationships presented, or get lazy in the collection of that data. There are far more ways to misrepresent data than this article discusses, but not leading people on with graphs boils down to honesty. Honesty in methodology, honesty in visual presentation, honesty in reflexive choices, and much more.
Presenting honest, transparent graphs is a good way to show people your company is trustworthy, and that’s worth the time.
Quincy Smith is part of the marketing team at Springboard, an online training company that provides mentor-led courses like the Data Science Career Track. He’s passionate about strong coffee, challenging hikes, and clean data.