If you’ve spent much time in the data reporting business, you know it takes maintenance to keep your analytics engine running like a well-oiled machine. Whether it’s cleaning up your data source integration or simply refining the process your organization uses to deploy data to users, your data structure requires some ongoing work.
There are two types of data maintenance: data cleaning and data tidying. At a glance, these processes may seem like the same thing. In reality, however, they are very different. Below, you’ll find a breakdown of the difference between data cleaning and data tidying, along with an actionable plan to keep your data neat and tidy as your organization – and your data volume – grows.
Data Cleaning and Data Tidying – What’s the difference?
Data cleaning (or data cleansing) is the process of identifying and fixing or removing inaccurate information from your data ecosystem. Data cleaning is closely related to data quality – and your organization should have a plan in place to build and maintain it.
According to the Journal of Statistical Software, data tidying is the process of structuring your data in a way that makes it easy to analyze and use. Instead of focusing on ridding your data of inaccuracies, data tidying focuses on ensuring the accurate data that remains is easy for users to inspect and use. This strategy should happen after you clean your data and bridges the gap between “How can we make sure our data is accurate?” and “How can we make our accurate data useful?”
Building Data Structure from the Ground Up
Spreadsheets are one of the simplest examples of data structure. If you’re organizing sales value by individual team members, for example, you’ll need to decide how that data will be organized. Most likely, Column A will contain the names of each person, while subsequent columns will hold the specific sales values that correlate to each individual. The sales amounts, then, could be sorted by revenue, date, etc. In a situation like this, the structure of your data is probably instinctual. In more complex scenarios, it’s important to analyze exactly how the data should be organized based on how users are likely to read and interpret it. Defining KPIs and your end users’ “questions to be answered” will guide this process.
Spring Cleaning: How to Tidy Up Messy Data
There are a few ways to ensure your data is tidy, and most of these start at the tabular level. Five of the most common “tidy data” errors are:
- Headers contain values instead of categories or names
- Columns contain more than one set of data
- Rows and columns are both used to store variables
- Charts store and display multiple data fields
- One observation/conclusion is stored across multiple tables
Messy data causes two problems: It’s hard to interpret (confusing to users) and it’s difficult to visualize (execute in a dashboard). If you include multiple data observations in one chart or graph, you run the risk of confusing users or distracting them from one of the information sets. By nature, users will assume that each chart or table serves one purpose. By including multiple objectives in one data set, you compromise the importance of at least one, if not both pieces of information.
How to Identify Messy Data with Your Data Narrative
Your data should tell a story. By combining data visualization best practices with accurate and timely information, you can guide users to actionable conclusions and, in the end, informed decisions. In order to do this, you’ll need to combine the right narrative strategy with the right type of chart. Pie charts, for example, are best used for showing percentages of a whole, while bar graphs can compare a specific set of data against a time line or other variable. Additionally, each carefully selected graph should draw users toward one conclusion. They should be singularly focused.
Read next: Top Charts & Graphs for Your Data
With this understanding, you can pinpoint messy data by identifying the charts and graphs in your data visualization that display more than one logical conclusion. Ask yourself the following questions to gauge the “tidiness” of the data behind a given chart of graph:
- What is the most important piece of information users can glean from this chart?
- Is there more than one conclusion or piece of information that it is trying to communicate?
If you answer “yes” to the second question, you’re probably dealing with messy data. In some cases, fixing this scenario is as easy as splitting the data into two graphs so each emphasizes separate, equally important pieces of data. In other cases, you may need to deconstruct larger pieces of your dashboard and reorganize the structure behind the narrative.
How Tidy Data and Data Manipulation Work Together
Sometimes, data is more complicated than narrative example above. You may need to provide information so that users can manipulate it, cross-reference variables, and explore data sets from multiple angles. This is where data visualization is most crucial: when you can use it to simplify otherwise complicated information for users.
Here are two ways you can keep data tidy without over-simplifying it:
- Drilldowns – Drilldowns are reports that allow the user to click through a top-level chart to see specific, detailed information related to a larger, more general data set. Drilldowns should not be displayed on the front page of a dashboard. Instead, they should be used to support a compelling initial view, encourage data exploration, and demonstrate transparency.
- Filters – Unlike drilldowns, filters don’t provide users with more data; they help them see it in a different way. With filters, you can remove or manage conditions within a chart or graph to help users exclude data they don’t need and focus on the data they want. This could mean adding variables, modifying them, or applying conditions to the report.
Keep Your Data Tidy with iDashboards
Messy data is a common problem, but that doesn’t mean you can’t find a solution. In fact, understanding the need for tidy data is the first step toward avoiding future mishaps and confusion in your organization’s reporting. Tidy data starts with a clear strategy and the right tools. At iDashboards, our data visualization software can help you realize your data narrative so you can deliver the data your users need. Get in touch with a representative from iDashboards today and ask about our free, 30-day trials.
Get the Guide Psychology of Data Vizualization
Take a primer in cognitive psychology, the science of perception, and neuroaesthetics and learn how to make dashboards even more effective.