Stop Good Data From Going Bad
Data can be an invaluable resource that informs decisions and leads to better management, increased productivity and a better culture of transparency. The key words here are can be. Unfortunately, in this data saturated modern environment, data easily becomes severely mismanaged. In the quest for more facts and figures, organizations are collecting, tracking and trending data at a faster rate than ever before, but they aren’t necessarily doing it right. In the worst cases, they’re actually doing it wrong.
A worrisome trend in the Big Data age is for business users to misuse data in order to support a previously conceived course of action. Instead of tailoring their decisions to the found data, they find data to support their decisions. Data has become a means to justify an end, but that defeats its purpose entirely. Data should not be cited as justification to spend money or make rash business decisions, nor should random information be used to fill in the blanks when there is no data to support a desired course of action.
At iDashboards, we love data because it is pure, unbiased, and unwilling to tell a lie, and we want to keep it that way. If you agree, and want to make sure you’re preserving your data integrity (or you’re just looking for a better way to collect and manage your information), here are a few tips that you can use to keep good data from going bad.
Identify Trustworthy Data Sources
Identifying trustworthy data sources is an extremely important, yet often overlooked, task. Pulling data from a reputable source is just as important as hiring reputable vendors, working with trustworthy manufacturers or investing in reliable equipment. You wouldn’t hire a vendor without first checking their references, so why would you rely on data without knowing where it came from?
While some data sources are instantly credible (the American Bar Association, the U.S. Bureau of Labor Statistics, and the National Institutes of Health, for instance), others are more questionable, and some are downright untenable. It is your job to sift through the sources and determine which are trustworthy and which are not. Some ways to determine how trustworthy a data source is include:
- Consider where the information is published. Is it published in a government publication or a peer-reviewed journal, or is it an un-cited opinion found on a small website’s blog?
- Consider who funded the collection. The organization that published the data might not have any particular bias, but the organization that did the collection and analysis of the data might.
- Was the data found online? If the data was found online, use the guidelines provided by the Stanford Guidelines to Web Credibility to determine a source’s trustworthiness.
If your data source is internal, like from a different department or a CRM, it’s imperative to check the validity before moving forward with a data visualization project. Particularly when merging multiple data sources into a single dashboard, you want to ensure that data points align and nothing is out of the bounds of logic because of a merge error or units variance.
Verifying the trustworthiness of a source might seem like extra work, but it can help you avoid mistakes that will waste your time in the long run. Maintaining data integrity will go a long ways towards ensuring that the data you cite is unimpeachable.
Identify the Stakes
As fun as it is to randomly collect and analyze data, in business there is always a higher purpose. Are you collecting data to identify areas for improvement? To gauge the success of a desired action? To measure department productivity levels? There are millions of different specific reasons, but there is always something at stake. The best way to identify the stakes is to determine why the data is being collected in the first place, and what there is to be gained from it.
If your goal for the data is to measure the success of a desired action, the stakes might be time, money, and adverse change if you ignore pertinent data. If your goal is to identify areas of improvement but you tailor your data to target only certain issues in order to save time and money, you stand to miss other issues that may eventually become bigger and more costly when left unresolved. Keeping priorities in mind can provide important perspective on data practices. By identifying what’s at stake, you’re more likely to mine data honestly and avoid making costly mistakes.
Read next: How to Optimize Data Reporting
Neutralize the Biases
When mining for data, it’s essential that you ask the right questions regarding how the information was collected in order to eliminate any remaining biases. Oftentimes, data analysts will unintentionally seek out information that supports a certain theory, belief, or action. This is called confirmation bias, and though it isn’t as overt as the intentional tailoring of data to decisions, it is just as detrimental.
There are many different types of bias that can interfere with data-driven decision making. For example, selection bias occurs when facts or figures are skewed because of the way in which they were collected. For instance, it wouldn’t be sound to rely on a study that focuses on Orca whale behavior that only follows two separate pods, as that leaves hundreds of other pods unrepresented entirely. To combat selection bias, make sure that each relevant group is fairly represented, as that ensures the most accurate results.
To ensure that your data is unbiased from a collection standpoint, be sure to ask about how the data was collected and organized. Additionally, compare data sets from multiple sources and look for any major discrepancies. If three data sets say one thing but the fourth set says something else entirely, it could mean that the fourth set was the subject of confirmation or selection bias. However, if the results vary greatly between each report, it could indicate that there something is wrong with the underlying data itself.
Appoint a Data Steward
Each data project needs one individual to ensure the quality and credibility of data, but who that person will be all depends on the size of your organization. If your enterprise is small to mid-sized, appoint an employee to be responsible for data governance as each new data project arises. If your organization is larger, consider investing in a Chief Data Officer who can oversee the collection and handling of data. Whomever you appoint will bear all the responsibility of all things data, including but not limited to data quality and life cycle management, information protection and privacy, collection policies and procedures, and data exploitation. The ultimate goal of appointing a data steward is to ensure that bad data is not used to substantiate important business decisions.
When businesses rely on bad data, bad things happen. To avoid falling victim to the bad data trap, invest in sound data mining strategies and data governance. Additionally, understand what is at stake if you do make important decisions based off of faulty information. Doing each of these things can help you bring your data back from the dark side and make sure that it remains good for good.