I began evaluating self-service ETL (extract, transform, load) tools at the beginning of 2015, and my lack of knowledge about the industry meant I was the right person to evaluate ease of use and general functionality.
I researched more than a dozen applications before finding the tool that worked perfectly for our needs with iDashboards. Additional research, sales calls and emails narrowed the field to about six installs of free trials or open-source tools. We sought out to find an application that could help our customers and technical services team members transform raw data into various chart-ready formats.
Excel users may be familiar with transposing or building a pivot table to accomplish these tasks. And database users are able to write custom SQL statements or implement views to solve similar problems. So naturally, I was skeptical at first that I would find a simple-to-use, no-coding-required interface to accomplish these tasks and, while most over-promised, a few truly delivered.
So really, what is ETL?
ETL is the process of moving data from one location to another (extracting and loading), while potentially reformatting or aggregating (transforming) the data in the process. This process is usually scheduled to occur automatically and requires maintenance as technologies or business needs change. ETL is used to move data into or around a data warehouse when integrating across applications or systems, and is almost always needed for reporting.
Once reserved for large corporations with data warehouses and expensive IT budgets, ETL projects are now possible for organizations, even departments, of any size. Data analysts or others with knowledge of the data are now able to build these jobs using self-service ETL tools. These tools have gone through many iterations of marketing-hype, but they are most commonly labeled as data integration or data blending applications – and most recently “data wrangling” (yes, it’s actually a thing). Many of these tools are advertised as lightweight and are installed on a user’s workstation with no expensive server or IT support required.
As a thirst for real-time, consolidated data from multiple sources becomes more prevalent, data blending tools are becoming more and more popular. Want to get a mid-day snapshot or make a change to how the numbers are reported? Do you have 35 Excel reports that all need to be combined each week for a standardized report? Gone are the days of manually combining all the data or waiting days jumping through IT hoops just to be told it’s not possible.
A user – with understanding of the data, where it’s located, and what KPIs are important – can take control by building data workflows and even automatically running them on a scheduled basis. These workflows contain multiple tasks to merge, join, combine, append, find or remove duplicates, and aggregate data. New columns can also be created, de-granulizing the data into useful dimensions. Best of all, the end-user is not generally a highly technical, expensive resource. Data scientists and database administrators are freed up to focus on projects requiring their in-demand skillsets. End-users will generally attend training sessions spread over just one or two days to become self-sufficient with the data integration tool.
When evaluating the products myself, I approached each tool the same. Without training or support, I’d attempt to build a specific job. I had a couple database tables and an Excel file that I wanted to blend together and aggregate in four different ways, outputting four datasets for dashboarding.
The open source tools caused issues from the start. Many were manual installs and didn’t provide much documentation for when I got stuck building the job. In contrast, I found most of the free trials to be pretty simple. The installers worked well and there were generally tutorials or documents for getting started. Support was also available if I really got stuck.
Many of the limitations fell into two buckets – scheduling and ease of use. Scheduling often was not available in desktop/workstation versions – expensive server licensing would be required to accomplish this. Each interface worked basically the same – connect data sources, drag tools into the workspace to accomplish various tasks, then output to a desired data type. I came across one tool that specifically stood out, as I couldn’t figure out how to perform a join – basically the “blend” in data blending. When I got stuck, I would reach out to support to solve the single issue then move on. If I kept getting stuck, that was my indication that the tool wasn’t that easy to use.
Save Hours of Data Manipulation with iDashboards Data Integrator!
The result of this research was the recommendation to leverage the data blending tool that integrated the most effectively with iDashboards and met our needs. Through a partnership, the iDashboards Data Integrator (iDI) was born. Those familiar with iDashboards’ data and their drag-and-drop interface will be quick power users of iDI. It provides all the tools needed to collect, combine, clean and integrate data sources. The workflows can be saved and scheduled to run again and again – repeatable results with no programming. iDashboards Data Integrator also provides built-in scheduling without the need for a server or database repository needed.
I highly recommend the 30-day evaluation of iDI to anyone with direct data challenges or issues with limited IT personnel resources. Interested in a free trial? Fill out this short form and mention iDI in the comment section.