Dashboards and Data Viz | Data Strategy | Tips & Tricks

While dealing with data is a necessary part of any business, it isn’t always easy to work with. Oftentimes, data needs to be transformed or set up in a way that allows it to be grouped, aggregated, and filtered before it becomes useful.

You may want to filter by the year a transaction took place but only have the transaction date. Or, you may need to be able to trend the data based on the fiscal quarter.  Data transformations can be created, to do this for you.  They are ways to extract or create data based on existing data.

That’s where an ETL tool, like the iDashboards Data Hub, can come in handy. With ETL tools, data is extracted from a system, either through a cloud API, Excel file, CSV file, or database connection. Then, it is transformed to match the data set needed to build dashboards from. The end result data is loaded into a data store.

Basically, the iDashboards Data Hub application can create transformations on your existing data, so you can more easily build the charts and dashboards necessary to run your business. In this post, we’ll go over how to effectively transform data with Data Hub transformations.

Add Column

This is a very flexible node, which will allow you to use Javascript to write expressions to create data.  Creating “If then” situations allows you to set up the proper fiscal year, based July to June, by evaluating the month number, which could be done in a variable. A script like this would work to determine the quarter based on month:

if ($val[MonthNum]=7) {‘Q1’}

else if ($val[“MonthNum”]=8) {‘Q1’}

else if ($val[“MonthNum”]=9) {‘Q1’}

else if ($val[“MonthNum”]=10) {‘Q2’}

else if ($val[“MonthNum”]=11) {‘Q2’}

else if ($val[“MonthNum”]=12) {‘Q2’}

else if ($val[“MonthNum”]=1) {‘Q3’}

else if ($val[“MonthNum”]=2) {‘Q3’}

else if ($val[“MonthNum”]=3) {‘Q3’}

else if ($val[“MonthNum”]=4) {‘Q4’}

else if ($val[“MonthNum”]=5) {‘Q4’}

else if ($val[“MonthNum”]=6) {‘Q4’}

else {null}

 

The end result would be a column named something like “Quarter”, showing the appropriate quarter for the month.

Delete Column

Quite simply, this transformation will delete columns.  If you are only keeping a few columns, I recommend selecting “Delete all” and adding back in the few columns that you need.

Rename Columns

This will give you the chance to rename column headers.  In the imported data, there may be 2 columns named “Type”.   One is a type of machinery and the other is a Type of Payment.  You can rename these to be more specific by using this transformation.

Reorder Columns

This will put the data columns in a specific order.  This is especially useful if you are going to use a Concatenate Rows transform later on. In order to use it, you need the columns from multiple data sources to be in the same order.  It’s also helpful to reorder columns if you want to put your dimensions at the top of the data set and your filters are the end of the data set.

Split Columns

Split columns will allow you to take 1 column, and extract data to make multiple columns.  You may want to extract a part and lot number from 1 code, or be able to split up a column with someone’s full name into separate first and last name columns. This can be done three different ways:

  1. You can use a Delimiter, which is a way to separate the data. You specify the character that you would like to split the data on. Next, towards the bottom, you need to name the columns that you are creating, along with their data types. I would also recommend checking the box that says “Retain Original Column,” if you will need the original column for something in the future.
  2. Set columns based on fixed position of data. This can only be done if the data is consistently in the same format. Years set at YYYY/MM/DD, etc. In this example, the ManufacturerID is 39056-994. Specify that the Part_Number is 39056 and the Lot_Number is 994, excluding the -, because it is not needed.

  3. Use Regular Expressions. If you are familiar with Regular Expressions, you will be able to enter more complicated expressions to split the data.  For example, to split a cell based on a Space, you can use the RegEx (.+)\s(.+).  For this example, there might be a Company Name and City and you want to separate them out.

Read Next: Data Analysis with iDashboards Expressions

Combine Columns

This will do the opposite and allow you to bring 2 columns together.  For example, LastName and FirstName columns can be combined to create a Full_Name column.

This can be done two ways:

  1. The first is Delimited, where you specify the character to separate the values.
  2. The second way is to create a template that will be allied to the new column. A good reason to do this is that you may want to have a space after the comma in the Full_Name column.

 

Aggregate Columns

Aggregating columns allows you to sum, average, get the minimum number, the maximum number, or a count of what is in the data set.

Change Case

This allows you to set the case of a column, for text-based columns.  The options are:

  • UPPERCASE
  • lowercase
  • Title Case

This could come in handy with your Full_Name field created earlier.  We can specifiy it as title case and automatically have a properly formatted name.

Change Data Type

This node will take a column of data and convert it to another data type. The destination data types are string, number, and date-time.  This is particularly helpful if you are using CSV as a data source, as their files all come into Data Hub as a string.  This will allow you to set all of the columns properly.

Truncate Dates

This is the tool you can use to either extract year, month, date, hour, minute, week, quarter, fiscal year, or fiscal quarter.

There are a few that need additional configurations. For week, you will need to specify what day of the week to start on. For fiscal quarter and fiscal year, you need to note which month is your year start or year end.

Date Difference

This will calculate the difference between two dates. The difference unit options are: years, months, weeks, days, hours, and minutes.

Input columns will allow you to select specific columns from the data. There are also options for the current date, as well as an offset and a difference unit. The final option is to select a specific date.

Pivot Transformation

The pivot transformation takes one column and breaks it apart into multiple columns.  You may want a stacked column chart of your different vendors, but in your database, they all reside in a column titled “Vendors.”  The pivot transformation will allow you to do this. Here’s how:

  1. Select a column to pivot. In this case, Vendor.
  2. Next, specify each vendor name that exists in the data. The list is one vendor per line. It will be separated by commas when complete.
  3. Select the column to aggregate against. In this case it’s the column, “Total Due.” We will sum it up to get a total due, for each Vendor, which will now be in their own columns.

Filter Rows

The purpose of this transformation is for you to be able to easily exclude specific values from a column of data.

You may want to remove blanks or show data from only a specific region or product. Clicking on the + near the bottom of the screen will bring up the filter box. Specifiy your columns to filter against, select your operator, and pick the value(s) that you would like to include or exclude from the list.

Concatenate Rows

This will allow you to stack data sets, perhaps from different sources or systems into one data set. The requirments are that there are the same number of colums in the same order with the same column headers. If you are familiar with database scripting, this is a Union. This transform requires at least two, but can have as many nodes connected as needed.

Join Rowsets

Join Rowsets, just as it sounds, allows two data sets to be brought together. There will be a key, that exists in both data sets, to join on. This is what we use as the value to know what rows belong together.  If you are familiar with Excel formulas, this would be similar to a VLOOKUP with two excel sheets.

You can select left, inner, or right join. A left join will give you all of the results form the left data set, and the matching results from the right data set.  Inner join will only provide results that exist in both data sets.  Right join is the exact opposite from the left join, in that it will give you all of the results from the right data set and only the matching data points from the left data set. You can also uncheck columns that are duplicates or simply not necessary.

Remove Duplicate Rows

Here, you will be able to get a single value from your columns of data.  If you are creating a data set to be used for a picklist, this may be helpful.  You can get a list of distinct results and select to keep one or keep none.

Sort Rows

Sorting rows will come in handy if you want your data in a specific order, such as ascending or descending.  If you would like to show who has the highest sales, you can sort your rows by descending sales amounts.

If you would like to show people in alphabetic order by last name, this is where that would be set up.  Simply select the column that you would like to sort by and choose between ascending or descending.

Limit Rows

This is a good node for testing a job that has a lot of data running through it. It will allow you to keep all rows or only keep the first or last amount of rows. When testing, this may be the second node in a job. For example, you may want to limit the rows to 25 so you can test the new job that you are building relatively quickly. Once the data has been verified to be correct, it can be changed to “Keep all rows” and saved.

Geocode

Just like it sounds, this node will allow you to feed a street address, city, zip/postal code, state/province, and country from your data set to the Data Hub.

If the data is known to be within a specific country, you can select that country in the Accuracy list. This will pull back the latitude and longitude for each row of data, which provides the data to plot points on the GeoPlot map.

Create Job Variables

This allows you to create a variable, from your data, that can be used later. Whether in the current Data Hub job, in an expression using the add node, or as a parameter in a new data set.

Combine Job Variables

As the name states, you can combine multiple job variables together.  You will need to define at least two job variables as separate nodes and combine them.

Want to learn more about how the iDashboards Data Hub can help you prepare your data for visualization? Click here to contact us today.

Alternative Text

Jim Qussar

Jim Qussar is our Sr. Pre-Sales Engineer, with over 20 years IT experience. He’s known as the office blacksmith. Every office has one right?

Get the Guide - Psychology of Data Vizualization

Get the Guide Fundamental Design Principles for Dashboards

Even if you’re not the artistic type, this guide will have you thinking like a graphic designer and making informed choices that support your data narrative.

Comments

Your email address will not be published. Required fields are marked *