Data Normalization Made Easy With GTM

In data collection and analysis, there is a well-known saying: garbage in > garbage out. Read what a recent Google Tag Manager Update brings to the table to improve data quality through normalization.

What is data normalization?

Normalization is typically a process applied in relational database design to organise data, aiming to reduce redundancy and improve consistency and accuracy. Generally, the process takes place after the data has been collected. The main goal is to associate similar forms of the same data items into a single data form and provide a clean data-set which you can query and perform analysis on.

However, normalization does not always have to be performed on data that has already been collected – you can apply it during the collection stage.

Data normalization in Analytics tools

Normalization is equally important in tools such as Google Analytics (which essentially relies on data stored in a database). One example of when normalization is required is for case formatting. In Google Analytics, strings are treated as unique and distinct if they are in lower or upper case. (i.e. “String”, “string” and “STRING” are three different unique entries). This is why best practice is to always account for casing and have a tracking setup that does the normalization during data collection.

Think about capturing a custom dimension that takes the value of a user entry. It can be added by the user as UPPERCASE, lowercase or left empty. In that example, you end up having multiple variations of potentially the same entry, which can make analysis harder and data will have to be cleaned each time it is used.

Additionally, in the example above, if the returned value from the custom dimension is null (i.e. it is empty), in order to make sure it’s ignored by GA and not set as “null string”, we have to convert it to “undefined”.

Historically, in the case of a hard-coded implementation, applying the above would require additional on-page JavaScript to be written to do the checks and transformations. If Google Tag Manager is used for tracking, using Custom JavaScript variables that would return the formatted values is the go-to method (if allowed by the client’s policies). There’s nothing wrong with these methods, but essentially the only reason for this code to exist is to change the output of other variables to lowercase, or undefined, to prevent them being set in Google Analytics.

Recent GTM update that made the process easy

A recent update of Google Tag Manager added the Format Value option in all of its variables. Format Value, allows you to modify the output of the variable with a number of predefined transformations which are executed by the gtm.js library itself, instead of additional user added code (which can be prone to errors and in some cases not even an option).

GTM Format Value options
The options that are currently available are:

  • Change Case to… – Allows you to change the case of the string output of the variable to either lowercase or UPPERCASE.
  • Convert null to… – Allows you to convert null values to some other string (i.e. undefined), or set the null output to fall back to another variable.
  • Convert undefined to… – Similarly to convert null, you can convert undefined values into strings or the returned value of other variables.
  • Convert true to… – Allows you to convert Boolean true value to a string or the returned value of another variable.
  • Convert false to… – Same as the above, except now it’s for a Boolean false value.

With these new options available, tackling problems such as those mentioned above a lot more streamlined. Hopefully the list of formatting options and flexibility of use will only grow moving forward.

Another example use of Format Value is when it’s applied to URLs:

GTM Format Value applied to Full URL
Being able to format the Full URL like this means that less technical users no longer need to use RegEx to account for case differences when using the Full URL variable as a trigger condition, for example.

Using contains/equals/starts with has always been a prefered option for non-devs that do not want to mess with RegEx.

Conclusion

By making sure you are using best practices and advanced tools when setting up tracking, you are producing cleaner data which leads to better analysis and fewer nuances. Additionally, taking advantage of feature-rich platforms such as Google Tag Manager removes many complexities and makes configurations easier to understand for less technical users.

This blog can also be found on my employer’s website here

Leave a Reply

Your email address will not be published. Required fields are marked *