Tingono's Blog

4 Signs You Aren’t As Data Driven As You Think: Part 2 - Messy Data

Written by Parry Bedi | Jan 19, 2023 10:38:00 PM

Sami and I started Tingono to deal with a specific problem, namely boosting recurring revenue growth. You can read more about it in my very first blog post

 

The most elegant and perhaps the only way to solve this problem is by leveraging data unified data from across an organization.  


Now being self-admitted data nerds, finding relevant data and analyzing should be easy for us, right?

Unfortunately, that is not the case. 

 

Even with all the advances in tools and technologies over the last few years like ETL, Reverse ETL, and CDP, it’s still too time-consuming.  Too many hurdles involved. 

This has been super frustrating.  

 

And we know we’re not the only ones who have been frustrated by this. 


So we have put together a few suggestions on how you can overcome these challenges. Now you can learn from our experiences and be truly data driven. 

 

In the first post in this series I discussed how to solve the data silo challenge.   


In this post, I will address the myth that some data is just "too messy” to be of any use.  

 

What is messy data? How is it holding you back from being data-driven? 

So, what do I mean by messy data?  It’s data that seems like it can’t tell a story, no matter how you look at it.   

 

Technically, it’s data that is viewed as either incomplete, inaccurate, or obsolete. Or some combination of the three.  

 

You might have heard it referred to as “rogue” or “dirty” data. Both are pretty negative descriptions! 

It’s easy to understand why it’s viewed so negatively because messy data often feels unmanageable.  It creates a feeling of not being quite sure what you’re looking at. 

 

Because of this, it can lead you back down the data silo rabbit hole. And this becomes one of the major contributors to too many versions of the truth.  

Examples of messy data  

Messy data can be a broad and hard to grasp concept. Let’s take a look at a few examples to make it more concrete: 

  • To error is human, right? And so, human error is a classic cause of data messiness. For example, when data is manually entered into a spreadsheet or CRM, mistakes can and are made. 

  • Data can be messed up when it is transmitted from one app to another through an automated process. 

  • The same data is entered twice, creating duplicates. 

  • Another aspect of messiness is incomplete data. Maybe you didn’t capture customer interactions at a particular touchpoint. Or may they were captured but stored in a way that’s incompatible with data you are looking to analyze.  

  • Different formats are used for different entries, creating inconsistencies. For instance, sometimes a date is listed as month, day, year. Other times it is listed as year, month, day. 

In other words, the problem with “messy data” is it masks the real issue. Or, multiple issues get lumped together and labeled as “messy data”.  

How you deal with messy data  

So then “messy data” becomes a catchall. Rather than diagnosing the individual issues, it’s easy to just throw in the towel altogether. It’s tempting to just leave it for another day. 

 

Really though, this again leads us further and further away from our goal of being more data driven.  

 

It may be daunting at first to get down to the nitty-gritty of what’s (unfortunately) wrong with the data you have. 

But figuring this out will allow you to be better prepared as your business moves forward. 

And the good news is there are three tangible, concrete ways you can fix messy data: 

  1. Consolidate your data silos.
  2. Clean your data.
  3. Use synthetic data to fill your data gaps. 

So let’s dive in. Let’s look at these three things that can turn your data from a mess into your most valuable resource. 😊 

Removing data silos can help fix “messy data” 

Many times, data can just seem like messy data because we only see a portion of it. This can happen when data lives in different places across an organization (e.g. data silos). 

 

Remember when I talked about how everyone has their own version of the truth?  

 

For instance, your sales rep might only have access to the CRM. But CRM data doesn’t help understand how customers are actually using the product.  

 

These limited views often result in different teams believing different “truths” about the customer. And this can lead to arguments over who is right. And then there is frustration when it’s difficult to find “the right answer”. 

 

It can feel like going in a circle. For everyone. The end result is often finger pointing, blaming, and dysfunctional GTM activities. 

These types of breakdowns cause big issues in the long run.  But as we discussed in my previous post, it’s solvable. 

 

Removing data silos can be done in two ways.  Using both methods together is ideal. 

1. Make data sharing part of your corporate culture 

Start by changing how people think about and treat data in your company. Make it clear that everyone can benefit by sharing data and information freely.  

 

You can help formalize this by scheduling data syncs (e.g., regular check-ins) between teams. Then add a regular stream of data updates to this. Make sure these updates share all data learnings with the entire organization. 

 

These types of changes come from management. 

 

When teams become more transparent with their data and the things they learn from data, good things happen. It builds a common understanding. The old 1 + 1 = 3. 

2. Adopt the right set of data tools 

There is another way to fix messy data caused by data silo.  Adopt a system that makes it easy for data to be understood across teams. 

 

At Tingono, we seamlessly unify data from customers, sales, and products. This creates insights that is helpful to many teams. 

 

Using systems like this allows messy data analysis to be a bit more hands off. It simplifies how your teams access data.  

 

So, in short, there are two simple steps to creating more value for everyone.

 

Step 1: Remove organizational barriers.

Step 2: Build a common understanding. Simple, right?

 

Then you’ll quickly realize that the data you already have is useful. This data can be leveraged to drive revenue.   

In fact, most organizations these days are swimming in data! It’s quite common for data to be collected at every step in the revenue chain.  

 

So now you’ve just got to make all that time and resources invested in the collection of data truly valuable!  And to do that, you have to make the data you already have useful. 

Cleaning the business data you have is incredibly valuable  

This leads us to another issue: People often feel their data is simply inaccurate.  And maybe it’s not just a feeling. Maybe the data really is wrong! 

 

It might have been manually entered and then not double-checked. 

 

Or maybe it was corrupted in some way. 

 

There are many different ways that data may have made it to you for analysis. But along the way, there are also many ways it could have been mishandled. So, you’ve got some data that might as well have come out of thin air.  

 

To address this, you can of course use off-the-shelf data cleaning tools like Talend or Open Refine.

 

You can also implement policies to ensure that data is manually confirmed as accurate. Then you can use these tools as an additional check. 

 

These tools can help you identify and fix both inconsistencies and errors in your data. They make it easier to focus on data deduplication, standardization, and enrichment.  

 

Finding the bumps (and knots) in your data will not only help you get more out of your data right now, but it also teaches you what to look for going forward. 

Carefully analyzing previous mistakes in data collection will save you time in the future. It’ll free up time previously spent on correcting mistakes. 

 

However, in my experience this is only half the battle. You can do all of this yourself, but you may need more than what you can give. It may also be more valuable to not do it all yourself. 

 

You may need to invest in more advanced analytics tools. You should consider working with a data specialist to help you extract more actionable insights from your data.  

 

No one said data was easy! 😉 

 

But, getting your data from messy and hopeless to useful and insightful is worthwhile. It’s a tremendously positive use of your time and resources! 

 

By taking these steps, you can ensure that your insights can be put into action to drive meaningful change and improve your business. 

 

So no, it might not be the easiest thing to clean messy data. But it’s well worth it. The data you have has tons of value!  

Synthetic Data Generation Can Solve Some Business Data Woes 

Sometimes, there are times where the data you have really is incomplete. It might actually not be useful as is. 

 

This can happen if you’re moving into a different industry. Or perhaps you’re launching a new offering. 

The data you have might not be sufficient to perform a usable analysis. But this doesn’t have to be the end of using your data for more. 

 

In these instances, you can try creating synthetic data.  

Synthetic data is artificially generated data. It's designed to mimic the characteristics of real-world data.  

 

It’s nice to use when the data you have is scarce. Maybe you just don’t have enough. 

 

Synthetic data might also be a good fit when the data you want is too difficult to find. 

 

It can also help when real-world data is not right for your use case.  

 

Synthetic data can be a great tool to help you look at the validity of the data you have. 

 

It’s typically used to train machine learning models. But it can also be used to perform predictive analyses.  

 

Here are a few common ways of creating synthetic data: 

  • You can sample from a distribution. Basically, you use the probability distribution from the data you have. Then, you can generate new data points by sampling from the new distribution. 
     
  • You can use a generative model such as GANs. Or, use a language model to create new, similar data points from scratch. These models can be trained using real data. They can then be used to generate new data points that are like the original data. 
     
  • Utilize a synthetic data generator library! These libraries can be used to generate synthetic data for different types of datasets. These include types such as synthetic names, addresses, or financial data. Not sure where to find a good library? Try Faker or Synthpop. 
     
  • Data augmentation: You can expand your dataset by applying random but realistic transformations to the existing data.  
     
  • You can use a noise injection algorithm to add “noise” on the existing data. This could be useful when the amount of data is small. It allows you to increase the number of samples, without losing the original characteristics of the data. 

So there’s all kinds of ways to use synthetic data. It can help you paint a bigger picture with the data you have.   

 

It might also help fill in some of the gaps that exist in the data you’ve collected.  

 

You just need to evaluate which tool (or tools) is best suited to your data situation.

Messy data can still lead to being more data-driven, as well as revenue growth 

So messy data doesn’t have to spell the end to your business. It can certainly be a challenge to fix.  But there is still value in using whatever data you’ve got.  

 

It can still help you uncover insights and information on your customers and business.  And it can help you forecast the future.  

It’s also great for learning about mistakes you may have made previously. It can help you prevent repeating those. 

 

As I’ve said, most SaaS companies have plenty of data to look at. And even if not, there’s ways to leverage what you do have, and make use of it. 

 

Using your data to its full potential, “messy” or not, can help you bring insights to your teams. These insights help drive revenue. 

 

Being data-driven is what we’re all about. We think every company has great potential to become data-driven.  In fact, it’s that’s a challenge that motivates us daily. 

 

Wondering if you’re at a good place with your data? Want to see if it can do a little more for you? We’d love to talk to you. Or, check out our demo!