How we Build

Charting the Data: Three Takeaways from Working on an Analytics Product

If a picture is worth a thousand words, then a good chart is definitely a match for even the most verbose images. This might help explain why analytics are everywhere.

Take Twitter as an example. It would never be as popular as it is without showing its users how well their tweets are doing. Analytics are central to Twitter and other social media companies: they may be fairly crude, but they are also crucial for triggering that dopamine hit that brings people back.

The ubiquity of analytics hides the potential complexity underneath, especially when you start to consider products where the analytics are more front and centre. Rather than Twitter, think of something like Hootsuite’s Analyze, which offers vastly more information and –hopefully at least – insights. Creating and maintaining pretty charts isn’t as easy as it seems, and I will be touching on three things that weren’t salient to me before I started working in the space.

#1: The charts are just the tip of the iceberg

This is a lesson I learned already in my research days, before I moved into product management. After completing my data collection, I would be keen to run my analyses and contribute to scientific knowledge. Yet, I quickly learned that there was a lot of work still to do before the data would be ready for analysis. I had greatly underestimated the amount of data processing that was required. When I eventually got to run the analyses themselves, it turned out to be anticlimactically quick.

iceberg image
About ninety percent of an iceberg is under the surface. Think of these proportions the next time you see a dashboard.

The data crunching can be divided into two stages. The first is the initial processing, where you get the data into a state in which they can be calculated, and the second is doing said calculations. If we look at the latter stage first. The metric you are going for will determine the complexity of the calculation. Working out something like the number of blog post views in a given month is relatively straightforward once you have the number of visits and when they took place. In contrast, calculating the customer churn rate of a SaaS business will be a bit more of an involved process. You need to take into account how many customers you had at the start of the month, how many canceled that same month, but only those cancelations who were for customers who were with you before the start of the month. You may also want to control for any cancelations that were followed by reactivations by those customers in that same period.

Whether you are dealing with blog post views or customer churn rates, the calculation itself may be the easy bit. Imagine you want to know which are the top five countries your users are based in. If you already have the country information for each one, this task is fairly easy. Now imagine you only have their GPS location. You will first need to figure out the country a user is in. If we want to make it really complicated, let’s assume you only have the raw GPS data. The amount of processing to determine the countries just increased manyfold.

Let’s hope you don’t have to do the equivalent of determining your users’ locations from scratch, but bear in mind that some of your product’s allure will come from information you are providing which isn’t easily available elsewhere. Doing the hard work for someone else is one of the reasons people will come to your product. Just make sure you are providing metrics that people actually use and care about.

#2: The virtues of transparency

Since your data processing is one of your competitive advantages, you might be tempted to keep it under wraps, as you might otherwise feel like you are revealing the ingredients for your secret sauce. However, opaqueness is not a good way to build trust in your numbers. After all, you aren’t providing numbers pulled out of nowhere, but rather something that would be too much work for your users to do themselves. The way in which it is calculated is not going to be a trade secret.

folder image
“I could tell you how I calculated your numbers, but I am afraid that’s a secret. Don’t worry, just trust me.”

Moreover, how different metrics are calculated can be a source of heated debate. The different methods will tend to have their own strengths and weaknesses, so you should make it clear which method you are using. People will then also know how to better interpret the data you are providing. Think of the churn rate example from earlier. People are better served if they can follow your approach.

A good example of this in the world of SaaS metrics is around ARR (annual recurring revenue) and how it is often used interchangeably with annual run rate. The confusion is not surprising considering the shared initials. The former is the recurring revenue that comes from your annual plans, whereas the latter is how much recurring revenue you would get from all your plans – regardless of their duration – in a year (or MRR x 12 for short). One can argue about the merits of both and when each one is useful, but the main thing is not to confuse the two. That being said, it could be worse… I once heard a podcast state ARR as average recurring revenue.

https://twitter.com/Mqsley/status/1403371831646642176

As a final point on this, the building blocks of your metrics have value of their own. If you are assessing the loading time of your application, the average time will be interesting, but you also want to dig into the numbers that make up this average, and so identify the major sources of latency. This is key for identifying which performance improvements will yield the highest return on investment. DataDog does this well, as they make it really easy to explore what their aggregate charts consist of.

#3: The challenges of improvement

Like any product out in the world, there will always be things you would like to add or change. For an analytics product this is harder than you might expect, so be careful if you are someone who likes to “move fast and break things”, as some things are harder to repair than others.

When you tweak the way you calculate a metric in order to account for a new scenario, you are in a bit of a tricky situation. If you reprocess pre-existing data then your previous values will also change (to which people don’t take kindly when this happens out of the blue). If you don’t reprocess, and the changes apply only going forward, then you may have inconsistent numbers, and it is hard to make it clear why your January numbers are different from your February ones. Also, if you are actually improving the calculations, should you not provide this improved data?

There are multiple ways of dealing with or mitigating these challenges, including adding new data settings, coordinated data reprocessing, and being transparent on how the numbers are obtained (as mentioned above). A clear data processing pipeline will also make any improvements easier to implement. Yet the best cure is prevention, so plan with care how you process your data and calculate your metrics.

Final thoughts

Working on a data product can present its own unexpected challenges even for those who have worked with data before. Not only do you have to make the underlying complexity clear and actionable, but you also have to provide the right amount of depth and transparency. All the while, you are dealing with a constant flow of ever-changing data. As we have seen it can get complicated quickly!

But it’s worth it. Information is key for empowering people to make the best decisions, and being able to provide those insights is a rewarding thing indeed.


Sebastian Sandoval Similä

Product Manager, Analytics