When the numbers don't add up: tips for reconciling different analytics tools

When the numbers just don’t add up, what can you do?  The image at right illustrates this in a perfect way.  Notice the two blue dots?  Is one bigger or smaller?  Look carefully.

The answer: it’s all about perspective. I recently received an email from a reader that went like this:

[quotebox]For 8 years we’ve been using Urchin 6, provided by our web hosting company. Now that we’ve switched hosts, we started using Google Analytics, and instead of reporting 50,000 total “sessions” a month, we now see 10,000 total “visits” a month.  How is it possible for the numbers to be so different, when both companies are owned by Google? Which numbers are correct?[/quotebox]

This is a great question, and not an uncommon one to hear.  I want to delve into what goes into the differences that commonly arise between web analytics tool, even ones from the same company with a shared past (Urchin and Google Analytics, case in point).

Know Your Tools

With this question in particular, the case is one where the products being compared (Urchin vs. Google Analytics) are likely not very close, despite both being owned by Google.  Given that the asker noted using “Urchin” for 8 years and that it was provided by their hosting company, my guess is that they are using Urchin 5, or even 4 since Urchin 6 was released in April of 2008.  The hosting company may have upgraded, but still it is a far cry from saying the tools are the same.

Know What You’re Comparing

There are two key ways to get web analytics data:

  1. From web server log files
  2. From JavaScript tags that use cookies and tracking pixels

I like to think of the difference in these two tools as looking at the top-side or under-side of the same rug.  The under-side shows a complex mis-mash of strings while the top-side shows a beautiful pattern.  Web server logfiles are literally the server’s perspective on what happened, while the tag-based method is literally the user’s perspective on what they did while on the site.  This is the most critical aspect to consider.  If you want to answer questions about what the servers were doing, look at server logs.  If you want to answer questions about what people were doing, look at tag-based data.  For more on this, Brian Clifton does a great job going into more detail on this topic in his book.

Back to the question at hand: I’ve found hosting companies are notorious for providing “plain vanilla” stats packages.  A “stock” Urchin profile will report on the web server logfiles, NOT JavaScript tags + cookies that Google Analytics or Urchin UTM reporting is based on.  This is a totally different way to analyze usage and is good for the server operators, but has little to do with the reality of website traffic from people.

Understanding Data Sources

The server log contains ALL hits, whether from humans or non-humans.  I find usually most logs are comprised of about 60% data from non-humans, i.e. search engine robots, content scraping bots, etc…  An improperly configured Urchin profile will report this all as the same and will identify sessions based on simple IP Address + USer Agent combinations of hits in a 30 minute time window.  Thus, you usually get a much larger number of “sessions” reported than what is actually happening.

Google Analytics and Urchin using UTM tags will report only people visiting your site since the mechanism of reporting is based on the visitor’s modern web browser executing JavaScript on each page load.  It also uses cookies so it is precise to the computer/browser vs. less precise when based on IP address.  The numbers are almost always lower than what you see in server log based data, and usually at least 50% lower.  I’ve seen it as high as 90% lower depending on factor.

Truth is Rather Gray

It’s not that IP+User-Agent data is “wrong” per-se, however it must be interpreted in its context.  If it were up to me it would be a crime for hosting companies to not make this clearly known, because you’ve been thinking you’re reporting “people who visit the site” when in reality you’re vastly over-reporting that number because the bots are probably included.

All this to say, a disparity is common.

The Solution

If you have questions about the quality of your data I would recommend that you conduct an full audit of your data.  Sometimes if your GA/UTM tags aren’t placed on all parts of your site the data will be falsely low because it won’t be complete.  To really be able to unravel the knot created by this and explain it to executives you’ll need to be able to show why the numbers have changed, explain what goes into the numbers from each system, and help guide the transition to better data.

If you have your old server logfiles around and your Urchin profile allows filtering and re-processing then we can take some measures to filter out bots, but the numbers still won’t line up well – think of it like measuring the height of your desk in Centimeters vs. Inches – same desk, different scale.

A few tools that you can use to help auditing of your data:

  1. Observepoint’s tag auditing tool
  2. Analytics HealthCheck for checking the integrity of your Google Analytics data
  3. Get a free demo of the Urchin software and run your server logs through it
  4. Get an expert to help – there are over 150 certified companies for Google Analytics worldwide, including yours truly

I hope this helps and look forward to your comments and questions on this topic!


Leave a Comment