Analytics Pros

.
Home Blog Urchin Software Excluding Robots, Bots, Spiders & Crawlers from Logfile data with Urchin 6

Excluding Robots, Bots, Spiders & Crawlers from Logfile data with Urchin 6

"60,000 out of 73,000 file fownloads were from bots"

Web analytics based on web server log file data using Urchin 6 web analytics software provides some very useful points of information that Google Analytics and any other tag-based web analytics solution simply can't deliver.  However, one of the biggest challenges with logfile based analytics is pollution of the data by "bots" - search engines robots, crawlers, spiders, scapers, etc...  I've found that the data generated by non-human activity can easily account for 60% of the hits in your logfiles, and if you don't exclude it, your resulting reports built on your server logs can be off by that margin.  Typical installations of popular logfile analysis tools like Webtrends, AW Stats, Webalizer, and even most Urchin installations won't exclude robotically-generated data by default.

Old vs. New Filtering Options

I was working on a project for a client recently using Urchin 6 in our cloud-based hosting environment and needed to process some old logfile data to cross-analyze and validate Google Analytics data.  Out of 73,000 hits for download files in the log data, 60,000 were from bots.  That's a problem!  So, I thought "I have to exclude all those bots".  In previous versions of Urchin (prior to 6.6) there has always been a "robots report", but no easy way to exclude robots.  Well, I took a look at filtering options in our hosted version of Urchin 6.6.02 and found a convenient filtering field for "robot_agent".  This field contains the user-agent for hits that were generated by a bot.  Nice!

Creating the anti-bot filter

So, I created a simple filter: exclude all hits where "robot_agent" equals ".*" (i.e. any value).  After applying the filter and re-processing the data (yeap - re-processing, you can't do that with Google Analytics!  That's one reason I love backing up my Google Analytics data to our analytics data warehouse) the reports were not completely void of any bot-generated data.

 

urchin 6 exclude all robots filter
urchin 6 exclude all robots filter

Nice, clean, pristine logfile data without bot pollution!

Next... analyzing and making sense of all the data.


Comments(0)



Be the first to add a comment
Add Comment
 
Follow analyticspros on Twitter

Bookmark and Share

Subscribe in a reader

Enter your email address:

Delivered by FeedBurner

Ask Any Question

 
 
 
 

Members Login


Google Analytics Authorized Consultant seal
Analytics Pros is an Urchin Software Authorized Consultant

Google Analytics Training in Toronto: April 6 - 7

ems_tor_hmsm_125

Use code "CALEB15" for a 15% discount at eMetrics Toronto!

Customer Reviews

Read customer reviews of Analytics Pros at the Google Solutions Marketplace.

The Name Says It All 'Analytics Pros'

SF Search Marketing

Hire Analytics Pros - You'll be glad you did!

Lisa Thayer

Analytics Pros Are My Go To Guys

Jonah Stein

Product Highlight

Measure the complete picture of website activity with APE, the Analytics Pros Engine. APE extends Google Analytics tracking to include page-level interactions including outbound links, file downloads, email address clicks, forms, buttons, scripts, and more.  Learn more about APE today!


Feedback Form