Microsoft has finally officially commented on the fake traffic they’ve been sending to websites recently.

Recently I commented about how Microsoft Live Search has been stuffing our log files with bot traffic pretending to be human.

Basically they say, “we’ve been doing bad stuff for the last EIGHT months that screws up your metrics and disregards internet respect standards, but it’s OK, because we are going to stop soon.”

If anything, their statements strengthens my mistrust.

At least now I know how far back I can’t trust any traffic from MSN. They offer no solutions for how to scrub our log files (which I’ve already spent time trying to do myself.)

Maybe the real problem is one of scale. Google and Yahoo probably do this too, but it’s different. Google and Yahoo (to a lesser extend) send real traffic volumes. Any stuff like this that Google and Yahoo do ends up being noise that is not noticed.

With MSN…these queries are more than just noise; it can end up being the bulk of MSN traffic coming in. In one case, the bogus queries continually use a keyword phrase that I am trying to optimize for and measure results around.

The only reason I even noticed this problem is I saw a large spike in traffic to that phrase, but could not figure out any ranking change driving the increase. The bogus Microsoft traffic was enough to cut through the Yahoo and Google traffic for that phrase, and lead me down the path to discovering this problem.

At least Microsoft has commented. They have not apologized or offered any solutions. In my mind they’ve only gone as far as saying they are playing dirty pool (but plan to stop soon!) The damage is done. I’ve said my piece. What are you thoughts on the matter?

On one of the large sites I SEO and monitor traffic for, we’ve seen a large bump in search.live.com traffic over the past few months. The keywords used don’t always make sense for the page being landed on, and the keyword phrase is always a single word.

I’ve assumed (shame on me?) that it was a bug somewhere related to search.live.com that is only allowing the first keyword through to our WebTrends Analytics tool. Andrew Urquhart has commented on his similar theory for the problem.

For yesterday’s stats, MSN had a significantly large volume of traffic that just didn’t seem to make any sense. If it’s wrong, it’s no longer statistically background noise, but rather seriously impactful data that puts the legitimacy of the log data into question.

Looking for answers, I directed my browser over to WebMasterWorld to check out the MSN forum. The Strange Referrer Activity post was right near the top. Among other things, this post on Sept. 5th 2007 was interesting, and a major cause for concern:

Thanks for all the feedback on this thread.

First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.

Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.

thanks
- msndude (msd)

That seems to confirm that at least some of this may be bogus traffic. How much is bogus? We have no easy way to know…

That should be very concerning to any webmaster who values the credibility of their log data. If nothing else, that puts thousand of historical search.live.com referrals for the past few months into question.

It concerns me that:

  1. Microsoft seems to not seem to care about messing with the validity of log files for websites on a global scale.
  2. Microsoft is making it look like websites are getting human traffic from search.live.com which in fact may be a Microsoft bot. I heard the phrase “fake it until you make it”, but that should not be a valid search engine market share tactic!
  3. There is no obvious and/or official way to correctly cleans the corrupted log files of this bogus traffic. (Not to mention the time I need to spend to do it.)
  4. Microsoft has mostly been silent on this issue.
  5. The bot is ignoring the robot.txt rules, and does not identify itself as a bot.

I don’t follow MSN very much since they don’t send very much traffic, so I don’t really know where to look for these things, but I’ve found no official mention of this other than the WebmasterWorld post.

Frankly, this seems to be to be very unethical of Microsoft on a number of levels. For one thing, many website opperators probably think they actually ARE getting an increase in traffic from search.live.com.

If you run a Website, this should concern you greatly. We need answers and solutions. Please comment if you have either.