Gmail and Web Spam

by Ruud Hein April 2nd, 2007 

I can't help but think that a reason for Google offering Gmail is spam or rather to help them learn and test how to get rid of spam.

Yahoo had email, Microsoft had email. Google didn't, now they do. And Google learned a lot about getting rid of spam in a short time.

My normal Gmail account received hardly any spam. An account which I've set up in order to be spammed receives 5000 spam messages per week. Over 90% of it gets caught in their spam filter.

Learning about spam, how to recognize it for what it is, is interesting because apart from capturing a bunch of URL's you can forget about for your search engine index, it can help you learn get rid of search engine spam.

"If you can't solve search engine spam, and the other guy does, your search engine is done at this point"
– Bill Yerazunis, chairman of the annual MIT Spam Conference.

True, search engine spam is a somewhat different beast. There is the actual spammy page. There is the page which gets boosted for a while by a whole bunch of spammy links. But in the end it is the same. You end up at something which is not relevant at all.

A list of stop words simply doesn't cut it anymore. You have to look at the reason, the intent. A post about using stopwords is something else than a page selling stopword related products.

For obvious reasons search engines don't share a lot of information about this.

"Think of this. If you were Google and you came up with a solution to solve search engine spam, would you publish it? It's a race among the big three. If Microsoft has solved it, they are going to knock out Google. If Yahoo has it, they are going to knock Microsoft out. The stakes here are billions of dollars,"
– Bill Yerazunis, chairman of the annual MIT Spam Conference.

Yet you can see some progress. Microsoft's adCenter Labs is one of those places where you can have a look in the kitchen.

Their detecting online commercial intent tool gives you a glimpse of what is already possible. And it's pretty good. It knows that this online casino page is different from this one.

Likewise its detecting sensitive webpages tool offers a bit of insight.

Bill Slawski has another look into Microsoft's anti-spam kitchen that's worth a read.

I like these Microsoft guys.

Ruud Hein

My paid passion at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. People who know me know I love coffee.

Ruud Hein

You May Also Like

Comments are closed.