Is Google a Scraper Website?

Daniel Kosir

10 years ago

Remember when Google released the Panda algorithm?

This was when content penalties were introduced, used primarily to eliminate scraper websites that would rip content from one website and put it on their own. This practice was frowned upon particularly because, often, these scraper websites would rank higher for that content than the original site or author. Since that time, it's been made clear that if you have duplicate content - essentially, if you're copying some other source - then you're going to feel Google's wrath.

Why, then, is it acceptable for Google to do the very thing they are telling everybody else not to do? Take a look at the below example. I typed in "search engine marketing" in the search bar, and these are the results I was given:

This is, by definition, what a scraper website does. Google has taken content directly from Wikipedia and put in on their search results page. And, it is ranking higher than Wikipedia, which is the original author of the content. How is this any different from what Google is penalizing other sites for doing?

It could be said that Google is giving attribution where it's due, but this is a tenuous argument. Other websites that get penalized for duplicate content may very well provide attribution, but the fact is, they are still offering up duplicate content. And Google is doing the same, only nobody is holding it accountable.

The User Experience Argument

The content taken from other sites is becoming more prominent as Google increases the amount of web definitions (such as the example above), direct answers and Knowledge Graph answers it provides in their search results.

In Google's eyes, this is a way to better serve its users with relevant results. And to be fair, this argument carries some validity. Sometimes, people just need a quick answer, and admittedly, I find this method of presenting search results is often useful.

But that doesn't mean it isn't hypocritical. It seems that the user experience argument is a convenient, self-serving excuse that allows Google to say "it's OK for us, but nobody else." Providing this kind of information on the search results page may indeed be useful for users, but the fact remains that it is scraped content - and Google penalizes scraped content.

A Breach Of Contract?

Perhaps more problematic is the fact that this practice can significantly harm websites that rely on Google as a source of traffic. The unspoken agreement between websites and search engines is predicated on a mutually-beneficial exchange: the latter fills their results pages based on content from the former, and the former benefits from the traffic they receive from their website being listed. The inclusion of direct answers, definitions and Knowledge Graph features breaches this agreement, because including such content directly on SERPs eliminates the need for users to click through to the original page. As a result, the sites from which the content originated are not receiving the traffic.

This has a variety of implications. When original lose the traffic coming in from search engines, they lose the opportunity to convert that traffic into leads, generate revenue from advertising, build brand recognition among users, and develop relationships with visitors. Essentially, the very reason that publishers originally wanted to be listed in SERPs is being circumvented.

Improved user experience or not, that doesn't sound like a fair exchange to me.