Toll Free: 1-877-695-7388

GTA: (647) 699-2838

Search Engine People
  • SEO
  • SEM
  • CRO
  • Display
  • Blog
  • Why Us
  • Contact
  • Join Our Team
  • Get A Quote

Toll Free: 1-877-695-7388

GTA: (647) 699-2838

5 Million Spam Pages I Found in a Couple of Hours That Google Has Missed All Week

Donna Fontenot | June 24th, 2006
Tweet
Share
Share
Pin
0 Shares

A friend of mine, Michael VanDeMar (aka mvandemar), alerted me to a situation that highlights the sub-subdomain spam problem that Google "supposedly" fixed, but apparently has not. Here is his research and report.
***
Last Saturday, June 17th 2006, an article was posted on how to get 5 Billion Pages indexed in Google in less than 30 days. The report was based around a series of domains from one particular spammer.

Google responded that in actuality, the counts reported were simply the results of a combination of a bug in the site: command and what they were calling a "bad data push". Here’s what Google spokesperson, Adam Lasnik, assistant to Matt Cutts, had to say on Tuesday:

I've long been a lurker / occasional commenter for quite some time here, and I figured I might as well offer a few clarifications on the "5 billion" issue :-).

I work with Matt Cutts and other engineers in the Search Quality Team at Google. And yes, we noticed that lots of subdomains got indexed last week -- and sometimes listed in search results -- that shouldn't have been. Compounding the issue, our result count estimates in these contexts was MANY orders of magnitude off. For example, the one site that supposedly had 5.5 billion pages in the index actually had under 1/100,000th of that.

So how did this happen? We pushed some corrupted data with our index. Once we diagnosed the problem, we started rolling the data back and pushed something better... and we've been putting in place checks so that this kind of thing doesn't happen again.

So it looks like, according to Google, the original site in question and the bad data had been corrected, and that they were well on their way to making sure it didn’t happen again.

I did a little looking around, and found that there actually did seem to be quite a bit of this spam still there. I pointed this out to Adam on Threadwatch, and offered to write a bot to help dig out some of the flotsam I was finding. The offer, much like the numerous spam reports I and many other webmasters have sent in to Google, was of course ignored.

Last night, while doing various searches on Google, I noticed that it seemed as if there was much more spam of the same variety that had been floating around before than there should have been, especially considering how hard the Google team had worked to assure us that this was merely a minor flaw, easily corrected. So, out of curiosity, I went ahead and wrote the bot for my own use. Coding and running it took right around 2 hours. This was just the rough draft of the bot; it could of course be refined to be much more accurate and go further in depth, but I just wanted to see what a quick look around would return.

What I found was approximately 10,902,060 pages of spam spread across 157 domains, each with a minimum of 5,000 pages indexed in Google. The "vast majority" of these domains are less than 2 months old, with some as recent as 4 days. This comes very shortly on the heels of Matt Cutts' response to webmasters, when asked why so many of their sites were being deindexed, that what was needed was better quality links to get indexed in Google since Big Daddy.

75 of the domains have greater than 55,000 pages indexed, which was about the number of pages that the original domain was said to have. 26 of them had 3-5 times that number of pages.

Since it looks like maybe Google is still relying on blogs such as this to oust spam, instead of finding it on their own, the domains will probably be banned in a few days. However, for now, you may click on the link below in order to view The Spam That Google Couldn’t Find:
http://googlespam.giantshoutbox.com/

-Michael VanDeMar
***
Michael is the owner of Better Mortgage Refinancing, a site which offers hassle-free Mortgage Quotes.

Note from DazzlinDonna to Matt, Adam, and Google gang: I'd really love some no-nonsense comments from you guys about this. And remember, when you give us a comment, don't forget that we aren't your every day clueless user, so please give us some credit. So, really, why is it that an average guy (sorry, Michael, you are above-average, not to mention a really cute, single, minor programming diety) can whip this up in 2 hours, and yet Google is unable to catch this?

Tweet
Share
Share
Pin
0 Shares
Posted in SEO

3 thoughts on “5 Million Spam Pages I Found in a Couple of Hours That Google Has Missed All Week”

  1. robert paulson says:
    June 24, 2006 at 5:05 pm

    Let’s hope the folks at G have some heavily callused hands, because they’re going to be giving handjobs from dawn ’till dusk for the forseeable future.

    It’s sad to see someone with the slightly condescending manner of Mr. Cutts (as seen in his public flogging of specific webmasters) be in the position of having to cover for a flawed product.

    Maybe he could venture outside G headquarters to L.A. and pull a bunch of fluffers from production to carry out the handjobs until the algo is straightened around.

Comments are closed.

Recent Posts

  • The Manifest Names Search Engine People Among Toronto’s Most Reviewed SEO Companies
  • Movin’ On Up! Why Migrating to Google Analytics 4 (GA4) Should be a Priority
  • A Year in Review: The Digital Marketing Trends That Defined 2021
  • The Basics of Video Marketing
  • Just How Much Do Google Reviews Impact Your SEO Ranking?

Categories

  • Analytics & ROI Analysis
  • Company News
  • Content
  • Conversion Optimization
  • Display Advertising/RTB
  • Email Marketing
  • En Español
  • En Français
  • Inbound Marketing
  • Lead Nurture & Marketing Automation
  • Local Search
  • Marketing
  • Mobile
  • Partnership Marketing
  • PPC
  • PR
  • SEO
  • Social Media Marketing
  • Web Design

Additional Posts

Some SEO Friday fun

June 23rd, 2006 | by Donna Fontenot

Current Known Google Datacenter IP list

June 23rd, 2006 | by Donna Fontenot

Google ranking and trust

June 22nd, 2006 | by Donna Fontenot

LET'S TALK

Need more information or want to get in touch?

Get in touch!
  • SEO
  • SEM
  • Display
  • Blog
  • Why Us
  • Join Our Team
  • Contact Us
  • Local SEO
  • Small Business SEO
  • Enterprise SEO
  • International SEO

LOCATION

1305 Pickering Parkway,
5th Floor Pickering, L1V 3P2

PHONE

Toll Free: 1-877-695-7388
Greater Toronto Area: (647) 699-2838

Social

© Search Engine People Inc. 2023 – Canada’s Top Digital Agency
© SEP 2023 – A Search Engine People Company | Privacy Policy

Search Engine People