Site icon Search Engine People Blog

Search Engine Test Case: New Data, New Findings

Three months ago, Joel Bresler launched a web site presenting a cultural history of the American folk song, "Follow the Drinking Gourd." His web site presents an interesting test case for search engine performance.

We covered his preliminary findings on March 16th, with a discussion of how long it takes the major search engines to find and present new content. We're back for a more extensive look at the new information Joel has unearthed. Thanks to Joel, first for doing the research, and second, for allowing me to summarize it here with his permission.

We presented the following background information back in March.

  1. There has been virtually no new research on Follow the Drinking Gourd in twelve years.
  2. The site represents the most comprehensive research published on the song.
  3. Some web pages that ranked highly on a "follow the drinking gourd" search were removed in February.

The entire article may be found here and the details are below.

>> See how quickly the engines found the site and how its rank changed after launch

As we wrote earlier, Yahoo and MSN ranked the site in the top 10 within three weeks of launch.

UPDATE: Google has finally done the same, eleven weeks later, jumping inexplicably from 35th on 19 April to 4th on 21 April . The site is now a Top 5 search on all three engines for a "follow the drinking gourd" search (without the quotes.)

>> How much of the site is covered (indexed)?

NEW FINDING: A significant percentage of the pages on the site that Google reports caching _are not actually searchable_. In other words, it is possible to navigate to a page from the site in the Google cache, select a unique string of text from that page and then search for that same text in Google ­ and Google will not find it! Strange but true...the screen shots are here. These unsearchable pages accounted at various points for more than half the site. By contrast, when either Yahoo or MSN cached a page, its contents were always searchable.

BOTTOM LINE: the percentage of pages actually searchable in Google is dramatically lower than in Yahoo or MSN.

>> How often was the site visited and its pages indexed?

NEW FINDING: On average, Google lags the other two engines significantly in caching the site's content. The average site-wide cache lag for Google is 30 days, vs. 15 for Yahoo and 10 for MSN.

>> How quickly did the search engines drop other sites' pages once they were removed?

NEW FINDING: Yahoo did a great job delisting pages that had been removed from the web. Google did a fine one. It's now over two months since these pages were removed, and they are STILL APPEARING IN MSN SEACH RESULTS. This even though their 'bot has "known" for over a month that these pages were gone. (Meaning, the cached pages were removed way back in March.) MSN's performance in clearing away outdated listings can only be characterized as abysmal. Score another one for Yahoo.

>> A Cautionary Tale About Electronic Reprints

NEW FINDING: According to Yahoo and MSN, the first choice on the web for a "follow the drinking gourd" search is the same page from NASA. Curiously, until recently this important page didn't show up at all on Google. Turns out the NASA content was identical to the page from an elementary school that used NASA material, with permission. Since Google rated the elementary school's page more highly, and since their content was the same, Google didn't even present a link to NASA's page.

>> How Volatile were the Search Engine Ratings?

NEW FINDING: On April 6th, the site ranked 26th for a Google search on "Follow the Drinking Gourd" (without the quotes.) The next afternoon, it was 44th. The next evening, it was 29th. These are fairly volatile shifts for an obscure corner of the web, one where the other relevant pages are changing very little, if at all. The average shift that day among the top 50 sites was a small change in rank of just two points. The site's 18 place swing was twice that of any other site in the top 50.

>> Charting the Leading Search Engines on Various Performance Parameters

SUMMARY: As might be expected, no single search site leads on every factor. Still, Joel was impressed with both Yahoo and MSN's performance, with the edge going to Yahoo. Both sites do a significantly better job than Google at showcasing new information and keeping site-wide content current. Yahoo search results were much less volatile than Google's. Google users should strongly consider supplementing their search results with those from Yahoo or MSN, too.