This post is part of an ongoing series: How Search Really Works.
Last week: The Compressed Index.

While human beings can scan a page and see if the whole phrase "a grandiloquent dictionary" appears on it, a search engine can't.

A search engine needs to:

  1. Lookup the occurrences for each word in the phrase
  2. See if the positions of words in the document fit the phrase

As a search engine isn't smart it needs to work smart.

Leverage Keyword Frequency

sort-by-frequency 

By storing the frequency with which a word appears in the whole index we can right away cut down to the smallest set from which to draw results.

Instead of selecting 15,570,000,000 documents in which "a" occurs and then checking which have the words grandiloquent and dictionary we can immediately limit the set to 222,000 documents; those documents that contain the relatively rare grandiloquent.

About the Author: Ruud Hein

I love helping to make web sites make it. From the ground up if needed. CSS challenges, server-side scripting, user and device friendly JavaScript tricks search engines have no problems with. Tracking how the sites perform and then figuring out how to make that performance and the tracking better. I'm passionate about information. No matter how often I trim my feeds in my feed readers (yes, I use more than one), I always have a couple of hundred in there covering topics ranging from design to usability, from SEO to SEM, from life hacks to productivity blogs, from.... Well, you get the idea, I guess. Knowledge and information management is close to my heart. Has to be with the amount of information I track. My "trusted system" is usually in flux but always at hand and fully searchable. My paid passion job at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. It's good to have you here: pleased to meet you!