Welcome! Thanks for visiting!
Subscribe to the full feed
This post is part of an ongoing series: How Search Really Works.
Last week: The Compressed Index.
While human beings can scan a page and see if the whole phrase "a grandiloquent dictionary" appears on it, a search engine can’t.
A search engine needs to:
As a search engine isn’t smart it needs to work smart.
By storing the frequency with which a word appears in the whole index we can right away cut down to the smallest set from which to draw results.
Instead of selecting 15,570,000,000 documents in which "a" occurs and then checking which have the words grandiloquent and dictionary we can immediately limit the set to 222,000 documents; those documents that contain the relatively rare grandiloquent.
I hang out at Twitter where I enjoy the company, the buzz, the nuggets of info and opinion we pass along.[…] How Search Really Works: Simple Query Optimization […]
[…] This post is part of an ongoing series: How Search Really Works. Previously: Simple Query Optimization. […]
March 24th, 2008 at 10:24 am
I wish I wrote this! Nice work explaining a techie concept Ruud!
Next up, I’d love to see you explain query-dependent and query-independent stuff, because I don’t understand that very well, personally. My understanding is limited to some factors being processed ahead of time (prior to the search occurring, and being general relevance factors like PR and domain trust/age) and others being calculated on the fly (intitle, inanchor etc.)
March 24th, 2008 at 10:24 am
Oh, and - Sphunn!
March 24th, 2008 at 2:14 pm
Thanks Gab. I can’t promise your topic is “next up” but I do have a whole slew of posts still to go!
April 13th, 2008 at 1:30 am
I haven’t been able to catch up on your posts for awhile but I’m doing so now. I hope you keep up this series for awhile.
April 13th, 2008 at 9:51 am
Happy to see you like it Jordan. Thanks for adding me on Twitter, by the way!
I hope to keep the series going for a while, yes.