How Search Really Works: Grabbing Most Red M&M's

This post is part of an ongoing series: How Search Really Works.
Previously: Relevance (2)

Instead of painstakingly grabbing the absolute best matches for your query to then rank those with infinite precision, one time saving strategy has search engines go for "close enough".

Painstaking Precision

Given all the time, money and resources in the world, here's what we'd normally do.

Word by word you go through a search. You look in your documents and see which has word one.... word two... word three.... You get the picture.

You give some plus points for every time a searched word appears in a document. How many points? Depends on the TFxIDF score for that specific word.

You add up the scoring into a sum, measured by relevance again. Do the same for the query itself (treating it like a very short document basically). In short: you're calculating their vector space scores.

Measure the mathematical similarity between document 1 and the query, document 2 and the query, document 3 and... Yup.

And then you can't just slap all those on the screen. You have to tailor to the searcher's need and pick and sort the top scoring documents!

Now you could sort all your scored documents at once or just go for the top number of documents needed; say the first or next 10 because the searchers has that set as maximum results per page.

Instead of doing this HUGE sorting routine you throw all values together in a big black hat (mathematicians call this a "heap"), come up with the top 10 documents or so and only then sort them.

Perceived Relevance

What struck me as funny is that this high precision, high-cost way of doing things doesn't necessarily mean you get the most bang for you buck, the best quality results for your searcher's patience.

No, the mathematical similarity between our search and those documents is something we perceive as relevant.

That's a low payback to work for when the cost is so high; comparing a huge number of documents, calculating mathematical similarities.... grabbing the top of the heap...

The perception of relevance is something a search engine can use though by going for "good enough".

The Inexact Sort Of Top 10-ish Documents You Might Want

Instead of calculating the top 10 with high precision, why not grab a bundle of documents that will most probably be in that top 10?

Just grab a bunch of documents that are in the race to be the answer to the searcher's query and take the top 10 of that bunch!

Even though this top 10 is not The Top 10 we would have found using our Painstaking Precision method, it will contain many documents that would have been in that top 10 or near it.

It's like having a bowl of M&M's and wanting to eat red ones. You could sort them out painstakingly and then go for the red ones... or you could grab in that area where you see most of the red ones seem to be.

In order of appearance, images courtesy of westpark, Irina Souiki and jacalynsnana

About the Author: Ruud Hein

I love helping to make web sites make it. From the ground up if needed. CSS challenges, server-side scripting, user and device friendly JavaScript tricks search engines have no problems with. Tracking how the sites perform and then figuring out how to make that performance and the tracking better. I'm passionate about information. No matter how often I trim my feeds in my feed readers (yes, I use more than one), I always have a couple of hundred in there covering topics ranging from design to usability, from SEO to SEM, from life hacks to productivity blogs, from.... Well, you get the idea, I guess. Knowledge and information management is close to my heart. Has to be with the amount of information I track. My "trusted system" is usually in flux but always at hand and fully searchable. My ~~paid passion~~ job at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. It's good to have you here: pleased to meet you!

How Search Really Works: Grabbing Most Red M&M’s

Painstaking Precision

Perceived Relevance

The Inexact Sort Of Top 10-ish Documents You Might Want

About the Author: Ruud Hein

Related Articles

Browse by Location

Browse by Industry

SEO Services by CMS

Check Our Guides

How Search Really Works: Grabbing Most Red M&M’s

Painstaking Precision

Perceived Relevance

The Inexact Sort Of Top 10-ish Documents You Might Want

About the Author: Ruud Hein

Related Posts

The “Buy” Button in AI Search is Here: Demystifying Google’s Universal Commerce Protocol (UCP)

Why Do Small Businesses Fail to Convert as Many Customers as Big Brands?

🚨 Why ChatGPT is Losing the AI War to Google Gemini 3 Pro, and How That Will Change Your Marketing Strategy

Subscribe to Our Newsletter

Related Articles

The “Buy” Button in AI Search is Here: Demystifying Google’s Universal Commerce Protocol (UCP)

Why Do Small Businesses Fail to Convert as Many Customers as Big Brands?

🚨 Why ChatGPT is Losing the AI War to Google Gemini 3 Pro, and How That Will Change Your Marketing Strategy

How To Get Your Business on Google and Start Ranking

Browse by Location

Browse by Industry

SEO Services by CMS

Check Our Guides