This post is part of an ongoing series: How Search Really Works.
Previously: Simple Query Optimization.
Search is always boolean: yes or no. True or false.
Either the words are in the document or not.
But as you see, not all documents are "born alike". Some are about our topic, some just mention it.
What we need, what we want, is not just a big list of results -- we want a relevant list of results, preferably sorted so that the best bet appears on top.
Boolean Zone Scoring
Zone scoring uses multiplication values (weights) to calculate the "relevance" of the occurrences of our search term based on how it appears in which zone of the document.
Document zones we're all familiar with are header/title, body/content, footer.
These weights are generally machine-learned by running test queries on a clean, non-spammed, non-gamed index. Relevance judges gauge how relevant the test results are.
Next week: Term Weight Scoring
Hey man,
that’s a pretty nice post. got to sign up for the RSS feed and stumble :).
I’m looking forward to the rest of teh series.
Interesting post. It depends in what context the keyword phrase is being used as well.
Didn’t know that it was called ‘zone scoring’. Thanks for the informative post Ruud
Yep, it depends on the keyword phrase…
Relevance is the reason why many of us do prefer del.icio.us to Google when we want good results. Instead of a search term, you use a tag. Also you can check which are “best” results seeing how many people has linked the same URL to that tag. Also you can feed links to a given tag into your RSS aggregator, to check for new content. Maybe we just need to check alternatives to search sites.
By non-spammed/gamed, you mean NOT like the knitting for grandkids SERPs?
Thanks for the comments and feedback. It always helps to see which posts, or part of a post, resonate.
Wouldn’t search engines use synonymy in conjunction with exact match keyphrasing to weigh relevance?
I guess the ultimate question is could a document rank highly for a long tail phrase merely based off semantically related natural language with no direct keyphrase mention?
@jordan Good questions to which I don’t want to say yes or no; not showing the hand of future posts 🙂 (but psst… in general… no)
Well , now I know better how to search, thanks for the information.