This post is part of an ongoing series: How Search Really Works.
Previously: Simple Query Optimization.

Search is always boolean: yes or no. True or false.

Either the words are in the document or not.

boolean-search

But as you see, not all documents are "born alike". Some are about our topic, some just mention it.

What we need, what we want, is not just a big list of results — we want a relevant list of results, preferably sorted so that the best bet appears on top.

Boolean Zone Scoring

Zone scoring uses multiplication values (weights) to calculate the "relevance" of the occurrences of our search term based on how it appears in which zone of the document.

Document zones we're all familiar with are header/title, body/content, footer.

boolean-zones

These weights are generally machine-learned by running test queries on a clean, non-spammed, non-gamed index. Relevance judges gauge how relevant the test results are.

Next week: Term Weight Scoring