Welcome! Thanks for visiting!
Subscribe to the full feed
This post is part of an ongoing series: How Search Really Works .
Previously: Relevance (2)
Instead of painstakingly grabbing the absolute best matches for your query to then rank those with infinite precision, one time saving strategy has search engines go for "close enough".

Given all the time, money and resources in the world, here's what we'd normally do.
Word by word you go through a search. You look in your documents and see which has word one…. word two… word three…. You get the picture.
This post is part of an ongoing series: How Search Really Works .
Previously: Relevance (1)
Another way we can assess the relevance of a document is by term weighting .
From the keyword density myth we know that true term weighting is done collection wide.
By looking at the number of documents in the index that a term appears in we can make a measurement of information: how good, how special… how meaningful is this word?
The word the would not be special at all, appearing in way too many documents. Its worth would be close to zero.
This post is part of an ongoing series: How Search Really Works .
Previously: Simple Query Optimization .
Search is always boolean: yes or no. True or false.
Either the words are in the document or not.

But as you see, not all documents are "born alike". Some are about our topic, some just mention it.
What we need, what we want , is not just a big list of results — we want a relevant list of results, preferably sorted so that the best bet appears on top.
This post is part of an ongoing series: How Search Really Works .
Last week: The Compressed Index .
While human beings can scan a page and see if the whole phrase " a grandiloquent dictionary " appears on it, a search engine can't.
A search engine needs to:
As a search engine isn't smart it needs to work smart.
This post is part of an ongoing series: How Search Really Works .
Last week: Recognize this index?
Memory is much faster than looking things up.
In order for a search engine in high demand to serve its users efficiently it should keep things in memory instead of looking it up on a disk.
Traditionally large scale search engines will keep their complete dictionary in memory and the posting list on disk .
Obviously the more you can keep in memory and the more information can be read back with one disk action, the better.
This post is part of an ongoing series: How Search Really Works .
Last week: "The" Index (2) .
Oversimplified: we have at least a few pages in our index, have extracted every single word from those pages and have written down in an index where in which pages those words occur.
Want to talk numbers? We have some very precise ones for the English language.
" We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times ."
This post is part of an ongoing series: How Search Really Works .
Last week: "The" Index (1) .
Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.
But what if you're not looking for any document that has the words search AND people AND engine but you're looking for Search Engine People ?
Well, if document 42 in our example reads " the engine was found after a search by some people " or " people use a search engine such as Google" than a traditional inverted index would think it's spot-on for your search. Ai….
This post is part of an ongoing series: How Search Really Works .
Previous Instalment: The Keyword Density Myth .
If a search engine would search "live" through the documents it knows about for the occurrence of the word we're looking for it could take its time and then simply report where it found our word.
In this example our search engine has only one index: the documents itself.
However, time is something a search engine doesn't have; the query needs to be answered now .
What we need is a real index!
This post is part of an ongoing series: How Search Really Works .
Last week: Keyword Stuffing .
Keyword Density is a function, a calculation, of keyword frequency .
It's calculated as number of occurrences divided by number of words and is usually expressed as a percentage.
Nothing much, really.
Keyword density can help in readability calculations.
Keyword density is also sometimes used as a simplified manner to introduce local keyword weight but should never be confused with it.
This post is part of an ongoing series: How Search Really Works .
Last week: Keyword Links .
Left to their own devices, people will assign keywords (tag or link) as they please.
They paint a rich picture of the linked content.
Keyword stuffing is the unnatural repetitive use of a specific word or phrase.
In your content….

..or your links…
