# How Search Really Works: "The" Index (2)

This post is part of an ongoing series: How Search Really Works.
Last week: "The" Index (1).

Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.

But what if you're not looking for any document that has the words search AND people AND engine but you're looking for Search Engine People?

Well, if document 42 in our example reads "the engine was found after a search by some people" or "people use a search engine such as Google" than a traditional inverted index would think it's spot-on for your search. Ai....

### (Extended) Biword Inverted Index

Problem: if you index phrases of 2-words length, 3+ word phrase searches become just another AND query, combining the parts of the phrase. "Long John" AND "Silver".

Indexing phrases of 3-words length simply moves the problem to 4+ word phrase searches.... etc. etc.

Problem: the inverted index becomes huge, listing every word in every document and every 2 (3? 4?) word phrase in every document....

### Positional Inverted Index

The only real solution is to store not only the incidence of occurrence of a word in a document but to store the exact position(s) of the word in that document.

In this example document 42 is identified for "search engine people" because the words appear in that order: they appear in positions 1, 2 and 3.

Advantage: because the positional index is similar in construction as the traditional inverted index it inherits the same advantage. That is, when doing an AND query it can jump ahead whenever one of the words doesn't occur in the document it is looking at.

Advantage: simply by looking at words occurring in the right order, any phrase of any length can be found even though it isn't indexed as such.

Advantage: by having precise position information we can do proximity queries.

Advantage: phrase matching and query word proximity can also be used to rank search results.

### The Winner

Although a positional index is at least 2-4 times (or up to 50%) larger than a traditional inverted index the payoff is so large that this is the type of index in use by commercial search engines -- for phrases. ... In general....

Frequently searched phrases are still better stored in a biwords index; less frequently searched phrases are better processed with a positional inverted index.

### Index Type & SEO

The fun (warning: geek talking!) is of course that knowing this kind of stuff implicitly explains you things.

For example, knowing that in order for a positional inverted index to really work all words, including so-called "stop words", need to be indexed makes it less surprising that stop words are dead.

Positional indexing and retrieval also makes it not only logic but expected that shop in new york and shop new york give different results.

Different results = different thinking, different SEO... different opportunities.

It's all in the index ðŸ™‚

#### About the Author: Ruud Hein

My paid passion at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. People who know me know I love coffee.

#### The End Is Nigh!!!

In compliance with Ontarioâ€™s non-essential business closure our physical offices are closed until further notice. Fortunately our willingness to adopt work from home and the required technology over the past two years has allowed us to continue our operations without impact. For our valued clients and partners you can expect the same great level of service and execution you have become accustomed to.

Many clients/prospects have reached out to us in an effort to introduce new campaigns as quickly as possible. In an effort to help our clients pivot we have increased our campaign build capacity. We are now able to turn new campaigns over in 2-3 business days opposed to the typical 5-7 business day turnaround time. Please note that campaign launch approvals from the vendor side (Google, Bing, Facebook, Instagram etc.) may be delayed as those companies migrate to work from home.

For existing clients please reach out to your account manager with any questions you may have.

Keyword research is one of the cornerstone's of starting any good search engine optimization campaign. While I use Wordze for...Read...

Close