Search Engine People - Search Engine Positioning, Placement Service
Home  |  Blog  |  About Us  |  Careers  |  News  |  Contact Us

How Search Really Works: "The" Index (2)

Ruud HeinWelcome! Thanks for visiting!

Subscribe to the full feed

by Ruud Hein
February 29, 2008

This post is part of an ongoing series: How Search Really Works.
Last week: "The" Index (1).

Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.

inverted index

But what if you’re not looking for any document that has the words search AND people AND engine but you’re looking for Search Engine People?

Well, if document 42 in our example reads "the engine was found after a search by some people" or "people use a search engine such as Google" than a traditional inverted index would think it’s spot-on for your search. Ai….

(Extended) Biword Inverted Index

One way to go about this would be to somehow generate an inverted list of phrases.

biword phrase index

Problem: if you index phrases of 2-words length, 3+ word phrase searches become just another AND query, combining the parts of the phrase. "Long John" AND "Silver".

Indexing phrases of 3-words length simply moves the problem to 4+ word phrase searches…. etc. etc.

Problem: the inverted index becomes huge, listing every word in every document and every 2 (3? 4?) word phrase in every document….

Positional Inverted Index

The only real solution is to store not only the incidence of occurrence of a word in a document but to store the exact position(s) of the word in that document.

positional index

In this example document 42 is identified for "search engine people" because the words appear in that order: they appear in positions 1, 2 and 3.

Advantage: because the positional index is similar in construction as the traditional inverted index it inherits the same advantage. That is, when doing an AND query it can jump ahead whenever one of the words doesn’t occur in the document it is looking at.

Advantage: simply by looking at words occurring in the right order, any phrase of any length can be found even though it isn’t indexed as such.

Advantage: by having precise position information we can do proximity queries.

Advantage: phrase matching and query word proximity can also be used to rank search results.

The Winner

Although a positional index is at least 2-4 times (or up to 50%) larger than a traditional inverted index the payoff is so large that this is the type of index in use by commercial search engines — for phrases. … In general….

Frequently searched phrases are still better stored in a biwords index; less frequently searched phrases are better processed with a positional inverted index.

Index Type & SEO

The fun (warning: geek talking!) is of course that knowing this kind of stuff implicitly explains you things.

For example, knowing that in order for a positional inverted index to really work all words, including so-called "stop words", need to be indexed makes it less surprising that stop words are dead.

Positional indexing and retrieval also makes it not only logic but expected that shop in new york and shop new york give different results.

 

Different results = different thinking, different SEO… different opportunities.

It’s all in the index :)

I hang out at Twitter where I enjoy the company, the buzz, the nuggets of info and opinion we pass along.
Join me on Twitter!
• Get Search Engine People delivered by email

As posted in How Search Really Works.

You're welcome to join the conversation; add your response. You can track the conversation using the RSS 2.0 feed.
You can also trackback from your own site.

5 Responses to “How Search Really Works: "The" Index (2)”

  1. Johan Krost (1 comments.) Says:
    March 1st, 2008 at 3:09 pm

    Wow, this is advanced search engine stuff. Thanks for the info

  2. Utah SEO Pro (79 comments.) Says:
    March 2nd, 2008 at 1:05 am

    Another great post Ruud! Stop words are definitely not dead. They are such a large part of natural language search queries.

  3. Shana Albert (1 comments.) Says:
    March 4th, 2008 at 10:04 am

    There’s a trick, you know.

Trackbacks

  1. How Search Really Works: Recognize This Index? Says:
    March 7th, 2008 at 9:27 pm

    […] This post is part of an ongoing series: How Search Really Works. Last week: "The" Index (2). […]

  2. Learn SEO: Search Indexing Part 2 Says:
    March 8th, 2008 at 7:18 am

    […] Hein’s 2 newest additions to “How Search Really Works” are “The Index Part 2” and “Recognize This […]

  3. Leave a Reply

« 10 Keyword Research Tools to Add to Your Arsenal
Friday Funnies: The History of Web 2.0 »

Subscribe

Full Feed
Email Updates

Recent Posts

  • Optimisation pour iPhone; Conseil #1 : Les Numéros de Téléphone en Méta Tags
  • Social Media Optimization Assets : The Fake User
  • Visualized: Interest In PubCon, SES, SMX
  • Friday Funnies: Best Friends
  • 12 Erreurs Fréquentes à Éviter Lorsque l’on Blogue
  • 12 Errores Comunes a Evitar en un Blog
  • iPhone Search Result Optimization Tip #1: Phone Numbers in Meta Tags
  • Friday Funnies: A Day In The Life Of A Link Ninja
  • Mom’s SEO Advice: Better Safe than Sorry
  • Créer une Stratégie Efficace pour Gérer vos Profils en Ligne

Most Popular Ever

  • 50 Sites to help your bury negative posts about you or your company
  • What is authority and how do you build it?
  • How to sell your client on a blog strategy?
  • Dude I'm phaaaaaat
  • Google vs. Yellow Pages

Most Popular this Month

  • Which SEO Lord of The Rings Character Are You?
  • Does the future of Windows spell the doom of Google?
  • What Is Authority, and How Do You Build It?
  • I lost a StumbleUpon Bet
  • Authority Building: Tools of the Trade

Subjects

  • Affiliate Marketing
  • Authority Building
  • Blogging
  • Branding
  • Canada
  • Content
  • Coupons
  • Cuil
  • eBooks
  • En Español
  • En français
  • En fran栩s
  • Events
  • Experiments
  • Francophone
  • Funnies
  • Google
  • Guest Post
  • How Search Really Works
  • Local Search
  • Mobile Search
  • MSN/Live
  • News
  • Online Marketing
  • Online Retailing
  • Online Shopping
  • Opinion
  • Pages Jaunes
  • PPC
  • Quebec
  • Reputation Management
  • SEM
  • SEO
  • Social Media
  • Spanish
  • Stats
  • Technology
  • The Algorithm is Human
  • Tips
  • Tools
  • video
  • Yahoo
  • Yellow Pages

Archive

  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • September 2006
  • July 2006
  • May 2006
  • March 2006

Search


Recent Readers

The Writers

  • Jeff Quipp
  • Jennifer Osborne
  • Ruud Hein
  • Tom Tsinas

Top Commentators

  • Utah SEO (10)
  • Singapore SEO (9)
  • jeflin (7)
  • Metaspring (7)
  • VMOptions (7)
  • Free Wordpress Themes (7)
  • Comparison Shopping (7)
  • The Quotes World (7)
  • Custom T-Shirts Toronto (7)
  • kerja sambilan (6)

Blogroll

  • AbleReach Blog
  • aimClear Blog
  • Bill Hartzer
  • Blah Blah Tech
  • Brent Csutoras
  • Courtney Tuttle's Blog
  • DoshDosh
  • Geyser Marketing
  • Gray Wolf's SEO Blog
  • Justilien - Link Building
  • Learning SEO Basics
  • Manish Pandey
  • Matt Cutts Blog
  • New Orleans Internet Marketing
  • NorthSouthMedia
  • Quiddity - Essence SEO Blog
  • Search Engine Jounal
  • Search Engine Land
  • Search Engine Watch
  • SEO by the SEA
  • SEO Design Solutions
  • SEO Megacorp Blog
  • SEOco UK Blog
  • SEOPittfall
  • SexySEO
  • Small Business SEM
  • Social Desire
  • Sphinn
  • Stepforth.com - Ross Dunn
  • Stephan Spencer's Scatterings
  • Stuntdubl
  • Techipedia
  • Tim Nash
  • Top Rank Blog
  • Trail of the Fire Horse
  • Utah SEO Blog
  • Yeepage Blogging Tips

SEO Toronto - Search Engine Optimization Specialists
Copyright © Search Engine People - All Rights Reserved.
Contact Us at 1-877-486-7875 or 905-426-9340 - contact@searchenginepeople.com