Search Engine People - Search Engine Positioning, Placement Service
Home  |  Blog  |  About Us  |  Careers  |  News  |  Contact Us

How Search Really Works: "The" Index (1)

Ruud HeinWelcome! Thanks for visiting!

Subscribe to the full feed

by Ruud Hein
February 22, 2008

This post is part of an ongoing series: How Search Really Works.
Previous Instalment: The Keyword Density Myth.

If a search engine would search "live" through the documents it knows about for the occurrence of the word we’re looking for it could take its time and then simply report where it found our word.

In this example our search engine has only one index: the documents itself.

 document-only-index

However, time is something a search engine doesn’t have; the query needs to be answered now.

What we need is a real index!

Boolean Index - Talk about the Matrix

boolean-index

The problem with a boolean index, where we put a little flag (1) or not (0) for every word for every document is that it quickly grows way and way too large.

Three documents with amongst them just four words take 12 1’s or 0’s — apart from the bits and bytes we need to store the word. Now imagine a matrix where one of the sides is 13,940,000,000 columns wide…

The Inverted Index

 inverted-index

In the inverted index we record only the places (documents) where a word does occur.

It’s called inverted because instead of the documents providing the occurrences of a word, the word points to which documents it occurs in.

Sorted by document pointer, the inverted index is extremely efficient in performing AND queries.

Let’s reshuffle our example a little bit to make this visually clear: intersect-search

If we search for documents that contain the words "search compression" and we down these rows at the same time, as soon as one row makes a jump to a higher document ID, you can jump forward in the other row as well: no use checking the intermediate ones as you now know that those won’t have both words.

Knowing only about yes/no occurrences, an inverted index is horrible at phrase and proximity matching:

paris-hilton

To be continued…

I hang out at Twitter where I enjoy the company, the buzz, the nuggets of info and opinion we pass along.
Join me on Twitter!
• Get Search Engine People delivered by email

As posted in How Search Really Works.

You're welcome to join the conversation; add your response. You can track the conversation using the RSS 2.0 feed.
You can also trackback from your own site.

4 Responses to “How Search Really Works: "The" Index (1)”

  1. Utah SEO Pro (64 comments.) Says:
    February 24th, 2008 at 11:14 pm

    Excellent post on co-occurrence in search. Interested in seeing the follow ups. Information retrieval should be on the “must-know” list for all SEOs but amazing how many don’t completely grasp it.

  2. Make Money Blogging (35 comments.) Says:
    February 25th, 2008 at 5:41 am

    I thought I understood a little about search engines but now I’m confused. Eagerly awaiting part 2.

  3. Geld Lenen (3 comments.) Says:
    February 25th, 2008 at 9:08 am

    I’m really looking forward to read more of this serie. I could make some people very happy if I referred them here!

Trackbacks

  1. Learn SEO: Search Indexing Says:
    February 26th, 2008 at 1:31 pm

    […] In the first parts of the series we have been educated  in META keywords, keyword links, keyword stuffing, keyword density myth,  and now we have “How Search Really Works: “The” Index (1)” […]

  2. Leave a Reply

« Friday Funnies: Old School Google Maps
Search Economics - The Law of Diminishing Returns »

Subscribe

Full Feed
Email Updates

Recent Posts

  • Let Me Count The Ways: Enumerated Sphinn Wisdom
  • Friday Funnies: Independence Day
  • Google versus Les Pages Jaunes
  • 50 Sites et + Pour Vous Aider à Enterrer les Commentaires Négatifs sur Vous ou Votre Compagnie!
  • 50 + Sitios que Ayudarán a Ocultar Publicaciones Negativas Acerca de Usted o de su Compañía
  • Key Elements of an Online Community Strategy
  • The Art of Eluding Google: Is It Even Possible?
  • Using the Cross Pollination Concept to Aid With Social Media Success!
  • Perpetuum Mobile SEO : Reaping The Benefits
  • Website Transition Planning Critical When Making Changes

Most Popular Ever

  • 50 Sites to help your bury negative posts about you or your company
  • What is authority and how do you build it?
  • How to sell your client on a blog strategy?
  • Dude I'm phaaaaaat
  • Google vs. Yellow Pages

Most Popular this Month

  • Which SEO Lord of The Rings Character Are You?
  • Does the future of Windows spell the doom of Google?
  • What Is Authority, and How Do You Build It?
  • How Search Really Works: The Keyword Density Myth
  • What Type of SEO Client Are You?

Subjects

  • Affiliate Marketing
  • Authority Building
  • Blogging
  • Branding
  • Canada
  • Content
  • Coupons
  • eBooks
  • En Español
  • En français
  • En fran栩s
  • Events
  • Experiments
  • Francophone
  • Funnies
  • Google
  • Guest Post
  • How Search Really Works
  • Local Search
  • Mobile Search
  • MSN/Live
  • News
  • Online Marketing
  • Online Retailing
  • Online Shopping
  • Opinion
  • Pages Jaunes
  • PPC
  • Quebec
  • Reputation Management
  • SEM
  • SEO
  • Social Media
  • Spanish
  • Stats
  • Technology
  • The Algorithm is Human
  • Tips
  • Tools
  • video
  • Yahoo
  • Yellow Pages

Archive

  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • September 2006
  • July 2006
  • May 2006
  • March 2006

Search


Recent Readers

The Writers

  • Jeff Quipp
  • Jennifer Osborne
  • Ruud Hein
  • Tom Tsinas

Top Commentators

  • Lily (5)
  • Utah SEO (5)
  • Comparison Shopping (4)
  • Paul (3)
  • Wii Boy (3)
  • Phil Benwell (2)
  • Yossarian (2)
  • Marketing Man (2)
  • Flingcom (2)
  • Jacques SEOman (2)

Blogroll

  • AbleReach Blog
  • aimClear Blog
  • Bill Hartzer
  • Blah Blah Tech
  • Courtney Tuttle's Blog
  • DoshDosh
  • Geyser Marketing
  • Gray Wolf's SEO Blog
  • Justilien - Link Building
  • Learning SEO Basics
  • Matt Cutts Blog
  • New Orleans Internet Marketing
  • NorthSouthMedia
  • Nowsourcing
  • Profectio - Dave Forde
  • Quiddity - Essence SEO Blog
  • Search Engine College
  • Search Engine Jounal
  • Search Engine Land
  • Search Engine Watch
  • SEO by the SEA
  • SEO Design Solutions
  • SEOco UK Blog
  • SEOPittfall
  • SexySEO
  • Small Business SEM
  • Social Desire
  • Sphinn
  • Stepforth.com - Ross Dunn
  • Stephan Spencer's Scatterings
  • Stuntdubl
  • Techipedia
  • Tim Nash
  • Top Rank Blog
  • Trail of the Fire Horse
  • Utah SEO Blog
  • Yeepage Blogging Tips

SEO Toronto - Search Engine Optimization Specialists
Copyright © Search Engine People - All Rights Reserved.
Contact Us at 1-877-486-7875 or 905-426-9340 - contact@searchenginepeople.com