<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Engine People Blog &#187; How Search Really Works</title>
	<atom:link href="http://www.searchenginepeople.com/blog/category/how-search-really-works/feed" rel="self" type="application/rss+xml" />
	<link>http://www.searchenginepeople.com</link>
	<description>Canada's Search and Social Media Authority</description>
	<lastBuildDate>Fri, 19 Mar 2010 11:50:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How Search Really Works: Grabbing Most Red M&amp;M&#039;s</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html#comments</comments>
		<pubDate>Fri, 02 May 2008 20:19:41 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.
Previously: Relevance (2)
Instead of painstakingly grabbing the absolute best matches for your query to then rank those with infinite precision, one time saving strategy has search engines go for &#034;close enough&#034;.
Painstaking Precision

Given all the time, money and resources in the world, here&#039;s what we&#039;d [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html">How Search Really Works: Grabbing Most Red M&amp;M&#039;s</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.<br />
Previously: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html">Relevance (2)</a></em></p>
<p>Instead of painstakingly grabbing the absolute best matches for your query to then rank those with infinite precision, one time saving strategy has search engines go for &#034;close enough&#034;.</p>
<h3>Painstaking Precision</h3>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/05/sorted-mm.jpg" alt="sorted-mm" border="0" height="294" width="440" /></p>
<p>Given all the time, money and resources in the world, here&#039;s what we&#039;d normally do.</p>
<p>Word by word you go through a search. You look in your documents and see which has word one&#8230;. word two&#8230; word three&#8230;. You get the picture.</p>
<p>You give some plus points for every time a searched word appears in a document. How many points? Depends on the <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html">TFxIDF score</a> for that specific word.</p>
<p>You add up the scoring into a sum, measured by relevance again. Do the same for the query itself (treating it like a very short document basically). In short: you&#039;re calculating their <a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html">vector space scores</a>.</p>
<p>Measure the mathematical similarity between document 1 and the query, document 2 and the query, document 3 and&#8230; Yup.</p>
<p>And <em>then</em> you can&#039;t just slap all those on the screen. You have to tailor to the searcher&#039;s need and pick and sort the top scoring documents!</p>
<p>Now you could sort all your scored documents at once or just go for the top number of documents needed; say the first or next 10 because the searchers has that set as maximum results per page.</p>
<p>Instead of doing this HUGE sorting routine you throw all values together in a big black hat (mathematicians call this a &#034;heap&#034;), come up with the top 10 documents or so and only <em>then</em> sort them.</p>
<h3>Perceived Relevance</h3>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/05/mms-in-bowl.jpg" alt="mms-in-bowl" border="0" height="293" width="440" /></p>
<p>What struck me as funny is that this high precision, high-cost way of doing things doesn&#039;t necessarily mean you get the most bang for you buck, the best quality results for your searcher&#039;s patience.</p>
<p>No, the mathematical similarity between our search and those documents is <u>something we <em>perceive</em> as relevant</u>.</p>
<p>That&#039;s a low payback to work for when the cost is so high; comparing a huge number of documents, calculating mathematical similarities&#8230;. grabbing the top of the heap&#8230;</p>
<p>The <em>perception of relevance</em> is something a search engine can use though by going for &#034;good enough&#034;.</p>
<h3>The Inexact Sort Of Top 10-ish Documents You Might Want</h3>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/05/mixed-mm.jpg" alt="mixed-mm" border="0" height="272" width="440" /></p>
<p>Instead of calculating the top 10 with high precision, why not grab a bundle of documents that will most probably be in that top 10?</p>
<p>Just grab a bunch of documents that are in the race to be the answer to the searcher&#039;s query and take the top 10 of that bunch!</p>
<p>Even though this top 10 is not The Top 10 we would have found using our Painstaking Precision method, it <em>will</em> contain many documents that would have been in that top 10 or near it.</p>
<p>It&#039;s like having a bowl of M&amp;M&#039;s and wanting to eat red ones. You could sort them out painstakingly and then go for the red ones&#8230; or you could <em>grab</em> in that area where you see <em>most</em> of the red ones seem to be.</p>
<p>In order of appearance, images courtesy of <a href="http://flickr.com/photos/west-park/" target="_blank">westpark</a>, <a href="http://flickr.com/photos/stillmemory/" target="_blank">Irina Souiki</a> and <a href="http://flickr.com/photos/jacalynsnana/" target="_blank">jacalynsnana</a></p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html">How Search Really Works: Grabbing Most Red M&amp;M&#039;s</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-grabbing-most-red-mms.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: Relevance (2) &#8211; Vector Space</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comments</comments>
		<pubDate>Fri, 11 Apr 2008 21:37:38 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.
Previously: Relevance (1)
Another way we can assess the relevance of a document is by term weighting.
From the keyword density myth we know that true term weighting is done collection wide.
By looking at the number of documents in the index that a term appears in [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html">How Search Really Works: Relevance (2) &#8211; Vector Space</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.<br />
Previously: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html">Relevance (1)</a></em></p>
<p>Another way we can assess the relevance of a document is by <em>term weighting</em>.</p>
<p>From the <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html">keyword density myth</a> we know that true term weighting is done collection wide.</p>
<p>By looking at the number of documents in the index that a term appears in we can make a measurement of information: how good, how special&#8230; how <em>meaningful</em> is this word?</p>
<p>The word <em>the</em> would not be special at all, appearing in way too many documents. Its worth would be close to zero.</p>
<p>But <a href="http://www.google.com/search?q=klebenleiben">klebenleiben</a> (&#034;<em>the reluctance to stop talking about a certain subject</em>&#034; &#8230;)would be very special indeed! Because it appears in only 18 documents among millions, its worth, its weight, would automatically be very high.</p>
<p>The measure is called <em>inverse document frequency.</em></p>
<p>This measure is our weight; it is what we use to judge the relevance of a document with.</p>
<h3>Term Frequency Times</h3>
<p>We do so by counting the number of times a word appears in a document. We <em>normalize</em> that count; we adjust it so that the length of a document doesn&#039;t matter that much anymore.</p>
<p>We then <em>multiply</em> it by our weight measurement: <strong>TF x IDF. Term Frequency times Inverse Document Frequency</strong>.</p>
<p>In other words, a high count of a rare word = a high score <em>for that document,  for that word</em>. But&#8230; a high count of a common word = not so high score <em>for that document, for that word</em>.</p>
<h3>Vectors</h3>
<p>A vector is a line of a certain <em>length</em> into a certain <em>direction.</em></p>
<p>Both the <em>length</em> and the <em>direction</em> of the line represent important information.</p>
<p>Vectors enable us to represent, to talk about, size and direction when position is irrelevant. Wind speed, velocity, force, acceleration; all these are good candidates to be represented as a vector.</p>
<p><u>TFxIDF scores are perfectly suited to be represented as vectors.</u></p>
<h3>Vector Space</h3>
<p>Think of the words that make up our index as axes of a space.</p>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/04/vector-space.jpg" alt="vector-space" border="0" height="270" width="315" /></p>
<p>Of course in a real index this space would consists of thousands upon thousands of axes&#8230;</p>
<h3>Documents as Vectors</h3>
<p>For each word in our document we can draw a line (vector) which shows its TFxIDF score for a certain term.</p>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/04/vector-space-documents.jpg" alt="vector-space-documents" border="0" height="270" width="315" /></p>
<h3>Queries as Vectors</h3>
<p>Every word in a query can <em>also</em> be shown as a vector.</p>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/04/vector-space-documents-queries.jpg" alt="vector-space-documents-queries" border="0" height="270" width="315" /></p>
<p>By looking at documents that are &#034;near&#034; our query we can rank (sort) documents in our result set.</p>
<h3>TFxIDF Vector Space Ranking</h3>
<p>If a document is close to our query it answers our query.</p>
<p>But better yet: documents close to ours are <em>similar documents</em>. They&#039;re talking about roughly the same thing.</p>
<p>This makes TFxIDF vector space ranking extremely useful to find sets of similar documents through &#034;closeness&#034;.</p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html">How Search Really Works: Relevance (2) &#8211; Vector Space</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: Relevance (1)</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html#comments</comments>
		<pubDate>Fri, 04 Apr 2008 21:52:54 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.
Previously: Simple Query Optimization.
Search is always boolean: yes or no. True or false.
Either the words are in the document or not.

But as you see, not all documents are &#034;born alike&#034;. Some are about our topic, some just mention it.
What we need, what we want, [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html">How Search Really Works: Relevance (1)</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.<br />
Previously: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html">Simple Query Optimization</a>.</em></p>
<p>Search is always boolean: yes or no. True or false.</p>
<p>Either the words are in the document or not.</p>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/04/boolean-search.jpg" alt="boolean-search" border="0" height="348" width="440" /></p>
<p>But as you see, not all documents are &#034;born alike&#034;. Some <em>are</em> about our topic, some just <em>mention</em> it.</p>
<p>What we need, what we <em>want</em>, is not just a big list of results &#8212; we want a <em>relevant</em> list of results, preferably sorted so that the best bet appears on top.</p>
<h3>Boolean Zone Scoring</h3>
<p>Zone scoring uses multiplication values (<em>weights</em>) to calculate the &#034;relevance&#034; of the occurrences of our search term based on <em>how</em> it appears in <em>which zone</em> of the document.</p>
<p>Document zones we&#039;re all familiar with are header/title, body/content, footer.</p>
<p><img src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/04/boolean-zones.jpg" alt="boolean-zones" border="0" height="234" width="312" /></p>
<p>These weights are generally machine-learned by running test queries on a clean, non-spammed, non-gamed index. Relevance judges gauge how relevant the test results are.</p>
<p><em>Next week: Term Weight Scoring</em></p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html">How Search Really Works: Relevance (1)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-1.html/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: Simple Query Optimization</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html#comments</comments>
		<pubDate>Sat, 22 Mar 2008 00:16:04 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: The Compressed Index.
While human beings can scan a page and see if the whole phrase &#34;a grandiloquent dictionary&#34; appears on it, a search engine can&#039;t.
A search engine needs to:

Lookup the occurrences for each word in [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html">How Search Really Works: Simple Query Optimization</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html">The Compressed Index</a>.</em></p>
<p>While human beings can scan a page and see if the whole phrase &quot;<em>a grandiloquent dictionary</em>&quot; appears on it, a search engine can&#039;t.</p>
<p>A search engine needs to:</p>
<ol>
<li>Lookup the occurrences for <em>each</em> word in the phrase</li>
<li>See if the positions of words in the document fit the phrase</li>
</ol>
<p>As a search engine isn&#039;t smart it needs to <em>work</em> smart.</p>
<h3>Leverage Keyword Frequency</h3>
<p><img height="98" alt="sort-by-frequency" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/sort-by-frequency.jpg" width="440" border="0" />&#160;</p>
<p>By storing the frequency with which a word appears in the whole index we can right away <u>cut down to the smallest set</u> from which to draw results.</p>
<p>Instead of selecting 15,570,000,000 documents in which &quot;<em>a</em>&quot; occurs and then checking which have the words <em>grandiloquent</em> and <em>dictionary</em> we can <strong>immediately</strong> limit the set to 222,000 documents; those documents that contain the relatively rare <em>grandiloquent</em>.</p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html">How Search Really Works: Simple Query Optimization</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-simple-query-optimization.html/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: The Compressed Index</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html#comments</comments>
		<pubDate>Sat, 15 Mar 2008 05:47:02 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: Recognize this index?
Memory is much faster than looking things up.
In order for a search engine in high demand to serve its users efficiently it should keep things in memory instead of looking it up on [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html">How Search Really Works: The Compressed Index</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html">Recognize this index?</a></em></p>
<p>Memory is <em>much</em> faster than looking things up.</p>
<p>In order for a search engine in high demand to serve its users efficiently it should keep things in memory instead of looking it up on a disk.</p>
<p>Traditionally <u>large scale search engines will keep their complete dictionary in memory and the posting list on disk</u>.</p>
<p><img height="235" alt="dictionary-in-memory-postings-on-disk" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/dictionary-in-memory-postings-on-disk.jpg" width="440" border="0" /> </p>
<h3>Inefficient Storage</h3>
<p>Obviously the more you can keep in memory and the more information can be read back with one disk action, the better.</p>
<p>Unfortunately <u>computer information gets inefficiently stored in boxes with fixed dimension</u>: if a box is 10 characters wide a 4 character word still takes up 1 box of 10 characters.</p>
<h3>Compression</h3>
<p>The solution is to <u>squeeze information together so the least amount of space contains the maximum amount of information</u>.</p>
<h3>Big String Compressed Dictionary</h3>
<p align="center">&#160;<img height="175" alt="uncompressed" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/uncompressed.jpg" width="315" border="0" />&#160; </p>
<p><strong>could become:</strong></p>
<p align="center">&#160;<img height="121" alt="big-string" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/big-string.jpg" width="293" border="0" /> </p>
<p>By adding the length of each word to every entry we can make the list of words hundreds of characters shorter.</p>
<h3>Gap Compressed Posting List</h3>
<p>&#160;<img height="203" alt="uncompressed-postings" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/uncompressed-postings.jpg" width="473" border="0" /> </p>
<p><strong>could become:</strong></p>
<p><strong><img height="203" alt="compressed-postings" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/compressed-postings.jpg" width="473" border="0" /> </strong></p>
<p>By storing the difference between the document ID&#039;s (the <strong>gaps</strong>) we can save hundreds of characters. </p>
<p>The same could be done for storing the gaps between the index numbers for the occurrence positions in each document.</p>
<p>This &quot;<em>compressed representation encodes occurrences of a term as a pointer to the next occurrence of the term to facilitate rapid enumeration of the occurrences of the term</em>&quot;.</p>
<p>You could search &quot;<em>occurrences of the terms in the set of documents by following pointers through the compressed representation</em>&quot;.</p>
<h3>Partial Decompression</h3>
<p>An index stored this way is completely lossless: it retains all information from document identifier to document positional identifier.</p>
<p>By starting with the least frequently used term in the search it is very easy to unravel to do a <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&amp;r=1&amp;p=1&amp;f=G&amp;l=50&amp;d=PTXT&amp;S1=7,319,994.PN.&amp;OS=pn/7,319,994&amp;RS=PN/7,319,994">partial decompression</a> of this index by</p>
<blockquote><p>&quot;<em>identifying occurrences of the terms in the set of documents</em>&quot; </p>
</blockquote>
<p>(the dictionary) &#8211; to then use;</p>
<blockquote><p>&quot;<em>the corresponding term identifiers for the terms in the search request to look up a term offset table for a pointer to a first occurrence of the terms in the compressed representation of the set of documents</em>&quot;</p>
</blockquote>
<p>(the very first posting document ID);</p>
<blockquote><p>&quot;<em>and following a chain of pointers starting at the first occurrence to identify other occurrences of the terms in the compressed representation of the set of documents</em>&quot;</p>
</blockquote>
<p>(the gap compressed document ID list)</p>
<p><strong>Recommended reading:</strong></p>
<ul>
<li><a href="http://www.seobythesea.com/?p=978">New Google Approach to Indexing and Stopwords</a> </li>
</ul>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html">How Search Really Works: The Compressed Index</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-the-compressed-index.html/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: Recognize This Index?</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html#comments</comments>
		<pubDate>Sat, 08 Mar 2008 04:27:24 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: &#34;The&#34; Index (2).
Oversimplified: we have at least a few pages in our index, have extracted every single word from those pages and have written down in an index where in which pages those words occur.
Want [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html">How Search Really Works: Recognize This Index?</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html">&quot;The&quot; Index (2)</a>.</em></p>
<p>Oversimplified: we have at least a <a href="http://www.google.com/search?q=the">few pages</a> in our index, have extracted every single word from those pages and have written down in an index where in which pages those words occur.</p>
<p>Want to talk numbers? We have some very precise ones for the English language.</p>
<p><a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">Google says</a>;</p>
<blockquote><p>&quot;<em>We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times</em>.&quot;</p>
</blockquote>
<p>And that&#039;s just a <em>part</em> of their index&#8230;</p>
<p>Now comes the fun&#8230;</p>
<h3>I have to sort what?!</h3>
<p><a href="http://flickr.com/photos/randysonofrobert/" target="_blank"><img style="margin: 0px 0px 0px 8px" height="180" alt="292020324_286705be9f_m" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/292020324-286705be9f-m.jpg" width="240" align="right" border="0" /></a> That list of words in the index (the dictionary as they call it) together with the document ID numbers they have as a pointer <em>and</em> the positional information needs to be sorted.</p>
<p>Uhuh. Sorted.</p>
<p>Let&#039;s say each of the above mentioned unique words (13,588,391) is 5 characters long. That&#039;s 67 MegaByte right there. Say each unique word is found in one unique document and the document pointer is 5 numbers wide: that&#039;s another 67 MegaByte to store the occurrence of each unique word in one document each. Imagine the word <em>the</em> which most probably appears at least once in every document as well&#8230;</p>
<p>As you see, the memory requirements are <strong>huge</strong> and we haven&#039;t even started factoring in the storage requirements for the in-document positional pointers for the <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html">positional inverted index</a> we know search engines use.</p>
<p>And <em>once</em> we do &#8212; we still need to factor in temporary memory to actually <em>do</em> something with that list; like sorting it&#8230;</p>
<h3>Bit by Bit</h3>
<p>The only way to handle this is to work with chunks of data which you combine later on. </p>
<p align="center"><img height="302" alt="block sorting" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/block-sorting.jpg" width="224" border="0" />&#160;</p>
<p>A chunk, or block, is read into memory, sorted, written back. At one point you can start to <em>merge</em> the pre-sorted blocks and write them back into one sorted super-index.</p>
<p>In a small setup this is one machine reading and writing blocks but in a large scale setup this is a whole bunch of <a href="http://labs.google.com/papers/mapreduce.html">machines working with chunks of chunks</a>.</p>
<p align="center"><img height="383" alt="distributed-indexing" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/03/distributed-indexing.jpg" width="270" border="0" /> </p>
<h3>Recognize This?</h3>
<p>In such an index you can&#039;t randomly insert new or updated documents or remove deleted ones. You would have to re-sort on every update.</p>
<p>So what do you do?</p>
<p>You sort your index and use it: this is your main index. New stuff you find on the web goes into another, more temporary index. Call it the <a href="http://www.mattcutts.com/blog/google-hell/">supplemental index</a>. In order to deliver complete and up to date results, when people search you have to return results from <em></em><a href="http://googlewebmastercentral.blogspot.com/2007/12/ultimate-fate-of-supplemental-results.html">both indexes</a>.</p>
<p>Every once in a while you&#039;ll need to merge the new stuff from the supplemental index into the new one. If you find a <em>lot</em> of new stuff every day you&#039;ll need some kind of <a href="http://www.mattcutts.com/blog/bot-obedience-herding-googlebot/#comment-45561">priority setup</a> which says <em>these</em> entries in the supplemental index are worth the CPU time of merging them back in the main index and <em>these</em> are not &#8230; yet.</p>
<p>Of course <a href="http://www.dummies.com/WileyCDA/DummiesArticle/Timing-Google-s-Crawl.id-2555.html">back in the old days</a> you would have just gone out and re-index everything thoroughly&#8230;</p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html">How Search Really Works: Recognize This Index?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-recognize-this-index.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: &quot;The&quot; Index (2)</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html#comments</comments>
		<pubDate>Fri, 29 Feb 2008 22:08:10 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: &#34;The&#34; Index (1).
Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.
 
But what if [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html">How Search Really Works: &#34;The&#34; Index (2)</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html">&quot;The&quot; Index (1)</a>.</em></p>
<p>Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.</p>
<p><img height="130" alt="inverted index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/inverted-index1.jpg" width="444" border="0" /> </p>
<p>But what if you&#039;re not looking for any document that has the words <em>search</em> <strong>AND</strong> <em>people </em><strong>AND</strong> <em>engine</em> but you&#039;re looking for <u>Search Engine People</u>?</p>
<p>Well, if document 42 in our example reads &quot;<em>the <u>engine</u> was found after a <u>search</u> by some <u>people</u></em>&quot; or &quot;<em><u>people</u> use a <u>search</u> <u>engine</u> such as Google&quot;</em> than a traditional inverted index would think it&#039;s spot-on for your search. Ai&#8230;.</p>
<h3>(Extended) Biword Inverted Index</h3>
<p>One way to go about this would be to somehow generate an inverted list of <em>phrases</em>.</p>
<p align="center"><img height="140" alt="biword phrase index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/biword-index.jpg" width="342" border="0" /> </p>
<p><strong>Problem</strong>: if you index phrases of 2-words length, 3+ word phrase searches become just another AND query, combining the parts of the phrase. &quot;<u>Long John</u>&quot; AND &quot;<u>Silver</u>&quot;.</p>
<p>Indexing phrases of 3-words length simply moves the problem to 4+ word phrase searches&#8230;. etc. etc.</p>
<p><strong>Problem</strong>: the inverted index becomes <em>huge</em>, listing every word in every document <em>and</em> every 2 (3? 4?) word phrase in every document&#8230;.</p>
<h3>Positional Inverted Index</h3>
<p>The only <em>real</em> solution is to store not only the incidence of occurrence of a word in a document but to store the exact position(s) of the word in that document.</p>
<p><img height="114" alt="positional index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/positional-index.jpg" width="438" border="0" /> </p>
<p>In this example document 42 is identified for &quot;search engine people&quot; because the words appear in that order: they appear in positions 1, 2 and 3.</p>
<p><strong>Advantage</strong>: because the positional index is similar in construction as the traditional inverted index it inherits the same advantage. That is, when doing an AND query it can jump ahead whenever one of the words doesn&#039;t occur in the document it is looking at.</p>
<p><strong>Advantage</strong>: simply by looking at words occurring in the right order, <em>any</em> phrase of <em>any</em> length can be found even though it isn&#039;t indexed as such.</p>
<p><strong>Advantage</strong>: by having precise position information we can do <em>proximity</em> queries.</p>
<p><strong>Advantage</strong>: phrase <em>matching</em> and query word <em>proximity</em> <u>can also be used to rank search results</u>. </p>
<h3>The Winner</h3>
<p>Although a positional index is at least 2-4 times (or up to 50%) larger than a traditional inverted index the payoff is so large that this is the type of index in use by commercial search engines &#8212; for phrases. &#8230; In general&#8230;.</p>
<p>Frequently searched phrases are still better stored in a biwords index; less frequently searched phrases are better processed with a positional inverted index.</p>
<h3>Index Type &amp; <a href="http://www.searchenginepeople.com/">SEO</a></h3>
<p>The fun (<strong>warning:</strong> geek talking!) is of course that knowing this kind of stuff implicitly explains you things.</p>
<p>For example, knowing that in order for a positional inverted index to <em>really</em> work <em>all</em> words, including so-called &quot;stop words&quot;, need to be indexed makes it less surprising that <a href="http://www.seofaststart.com/blog/stop-words-are-dead">stop words are dead</a>.</p>
<p>Positional indexing and retrieval also makes it not only logic but <em>expected</em> that <a href="http://www.google.com/ie?q=shop+in+new+york">shop <strong>in</strong> new york</a> and <a href="http://www.google.com/ie?q=shop+new+york">shop new york</a> give different results.</p>
<p>&#160;<img height="164" alt="" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/search.jpg" width="440" border="0" /> </p>
<p>Different results = different thinking, different SEO&#8230; different opportunities.</p>
<p>It&#039;s all in the index <img src='http://www.searchenginepeople.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html">How Search Really Works: &#34;The&#34; Index (2)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-2.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: &quot;The&quot; Index (1)</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html#comments</comments>
		<pubDate>Fri, 22 Feb 2008 23:47:36 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Previous Instalment: The Keyword Density Myth.
If a search engine would search &#34;live&#34; through the documents it knows about for the occurrence of the word we&#039;re looking for it could take its time and then simply report where [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html">How Search Really Works: &#34;The&#34; Index (1)</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Previous Instalment: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html">The Keyword Density Myth</a>.</em></p>
<p>If a search engine would search &quot;live&quot; through the documents it knows about for the occurrence of the word we&#039;re looking for it could take its time and then simply report where it found our word.</p>
<p>In this example our search engine has only one index: the documents itself.</p>
<p>&#160;<img height="217" alt="document-only-index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/document-only-index.jpg" width="248" border="0" /> </p>
<p>However, time is something a search engine doesn&#039;t have; the query needs to be answered <em>now</em>.</p>
<p>What we need is a <em>real</em> index!</p>
<h3>Boolean Index &#8211; Talk about the Matrix</h3>
<p><img height="100" alt="boolean-index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/boolean-index.jpg" width="440" border="0" /> </p>
<p>The problem with a boolean index, where we put a little flag (1) or not (0) for every word for every document is that it quickly grows <b>way and way too large</b>.</p>
<p>Three documents with amongst them just four words take 12 1&#039;s or 0&#039;s &#8212; apart from the bits and bytes we need to store the word. Now imagine a matrix where one of the sides is <a href="http://www.google.com/search?q=a">13,940,000,000</a> columns wide&#8230;</p>
<h3>The Inverted Index</h3>
<p>&#160;<img height="229" alt="inverted-index" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/inverted-index.jpg" width="440" border="0" /> </p>
<p>In the inverted index we record only the places (documents) where a word <em>does</em> occur. </p>
<p>It&#039;s called inverted because instead of the documents providing the occurrences of a word, the word points to which documents it occurs in.</p>
<p>Sorted by document pointer, the inverted index is <u>extremely efficient in performing AND queries</u>.</p>
<p>Let&#039;s reshuffle our example a little bit to make this visually clear: <img height="79" alt="intersect-search" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/intersect-search.jpg" width="440" border="0" /> </p>
<p>If we search for documents that contain the words &quot;search compression&quot; and we down these rows at the same time, as soon as one row makes a jump to a higher document ID, you can jump forward in the other row as well: no use checking the intermediate ones as you now <em>know</em> that those won&#039;t have <em>both</em> words.</p>
<p>Knowing only about yes/no occurrences, an inverted index is <u><em>horrible</em> at phrase and proximity matching</u>:</p>
<p><img height="383" alt="paris-hilton" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/paris-hilton.jpg" width="439" border="0" /> </p>
<p><em>To be continued&#8230;</em></p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html">How Search Really Works: &#34;The&#34; Index (1)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-the-index-1.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: The Keyword Density Myth</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html#comments</comments>
		<pubDate>Sat, 02 Feb 2008 01:19:15 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: Keyword Stuffing.
What is Keyword Density?
Keyword Density is a function, a calculation, of keyword frequency.
It&#039;s calculated as number of occurrences divided by number of words and is usually expressed as a percentage.
&#160;
What is Keyword Density Used [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html">How Search Really Works: The Keyword Density Myth</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html">Keyword Stuffing</a>.</em></p>
<h3>What is Keyword Density?</h3>
<p>Keyword Density is a function, a calculation, of <strong>keyword frequency</strong>.</p>
<p>It&#039;s calculated as <u><em>number of occurrences</em> divided by <em>number of words</em></u> and is usually expressed as a percentage.</p>
<p align="center"><img height="221" alt="keyword density example" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/keyword-density-example.jpg" width="256" border="0" />&#160;</p>
<h3>What is Keyword Density Used For?</h3>
<p>Nothing much, really.</p>
<p>Keyword density can help in readability calculations.</p>
<p>Keyword density is also sometimes used as a simplified manner to introduce <strong>local keyword weight</strong> but should never be confused with it.</p>
<h3>Why don&#039;t Search Engines use Keyword Density?</h3>
<p align="center"><img height="224" alt="local-keyword-density" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/local-keyword-density.jpg" width="422" border="0" /> </p>
<p>Search engines deal with calculations that say something about words in a document <strong>in relation to the index</strong> it appears in.</p>
<p>Keyword density says something about words in a document <strong>in relation to the document</strong> itself. It doesn&#039;t help you to compare and thus sort or rank a set of documents.</p>
<h3>Frequency &lt;&gt; Relevance</h3>
<p>The fact is that frequency in and of itself doesn&#039;t equate to relevance.</p>
<p>The word <em>the</em> is the most commonly used English word: it appears with the highest frequency. If a search engine would calculate relevance as frequency, all documents in its index would have <em>the</em> as their topic subject.</p>
<p>Likewise the word <em>time</em> is the most commonly used English noun. This would make a multitude of documents relevant to <em>time</em> before anything else.</p>
<h3>Keyword Weight</h3>
<p>To make sense of word occurrences in a document a search engine has to see those words <u>in the context of its index</u>.</p>
<p>This is done by calculating the overall importance of words both in the document <em>and</em> in the index.</p>
<p><u>This importance is called term weight</u>.</p>
<p>To calculate the importance of a word in a document, 3 variables are needed:</p>
<ul>
<li><strong>local weight</strong>: a calculation <em>based</em> on keyword frequency in this document. This variable can be calculated in many ways but <em>not</em> as a straightforward count of how many times the word appears in the document. </li>
<li><strong>global weight</strong>: calculated based upon <em>number of documents in index</em> divided by <em>number of documents with the keyword</em>. </li>
<li><strong>normalization</strong>: a calculation designed to remove the unfair advantages and disadvantages of document length. Usually you work to express the end values between 0 and 1. </li>
</ul>
<p><u>None of the search engines have ever disclosed which published or unpublished scales they use for local weight or global weight</u>.</p>
<p>What we&#039;re looking to achieve is to get high values for terms (words/phrases) that occur a lot of times in the relevant documents but infrequently in the index as a whole.</p>
<p><img height="313" alt="term-weight" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/term-weight.jpg" width="440" border="0" /> </p>
<h3>Keyword Density Myth Summary</h3>
<p>Search engines use <em>term weight</em> to rank documents by relevance.</p>
<p><em>Term weight</em> is calculated from the result of two other calculations: <em>local weight</em> and <em>global weight.</em></p>
<p>Without knowing the function used for <em>local weight</em> we can&#039;t calculate it &#8212; but we <em>do</em> know that it&#039;s not just pure keyword frequency.</p>
<p>Without knowing the size of the index, the number of documents relevant to the term, <em>and</em> the function used for <em>global weight</em> we can&#039;t calculate it.</p>
<p>Using <em>keyword density</em> as a guesstimator of <em>weight</em> or <em>relevance</em> is therefore utterly useless. It&#039;s like giving you the height of a three dimensional object based on which you have to not only return its volume but also tell whether it is larger or smaller than any other unseen object in a collection you don&#039;t know about.</p>
<p align="center"><img height="198" alt="keyword-density-calcultation" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/02/keyword-density-calcultation.jpg" width="302" border="0" /> </p>
<p><em>Hungry for more? I recommend <a href="http://www.miislita.com/fractals/keyword-density-optimization.html">The Keyword Density of Non-Sense</a>.</em></p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html">How Search Really Works: The Keyword Density Myth</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-the-keyword-density-myth.html/feed</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>How Search Really Works: Keyword Stuffing</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html</link>
		<comments>http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html#comments</comments>
		<pubDate>Sat, 26 Jan 2008 03:33:21 +0000</pubDate>
		<dc:creator>Ruud Hein</dc:creator>
				<category><![CDATA[How Search Really Works]]></category>

		<guid isPermaLink="false">http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html</guid>
		<description><![CDATA[This post is part of an ongoing series: How Search Really Works.       Last week: Keyword Links.
Left to their own devices, people will assign keywords (tag or link) as they please. 
They paint a rich picture of the linked content.
 
Keyword stuffing is the unnatural repetitive use of a specific [...]<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html">How Search Really Works: Keyword Stuffing</a></p>
]]></description>
			<content:encoded><![CDATA[<p><em>This post is part of an ongoing series: <a href="http://www.searchenginepeople.com/blog/category/how-search-really-works">How Search Really Works</a>.       <br />Last week: <a href="http://www.searchenginepeople.com/blog/how-search-really-works-keyword-links.html">Keyword Links</a>.</em></p>
<p>Left to their own devices, people will assign keywords (tag or link) as they please. </p>
<p>They paint a rich picture of the linked content.</p>
<p><img height="217" alt="natural linking" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/01/tag-cloud.jpg" width="430" border="0" /> </p>
<p><u>Keyword stuffing is the unnatural repetitive use of a specific word or phrase.</u></p>
<p><strong><font color="#ff0000">In your content&#8230;.</font></strong></p>
<p><img height="257" alt="keyword-stuffing" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/01/keyword-stuffing.jpg" width="440" border="0" /></p>
<p><strong><font color="#ff0000">..or your links&#8230;</font></strong></p>
<p>&#160;<img height="233" alt="keyword-stuffing2" src="http://www.searchenginepeople.com/blog/wp-content/uploads/2008/01/keyword-stuffing2.jpg" width="442" border="0" /></p>
<p>Post from: Search Engine People <a href="http://www.searchenginepeople.com">SEO</a> Blog<br/><br/><a href="http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html">How Search Really Works: Keyword Stuffing</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.searchenginepeople.com/blog/how-search-really-works-keyword-stuffing.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
