<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: How Search Really Works: Relevance (2) - Vector Space</title>
	<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html</link>
	<description>Canada's Search and Social Media Authority</description>
	<pubDate>Sat, 19 Jul 2008 23:20:25 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-2370</link>
		<dc:creator>SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 03 Jul 2008 13:24:25 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-2370</guid>
		<description>[...] Others have claimed that it is not possible to evaluate the IDF of a phrase. Even some that plan to teach IR have claimed that calling log(N/n) &#8220;inverse document frequency&#8221; is an &#8220;insult to students&#8221;. Before making a fool of themselves they should read Robertson and Sparck Jones legacy papers on the topic. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Others have claimed that it is not possible to evaluate the IDF of a phrase. Even some that plan to teach IR have claimed that calling log(N/n) &#8220;inverse document frequency&#8221; is an &#8220;insult to students&#8221;. Before making a fool of themselves they should read Robertson and Sparck Jones legacy papers on the topic. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vector Space Models and Search Engines &#171; IR Thoughts</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1473</link>
		<dc:creator>Vector Space Models and Search Engines &#171; IR Thoughts</dc:creator>
		<pubDate>Mon, 21 Apr 2008 15:36:31 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1473</guid>
		<description>[...] That said, today&#8217;s post is in reaction to the article at http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] That said, today&#8217;s post is in reaction to the article at <a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html">http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html</a> [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How Search Engines Do Not Work &#171; IR Thoughts</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1474</link>
		<dc:creator>How Search Engines Do Not Work &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 17 Apr 2008 12:40:37 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1474</guid>
		<description>[...] 1. http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] 1. <a href="http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html">http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html</a> [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Malte Landwehr</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1469</link>
		<dc:creator>Malte Landwehr</dc:creator>
		<pubDate>Wed, 16 Apr 2008 21:06:29 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1469</guid>
		<description>An excellent analysis of how to weight terms by their frequency. But I doubt that the two dimensional space is enough to represent the complexity needed to maintain an index of millions of documents.</description>
		<content:encoded><![CDATA[<p>An excellent analysis of how to weight terms by their frequency. But I doubt that the two dimensional space is enough to represent the complexity needed to maintain an index of millions of documents.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dev Basu</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1470</link>
		<dc:creator>Dev Basu</dc:creator>
		<pubDate>Mon, 14 Apr 2008 21:22:14 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1470</guid>
		<description>As usual Ruud this is a great post. It's always interesting to learn the inner workings of an SE :)</description>
		<content:encoded><![CDATA[<p>As usual Ruud this is a great post. It&#8217;s always interesting to learn the inner workings of an SE <img src='http://www.searchenginepeople.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruud Hein</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1471</link>
		<dc:creator>Ruud Hein</dc:creator>
		<pubDate>Sat, 12 Apr 2008 03:20:14 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1471</guid>
		<description>Good indeed to point that out. Doing any of this at run time is extremely costly. There are cost reducing procedures; working with top N documents or leader/follower samples.

Yet I too think that this isn't used at run time (read: query time) because the TFxIDF vector space model is geared towards words. The IDF of a words is computed; not of phrases. All in all it doesn't deliver enough bang for its buck.

Worse: it's typically a model for a clean index. Boosting TF for a high IDF word is too easy when you have search access to the whole collection.

It's interesting though to see how this model can find related documents.</description>
		<content:encoded><![CDATA[<p>Good indeed to point that out. Doing any of this at run time is extremely costly. There are cost reducing procedures; working with top N documents or leader/follower samples.</p>
<p>Yet I too think that this isn&#8217;t used at run time (read: query time) because the TFxIDF vector space model is geared towards words. The IDF of a words is computed; not of phrases. All in all it doesn&#8217;t deliver enough bang for its buck.</p>
<p>Worse: it&#8217;s typically a model for a clean index. Boosting TF for a high IDF word is too easy when you have search access to the whole collection.</p>
<p>It&#8217;s interesting though to see how this model can find related documents.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hamlet Batista</title>
		<link>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1472</link>
		<dc:creator>Hamlet Batista</dc:creator>
		<pubDate>Fri, 11 Apr 2008 22:08:39 +0000</pubDate>
		<guid>http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html#comment-1472</guid>
		<description>Hi Rudd,

Excellent post as usual. It is important to mention that vector space model for ranking is not currently practical for the top search engines due to the size of their index (and the corresponding size of the document vectors). While they use huge matrices for computing the importance of the links (PageRank), the process is done offline and is query-independent. Computing such vectors are query time would be prohibitively expensive in times and resources.

Cheers</description>
		<content:encoded><![CDATA[<p>Hi Rudd,</p>
<p>Excellent post as usual. It is important to mention that vector space model for ranking is not currently practical for the top search engines due to the size of their index (and the corresponding size of the document vectors). While they use huge matrices for computing the importance of the links (PageRank), the process is done offline and is query-independent. Computing such vectors are query time would be prohibitively expensive in times and resources.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
</channel>
</rss>
