The web is a series of "things" with links between them.

In math a series of "objects" with links in between them is called a graph.

In computer science a graph is data structured along the mathematical graph concepts.

The World Wide Web then looks like and can be acted upon mathematically as a graph.



Graph algorithms " series of math calculations to extract data from or conclude things about a graph " are heavily used in computer science.


At the heart of Google was the PageRank algorithm.

PageRank is a graph algorithm that extracts meaning from a graph (the web) by looking at the structure of the graph: it looks at the links (edges in math) between pages (nodes) and assigns a numerical value to each page based on the links found. By repeating this several times, going over the same graph time and time again, the most connected pages can be found:  these pages are then considered by Google to be the most important (#1 ranking), and their links are considered the most influential.

The correlation Google draws between link, value of links, and the importance of a page was an intuitive one. People who used Google found the search results ordered (ranked) this way very, very relevant, confirming the intuitive correlation.


We just know in our gut that there has to be a correlation between the number of people that like (Facebook), tweet (Twitter), or Plus One (Google) a page.

That's the answer.

The question is: what is the nature or the meaning of this correlation?

Is there a correlation between relevance and social shares? Traffic and social shares? Are social shares maybe only relevant and correlated within one's social network; you visit what I visit but outside of our relationship people could care less? Do pages with more links get equally more social shares? Are too many social shares a sign of web spam?


When you have a lot of related data in rows and columns (a table of data for you and me; a matrix in math) you know they are related in the sense that the data belongs to each other.

If you want to explore or confirm relationships in, especially, massive data sets with lots of "noise" (bad data), unknown connections, and all kinds of cross linked interactions , then you would use factor analysis.

Factor analysis is a mathematical tool that can show patterns of occurrence.

For example, in a large data set of everyday objects, factor analysis can show that there are patterns of occurrence. That beds appear with houses but are less likely to appear with office. An insurance expert could use the data set of everyday objects together with everyday accidents and factor analysis could shows or confirm that ladder appears a lot with broken bones.

Combining social and economical data social scientists, economists, and politicians can explore which patterns seem to be beneficial for economic growth or which promote peace and stability.


Google will go through a period where they collect +1 data but not act on it.

Once they have a large data set they can apply factor analysis to have the data tell them what the meaning of the +1 variable is and what its various values seem to imply.

The patterns of occurrence found can be compared with models Google might have made or with a similar data set from a smaller collection which contains more trusted data.

With the plus one meanings known, Google can then codify this meaning.

Aware of what the different values stand for, which frequencies are normal, and which patterns of appearance are odd, Google can set the bottom and ceiling triggers and create a negative filter, a normative filter, and a positive filter.


I expect that the primary occurrence patterns will be:

  • Social Influencer: this is where +1 starts today and will start every time. The pattern is localized in the social network of the user making the +1 vote. From that user the effects ripples through her network, causing additional +1 votes from other people in the network; the +1'd site is very valuable or relevant to that network for that query.
    Other terms relevant here: localized expert; trust.
  • Network Hopping: when the initial +1 has traveled through the localized network and the subsequent +1's cause people in the connected networks (friends-of-friends, etc.) to +1 the result.
    Associative terms: viral; expert; meme; middle-of-the-road.
  • Query Relevant: the result of the first two is that the +1 is highly relevant according to ever growing numbers of searchers.
    So: expert advice; "best of" (not necessarily "best")

The meaning of these patterns, or their value, seems to be inverse to the level of connectiveness: the less people voting each other are connected, the better their vote.

That is, if a PHP expert has found a page with good code and +1'd it, someone performing a similar query sees this results and +1's it and these two people are unrelated, then the +1 has high value in suggesting query relevant value then had the two known each other.

This in turn suggests that low numbers of +1's can be just as valuable as high numbers of +1's depending on the connectiveness between the voters.

Perhaps this can be used to add an inverse corrective filter, one that says that the more votes from within the immediate network + the more votes from within the direct neighboring networks = less value for non-connected networks " until a certain level is passed.

Suggested reading: Bin Gao, Tie-Yan Liu, Wei Wei, Taifeng Wang, and Hang Li.  Semi-Supervised Ranking on Very Large Graph with Rich Metadata [PDF]. Microsoft Research. March 2011.