Discover What Google Really Thinks Of Your Pages

by Donna Fontenot November 23rd, 2009 

What does Google REALLY think of your site's pages? Think PageRank gives a clue? Perhaps if we were shown true PageRank, we could use it as a somewhat reliable gauge, but unfortunately, we don't have that luxury. But we don't need PageRank. We have a better way to determine the rate of importance that Google places upon each of those coveted pages of ours, but it takes a little effort on our part to uncover this goldmine of information.

What To Look For

Long ago, January 27, 2007 in fact, Aaron Wall said in a post titled, Cache Date as the New Google PageRank:

Given Google's reliance on core domain authority and displaying outdated PageRank scores, cache date is a much better measure of the authority of a particular page or site than PageRank is.

What Google frequently visits (and spends significant resources to keep updated) is what they consider important.

google cache rank
*Fake photoshopped image depicting what I imagine CacheRank might look like if it actually existed.

Then, in a guest post I wrote on directom.com in November of 2008, I discussed a few ways to track cache date, including Michael Gray's method, some WordPress plugins and some paid tools. As I mentioned in that post, I wasn't particularly happy with any of them. Because of that, I kept searching.

The Right Tool For The Job

Earlier this year, I ran across the free CJ Googlebot Activity script created by James Crooke of cj-design.com. I started running it on one of my sites in mid-May, let it accumulate data for four and a half months, and then started analyzing that data. The results were very, very interesting.

The bottom line? I now have a very clear perspective on what pages on my site Google thinks are most worthy of their time – and which aren't! And while a few of the pages were obviously ones that Google appreciated, I was very surprised by quite a few that Google placed at the bottom of its priority list. Many of those at the bottom of the heap were ones that I personally think are some of the better pages on the site, so I obviously need to make sure Google appreciates them as well. I now have a very clear idea of where my focus needs to be, and what pages I can spend less time on for now.

cacherank bar chart

Making It Work

Now, I want to give you some technical tips on getting this working for you. The script comes with easy installation instructions included, so you should have no issues getting it set up. Like most scripts, you edit a config file to answer a few basic questions, and then simply place a short php include code snippet in the files you want to track. If you have a header or footer file that gets included in all your templated pages, you can place the snippet there and know that it will automatically be included everywhere (similar to how including analytics code works).

Once it's running, you have an administration area that you can go to any time you wish to view the activity. That's useful, but it's limiting in giving you "the big picture". Luckily, it also gives you an option in the admin area to export to a csv file, and this is where the data can start to be meaningful.

We'll get to that in a minute, but first you should take a look at a screenshot of the demo account listed on James Crooke's site. You can see more of the demo here.

crawl screenshot

Tweaking and Geeking

So, although I love seeing that data, I knew I needed to delve a bit deeper. I'm not a spreadsheet geek, though I wish I were, so instead of analyzing the data there, I imported the csv data into a MySQL database table. I was only concerned with 3 pieces of data, Crawl Date, Crawl Time, and URL Crawled, so the table was a very simple one. (I know a lot of people might not be MySQL geeks, but if you ever wanted to learn the basics, this task will be a good start, since it's a relatively simple one).

The steps basically involved creating a database, and then using PHPMyAdmin, creating a table with 3 fields – crawldate (date field), crawltime (time field), and crawlurl (varchar field). (You simply type the names in, and choose the type of field from a dropdown, and then click the Go button to create the table).

Once the table is created, click on it, and then choose the Import tab. Upload the csv file using the browse button and select CSV Using LOAD DATA. Since my csv file was comma-delimited, I put a comma (,) in the Fields terminated box, left the Fields enclosed by and Fields escaped by boxes blank, kept the defaults for everything else, and clicked the Go button.

In a second, everything was imported into the table. The whole process takes just a few minutes.

Once the data was in the table, I ran the following SQL statement by clicking on the SQL tab in PHPMyAdmin:

select crawlurl, count(crawlurl) from crawls group by crawlurl order by count(crawlurl) desc

This particular SQL statement doesn't even concern itself with dates or times. It simply counts the number of times each URL was crawled within the entire span of data (which was four and a half months for me), and then sorts each page URL by the number of times it was crawled, showing the most-crawled pages first.

Here's an example of how my data looked afterwards:

Crawled … PageName
————————————-
52 … somepage.php
44 … thatpage.php
36 … anotherpage.php
8 … sampleurl.php
2 … yetanotherpage.php

With a large site, the list can be quite long, but that's ok. Because it's sorted, it's easy to scan and see where the problems are, and what pages need more tender loving care (as well as link love).

I'll be working more with the dates and times in the future to see if there's any more insights I can glean from the data, but for now, this very simple analysis proved to be extremely useful as an analytical tool.

Why Bother?

If you want to see clearly which of the pages Google appreciates the most – and more importantly, which they appreciate the least – I recommend doing a similar analysis on your site. Once you have this information, you can schedule your time so that the pages you work on are the ones that need the most help.

Information yields power. Data is good. :)

*hat tip to Michael VanDeMar for his help.

UPDATE: I may have forgotten one step in that process above. When I said I exported to a csv file, actually it merely exports into a regular txt file. I then just did a quick find/replace to replace the line endings with commas, and find/replace to delete the extraneous words "Crawled:" and "Bot:". THEN, I saved that as a .csv file.

UPDATE 2: I didn't make things really clear apparently. This analyzes crawl data, not actual cache data. I make the leap in my own brain that crawls leads to caches, but it's only crawl data that is being analyzed here. Apologies for any confusion.

You May Also Like

8 Responses to “Discover What Google Really Thinks Of Your Pages”

  1. You could do most of this using a Google Doc Spreadsheet. I THINK you can scrape the crawl date using importXML so you could use the spreadsheet itself to import the data.

    check out my post on creating a server monitor program using a google docs spreadsheet. It's not quite the same thing, but I think you'll see how one may apply much of the same method to your dream app.

    http://stephenakins.blogspot.com/2009/04/google-docs-server-monitoring_8546.html

  2. shuvo says:

    Great article but will it work for blogspot blogs
    .-= shuvo recently posted: Online Income For Voting =-.

  3. amir says:

    Is there a solution for .NET that does the same?

  4. AlessioWeb says:

    I actually think the page rank no longer has any importance, and perhaps even less of the trust rank, which can be better considered. Many people consider the page rank which is essential, but it is no longer even true result. It 'better to rely on good content and reputable sites, because Google no longer takes it seriously.

  5. Mark Vozzo says:

    Great post, but am I missing something? Google provides us with a lot of insight on it's crawler acitivty via Google Webmaster Tools (http://www.google.com/webmaster).

    What are the main advantages on running Script vs. Google Webmaster Tools reports?

    If there is a major advantage, than like Amir, I too am interested in a .NET version.

    Regards,
    Mark Vozzo – Sydney, Australia

  6. Cata says:

    Interesting experiment, but quick question: you said that you had a few surprises in the sense that you expected Google to appreciate certain pages of yours more. Did those pages have more links to them, or why were you expecting to be appreciated more?
    .-= Cata recently posted: Technique 1: The Small Niche Site and the Reverse Niche Site =-.

  7. Neilson says:

    Really informative post. Its not very often that people think about the pages that Google views as weak, and more importantly how they can make them more powerful. Too much emphasis is put on the powerful pages, make the site very one sided.

  8. Philippe says:

    I wrote a post on my blog some months ago about how to track spiders activity on your site, using Google Analytics. It's an amazing google analytics hacks, as it tells you which spider went to your site, when, how often, and which pages they visited.

    http://philippeog.com/seo-analytics-how-to-track-search-engine-bots-with-google-analytics
    .-= Philippe recently posted: Advertising on Tube Study Case: Call to action, Incentive and vanity Url =-.