Learn SEO: Maximize Your Time with Google's Spiders

by Stephanie Woods October 13th, 2009 

Let's face it. You wish your website ranked for the number one spot on Google. Yes there are more important things like overall online visibility and ROI. And yes, there are other major players like Yahoo! and Bing. But today we are going to focus on Google. In fact, I am going to make a bold statement and say that anyone who tells you they don't care about ranking high in Google is lying.

Now let's start with the basics.

Software robots, or bots, called spiders build lists of words they find on websites. This process is called crawling. Spiders crawl through web pages and index what they find. They also follow every link found within a web page. This practice of following links enables spiders to quickly travel across the web indexing pages.

There are millions of websites and billions of web pages. Because of this, Google has sophisticated algorithms that determine how much time a spider can spend on your site. In order for your site to be displayed in the results pages, it is important that the spiders properly (and fully) index your website.

Spider Friendly Checklist

For those of you well versed in all things SEO, nothing here will be new to you. For the rest of you newbs out there (we all have to start somewhere), please keep in mind that some factors have more importance than others and are listed in no particular order. In my humble opinion it is beneficial to execute all of the items listed below. Every little bit helps. With constantly changing algorithms, it is imperative to have all the bases (and basics) covered.

Individually each of the tasks below won't have a huge impact. Collectively they will help your site rank; especially if you're in a niche and not-so -competitive industry (like crumb rubber in Penticton). If you are in a competitive industry (like New York
real estate), these tactics are small (but necessary) stepping stones to compete with the big boys. At the end of the day, no matter how competitive your industry is, if spiders are unable to index your site, you won't be found in Google. Simple as that.

All of the items below can (and do) have blog posts of their own describing each task in detail. For the sake of brevity, each tactic is described in basic top level terms.

  • Create good site architecture and link structure: Two to three clicks to reach a destination (i.e. important product or service landing pages) from the homepage is optimal. If a spider has to crawl too deep it may never get to those pages. Also, be sure you don't have any orphan pages (i.e. pages that aren't linked to any other pages on your site). This seems glaringly obvious, but it happens.

  • Avoid the use of dynamic URLs: A dynamic URL is a URL that is not written in plain English. An example of a dynamic URL is this: www.mysite.com/url.com?id=4&ses=aa#. One of the problems associated with dynamic URLs is that too many parameters can cause a spider trap. This happens when a spider gets trapped in an endless loop of code. What you want to use is a URL that has your chosen keywords written in plain English. This is referred to as a canonical URL. Dynamic URLs are handy for tracking things, so if you do need to use them, make sure you use a mod_rewrite to ensure the spiders track the canonical URL.

    Another tip: use underscores instead of hyphens hyphens instead of underscores. Spiders read underscores as all one word, whereas dashes are read as separate words.

  • Beware of duplicate content
    issues: Duplicate content can waste valuable spider time. However, there are instances where it is unavoidable, so ensure that it is dealt with correctly. In many cases the best solution is to use 301 redirects to point all of the duplicate content to one page. Your date with spidey is limited so make sure you're giving him new content to index. It's a waste of time having a spider index pages that they already have. Also, Google gets to pick which of the duplicated pages it wants to index and it just might end up picking the wrong one.

    There are more advanced options to consider as well. These include adding a canonical tag (a page level meta tag) to specify which version is the canonical page (aka plain English URL). The downfall to this is that the spiders have to crawl the page first to read the tag. So it's not necessarily maximizing your time with Google's spider. Google recently another option: in Webmaster Tools you can tell Google's robots to ignore any dynamic parameters, and have the spiders only crawl the canonical version of page. The benefit to doing this is that it can reduce crawl on unnecessary pages and free up bandwidth for other pages to get crawled.

  • Create a robots.txt file: This file creates an opportunity to tell the spiders which parts of your site are not important for them to check out (such as folders where your images are contained). This helps to ensure you're not wasting valuable face time having unnecessary files checked. Robots.txt can also help you tell spiders which pages it shouldn't check to avoid duplicate content issues. Be careful though. An incorrect robots.txt can also make your site uncrawlable.

  • Generate an XML sitemap: It's true that Google will eventually find your site and spider it. It's also true that this can take some time if you have a brand new site with no little or no external links pointing towards it. Submitting a XML sitemap to Google helps speed up the process and is generally considered good practice.

  • Utilize onpage SEO tactics: Not all onpage tactics are equal. However, it's inevitable that algorithms will change and covering all the bases is encouraged. Onpage tactics help busy little spiders building their lists of words to distinguish which keywords are central to each page on your site.

    Your goal is to create content that people will like to read and share with others (and hopefully link to). Make sure the following tactics are used so that the inclusions of keywords still sound as natural as possible (including the use of modifiers and synonyms of the selected phrase).

    • Use your most important keywords at the front of your page title.
    • Utilize relevant keywords for the H1 tag for page headlines.
    • Adjust your internal linking structure so that you are linking using relevant anchor text.
    • Label images and photos with your targeted keywords only if relevant (i.e. no unnecessary keyword stuffing).
    • There is no magical formula for how many times you should repeat your selected keyword phrase. However, it's safe to say that using it at least two or three times throughout the body copy makes sense.

  • Ensure your pages have a fast download time: Google says that spiders will crawl as many pages as they can without overwhelming your server. Most often they only crawl a portion of your site before they move on. There is a direct correlation between page download time and how many pages are crawled that day. Make sure that your pages are not too big and load quickly.
  • Be careful when using Flash: Yes. It is true that spiders have come a long way in their ability to index Flash images. At present, Google's Flash algorithms extract text and links only. Which sounds all good. However, the problem is that Google's spiders will not crawl or index any Flash executed using Javascript (which a lot of Flash uses). At this point in time it is still best practice to be careful when using Flash for integral parts of your website i.e. links, navigation and important content.

  • Create custom 404 pages: If a spider is on your site and hits a 404 page (i.e. page not found) with no links on it, then it's the end of the visit for you. It has nowhere to go. A custom 404 page ensures that there are links for the spider to continue indexing your site.

  • Add your site to Google Webmaster Tools. Google will let you know if there are things wrong with your site. Like crawl errors. Or duplicate meta information. This information helps you keep your site Google friendly.

  • Do not hide behind logins if you want your content to be seen: If a user needs to login to view content, then so will the spider. This means that any content hidden behind a login will not get indexed.

  • Do not require that cookies or session IDs be enabled to view your site: Our friendly crawlers do not have the ability to access cookies or sessions IDs. If this is a requirement on your site, it won't get indexed by spiders. There are parts of a site where it's okay; like the checkout. There are parts of a site where it's bad; like your homepage.

And there you have it: A solid platform upon which to build a successful (and well spidered) website. What are some of your tips for maximizing your time with Google's spiders? We'd love it if you'd share them.

Stephanie Woods is a freelance SEO/SEM consultant (www.stephwoodsseo.wordpress.com) in Kelowna, BC, Canada. You can follow her on Twitter (www.twitter.com/steph_woods).

You May Also Like

19 Responses to “Learn SEO: Maximize Your Time with Google's Spiders”

  1. "Another tip: use underscores instead of hyphens. Spiders read underscores as all one word, whereas dashes are read as separate words."

    Don't you mean "use hyphens instead of underscores" ?

    Good information. Sometimes we go into all the advanced stuff without looking at the simple, which can be as important or even more so.

  2. Stephanie Woods says:

    @RoofingGuy9000 Thanks for pointing out the typo. You are right, that is what I meant to say!

  3. KJ says:

    The rule about KW repetition only holds true if you aren't talking about Bing. They LOVE KW spamming.

  4. I think sometimes people over think SEO to be something that is super technical that requires a programmer. Yes SEO in copy writing will require the writer to be more literal than cute. There is plenty of room for the two to mesh together.

  5. Guys, every time I read these comments they get me furious with those people who claim to know it all but knows nothing at all, things like how many keywords to put in the title, description, keyword tag, alt tag, and what density you should have, this is all bogus stuff. Please let me explain, if you've done any type of SEO then you probably know the basics, if you do then that's all you need, the rest is all tweaks, adjustments and monitoring the site to see what works and what does not. Patience is said to be a virtue and it's completely true in SEO. It's also good to look at those sites that rank high for very competitive keywords, like Internet marketing, search engine optimization, and many others, look at there source code and see some of the techniques they have employed like instead of titles in links, alt tags and descriptions on images, these are just some of the things you can learn, stay up to date with good SEO information not the nonsense you get from forums and a few out of touch blogs, trust me I was there before I became enlightened, and my advice to anyone is that if the initial thought was negative then it is. Spamming, Link Farms stay away from Black Hat SEO.

    • Jeremy says:

      "…these are just some of the things you can learn, stay up to date with good SEO information not the nonsense you get from forums and a few out of touch blogs…"

      Care to share where you keep up to date with such information short of these blogs?

  6. To tell you the truth, there are not too many articles out there that cover such seemingly technical stuff in such an easy to read format. I actually learned something!

    Your Net Biz law of attraction marketing blog

  7. [...] Presenting my first post on SEO Scoop: Learn SEO: Maximize Your Time with Google’s Spiders [...]

  8. [...] article from Search Scoop is a basic introduction to how to get your site noticed and how to make the most [...]

  9. [...] Learn SEO: Maximize Your Time with Google’s Spiders – Stephanie Woods dropped in to the SEO Scoop with a ‘Spider Friendly Check List’ which are some common sense tips for spider management. [...]

  10. [...] Learn SEO: Maximize Your Time with Google’s Spiders – Stephanie Woods dropped in to the SEO Scoop with a ‘Spider Friendly Check List’ which are some common sense tips for spider management. [...]

  11. Stephanie Woods says:

    KJ – I haven’t had time to delve too deeply into Bing as of late. The word on the street though is it’s something we need to start paying more attention to. That’s an interesting comment about keyword spamming I will look further into. Not that I have any intentions of KW spamming!

    Nick – SEO can be done by anybody if they learn how to do it. Same goes with building a website. All it takes is time. As for writing, engines are looking for good content, not keywords. Getting keywords into your content should more or less occur naturally since your onpage optimization should be a direct reflection on what the page is actually about!

    Mark – You are correct in saying that it doesn’t matter how many keywords are in the title tag (although it certainly does help to have your targeted keywords in the page title). If you look at sites that are doing well in the search engines, you will see that they have the nominal amount of onpage optimization done mentioned here. This article makes no mention of keyword density, just suggests it doesn’t hurt to include the keywords a copy times in the body copy (which shouldn’t be hard if you’re targeting the right keywords in the first place).

    NetBiz – Although you sound like a bot, thanks for the compliment. Glad to provide information that you can use. Feel free to drop me a line if there is anything else you’d like to see covered.

  12. [...] Learn SEO: Maximize Your Time with Google’s Spiders – Stephanie Woods dropped in to the SEO Scoop with a ‘Spider Friendly Check List’ which are some common sense tips for spider management. [...]

  13. Neuro SEO says:

    Ensure your pages have a fast download time – that's a nice point every website should have…

    Nice article.

  14. Herman says:

    Came across your post when Googling "how to get google to spider your links" I've built 100s of links to a friend's site over several weeks but Google hasn't respidered it. How do I speed up respidering when I've followed the checklist?

  15. Stephanie Woods: Yes you're right on keywords in title absolutely, but let me just share this with you, no key phrase is optimized equally, what I mean by that is if you take for argument sake "search engine optimization" and "website internet marketing" and compare the top ten for each you will find that the methods that got them there are completely different, search engine optimization could be heavily weighted on optimization compared to website internet marketing where the method could be backlinks driven, because of the theme, "marketing" although they both need every element to rank well by nature you have to understand what makes your phrase rank well. Here is another example, have you had your site rank for keywords you were not targeting? and your target keywords are nowhere to be found? well that is because your not using the right method to target the keyword. Maybe the top ten does not have their keyword in H1, Alt Tags, Title, whatever the reason, so never assume that just because we know what a good optimization needs that every key phrase needs it. It's all about tweaking to find the right balance…..

  16. Stephanie Woods says:

    Herman – My apologies for the latent response. You are building up your external links already and that is a great way to get your pages indexed more quickly. In terms of Google finding the links that are linking back to you, there is not much you can do. The reason being is that the spiders need to find the link back to you on other people's sites; not your own site. Hence you have no control over how other people's sites (and the links back to you) are spidered.

    Mark – You are absolutely correct. There are various paths to take when marketing your site. As I mentioned, the items are only small steps towards a bigger goal. These alone won't make your site rank #1 in a competitive industry, but they can certainly help. As the title of the post implies, this article is about what you can do with your site to help expedite the spidering process. It's not about how to market your site. That's something entirely different. For the sake of brevity, it's not possible to cover every angle of online marketing in one post without confusing the hell out of people just starting out.

    When you end up ranking for terms that you're not even going for, you're getting into the theory of the longtail. Some people argue that optimizing your site for longtail phrases that only a few people search for can collectively bring you lots of traffic. There are many facets to SEO and online marketing, however, my intention for the article was to to provide information to people on a pretty basic level – without causing too much confusion!

  17. Point taken Stephanie: Thanks

  18. [...] a new article, they are often Indexed within 2-3 hours depending on time of day. This shows that the Spider pays me a visit on a daily basis, looking for new Content to give to their hungry [...]