This post is part of an ongoing series: How Search Really Works.
Last week: Keyword Stuffing.
What is Keyword Density?
Keyword Density is a function, a calculation, of keyword frequency.
It's calculated as number of occurrences divided by number of words and is usually expressed as a percentage.
What is Keyword Density Used For?
Nothing much, really.
Keyword density can help in readability calculations.
Keyword density is also sometimes used as a simplified manner to introduce local keyword weight but should never be confused with it.
Why don't Search Engines use Keyword Density?
Search engines deal with calculations that say something about words in a document in relation to the index it appears in.
Keyword density says something about words in a document in relation to the document itself. It doesn't help you to compare and thus sort or rank a set of documents.
Frequency <> Relevance
The fact is that frequency in and of itself doesn't equate to relevance.
The word the is the most commonly used English word: it appears with the highest frequency. If a search engine would calculate relevance as frequency, all documents in its index would have the as their topic subject.
Likewise the word time is the most commonly used English noun. This would make a multitude of documents relevant to time before anything else.
To make sense of word occurrences in a document a search engine has to see those words in the context of its index.
This is done by calculating the overall importance of words both in the document and in the index.
This importance is called term weight.
To calculate the importance of a word in a document, 3 variables are needed:
- local weight: a calculation based on keyword frequency in this document. This variable can be calculated in many ways but not as a straightforward count of how many times the word appears in the document.
- global weight: calculated based upon number of documents in index divided by number of documents with the keyword.
- normalization: a calculation designed to remove the unfair advantages and disadvantages of document length. Usually you work to express the end values between 0 and 1.
None of the search engines have ever disclosed which published or unpublished scales they use for local weight or global weight.
What we're looking to achieve is to get high values for terms (words/phrases) that occur a lot of times in the relevant documents but infrequently in the index as a whole.
Keyword Density Myth Summary
Search engines use term weight to rank documents by relevance.
Term weight is calculated from the result of two other calculations: local weight and global weight.
Without knowing the function used for local weight we can't calculate it -- but we do know that it's not just pure keyword frequency.
Without knowing the size of the index, the number of documents relevant to the term, and the function used for global weight we can't calculate it.
Using keyword density as a guesstimator of weight or relevance is therefore utterly useless. It's like giving you the height of a three dimensional object based on which you have to not only return its volume but also tell whether it is larger or smaller than any other unseen object in a collection you don't know about.
Hungry for more? I recommend The Keyword Density of Non-Sense.
21 thoughts on “How Search Really Works: The Keyword Density Myth”
Once again Ruud, you dispel the myths that a lot of beginners (including myself a few moons back) think equals SEO, and expand upon to show the true path.
I love your writing style. I wish I had your patience to lay out the fundamentals so well. I know I am partially responsible for having the appearance of jumping to conclusions and calling them “rules of thumb” but the most important factor “in my summation” I have found that translates into high ranking SERPs is global weight in conjunction with back link relevance and authority (that produce ranking rock stars for pages).
Thanks for breaking it down so well. Love the writing style.
Keyword Density = High Rankings. I am so sick of hearing this, nice to see a post explaining that it is a myth and maybe people will learn what it really takes to rank a website.
You are right, it is great for beginners and websites which provide all this keyword destiny tools just waste people time.
There are millions examples when new websites with small amount of backlinks got #1 positions for keywords which seems to be difficult. From another side, large companies whith popular websites can’t rank for specific keywords…
I always say that domain power and trustrank is the most important factors in SEO. Good reputation will give you great results over time and amazing average traffic from all search engines.
Good thinking man … And it’s very hard for me to like something …
Very good that you shed light on this for people. Thanks.
Of course KWD matters it doesn’t matter more than trust, and links/anchor text but of course it matters.
now of course every time I post an example some engineer comes along and kills it, even though it was working fine for months or years before.
lol, graywolf, good one 🙂
I have to agree with graywolf on this. KWD matters, but to what extent it matters is up in the air. I always explain SEO as hundreds of little things all done the right ways at the right times that help you get ranked.
If you do not put any thought into KWD for your campaigns what do you recommend for keyword content? How do you attack on page SEO in relation to keywords?
I suspect he was making a joke 🙂
From the field of information retrieval we know, don’t suspect but know, that keyword density cannot be used to rank documents according to relevance. Except for very basic in-classroom kind of search engines no-one knows of any type of commercial search engine using KD as a relevance factor.
As a non-relevance/spam factor it makes even less sense, increasing the number of calculations a search engine has to do.
Keyword frequence, keyword distribution, keyword distance, topic relevance, etc.: these all matter. But keyword *density*?
Take your favorite KD analyzer. Do 20 searches. See if the ranking matches the KD.
Nice post… Am trying a lot to get my pages listed in search engines. Even tried this keyword density stuff long back. It would be more helpful if you could explain in simple terms, what is it, that we should do to get our pages more relevant? In terms of content and articles how must we use keywords?
@cipher As the series progresses we’ll see that information come forward more and more.
Sounds really lame — but the best way to be “more relevant” is to *be* relevant. Seed & promote that and backlinks confirming and voting for that relevance come into play.
But if KWD effects local weight and global weight (which is basically the density of pages in the index that mention a given KW) how can you say that it is not important. It is not a main factor in ranking but is still part of the puzzle. You cannot say that if your KWD is too high that you will not be affected. You will be considered spammy and drop rankings. That being said, KWD is a factor, not only in the body but in the Title tag and in the code (such as alt and hyperlink title tags).
Your reasoning about the words “the” and “time” make sense but I feel that maybe Google is sophisticated enough to not use KWD for these types of terms. Rather it is used more for KW phrases.
You confuse, or mix up, keyword density (words:keywords ratio) and keyword frequency. Keyword frequency is part of the tf*IDF calculation: keyword density isn’t.
I’ve held on to the idea of keyword density as a spam measure for a while but Dr. E. Garcia does an amazingly eloquent job of dispelling that notion in Keyword Density Myth – The Devil’s Advocate and Keyword Density (KD): Revisiting an SEO Myth.
With term weighting it’s easy to see the where a search engine gets its values from. Using keyword density as a relevance measure (or spam measure), where does it get its values from? How would it come up with x% is relevant, y% is not and z% is spam? If these are absolute values, how does that relate to long/short content? If these are variable — again, where do those numbers come from?
As for keyword *phrases* read the paragraphs about linearization and “burning the trees” on Garcia’s article (link).
Quote: “Two term sequences illustrate the point: “Find Information About Food on sale!” and “Clients Visit our Partners”. This state of the content is probably hidden from the untrained eyes of average users. Clearly, linearization has a detrimental effect on keyword positioning, proximity, distribution and on the effective content to be “judged” and scored. The effect worsens as more nested tables and html tags are used, to the point that after linearization content perceived as meritorious by a human can be interpreted as plain garbage by a search engine. Thus, computing localized KD values is a futile exercise.“
so what if I was to show a blog post where all of the text was a 4 word phrase repeated 10 times. What if I was also to tell you these keywords were part of an seo contest. What if I was to tell you my page wasn’t created until after the contest was over, with no active link building on my part (I get scraped to death). What if I was to tell you this page ranked for it’s term until I mention it and some engineer comes along and killed it … twice.
Of course google should be “smart” enough to catch this but they aren’t
Graywolf, excellent example of Google not using KD as an anti-spam measure. Or as a relevance measure.
On the whole I think Google errs on the side of safety. They can do more to remove a lot of possible noise but then you end up throwing out way too much good stuff too.
“smart” is as smart does. With a clean database it’s already hard enough to return and rank *really* relevant data. With a tainted set like Google’s… wow.
Your article is nice with very usefull information but if it possible then please can you tell me meaning of normalization and how it will be count
:O you’ll be telling us that the meta keywords tag doesn’t mean anything next! 😉
Great article, I get so sick of explaining why keyword density would be a useless measure of a page’s content even if it were to be used. SEs are so much smarter than that these days.
Glad you found it helpful!
Ruud, you’re exactly on point as usual!
Keyword density is phrenology!
Very nicely explained and detailed post. Thanks a lot Mr. Ruud.
Search engines have always improved themselves by learning from their previous mistakes and data collected.
Comments are closed.