Site icon Search Engine People Blog

Using Excel Formulas To Group Your Google Analytics Long Tail Keywords Into Themes

We all know that long tail keywords can be as valuable, if not more so, than one or two word phrases, right? It is easy to start to get traffic from them while building for something more competitive, they are a perfect fit for content strategies and as most SEM professionals would tell you, they are great indicators of consumer intent. Individually though they don't send a lot of traffic. This can make it a little harder to analyse their impact compared to high volume phrases. Especially when you treat them as individual keywords.

Of Landing Pages and Themes

Fortunately long tail keywords tend to match the subject of the landing page, and can also be grouped by common terms. Assuming the content is tightly themed and the searches it appears for are relatively focused, treating the landing page's organic traffic on aggregate can be a viable strategy. This approach will leave out a lot of detail, such as qualifying terms that indicate different intents from the user. However (Not Provided) won't cause any issues with this model, as it is assumed that they are much the same as the other search terms sending traffic.

In practice, this model can be used to assess the effectiveness of a Long Tail (or Content) strategy by what areas of the site are gaining traffic, how productive it is and which distinct query group it receives traffic from. While it is less affected by (Not Provided) traffic, it is also overly broad and does not provide insights for future optimisation work, and it really one works if each page ranks for one theme, such as "Red Thingy's". Pages that get traffic for "Blue Thingy's" as well will require more work.

Themes, Entities and Intent

Wagging the long tail

There is a difference between a search for a "Red Thingy" and "Adjective Red Thingy", or "Location Red Thingy", not to mention the difference between "Blue Thingy" and "Red Thingy". Each phrase can signal a different intent on the part of the user, and as a result, probably should be treated as a part of a distinct group, based on it's attributes.

A search term has a number of attributes including the number of words, subject or entity it refers to, stop words and descriptive terms that put it into context or communicate the searcher's intent. What these will vary from industry to industry, but they often include words like the city or state's name, adjectives like "cheap", alternative descriptive terms and verbs.

Unfortunately natural language processing tools are not as widely available as we would like and frankly word clouds just do not cut it. This leaves us having to find some kind of a workable kludge to make sense of the information Google Analytics provides. In this case, it is going to be Excel's formulas. A sample spreadsheet is provided at the end of this post.

Once you have exported the data you need from Google Analytics, creating a word count column is easy:

=IFERROR((LEN(A2)-LEN(SUBSTITUTE(A2," ","")))+1,0)
This formula displays word count by adding one to the number of spaces in the phrase.

From here however it becomes a little more involved. You will need to create a number of arrays of words. Each array should be of words that rarely appear together in a search phrase and relate to one subject area, such as location or product. Excel supports array formulas such as the one below. This will search a cell (A2) for one of a list of words (I$2:I$3) and add it to the cell the formula is in. The formula is not perfect and if it finds more than one word from the list in the targeted cell, will return the one lower down the list.

The formula finds the row number of the matching word from the keyword and with the array set just after INDEX to display the term it found. If you want to group similar words together or account for variant spellings, simply use two columns, with one of the words you want to find and the other as what you want displayed for them. Change the column reference at the start of the INDEX formula to match to the second column.

{=INDEX(I$2:I$3,MAX(IF(ISERROR(FIND(I$2:I$3,A2)),-1,1)*(ROW(I$2:I$3)-ROW(I$2)+1)))}
This formula will preference the last item in the array it is searching from. It is an array formula and after editing it in a cell, you must click CTRL+SHIFT+ENTER.

As per the attached screenshot, this will give you a few columns labelling each keyword with a term from one of the arrays you create. Using CONCATENATE to merge more than one of these columns into a new label can be useful for creating labels for classifying keywords and creating groups of aggregated data for related terms.

While this approach does not allow for (Not Provided) it makes it relatively easy to define query groups by common shared words and possibly even user intent. While it can be hard to analyse long tail keywords individually, grouping them can make this easier and applying the same rules over time simplifies tracking the effectiveness of your long tail strategy.

Taking it Further

There is a limit to how small each group can be. The less data available to determine the distinctiveness of a group, the more likely that its differences from another a merely the result of chance. There are a number of different ways to check top see if the query groups developed for analysis this way actually do represent distinctive behaviours, the Chi square test being one of them.

Download the example spreadsheet