The DougWelcome! Thanks for visiting!

Subscribe to the full feed

Match and Game! Google Automatic Keyword Match

by The Doug.

So there I was, minding my own business, when all of a sudden I had a conversion on one of my campaigns (ALWAYS exciting) but I couldn’t attribute it to any of the keywords in the adgroup it was showing up for.  “Well!  How can THAT be?!”, says I.

After running a search query to investigate further, I found the little miracle keyword that so silently and wonderfully threw a conversion my way.  I found it, and it was not a broad, exact, or even phrase match.  It was Automatic.  That’s right!  It’s not an option that you just choose like the others, but in fact a beta that Google inserted on various accounts with the “you need to do nothing in order for this feature to be turned on” type of message.

Hey, Yo Soy Chévere

by Martha.

Esta es una traducción al español de: Dude, I’m Phaaaaaat! escrito por Jennifer Osborne.

Cuando era niña, fuí entrenada a hacer búsquedas por categorías a través del sistema decimal Dewey. Mas tarde, a medida que fui creciendo, las páginas amarillas reforzaron este tipo de búsqueda por categoría.

Cuando necesitaba un corte de cabello, dejaba que mis dedos caminen y buscaba en Salones de Belleza. Si es que necesitaba perder algunas libras, buscaba en perder peso.

Y yo no soy la única.

Ruud Hein

How Search Really Works: Grabbing Most Red M&M’s

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Previously: Relevance (2)

Instead of painstakingly grabbing the absolute best matches for your query to then rank those with infinite precision, one time saving strategy has search engines go for “close enough”.

Painstaking Precision

sorted-mm

Given all the time, money and resources in the world, here’s what we’d normally do.

Word by word you go through a search. You look in your documents and see which has word one…. word two… word three…. You get the picture.

Ruud Hein

How Search Really Works: Relevance (2) - Vector Space

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Previously: Relevance (1)

Another way we can assess the relevance of a document is by term weighting.

From the keyword density myth we know that true term weighting is done collection wide.

By looking at the number of documents in the index that a term appears in we can make a measurement of information: how good, how special… how meaningful is this word?

The word the would not be special at all, appearing in way too many documents. Its worth would be close to zero.

Ruud Hein

How Search Really Works: Relevance (1)

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Previously: Simple Query Optimization.

Search is always boolean: yes or no. True or false.

Either the words are in the document or not.

boolean-search

But as you see, not all documents are “born alike”. Some are about our topic, some just mention it.

What we need, what we want, is not just a big list of results — we want a relevant list of results, preferably sorted so that the best bet appears on top.

Ruud Hein

How Search Really Works: Simple Query Optimization

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Last week: The Compressed Index.

While human beings can scan a page and see if the whole phrase "a grandiloquent dictionary" appears on it, a search engine can’t.

A search engine needs to:

  1. Lookup the occurrences for each word in the phrase
  2. See if the positions of words in the document fit the phrase

As a search engine isn’t smart it needs to work smart.

Leverage Keyword Frequency

sort-by-frequency 

Ruud Hein

How Search Really Works: The Compressed Index

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Last week: Recognize this index?

Memory is much faster than looking things up.

In order for a search engine in high demand to serve its users efficiently it should keep things in memory instead of looking it up on a disk.

Traditionally large scale search engines will keep their complete dictionary in memory and the posting list on disk.

dictionary-in-memory-postings-on-disk

Inefficient Storage

Obviously the more you can keep in memory and the more information can be read back with one disk action, the better.

Ruud Hein

How Search Really Works: Recognize This Index?

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Last week: "The" Index (2).

Oversimplified: we have at least a few pages in our index, have extracted every single word from those pages and have written down in an index where in which pages those words occur.

Want to talk numbers? We have some very precise ones for the English language.

Google says;

"We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times."

Ruud Hein

How Search Really Works: "The" Index (2)

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Last week: "The" Index (1).

Last week we saw how an inverted index (where a list of words points to a list of documents in which they appear) is insanely useful for doing AND queries.

inverted index

But what if you’re not looking for any document that has the words search AND people AND engine but you’re looking for Search Engine People?

Well, if document 42 in our example reads "the engine was found after a search by some people" or "people use a search engine such as Google" than a traditional inverted index would think it’s spot-on for your search. Ai….

Ruud Hein

How Search Really Works: "The" Index (1)

by Ruud Hein.

This post is part of an ongoing series: How Search Really Works.
Previous Instalment: The Keyword Density Myth.

If a search engine would search "live" through the documents it knows about for the occurrence of the word we’re looking for it could take its time and then simply report where it found our word.

In this example our search engine has only one index: the documents itself.

 document-only-index

However, time is something a search engine doesn’t have; the query needs to be answered now.

What we need is a real index!