Warning: Many of the links in this post might be considered unsafe for work due to profane language and, in at least one case, "hate speech." Don’t click on them if your company has restrictions on Internet use or if you’re easily offended.
One of the features that, as best I can recall, Google pioneered in the search engine world is showing your search terms in context of results by including a brief excerpt of the article below the title. Prior to that, in the dark ages of Internet search, you had to click on the link to see this information.
But it appears that Google is "quoting" content not actually on the page in at least some cases. It is no longer safe to presume that the excerpt from the page in Google’s search results has ever actually appeared in the page itself.
On to the particuars. A few years back, sex columnist Dan Savage started a campaign (NSFW) to make U.S. Senator Rick Santorum’s last name a slang term for something the senator himself would probably find unspeakable. Supporters of the effort Google-bombed Santorum’s name, and Savage’s site has been at the top of the Google search results for "Santorum" (NSFW) for a couple years now.
The homepage of Savage’s site (NSFW) has a mock-dictionary excerpt with the word "santorum" followed by its "new" definition. The Senator himself is listed as "definition #2." You can then click through to see the site itself. Now the "splash page" (no pun intended!) is just a graphic with no alt text — Savage should spank his web designer! — and the text of the graphic appears only in a META CONTENT header, not in the BODY of the page. But Google never had a problem figuring this out in the past, and had displayed the CONTENT text in its excerpt in the search results.
Now, however, you see some text that does not come from Savage’s site at all. Instead it appears to come from the Open Directory project’s political humor page (NSFW).
I can make a couple of guesses as to why Google does this. One guess is that since the BODY of the page contains no actual text Google is picking a description from the Open Directory as a description. However, the search term is highligheted within the "excerpt," so this is dangerously misleading.
The other guess is that since this is a well-known instance of "Google-bombs," Google has intentionally changed the text. So I tried a couple of other well-known Google bombs (and I’ll limit the discussion here to those which still "work": "miserable failure" returns the White House biorgraphy of George W. Bush as the #1 result plus Jimmy Carter and Michael Moore in slots #2 and 3, reflecting a "counter-Google-bombing campaign. Now this is a bit different from the "Santorum" search as the pages returned don’t contain the search terms. Nevertheless, in the "Bush" and "Carter" results we once again see text not present in the actual articles, even though they, unlike the Dan Savage site, have quite a bit of text which Google could use for an excerpt. Once again, the text appears to come from the Open Directory. With the "Moore" result, however, things are different. For one thing, the term "failure" actually appears on Moore’s site. And the excerpt Google displays does come from Moore’s home page.
By comparison, look at (or, better still, don’t look at) the search results for "Jew" (linked result NSFW). This is a particularly noxious case of Google bombing by a white supremicist group. In this case Google displays an "offensive search results" box at the top of the page. The excerpt itself looks like it might come from the actual page, but, sorry, I can’t bring myself to click on it to find out. But I note that the term "Jew" doesn’t appear in the excerpt (i.e., there’s no highlighted word in the excerpt), so it appears that Google has selected an excerpt from the site itself even though it couldn’t find one containing the search term (unlike the Bush and Carter matches noted above, where a description of the page from the Open Directory was used)
So the long and the sort of it is don’t presume that the "excerpt" on Google’s search results has ever actually appeared on the page itself. With no warning or indication of when or how often Google does this, we can’t ever really be sure without looking.
One last note: Google’s result for "Google" also includes an "excerpt" not in the page source, and again it comes from the Open Directory. Now, obviously "www.google.com" is a special case, and it’s another example of a page which doesn’t contain a lot of text, although it does contain the word "Google."
Post a Comment