Thursday, August 31, 2006

How to Cloak Page Content from Seach Engine Spiders

Cloaking is the term used to describe showing different content depending on who your visitor is. Don't confuse cloaking with hidden text; hidden text is hiding it from view, cloaking is not even loading it into the page.

There are some legitimate uses of cloaking, like disabling features that might not be cross browser compatible, or showing different news headlines based on the geo-location of the visitor to your web site.

The most reknowned use of cloaking, however, is in relation to attempts to manipulate a web sites ranking in search engine results by showing the search engine spiders one thing (usually highly optimized keyword content) and showing visitors something else. Webmasters might add or not show content in the hopes that their site will rank better in organic results.

Here's a fairly simple way to cloak content so it gets seen by search engine spiders but not by human visitors. You might, for example, place this snippet at the bottom of the page. Or you can see my post in hiding web site content and put the cloaked content in a div that is styled to keep it hidden from view:
<%
Dim strUA, vShowIt

strUA = LCase(request.servervariables("HTTP_USER_AGENT"))

vShowIt = 1
If InStr(strUA,"google") <> 0 Then vShowIt = 0
If InStr(strUA,"msnbot") <> 0 Then vShowIt = 0
If InStr(strUA,"yahoo") <> 0 Then vShowIt = 0
If InStr(strUA,"inktomi") <> 0 Then vShowIt = 0
If InStr(strUA,"snapbot") <> 0 Then vShowIt = 0
If InStr(strUA,"irlbot") <> 0 Then vShowIt = 0
If InStr(strUA,"turnitinbot") <> 0 Then vShowIt = 0
If InStr(strUA,"cjnetwork") <> 0 Then vShowIt = 0
If InStr(strUA,"myfamilybot") <> 0 Then vShowIt = 0
If InStr(strUA,"geniebot") <> 0 Then vShowIt = 0
If InStr(strUA,"wiki") <> 0 Then vShowIt = 0

If vShowIt = 0 Then
%>
Cloaked Content Here
<%
End If
%>
So if vShowIt = 0, that means the visitor was one of the bots you checked for, and you want to show the content. You could reverse it as welll by checking to see if vShowIt = 1, you could hide content from search engine spiders. This could be useful for putting links on apage that you want human visitors to see, but not search engines.

A more advanced way of cloaking is to cloak based on IP. There are service you can subscribe to that will provide you wil updated lists of all the IPs search engine spiders originate from. Cloaking based on those IPs is more reliable than simply checking the User Agent name.

Now you're probably asking, "How can someone detect cloaked content?"

The search engine sees the cloaked content, so if the webmaster didn't hide it as well, you can see the cloaked content when you view a cached snapshot of the page. This is why some webmasters use the "no-cache" parameter in the meta tags to keep the page from being cached.

Another way is to use a browser that lets you impersonate a different User Agent. WANNABrowser lets you do just that. Type Google or MSN or Yahoo in the HTTP User Agent field and then the url of the page you want to view.

Cloaking had it's heyday when page content was the main driver of search engine rank. Although it can still be useful, for both legitimate and shady purposes, it's not nearly the tool it used to be for driving major ranking changes in search engine results.

No comments: