Thursday, October 26, 2006

How to Filter out Bad Bots in ASP with Global.asa

Bad bots? What are those? And why should I keep them out of my web site?

Glad you asked....

Do a search for "bad bot" and you'll find various lists, most written by web site owners who've noticed some strange behavior among the many bots visiting their sites. Although the The definition of a "bad bot" would vary depending on who you ask, there are various behaviors that can be/are considered bad:

not reading robots.txt
disobying robots.txt
requesting too many pages in a short time span
revisiting pages too often
e-mail harvesting
Guestbook spamming
Log Spamming
munging URL's
scraping content
and more....

Trying to use robots.txt for bad bots that either don't read robots.txt or disobey it won't work, so you'll have to use other methods against them. Although some bad bots change both their User Agent and IP address, using one or the other (or both) to ban these bots remains an easily implemented solution. If you want more advanced methods, do a search for "robot trap".

Using Gloabl.asa to Stop Bad Bots

Global.asa is an optional file that can contain declarations of objects, variables, and methods that can be accessed by every page in an ASP application. All valid browser scripts (JavaScript, VBScript, JScript, PerlScript, etc.) can be used within Global.asa.

The Global.asa file can contain only the following
  • Application events
  • Session events
  • <object> declarations
  • TypeLibrary declarations
  • the #include directive
Note: The Global.asa file must be stored in the root directory of the ASP application, and each application can only have one Global.asa file.

This isn't an artivle about Globals.asa, so I'm not going to go into depth here. Suffice to say that in Global.asa there's a section reserved for Session_OnStart. Session_OnStart occurs EVERY time a NEW visitor (including a bot) requests his or her (ot its) first page in the ASP application.

That means you can have a bit of code executed everytime a new visitors arrives on your site, no matter what the entry page.

Here's an example of a global.asa file that will direct bad bots, detected by user name, to a file named "badbot.html". The names here are actual bad bots I've found poking around my various sites. If you do a search for "bad bot lists", you'll find plenty of resources to help you identify bad bots.
<script language="vbscript" runat="server">
sub Session_OnStart
dim vUserAgent, vBad

vUserAgent = Request.ServerVariables("HTTP_USER_AGENT")
vBad = 0

Select Case vUserAgent
Case "larbin_2.6.3 larbin2.6.3@unspecified.mail"
vBad=1
Case "larbin_test nobody@airmail.etn"
vBad=1
Case "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS agent"
vBad=1
Case "Missigua Locator 1.9"
vBad=1
Case "EmeraldShield.com WebBot"
vBad=1
Case "<unknown user agent>"
vBad=1
Case "Web Downloader/6.5"
vBad=1
Case "Xenu Link Sleuth 1.2f"
vBad=1
Case "Zeus 46694 Webster Pro V2.9 Win32"
vBad=1
Case "LMQueueBot/0.1"
vBad=1
Case "Zeus 28879 Webster Pro V2.9 Win32"
vBad=1
Case "Offline Explorer/2.1"
vBad=1
Case "HTMLParser/1.4"
vBad=1
Case "Zeus 94377 Webster Pro V2.9 Win32"
vBad=1
Case "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; but first you must bring me a shrubbery)"
vBad=1
Case "W3CRobot/5.4.0 libwww/5.4.0"
vBad=1
Case "updated/0.1beta (updated.com; http://www.updated.com; crawler@updated.com)"
vBad=1
Case "mogren_5.4.1 mogrn@mail.ru"
vBad=1
Case "Holmes/1.0"
vBad=1
Case "Ken"
vBad=1
Case "Cuasarbot/0.9b http://www.cuasar.com/spider_beta/"
vBad=1
Case "EmailSiphon"'2/22
vBad=1
Case "Java/1.4.1_04"
vBad=1
Case "OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Rentals Crawler"
vBad=1
Case "BigCliqueBOT/1.03-dev (bigclicbot; http://www.bigclique.com; bot@bigclique.com)"
vBad=1
Case "combine/0.0"
vBad=1
Case "Avant Browser (http://www.avantbrowser.com)"
vBad=1
Case "Anonymous/Password"
vBad=1
Case "Mozilla/4.0 pradipjadav@gmail"
vBad=1
Case "aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)"
vBad=1
Case "abot/0.1 (abot; http://www.abot.com; abot@abot.com)"'4/8
vBad=1
Case "OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Boats Crawler"
vBad=1
Case "AtlocalBot/1.1 +(http://www.atlocal.com/local-web-site-owner.html)"
vBad=1
Case "Java/1.4.2_04"
vBad=1
Case "Web Downloader/6.3"
vBad=1
Case "sbSrer33n qeeyuSrSy hna"
vBad=1
Case "Java/1.4.2_06"
vBad=1
Case "MVAClient"
vBad=1
Case "4"
vBad=1
Case "versus crawler eda.baykan@epfl.ch"
vBad=1
Case "telnet0.1 noone@example.org"
vBad=1
Case "ssquidagent pradipjadav@gmail"
vBad=1
Case "Zeus 83206 Webster Pro V2.9 Win32"
vBad=1
Case "ichiro/1.0 (ichiro@nttr.co.jp) 1 1 0.12%"
vBad=1
Case "larbin_2.6.3 larbin2.6.3@unspecified.mail"
vBad=1
Case "larbin_2.6.3 (larbin2.6.3@unspecified.mail)"
vBad=1
Case "MediaMirror (0.1a)"
vBad=1
Case "Missigua Locator 1.9"
vBad=1
Case "test/0.1"
vBad=1
Case "larbin_2.6.3 (wgao@genieknows.com)"
vBad=1
Case "larbin_2.6.3 wgao@genieknows.com"
vBad=1
Case "Java/1.5.0_02"
vBad=1
Case "noxtrumbot/1.0 (crawler@noxtrum.com)"
vBad=1
Case "versus crawler eda.baykan@epfl.ch"
vBad=1
Case "updated/0.1beta (updated.com; http://www.updated.com; crawler@updated.com)"
vBad=1
Case "0.1 noone@example.org"
vBad=1
Case "Mozilla/4.0 (compatible; BorderManager 3.0)"
vBad=1
Case "Missigua Locator 1.9"
vBad=1
Case "BigCliqueBOT/1.03-dev (bigclicbot; http://www.bigclique.com; bot@bigclique.com)"
vBad=1
Case "POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)"
vBad=1
Case "RPT-HTTPClient/0.3-3"
vBad=1
Case "SeznamBot/1.0 ( http://fulltext.seznam.cz/)"
vBad=1
Case "Zeus 2339 Webster Pro V2.9 Win32"
vBad=1
Case "telnet0.1 (noone@example.org)"
vBad=1
Case "Wget"
vBad=1
Case "Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0)"
vBad=1
Case "HLoader"
vBad=1
Case "Java/1.4.1_04"
vBad=1
End Select

If vBad=1 then
Session.Abandon
Response.Redirect("http://www.no-where-at-all-in-the-universe.com/")
Response.end
End If
end sub
</script>
To use this, simply create a file named global.asa, paste in the above script, and save it in the root of your web site. Any bot with a user name matching any of the above will be sent to the non-existance web site "http://www.no-where-at-all-in-the-universe.com". You can use the same bad bots I'm using or build your own list.

1 comment:

Prefabrik said...

page_test larbin2.6.3@unspecified.mail
this bot is very bad :[ I want to disable it..