Threat Level: green Handler on Duty: Guy Bruneau

SANS ISC: Search engines that are no search engines SANS ISC InfoSec Forums

Watch ISC TV. Great for NOCs, SOCs and Living Rooms:

Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
Search engines that are no search engines
The DShield database was running a bit "hot" earlier today, so I took a closer look at the web log and found that one particular "search engine" was indexing the site rather aggressively:

a.b.c.d - - [09/Nov/2007:15:24:35 +0000] "GET /portreportascii.html?date=2007-11-09 HTTP/1.0" 200 500572 "-" "gsa-crawler (Enterprise; S5-FTNF3BWZPUJAS;" "-"

At first, I thought "oh well, its google". But looking at the user agent string closer, reveals some subtle differences. This is a Google search appliance, not the uber-google-bot we all love. The regular Google bot looks like this: - - [09/Nov/2007:15:24:37 +0000] "GET /date.html?port=47109&date=2007-10-25 HTTP/1.1" 200 7538 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" "-"

I have seen similar cases a few times now. While this one was not malicious, in some cases attacks used google's (or other search engine) user agent strings. I can only assume that this is an attempt to fit in better, and maybe retrieve a search engine version of the page. If anybody knows a good reference where to find IP address ranges used by certain search engines: let us know.

(and btw... if you need bulk data access to dshield data: Please ask. Spidering the site is just not very efficient and you will run into some anti-harvesting traps sending you in circles)

Johannes B. Ullrich
Chief Research Officer, SANS Technology Institute
I will be teaching next: Defending Web Applications Security Essentials - SANS Cyber Defense Initiative 2021


4301 Posts
ISC Handler
Nov 9th 2007

Sign Up for Free or Log In to start participating in the conversation!