The DShield database was running a bit "hot" earlier today, so I took a closer look at the web log and found that one particular "search engine" was indexing the site rather aggressively:
At first, I thought "oh well, its google". But looking at the user agent string closer, reveals some subtle differences. This is a Google search appliance, not the uber-google-bot we all love. The regular Google bot looks like this:
I have seen similar cases a few times now. While this one was not malicious, in some cases attacks used google's (or other search engine) user agent strings. I can only assume that this is an attempt to fit in better, and maybe retrieve a search engine version of the page. If anybody knows a good reference where to find IP address ranges used by certain search engines: let us know.
(and btw... if you need bulk data access to dshield data: Please ask. Spidering the site is just not very efficient and you will run into some anti-harvesting traps sending you in circles)
Nov 9th 2007
1 decade ago