Reader's tip of the day: ratios vs. raw counts
Today, I'd like to present another of our tips of the day (see the whole series here). This one provided by one of our faithful readers, Dai Morgan, in response to my log analysis stories from last month. Here is an excerpt from the e-mail we received:
---------------------------
Jim Clausing, jclausing --at-- isc dot sans dot org
0 comment(s)
I've recently been dealing with a harvesting incident and needed to identify IP addresses which were running scripts against a web site. If you just look at the high talkers then big customers and gateways can be as big as the bad guys. After some work I found it was useful to look at the ratio of URLs to hits. Normal users hit a wide variety of pages, but the scripts just churn round and round at the same URLs.
Using perl it's easy to pull the source IP address and the URL as you loop through the web server logs. To analyse this data it needs to be loaded into a hash of hashes to keep a count of urls per ip address.
$hash{$ip}{$url}++;
When you've finished the log file loop, start another loop through the hash , you can get the url count as follows
my $url_count = (keys (%{hash{$ip}}));
Then it is just a matter of dividing number of hits by the url count. The bad guys have a higher ratio than normal users. Each site will have slightly different characteristics, so some degree of local tuning will be required. It also helps to strip out any in URL tokens, either in the perl or externally via a 'grep -v'. (or sed/awk, JAC)
I think this technique has other applications, for example looking at signon success and failures. Its also possible to produce summary data of IDS data.
I thought it was an excellent observation that the ratio rather than raw number of hits might provide some very useful data. Dai has provided the script and you can see it here. Dai's explanation and usage docs are here and here, respectively. The explanation doc goes into a lot of detail on what the Perl is actually doing which is quite educational if you aren't a Perl guru. Dai, thanks for sharing the tip and script with our readers.Using perl it's easy to pull the source IP address and the URL as you loop through the web server logs. To analyse this data it needs to be loaded into a hash of hashes to keep a count of urls per ip address.
$hash{$ip}{$url}++;
When you've finished the log file loop, start another loop through the hash , you can get the url count as follows
my $url_count = (keys (%{hash{$ip}}));
Then it is just a matter of dividing number of hits by the url count. The bad guys have a higher ratio than normal users. Each site will have slightly different characteristics, so some degree of local tuning will be required. It also helps to strip out any in URL tokens, either in the perl or externally via a 'grep -v'. (or sed/awk, JAC)
I think this technique has other applications, for example looking at signon success and failures. Its also possible to produce summary data of IDS data.
---------------------------
Jim Clausing, jclausing --at-- isc dot sans dot org
My next class:
LINUX Incident Response and Threat Hunting | Online | Japan Standard Time | Oct 21st - Oct 26th 2024 |
×
Diary Archives
Comments