Are you losing system logging information (and don't know it)?
(This is a guest diary submitted by Bill Parker)
How many administrators review log files in /var/log/*, but don't realize they may be losing possibly important (or even critical) information?
In working with a commonly used IDS (Snort 2.9.x) on one of my test platforms (CentOS 6.3 64-bit inside of VirtualBox 4.2.6), I happened to notice a unusual line in /var/log/messages when snort initialized via startup script in /etc/init.d:
Feb 5 13:07:52 plugh snort[12105]: Non-RFC Compliant Characters: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
Feb 5 13:07:52 plugh snort[12105]: Whitespace Characters: 0x09 0x0b 0x0c 0x0d
Feb 5 13:07:52 plugh snort[12105]: rpc_decode arguments:
Feb 5 13:07:52 plugh snort[12105]: Ports to decode RPC on: 111 32770 32771 32772 32773 32774 32775 32776 32777 32778 32779
Feb 5 13:07:52 plugh snort[12105]: alert_fragments: INACTIVE
Feb 5 13:07:52 plugh snort[12105]: alert_large_fragments: INACTIVE
Feb 5 13:07:52 plugh snort[12105]: alert_incomplete: INACTIVE
Feb 5 13:07:52 plugh snort[12105]: alert_multiple_requests: INACTIVE
Feb 5 13:07:52 plugh snort[12105]: FTPTelnet Config:
Feb 5 13:07:52 plugh snort[12105]: GLOBAL CONFIG
Feb 5 13:07:52 plugh rsyslogd-2177: imuxsock begins to drop messages from pid 12105 due to rate-limiting <- LOOK HERE
Feb 5 13:07:57 plugh snort[12106]: Daemon initialized, signaled parent pid: 12105
Feb 5 13:07:57 plugh snort[12106]: Reload thread starting...
Feb 5 13:07:57 plugh snort[12106]: Reload thread started, thread 0x7f4039a37700 (12106)
Feb 5 13:07:57 plugh kernel: device eth0 entered promiscuous mode
Feb 5 13:07:57 plugh snort[12106]: Decoding Ethernet
Feb 5 13:07:57 plugh snort[12106]: Checking PID path...
Feb 5 13:07:57 plugh snort[12106]: PID path stat checked out ok, PID path set to /var/run/
Feb 5 13:07:57 plugh snort[12106]: Writing PID "12106" to file "/var/run//snort_eth0.pid"
Feb 5 13:07:57 plugh snort[12106]: Set gid to 40000
Feb 5 13:07:57 plugh snort[12106]: Set uid to 40000
Feb 5 13:07:57 plugh snort[12106]:
Feb 5 13:07:57 plugh snort[12106]: --== Initialization Complete ==--
Feb 5 13:07:57 plugh snort[12106]: Commencing packet processing (pid=12106)
It turns out that many modern Linux distributions come with 'rsyslog', which is a replacement for 'syslogd' or 'sysklogd', but starting with version 5.7.1 of rsyslog, a feature known as rate-limiting was added to the utility, and if a given process ID (PID) were to send more than 200 messages to /var/log/messages in a 5 second interval (the default setting in rsyslog), it will start to drop messages and place the following warning inside of /var/log/messages:
Feb 5 13:07:52 plugh rsyslogd-2177: imuxsock begins to drop messages
from pid 12105 due to rate-limiting
In the case of daemons or processes logging to /var/log/messages (or any other directory/file which rsyslog happens to be handling logging for), a great deal of important and/or critical logging data could be lost to security or system administrators.
While rate-limiting on routers/firewalls/web servers is a useful method of containing certain types of network based attacks, in the case of system and/or application logging, this may create a logistical nightmare for SIEM's or applications which collect and analyze large amounts of system and/or application logs for event information/messages/warnings.
In doing some research on rsyslog, I found two solutions which can be used to solve this condition on systems where rsyslog is the default system logging method.
Note - Back up any file(s) listed below before proceeding!
The first solution is to simply increase the messages allowed and the time interval before rate-limiting occurs in rsyslog. To do this, locate the rsyslog.conf and/or rsyslog.early.conf (usually in /etc) and add the following lines:
$SystemLogRateLimitInterval 10
$SystemLogRateLimitBurst 500
after any ModLoad commands in rsyslog.conf and/or rsyslog.early.conf
This will tell rsyslog to start rate-limiting (discarding messages) when more than 500 messages from a single PID are received within a 10 second interval (these numbers are not absolutes, they can be tailored to any given system, btw).
The second solution is to simply turn off rate-limiting for rsyslog, and to do this, add the following line to rsyslog.early.conf and/or rsyslog.conf using your favorite editor (Iím a vi/vim/gvim hound):
$SystemLogRateLimitInterval 0
after any ModLoad commands in rsyslog.conf and/or rsyslog.early.conf
This will disable any rate-limiting in effect for the rsyslog process running on this system. Note that by doing this, an out of control process ID on your system can fill up /var/log/messages with a lot of useless messages (which is why rate-limiting is enabled by default in rsyslog).
Remember to stop/start or restart the rsyslog daemon in order to make the changes to rsyslog.conf and/or rsyslog.early.conf take effect.
The following Linux systems use 'rsyslog' as the default system logger (these are distributions which I am actively using, btw):
CentOS 6.x
Debian 5.0 or greater
Fedora 13 or greater
OpenSuSE 11.x/12.x
Ubuntu 10.0 or greater
BSD based systems (FreeBSD 8.x/9.0, OpenBSD 5.x, and NetBSD 5.x/6.0) use traditional syslogd as the default system logging utility.
If you need more information about rsyslog, you can visit the following URL:
http://www.rsyslog.com/doc
Questions/Comments/Suggestions?
Bill Parker (wp02855 at gmail dot com)
Comments
The text below is what Mr. Rollinson sent me:
Part of your recommendation to reduce the risk of losing logging information uses incorrect logic. As you stated, if the "Burst Limit" is 500 and the "Interval Limit" is 10, then more than 200 messages in a 5 second period will trigger rate limiting. This is equivalent to an average rate of 40 messages/second. If we followed you first recommendation and increased the "Burst Limit" AND the "Interval Limit," we might do something like set the "Burst Limit" to 250 and the "Burst Interval" to 10 which would cause the rate limit to be triggered if we received more than 250 messages in a 10 second period. This is equivalent to an average of 25 messages/second.
This means that if something triggered 100 messages/second for a 10 second time interval, the default would discard approximately 600 messages (depending on the exact implementation) but the "improved" method could discard approximately 750 messages. The proper way to reduce the risk of discarding messages would be to INCREASE the "Burst Limit" and/or DECREASE the "Interval Limit." This would guarantee that the message rate that triggers the limit has increased.
I didn't consider the issue of the values when examining this issue, though the second solution in the article (turning off rate-limiting completely) would solve the problem faced in the first solution.
Bill
Feb 6th 2013
1 decade ago