Dealing with images in your spam

Published: 2007-01-15
Last Updated: 2007-01-16 13:40:49 UTC
by Jim Clausing (Version: 2)
0 comment(s)
A few years ago, I finally purchased my own domain and set up a small mail server running on a Linux box.  This server is used to host a couple of small mailing lists I run and to handle e-mail for my immediate family.  I have a total of 8 or 9 e-mail addresses that end up being forwarded to my server to end up in my inbox there.  I also have fetchmail running to pop e-mail for the entire family and deliver it to their mailboxes that I then serve up on my home network via imap (actually imaps, but that doesn't make any difference for this discussion).

I had administered rather large sendmail installations in the past and actually got to be quite good at, but for my server at home I decided I wanted to dig into postfix a little more.  At this point in time, I had not done anything overly complex in postfix, though I had built and packaged it for some Solaris servers/workstations that were under my administrative control at the time.

Anyway, once I got postfix set up initially, I knew I'd need to add anti-virus and anti-spam.  Fortunately, my Linux distro came with amavisd-new.  I'm not going to go through all of the details and settings I played with to reach something I was initially comfortable with, but there are a number of  'how-to' type documents at [1], that tell how to set-up Postfix with amavisd-new, spamassassin, and your choice of anti-virus.  Since my server is relatively lightly loaded and only has a few users who receive e-mail (my various addresses --used for subscribing to mailing lists, etc.-- account for well over 95% of all the e-mail received by this machine), I decided to cover all my bases.  I now run 5 different anti-virus packages (several free, the rest cheap for a simple home setup).  Eventually, I decided I needed to play with some of the more advanced features/options of Postfix so I got a couple of books.  My favorite (though, by no means the only good one), was [2] and I anxiously await the second edition which they are apparently working on.  As, a result of reading the book, I found Ralf's blog on Amazon, too [3].  I'll come back to that in a minute.

As I tried to tune spamassassin, I found the SARE (SpamAssassin Rules Emporium) at [4], which has a bunch of useful rules, some of which I've added to my collection.  I also wrote a few of my own and spent a lot of time playing with the scores of various rules.  At one point in time, I had virtually eliminated spam in my inbox (I have a procmail rule that sorts messages marked as spam into a separate folder).  Unfortunately, I had too many false positives, that is, I was marking too much legit e-mail as spam.  After a little more tuning, I finally reached the state where I had very few false positives (maybe 1 every other day or so) and I realized that almost all of the false negatives (spam getting past the filters) had images in them.  Now, I read my e-mail as plain text (see and Spaf's blog for other discussions of that subject), so I had no idea what was in these images, I just saw that there were images as attachments on these messages.

That brings me back to [3], in reading back through old entries, I noticed that Ralf had mentioned (back in Sep 2006) using FuzzyOCR [6], to reduce some of this spam.  It turns out (and maybe all of you already knew this) that most of those images contain the same kind of fuzzed references to cheap pharmaceuticals or stocks that could bring quick profits, that I was filtering successfully from the body or headers of the messages.  As a result, I looked at [7] and then [5] and found a couple of new (to me) tools to help deal with some of this remaining spam in my inbox.  I've only been running this setup for a little over a week, but the result has been that I have had no more than 1 or 2 false negatives and 1 or 2 false positives a day since I starting running FuzzyOCR and ImageInfo (well under 1% of my e-mail), so I'm pretty happy with them.  I realize that running OCR software against all inbound e-mail is going to be too heavy a load for the mail servers in most large organizations.  As I mentioned above, my server is relatively lightly loaded and I tweaked the FuzzyOCR config to basically only run it if it hadn't already been determined to be spam by some other means, but I'm quite happy with it.  Of course, this means that the spammers will soon change tactics again to evade these tools, too, but for the moment, it works for me.

[1] (especially the how-to sections)
[2] Ralf Hildebrandt's and Patrick Koetter's "Book of Postfix"
[3] Ralf's Amazon blog

Update:  About 10 minutes after I made this story live, I noticed this from McAfee on the image spam situation.  I guess I wasn't imagining things.  Also, James mentioned this from Sophos in e-mail.

Jim Clausing, jclausing ++ at ++ isc dot sans dot org
0 comment(s)


Diary Archives