Filtering spam into a folder on Slackware/Sendmail using SpamAssassin

Posted by: Andrew Smith
Poster contact info: asmith16 at littlesvr ca
Author: same as 'Posted by'
Software: Slackware 12.2

For years I've been manually filtering spam, and for years I wanted to set up some kind of spam filter on my mail server (I run my own Sendmail). Unfortunately if you've never done it the setup is really complicated and until now I never had the time to figure it out.

Now that I know how to do it I can probably do it again in less than 3 hours, most of which would be spent pressing enter in response to useless prompts.

My setup is:

- Slackware 12.2 (it has a huge uptime and I don't have a need to upgrade it)
- Sendmail from the distro, upgraded at some point via a Slackware security update
- Sendmail set up properly not to forward spam around, outgoing mail through my ISP, optionally accessed via SSL
- imapd for reading the email

In other words - nothing special really.

The short of it is this: you need to filter all the incoming mail through SpamAssassin, which will flag all mail with spam likelihood scores. Then you have to configure procmail (the default MDA used by Sendmail on Slack) to move the likely spam messages to a different folder.

Note: almost all the commands below need to be run as root. The one possible exception is creating the .procmailrc files.

Installing SpamAssassin

Unless you're a hardcore Perl dude - you have to use CPAN to install it. Even so it will take several steps.

Step1: figure out what the dependencies are. To do this run cpan Mail::SpamAssassin (as root of course) and when it fails to install because it failed to compile because dependencies were missing: make a note of all the required and optional packages SpamAssassin depends on (it will be printed near the end of the cpan output).

Step2: install all the dependencies. For me the command was
cpan HTML::Parser Net::DNS NetAddr::IP Digest::SHA1 \
Mail::SPF IP::Country Razor2 Net::Ident IO::Socket::INET6 IO::Socket::SSL \
Mail::DKIM LWP::UserAgent HTTP::Date Encode::Detect
But it might be different for you if you don't have the same version of Slackware as me.

This is a long process, and unfortunately it's not unattended. Even after the initial CPAN setup is done (which will happen if you never ran it before) it will keep coming up with prompts about extra dependencies and some other things. Basically you just have to press enter for any prompt that comes up.

You can expect this to take a couple of hours, though it depends on the speed of your network and your machine.

Step3: install SpamAssassin. Run the same command as you did in step 1: cpan Mail::SpamAssassin. On my system I ended up with it compiled but some of the retarded unit tests failed. So I had to install despite that the unit tests failed, using cpan -f Mail::SpamAssassin instead.

There are plenty of warnings printed during the cpan runs above, you can ignore all of them.

Configuring SpamAssassin

The defaults (which I've tested now for a couple of days) are very good so actually you don't need to configure much at all. In the default /etc/mail/spamassassin/local.cf I just added one line. This is the entire file with most of of the comments stripped out for brevity and the line I added in bold:
# rewrite_header Subject *****SPAM*****
# report_safe 1
# trusted_networks 212.17.35.
# lock_method flock
#required_score 2.0
# use_bayes 1
# bayes_auto_learn 1
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status
bayes_path /etc/mail/spamassassin/bayes
The required_score I changed to 2.0 for testing, so I could send an email to myself with lots of "viagra" in it and get a high enough score to get it classified as spam.

If in the future I will make more useful tweaks to the config - I will post them in a reply here.

Get the rules

Just run sa-update.

Performance

I figured this is a low-volume mail server and I don't care much about performance but I underestimated just how long a bunch of perl modules can take to load. If you're curious you can try to run this and see how long it takes:
time spamassassin --test-mode ~/.cpan/build/Mail-SpamAssassin-3.3.2-XDoMpI/sample-spam.txt
The exact directory name will be slightly different on your system. You can use your own sample email too of course.

On my system this took over 22 seconds (no kidding), which is completely unacceptable. So I tried using spamd/spamc instead. The improvement was massive.

Setup is a piece of cake: simply run /usr/bin/spamd --daemonize --pidfile /var/run/spamd.pid and you can add it to your rc.local too:
echo -n "Starting spamassassin daemon: "
/usr/bin/spamd --daemonize --pidfile /var/run/spamd.pid
echo "done."
After the daemon is running try time spamc -c < ~/.cpan/build/Mail-SpamAssassin-3.3.2-XDoMpI/sample-spam.txt

The difference was shocking on my system. Instead of 22+ seconds this ran in 3+ seconds! All the loading time was spent just once, when starting the daemon.

Plugging it into Sendmail

By default Sendmail (the MTA) will use procmail as an MDA. But you might want to confirm that for your existing setup. If you still remember which .mc file you used to configure Sendmail, there should be a line like this in it:
MAILER(procmail)dnl
If there isn't - well, in that case you'll have to figure some things out yourself. It should be there.

Now we will tell procmail to send all mail less than 300k in size to SpamAssassin. Create the file /etc/procmailrc and paste this into it:
DROPPRIVS=yes
PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/sh

# Spamassassin
:0fw
* <300000
|/usr/bin/spamc
No need to restart anything after changing this file, it gets reread for every message.

Now go send yourself an email and have a look at the headers. There should be several X-Spam-* headers in it. If there are - you're almost done! If not - stop now and figure out what step you missed.

Move spam automatically to a folder

I use a folder named "Spam" for all the garbage I receive. Unfortunately it looks like I had to create a per-user .procmailrc file to instruct procmail to do the move automatically. Basically for each user who you want to enble this for create a ~/.procmailrc file with these contents:
:0
* ^X-Spam-Flag: YES
Spam
And if your spam folder is called something else - just use that instead. This folder wasn't there automatically, I created it manually years ago after I started receiving spam.

Training SpamAssassin

I'm not sure this part is required but I did it anyway, and it certainly didn't hurt. The idea is to show SpamAssassin lots of examples of messages you know to be spam and lots you know are not spam. This will only work of course if you kept all the spam you received to date in a separate folder:
sa-learn --showdots --mbox --spam ~andrew/Spam
sa-learn --showdots --mbox --ham ~andrew/Trash
This took me about 30 minutes per 3500 messages, so be ready to wait after running it.

Now if you wait till you receive some spam and some real email - everything shold work. I was quite impressed with how good SpamAssassin is at identifying spam.

Good luck!