The recommended storage interface for dspam is MySQL, and after dealing with RedHat specific problems revolving around SleepyCat’s db4, I am not at all surprised. (More on that later if you encounter the same issue.) The simplest part, by contrast, is actually compiling the software and configuring Sendmail to call procmail such that dspam obtains the information it needs to run happily.
If you aren’t familiar with dspam, visit the project page to read more. Some of its cooler features include extremely light overhead and console free retraining by simply forwarding misclassified mails off to a particular email address. The necessary corrections are made so the mistake doesn’t happen again. This is possible because dspam hangs on to messages for a definable period of time before discarding them, so if you hit it with a message containing a dspam signature, it knows what to do. Training of this nature is especially crucial during the first few days. If all your mail clients support a complete message bounce, you can relocate the magic signature into the message header. However, not all mail clients support this. Fortunately, a small signature is only a minor nuisance. (What’s more, some ISP’s SMTP servers silently drop emails that have URIs they have decided to blacklist, so having to forward only the message signature by itself gets you off the hook.)
The specific scenario this guide covers is installing dspam on a stock RedHat Fedora Core 1 box running Sendmail and using procmail as the local delivery agent. In order to ensure the correct permissions are available to the dspam binary, I call it from a global procmail recipe which starts dspam, then finishes up by executing each user’s .procmailrc, if such a file exists. This is an excellent SpamAssassin refugee configuration, as it is often used in a configuration similar to this.
First, download the latest dspam, which is version 3 as of this writing. If you have used version 2 before, version 3 is a whole new ballgame with some differing options.
Unpack the source and run configure. dspam will default to the db4 driver, so you need not specify any magic options. The default directory for databases and other option files is /var/dspam. Other files will be installed into the /usr/local tree by default.
[jasonb@trekweb ~/src]$ tar -zxvf dspam-3.0.0.tar.gz [jasonb@trekweb dspam-3.0.0]$ ./configure --with-debug [jasonb@trekweb dspam-3.0.0]$ make [jasonb@trekweb dspam-3.0.0]$ su [jasonb@trekweb dspam-3.0.0]# make install
For security, only authorized users can execute dspam with arguments. By default dspam is owned by root, in the mail group, and setgid, which is sufficient for a standard Fedora Core 1 Sendmail installation. Create the file /var/dspam/trusted.users with the user root listed. You can list additional usernames as necessary. By default anyone not listed cannot specify arguments to dspam to prevent potentially malicious activity.
Next, Sendmail must be reconfigured to include an additional argument to procmail, the username the delivery is being performed for. Assuming you’re configuring Sendmail using the standard sendmail.mc file, your local_procmail FEATURE should be modified to look like the following snippet:
FEATURE(local_procmail,`',`procmail -t -Y -a $h -a $u -d $u')dnl
dspam needs to know the username the mail being delivered is for, so it can perform internal housekeeping on the statistical databases and temporarily cache a copy of the message should retaining be necessary. This is achieved by using the -a option to procmail, which passes the specified variable off as an argument which we will fetch later. By default this is not done, since procmail is called with the -d argument already which tells it explicitly all it needs to know to deliver the mail. For those unfamiliar with sendmail.mc, parameters are quoted starting with a backtick character and terminated with the standard single quote character. Once you have completed your changes, rebuild the Sendmail configuration and restart the daemon:
[root@trekweb root]# nano /etc/mail/sendmail.mc [root@trekweb root]# cd /etc/mail [root@trekweb mail]# make [root@trekweb mail]# service sendmail restart Shutting down sendmail: [ OK ] Shutting down sm-client: [ OK ] Starting sendmail: [ OK ] Starting sm-client: [ OK ]
Next, you can configure procmail to pass mails to dspam before proceeding with regular delivery. The following simple procmail recipe, placed in /etc/procmailrc, is executed by procmail at time of delivery. The process is executed with root permissions before performing a setuid to the user the mail is being delivered to.
LOGFILE=/var/log/procmail.log
# Produce tons of debugging output
#
VERBOSE=1
LOGABSTRACT=1
#
DSPAM=/usr/local/bin/dspam
# Extension
# Sendmail lets you send mail addressed to user+foo@example.com
# This is 'foo'
#
EXTENSION=$1
#
# Catch our user from sendmail.mc to pass to dspam
#
USER=$2
#
# Let's retrain as necessary
#
:0
* EXTENSION ?? spam
{
:0 w
# No notification on 'delivery'
COMSAT=0
| $DSPAM --mode=teft --class=spam --source=error --user $USER
}
#
:0 fw
| $DSPAM --stdout --deliver=spam,innocent \
--mode=teft --feature=chained,noise,whitelist --user $USER
#
# default action: drop privs and deliver to user
The preceding procmail recipe will excute dspam with the arguments of your choice (it will run as root). dspam will evaluate the message and return it back to procmail for further processing. Once /etc/procmailrc has finshished, the user’s .procmailrc will excute, if one exists, with the permissions of that user. If any mails come in with a specific suffix, spam, they are fed into dspam with the options necessary to reclassify the mail as spam, correcting dspam’s mistake. This is crucial to get the classifier up and running. A similar mechanism could be used for dealing with misclassified innocent messages, commonly called ‘false-positives’. A full discussion of procmail is beyond this short document’s scope. If necessary, review this excellent guide.
If you are already using SpamAssassin in your personal .procmailrc, you can easily intregrate it into your new dspam setup until the classifier is fully trained, if you want. I already use procmail to filter spam into a YYYY-MM-DD directory structure for easy pruning. I recycled that concept for use with mail flagged as spam by dspam and handle that before invoking spamc on any remaining mail. Once the classifier is fully trained, (practically) all of this mail will be innocent and I will entirely turn off SpamAssassin.
[jasonb@trekweb jasonb]$ cat .procmailrc
LOGFILE=procmail.log
VERBOSE=1
LOGABSTRACT=1
MAILDIR=$HOME/mail
#
# Assign dates to some variables
#
YEARFOLDER=`date +"%Y"`
MONTHFOLDER=`date +"%Y/%m"`
DAYFILE=`date +"%Y/%m/%d"`
#
:0
* ^X-DSPAM-Result: Spam.*
{
:0 Wic : year.lock
* ? test ! -d dspam/$YEARFOLDER
| mkdir dspam/$YEARFOLDER
:0 Wic : month.lock
* ? test ! -d dspam/$MONTHFOLDER
| mkdir dspam/$MONTHFOLDER
:0 Wic : day.lock
* ? test ! -f dspam/$DAYFILE
| touch dspam/$DAYFILE
:0 :
dspam/$DAYFILE
}
#
# Now hit it with SA if necessary.
# Good mails and missed spams will be treated alike in
# this fashion, until the classifier is up to speed.
#
SPAMC="/usr/bin/spamc"
#
:0 fw
| $SPAMC
#
:0
* ^X-Spam-Status: Yes.*
{
:0 Wic : year.lock
* ? test ! -d spam/$YEARFOLDER
| mkdir spam/$YEARFOLDER
:0 Wic : month.lock
* ? test ! -d spam/$MONTHFOLDER
| mkdir spam/$MONTHFOLDER
:0 Wic : day.lock
* ? test ! -f spam/$DAYFILE
| touch spam/$DAYFILE
:0 :
spam/$DAYFILE
}
Finally, fire off a few test messages to see if dspam is indeed feeding off your messages. You should have all sorts of little critters in your /var/dspam directory.
[root@trekweb mail]# ls -A1 /var/dspam/ data .debug dspam.debug dspam.messages system.log trusted.users
You might also run dspam_stats, which is your friend:
[root@trekweb mail]# dspam_stats
jasonb TS: 0 TI: 26 SM: 1 IM: 0 SC: 0 IC: 0
[root@trekweb mail]# dspam_stats -H
jasonb:
TS Total Spam: 0
TI Total Innocent: 20
SM Spam Misclassified: 7
IM Innocent Misclassified: 0
SC Spam Corpusfed: 0
IC Innocent Corpusfed: 0
TL Training Left: 2480
Initially the false negatives will slip through since you won’t have any classification information built up. To train, simply forward the misclassified message, or at the very list the !DSPAM… id, to your email address with +spam suffixed to it. If you configured the global procmail recipe above, it should be reclassified as spam.
If you’re like me, you don’t want !DSPAM… ids all over your messages. The solution is rather simple. You can enable –enable-signature–headers when you build dspam. The only caveat with this approach is most mail clients do not include the full headers when you forward an email, so you must do one of two things. You can either use your mail client’s bounce or redirect feature to redirect the message intact to your reclassification address, or you can copy the dspam id from the message header, put that in a new mail, and email that to your reclassification address. Either method works fine. If you’re using a recent version of pine, you can enable bouncing on Fedora Core 1 by following this guide.
With signature headers enabled, your mails will have a header much like the following. The only difference from a regular mail is the addition of DSPAM-Signature which is normally appended to the end of the message.
X-DSPAM-Result: Innocent X-DSPAM-Confidence: 0.9997 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: !DSPAM:50e21e2c201385721121391!
If you want to use something like Michael Thompson’s PHP based Web interface for dspam, which expects a standard message quarantine to exist, this procmail recipe will create a fake quarantine for the default user scale. It needs to be modified for domain or large user scales. It’s perfect for checking up on those false positives without having to login to a shell. You need to setup Apache authentication for this to work, with your login to the secure realm being the same as the userid dspam thinks you have. If the file created is not readable and writeable by the mail group, it will appear as no messages are ever being dropped in the file. Ensure the permissions are correct when the file is created. You can probably modify the procmail recipe to do this automatically. You’ll have to modify the PHP source to rearrange the definitions of your false-positive and spam reporting addresses, since it is hardcoded to use a spam-foo / ham-bar combination, not user+spam.
:0
* ^X-DSPAM-Result: Spam.*
{
:0 Wic : $USER.lock
* ? test ! -f /var/dspam/data/$USER/$USER.mbox
| touch /var/dspam/data/$USER/$USER.mbox
:0 :
/var/dspam/data/$USER/$USER.mbox
}
For debugging purposes, you can touch /var/dspam/.debug file, a dot file, which will produce some additional information to track down any problems you may encounter. You’ll want to turn it off when you don’t need it, though.
One you may encounter is with db4. There is apparently an outstanding issue that effects some systems and it may be necessary for you to install a modified build of db4. If you are in this situation, you will know rather rapidly. You will encounter the following error in your /var/dspam/dspam.debug. (You need to touch the debugging dot file first, though.)
29157: [-] DB_ENV->open failed: /var/dspam/data/jasonb: Invalid argument 29157: [-] unable to initialize dspam context 29157: [-] process_message returned error -2. delivering message.
There is an extensive bugzilla thread devoted to this issue at RedHat’s bugzilla. The short of it is, you should install the packages made available by Tomas Janousek which work around the issue. I installed db42, db42-devel, and db42-utils. Work through the dependency conflicts as necessary by removing db4-devel and friends if you have them installed. You still need db4, however, as many existing packages depend on it. Fortunately, both revisions of the library play nice together. You will need to recompile the dspam package to ensure it links against the newer libdb42 library.