Semi-Automatic Whitelist/Greenlist System


[Note: If this text is too small to read, increase the default font size in your browser.]

An email "whitelist" exempts known senders from spam filtering, which decreases the likelihood of a "false positive" and allows more aggressive filtering of mail from strangers. It's possible that a spammer might forge the From address of someone in your whitelist, but it's statistically unlikely (viruses are more likely to do this, but they're easier to filter).

[Some people call this a "greenlist" to distinguish it from a "whitelist" that simply discards mail from anyone who is not on the list. Others use "whitelist" and "greenlist" interchangeably. The term "whitelist" predates "challenge-response" systems, but some people confuse the two.]

Some email clients can do whitelisting, but IMHO that's too late-- I want the whitelisting and filtering to happen when the mail is delivered, even if I'm not online at the time. In most cases, that means doing it with procmail.

A whitelist can be assembled by hand, but it's easier to let the computer do it. The simplest way is to write a shell script that extracts email addresses from saved emails or from your address book files. The drawback to this approach is that you might forget to do it often enough (although it could be automated). Dallman Ross described a system he uses that automatically generates whitelist entries for anyone he sends email to, by sending a Bcc to a secret address used only for that purpose. I thought that was a good idea, but my implementation is different from his.

System Requirements:

This system depends on three special features of the mail system: "subaddressing" (sometimes called "+addressing"), a way to read a message's "envelope address", and mutt's "folder-hook" and "my_hdr" functions. The details will vary depending on your ISP:

1: Subaddressing means that some ISPs will deliver mail to "username+anything@ispname.com", where "anything" can be any string. This gives you an (almost) infinite number of email addresses, which can be used for various purposes (some ISPs use "-" instead of "+", because some programs don't know that the "+" character is allowed in an email address).

Some ISPs also give you a "personal domain" that looks like "anything@username[.something].ispname.com", where [.something] will vary depending on your ISP.

Nancy McGough discusses subaddressing here. She also mentions that it's best not to use the "base" address for anything, because it tends to become a "spam magnet".

Some people use subaddressing as a substitute for filtering, by changing their "public" email address frequently so the spammers never have the current one. I combine it with filtering because I prefer not to change my addresses any more often than I have to (on rare occasions, I've received legitimate replies to usenet postings that were several years old).

2: I use the Bcc address as a coded signal, but the Bcc: header line is not delivered, so I need access to the "envelope address". Some ISPs add this information in a special header line (Panix uses "X-Original-To:", other ISPs use various other things). On some systems the subaddress is passed to procmail in the pseudo-variable "$1", which can be assigned to a normal variable.

I've run across one ISP that supports these features without advertising it, so it might be worth sending yourself a Bcc to see what shows up in the headers. Note that if you're not using procmail, the mail system may interpret a subaddress as a mailbox name, so mail sent to "username+test@ispname.com" may be delivered to a mailbox named "test" (procmail bypasses this, and will deliver wherever you tell it to). In any case, there is no substitute for testing.

At Panix, X-Original-To: always seems to be in the form of "username+subaddress@..." even if the message was sent to "subaddress@username...". I'm not sure if this is a bug or a feature.

3: Mutt's "folder-hook" and "my_hdr" functions allow different headers to be added to messages sent from different folders. This means that I can use different From:, Reply-To: and Bcc: headers for different mail folders, which allows me to change the subaddresses I use for some folders without changing the ones I use for others.

I don't know if any other mail readers can do this on a per-folder basis (other than emacs, which is where mutt's author got the idea). Pine has "roles" that can do something similar, but they work on a per-message rather than a per-folder basis, so they're triggered by looking for patterns in the headers (such as the To: or Subject: lines). If the pattern isn't there, the message won't get the right headers (although you can still edit them by hand).

Nancy McGough discusses pine vs mutt here. Interestingly, she's tried mutt but keeps going back to pine; I've tried pine but I keep going back to mutt. Obviously, Your Mileage Will Vary. But note that, in either case, you will need to learn to use the configuration options; neither program will do much for you with its default settings.

More on Pine's "roles" here.

Goals:

1: Automatically add (most) addresses to the whitelist, with as little manual intervention as possible.

2: Manage subaddresses in a way that makes it easy to change them.

3: Keep separate whitelists of people I've corresponded with about certain subjects. This is not really necessary, but it has some potential advantages (for example, I might decide to "purge" one whitelist while leaving the others intact).

4: Separate the public "initial contact address" from the addresses used by people I know. This means that when a stranger sends email to one of my "public" addresses (for example, the one posted on my website), I'll respond with a different Reply-To address. The From address might still be the public one, in case they have it whitelisted (this assumes that most people will actually use the "Reply-To" address when replying).

5: Include enough comments so that beginners can figure out how it works by reading the rc files.

Methods:

The basic idea is that whenever I send email to someone, my email program adds a Bcc header that points to a (secret) coded subaddress, which causes the "To:" address to be added to the appropriate whitelist (the Bcc'd copy of the message is not actually delivered). If there is more than one recipient, only the first one is added-- I could try to add all addresses, but I don't think it's necessary, and it's easier to just extract the first one.

It does not add addresses from incoming messages that merely get past the spam filter, unless I reply to them. If it's not worth a reply, then they don't need to be in my whitelist. If it is worth a reply, then they probably should be in the whitelist, so the act of sending a reply should add their address to the whitelist without further manual intervention.

If the address is already in the whitelist, it is not added again (this avoids repeated "add and remove" activity for duplicates). There is also an extra check to avoid accidentally adding one of my own addresses (this should never happen, unless I send mail to to myself, but if it did it would create a "back door" for spam with my address forged in the "From:" line).

If a message comes in that doesn't match the whitelist (meaning I've never sent mail to that address), then I send it through the usual spam checking, and deliver as appropriate.

There are some cases this doesn't cover:

1: I send email to someone and they reply from a different address (common for people with several addresses). This is exactly what I plan to do for my "public" addresses (eg the ones posted on my website).

The whitelist checks for a Reply-To header, so anyone who uses Reply-To consistently will get whitelisted the next time I reply to one of their messages, regardless of where the mail actually came from. If they use several addresses without using Reply-To, then it may take several exchanges before all of their addresses get into the whitelist.

2: I send someone my address through a web form and they send confirmation by email (common for businesses). In this case the address may have to be added by hand, since these are often "read-only" addresses that are never replied to (but check for a Reply-To header anyway, since the whitelist will favor that one if it exists). These addresses are sometimes forged by spammers (and "phishers"), but it may be worth it if you need to make sure you see all the real mail.

3: Someone gets my address from a website (very common). Their address will be added if I reply to their message, otherwise not. I also whitelist certain words or phrases in the Subject line that are related to the content of the website (and are unlikely to be used by a spammer).

4: Someone sends me email and I reply in a different medium (eg, by telephone, or by talking to them in person). They'll be whitelisted the first time I reply to one of their messages.

5: As always, mailing lists may have to be handled manually. Unless you constantly subscribe to new lists, this shouldn't be a problem.

Generating new subaddresses:

These should be random strings-- if they're dictionary words, spammers might start trying to guess them. I assign them with a random-password generator program. If it generates truly random strings, you'll want to avoid characters that have a special meaning in the mail system or in regular expressions, specifically ^*+?@|\()[]<>$. If you generate "pronounceable" passwords they won't have any junk characters, but they may only have one digit, so you might want to add one or two more.

I use the pwgen program because it's included in the Debian linux distribution (although it may not be installed by default).

See also: Java password generator (gpw="Generate Pronounceable Words"). Also available as Java source or C++ source (near the bottom of the page). This is by Tom Van Vleck, who worked on the Multics project. He notes that the first such program was written in 1965.

A web search will turn up others (free and otherwise), including some for Windows.

[One other hint: use human-readable subaddresses for testing, then switch to the random ones after you know it's working.]

When new addresses are created, mutt and procmail need a way to recognize them. I store the information in a data file and then use shell scripts to generate rc files from that data for procmail and mutt. The data file has five fields per line:

foldername whitelistname public-subaddr private-subaddr add-code

"foldername" is the name of the mail folder or mailbox file
"whitelistname" is the name of the whitelist file
"public-subaddr" is the subaddress that I give out publicly
"private-subaddr" is the subaddress that I use but don't publicise
"add-code" is the subaddress that adds an address to the whitelist.

When editing this file, don't let your editor wrap long lines-- each line must have exactly five fields (see link to sample data file below). Also be sure it doesn't contain any blank lines, especially if you've removed a line at the end of the file. I always look at the generated files after running the scripts, to make sure the output is sensible.

If foldername is a maildir-style subfolder, it should include the path from the parent folder (that's "folder.subfolder" rather than "folder/subfolder"). Some systems use a leading dot, which should also be included (eg ".folder1" rather than "folder1").

More than one folder can use the same whitelist. In some cases the "public-subaddr" and "private-subaddr" can also be the same (eg for mailing lists that may be archived on the web).

When you add a new folder to the subaddress file, it won't actually exist until someone sends mail to it, and you won't be able to send mail from it until it's been created. I create the new folder by sending email to myself (this is why I made sure the whitelist won't add my own addresses). It will show up the next time I start mutt.

If a subaddress has started to attract too much spam but you're still getting some legitimate mail using it, you can use the subaddress data file to move that subaddress to a temporary folder until you can get all of your correspondents converted. Mail with a non-valid subaddress is discarded.

I also generate a file called ".myemail" that lists all possible addresses where I can receive mail. This file is used by Spambouncer, but I also use it to avoid adding one of my own addresses to the whitelist.

Notes on mutt:

If you define a shell alias so mutt always runs as "mutt -y", it will start up with a display of all mail folders.

To change folders while reading messages, press "c", then "?", then the TAB key. TAB toggles between mailbox view and directory view. Note that "../" in directory view is not the same as "=/" in folder view (the latter is the "INBOX" in a maildir folder).

I'm not sure how to set this up for IMAP, becuse I'm reading mail from local folders. I think you need to use:
set spoolfile=imap://imapserver/
set folder=imap://imapserver/
(If someone who actually uses IMAP can tell me the right way to do this, I'll post it here.)

For testing purposes, you can run mutt with a different .muttrc:

mutt -F muttrc

There may be some errors and omissions in the mutt manual. If in doubt, do some testing, and read comp.mail.mutt (note that they do expect you to read the documentation before posting questions).

Notes on procmail:

(These are decscribed in the procmail FAQ, but they're important for beginners to know about.)

Condition lines that use variable expansion need a "$" character before the condition. It's very easy to leave this out if you're not careful. It changes the meaning of "$" from "match the end of a line" to "expand this variable".

Mbox folders require lockfiles, maildir folders don't. Normally, specifying a lockfile on a maildir folder is harmless, but if the folder doesn't exist yet, procmail may complain about not being able to create the lockfile before it creates a directory to put it in. Since I use maildir folders, I'm not specifying a lockfile on delivery recipes. It's up to you to put them back if you need them (but consider converting to maildir instead).

Be sure to read the "procmailsc" manpage, which explains the scoring system. Some procmail experts seem to use scoring for almost everything these days, so if you ask a question you may get an answer that assumes you know how it works. One reason for this is that it's a handy way to "OR" several conditions together (normally all conditions are "AND"ed).

It helps to learn a bit about writing shell scripts, because procmail supports most of the same syntax (see references below). Note in particular that backquotes and single forward quotes look similar in some fonts, and they have very different meanings. The scripts below use the 'awk' utility, which is relatively easy to learn because it only works on formatted data.

It's handy to be able to test an rc file without having it affect your real mail. If testrc is the experimental version of your .procmailrc, and testfile contains a test email message (including headers), you can type:

procmail -m testrc <testfile

testfile can be very simple-- it only needs the header lines that your .procmailrc actually uses. For most uses that's just To, From, Subject, Reply-To, and X-Envelope-To (spam filters will expect to find more complete headers).

It also helps if testrc specifies a different $LOGFILE so you can use VERBOSE=yes without mucking up your normal logfile. Meanwhile, any mail that comes in will continue to use your existing .procmailrc.

Notes on shell scripts:

If you're not used to writing shell scripts, be sure to make the script executable with "chmod u+x scriptname". If it's not in your $PATH, you can run it as "./scriptname".

Sample files

These contain many comments, so be sure to read through them. Note that they are not "plug and play", you need to understand how they work and adapt them to your local system. (Also, the subaddresses used here are just examples, they're not the ones I actually use.)

.procmailrc
.muttrc
subaddr (Subaddress data file)
mutthooks.rc (include file for .muttrc)
subdata.rc (include file for .procmailrc)

makesubrc (shell script)
muttfolders (shell script)
addresses (shell script)
update (wrapper script to run the other three scripts)

Manpages:

procmail
procmailrc
procmailex
procmailsc
mutt
muttrc

The Mutt Manual

Websites:

Procmail tips page

Nancy McGough's Procmail Quick Start

Procmail FAQ
If this doesn't work, try these mirror sites, or do a web search for "procmail faq":

North America
mirror1
mirror2
mirror3
mirror4

Europe
mirror1

If you list your contact address on a web page, you can use this to disguise it (but don't expect it to work forever...)
http://www.spamassassin.org
Spamassassin Configuration Generator
Spamassassin Wiki
http://www.spambouncer.org

At Panix, Spamassassin is maintained by the sysadmins so I just have to keep my user_prefs file up to date. I've been using Spambouncer on another ISP where the admins aren't as responsive, because it's a procmail script and I can easily maintain it myself.

Books:

Kochan & Wood: Unix Shell Programming (SAMS)

Dougherty & Robbins: Sed & Awk (O'Reilly)

Daniel Gilley: Unix in a Nutshell (O'Reilly)
(or any comprehensive unix command reference)

Click here to contact me

Index Page