Tricks and technologies for dealing with spam

The first rule of spam-stoppers is that no solution is perfect, but some are more imperfect than others. With that in mind, here are some thoughts on the state of the art.

Home : Linux resources : Antispam


Current spam rate

My server's "spam rate" as of April 2009 is nearly 6000 messages per day, of which some 350 are held by TMDA (see below), and the rest are for nonworking addresses that bounce. The rate seems to hold steady for three to six months, and then suddenly doubles; in December 2004 it was only around 100 messages per day with 15 blocked by TMDA. At most, only one or two spam messages per month actually reaches my inbox; usually, these use a return address that points to a badly-configured autoresponder that responds to bounces, which TMDA accepts as a confirmation. These I immediately blacklist.

This "spam-rate" computation is based on recording all double-bounces, which happen when the destination address was invalid so a bounce was generated, but the bounce itself was also undeliverable. This can happen directly, as when a spammer sends to a non-existent recipient address with a non-existent return address, or indirectly via TMDA. When TMDA sends a confirmation request, it looks like a bounce to the mail system, so if the original sender address is bogus, the confirmation gets double-bounced straightaway. Note that this also includes messages destined for old throwaway addresses that get spammed frequently (that's why I threw them away); I send these straight into the double-bounce logger, since that reduces the load on my server.

This also may include a few random probes from viruses, though not from one in particular that sends huge volumes of probes. This virus (possibly W32.Klez.E@mm) sends emails that purport to come from inet@microsoft.com, which at least makes it easy to eliminate. Astonishingly, this virus guesses destination email addresses at random, and may send hundreds of random emails over the course of a day before it gives up (or somebody wonders why their computer is so slow). So far, the virus' success rate at my site has been zero.

What I do now

In order to cut the flow of spam to an acceptable trickle, I use a combination of methods:

Disposable addresses
I used to use special-purpose email addresses to subscribe to mailing lists, e.g. "rogers-foo1", "rogers-foo2", etc. When the temporary address starts getting spammed, I resubscribe with a new address.
Whitelisting with TMDA
The Tagged Message Delivery Agent (TMDA) is usually configured to accept email only from specified "whitelist" addresses, never from "blacklist" addresses, and only after confirmation from all others. See my Why I use TMDA to reduce spam page for why this is so effective.
Tagging with Spamassassin
SpamAssassin is the best-known example in the "content filtering" category. It looks for telltale features of spam in the message headers and body, and produces a "spam score"; if over a certain threshold, the message is considered spam. As mentioned below, I only use Spamassassin to suppress challenges for probable spam to a valid address, so correspondents who are whitelisted need not worry about mentioning anything (like spam) that might get their message discarded. Not only does this reduce the outgoing load on my server, it reduces the chance of adding to someone's backscatter misery. And TMDA still holds on to the message for two weeks, so I have a chance to look for it if I suspect something is missing.

Discarded solutions

Here are some things I've tried and discarded as ineffective or impractical:

MIME-type filtering.
I used to filter out 95% of all spam by disallowing HTML email and attachments, and provided a separate address for people who wanted to send me attachments, which I kept hidden. (This was even more effective for viruses, which always have attachments these days.) But this had several flaws:
  1. The world is full of people who won't turn off HTML when sending a message, either because they can't be bothered, or they haven't a clue. (I suspect most of the latter don't even know that their mail program sends HTML messages in the first place. If you're one of those, you should read about the ASCII Ribbon Campaign (among other pages) for why HTML email is a poor idea, and then How to Turn Off HTML in Your Outgoing Mail Messages page.)
  2. A few such people had used my "hidden" address as my primary email address in their address book, increasing the chance that it would be disclosed to spammers.
  3. By December 2003, the total volume of spam had increased such that stopping 95% of it wasn't good enough.
That's when I decided I had to switch to TMDA.
Traditional content-based filtering.
The way SpamAssassin is traditionally used is to redirect messages based on the "spam score"; if over a certain threshold, the message is considered spam, and is sent to the recipient's quarantine folder. I tried this at work, using SpamAssassin to tag messages with "**** SPAM" in the "Subject:" line, but it wasn't very effective, though that may be at least in part because the database of spam signatures was not as up-to-date as it could have been. I set it up for myself to test it without actually blocking anything; for the office administrator, we configured her mail reader to quarantine any messages with this tag. Then one day, the office administrator sent me an email with "IMPORTANT!!!!" in the subject and the body in all-caps. Not surprisingly, SpamAssassin tagged this as spam, which was bad enough; what was worse was when the admin didn't see my reply, because the subject still had "**** SPAM" in it and the message got quarantined!

I have not tried any of the commercial spam-filtering solutions, because they all seem to consist of some kind of content-based filtering, usually in combination with proprietary technologies. If the proprietary technologies were good enough, they wouldn't need the content filtering, so I'm not inclined to spend the money for what must be an incremental improvement. (True, it would be equivalent to hiring somebody else to maintain the signature database, so it would be more current and therefore more effective, but I'm leery of false positives.)

The future of antispam

There are several other proposals for stemming the flood of spam, but they tend to be longer term. They consist of changes to the infrastructure that make it easier to detect forgeries, but they won't have much impact until at least one of them is widely implemented.

DomainKeys
This is a Yahoo proposal for authenticating the sending domain (but not the user). If an email purported to come from a sending domain whose advertised policy is that it always signs outgoing email, and the signature was missing or invalid, then the email is clearly a forgery, and can be dropped or quarantined. Once this is widely deployed, it could be quite effective; there are ways to defeat it, but it would be prohibitively expensive for large mass-mailing.
???
[There was another similar proposal I read earlier, but I can't remember where now. -- rgr, 10-Dec-04.]


Bob Rogers <rogers@rgrjr.dyndns.org>
$Id$