[Last modified: 16-MAR-2003]

The Weekly Spam Report



Week 4 - February 9 to 15, 2003

Number of spam messages for this report: 827
Average number of spam messages a day: 118.14

Well, after a looong rest, I'm back. I stopped writing these reports for a while because of two things: one, a lack of time. I did have free time, but I really wanted to use it for more enjoyable than looking at spam. And two, after three reports I found out that there aren't many interesting and witty things to say about spam. It is annoying, there is way too much of it, and it seems to be always the same thing. Make money fast, get a mortgage, buy viagra, increase the size of some part of your body, look at this amazing porn for free, get cash from corrupted third world goverment officials and, lately, get rid of spam. I simply didn't know what to write about.

So, I decided to stick to the facts. I will relate here how much spam I get, how much of it is filtered and what filtering methods and rules work best for me. Let's see how that goes...

I'll write here about my two main mail-boxes; one is used at work, for (obviously) work-related e-mail, and the other is used at home, for personal stuff. I subscribe to several mailing lists with my home e-mail address, and to none with my work address (except internal, employee-only lists hosted by my company). However, my work address was also my personal address for several years, so it gets a good amount of spam, too.

For my work account, I use a server-based filtering system; it allows me to specify simple rules: block a certain domain, or a certain e-mail address, or block anything with some text in some header field or in the message body, or blovk anything addressed to me using one of our old domain names (as many Internet companies, mine changed names and owners a few times over the last years). It requires constant supervision, as most simple filtering systems do, and it causes a few false positives every now and then, as well as letting some spam through. To deal with the spam that gets through, I use bayesian filters in my desktop.

At home, I use mainly bayesian filters (PopFile, in case you are wondering), coupled with a few rules in my e-mail client. The bayesian filters catches over 99% of the spam, fortunately.

Below, and in the following weeks, you will see a table with a list of blocking rules, and the number of messages blocked by each one during the week. These are the rules I use in my work mailbox, and this table will show you how effective each of the rules is in blocking spam. Some caveats: the rules are applied in a fixed order not defined by me, but by the system. Messages that match more than one rule will be counted for the rule that caught them first. The order, is approximately: messages addressed to old domains, then domains and addresses that were blocked globally, then domains and addresses blocked by the user (me), then keyword filters for message headers, then keyword filters for the message body.

I don't have a similar list for the results of my filters at home, as bayesian filters do not work with this type of rules. I am working in creating some scripts to generate some statistics on these messages, but it will take a while. In the mean time, I'll just count these messages for the grand total. For this week, I started using PopFile on Friday, so it is still in training and the numbers of messages blocked/missed don't mean anything. Starting next week, I'll list this, too.

So, here's the list for this week (532 messages were blocked at work):

RuleMessagesPercent
Old domains20939.29%
From: contains "offers"336.20%
hotmail.com305.64%
free-gift-offers.com244.51%
Received: contains "dsl.telesp.net.br"224.14%
lists.mailogen.com224.14%
no @ in sender address203.76%
recessionspecials.com163.01%
bol.com.br152.82%
uberabalogistica.com.br112.07%
emailsupersavers.com101.88%
afsmail.com81.50%
jsuati.com81.50%
dealpatrol.com71.32%
From: contains "rewards"71.32%
zipmail.com.br71.32%
False positives71.32%
msn.com61.13%
mail.com61.13%
ig.com.br50.94%
sendgreatoffers.com50.94%
progdelphi@progdelphi.cjb.net40.75%
Message body contains "oppt out"40.75%
mailmanfirm.com40.75%
uol.com.br40.75%
Message body contains "You have received this notice"40.75%
valley2003.com30.56%
midia.rubystock.com30.56%
yoppit.com20.38%
mdb.adm.br20.38%
brfree.com.br20.38%
list.emailbucks.com20.38%
webmarketingmail.zzn.com20.38%
From: contains "xxx"10.19%
aol.com10.19%
Message body contains "S.1618"10.19%
From: contains "oportunidad"10.19%
pop.com.br10.19%
latinmail.com10.19%
globo.com10.19%
suportt@suportt.com.br10.19%
From: contains "dealz"10.19%
From: contains "execplan"10.19%
Subject: contains "printer cartridge"10.19%
From: contains "free"10.19%
From: contains "bargain"10.19%
ibest.com.br10.19%
To: contains "brasilia"10.19%
Subject: contains "inkjet"10.19%
Message body contains "emailbucks.com"10.19%
sf.com.br10.19%

You will notice that "old domains" seem to be the most frequent cause of blocked messages. It is also responsible for most of the false positives. I will turn this off in the near future, an see how the other rules hold up. My guess is that they will do just fine.

By the way, in the list above, a rule that is just a domain name or an e-mail address means that the sender address matched that. More specific rules contain a full description of why the message matched.

That's it for now. See you next week.


Last updated March 16th, 2003. Send feedback, comments and criticism to Wilson Afonso.
Developed with jEdit