David A Gift wrote:
> I'm not the expert on this and someone from the mail.msu team who is
> needs to jump in, but it is my understanding that (a) we have been
> using SpamAssassin on mail.msu since day-1,
I haven't been on the mail.msu.edu team since day one... however, I have
worked here since '98 in some capacity and I recall there always being
the option to filter spam since pilot.msu.edu became mail.msu.edu (if
I'm wrong I'd ask that some of the original admins step in and correct
me because most of them still work here).
> (b) spam filtering is turned off by default and most users have NOT
> turned it on,
I don't think we've run the numbers on this in quite awhile, but it is
true that spam filtering is turned off by default (this had to do with
resources and policy questions). The best I can recall is the
percentage of people using our default SpamAssassin rules (not including
Mail Filter Rules) is much lower than *I'd* like to see. I wouldn't
even want to make up a number at this point though.
> (c) our spam-threshold settings are somewhat conservative -- i.e.,
> we tend toward more false-negatives (letting spam through) to avoid
> too many false-positives (trapping content that people actually want
> to receive). In fact, SpamAssassin is only one of several layers of
> spam filtering deployed in mail.msu.
We have our threshold set to a score of 5.0, now numerous factors
contribute to the score. For example an MLUI, or Michigan Land Use
Institute (who consistently gets marked as spam) shows this as their
scoring:
X-Spam-Report:
* 2.7 DEAR_FRIEND BODY: Dear Friend? That's not very dear!
* 1.5 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of
words
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
Every single newsletter they send started off with "Dear Friend" Well
right away, boom! That's 2.7 points against your spam score. The
message was only sent via HTML, there's another 1.7 points. Including
the embedded image in the HTML another 1.5 points.
Now unfortunately this isn't spam, this is a legit newsletter that
someone on campus here was having trouble receiving. The way around
this was to setup a list of "Trusted Senders" within the confines of
your MSU Webmail preferences. The trusted sender's list had some small
issues of its own, but they were not very common and not worth
mentioning here.
So I'd say that, yes, a score of 5.0 is fairly conservative considering
how low a point value other attributes can be. Lets say for example,
that we changed the threshold to 7.0. I'd guess that easily you'd see a
10% increase in spam (to those who have filtering turned on). You could
lower the threshold to something like 2.5, but then we'd have to assign
lower point values ourselves to things like "HTML ONLY BODY" in order to
avoid so many false positives.
We do have other methods in place for fighting spam. We have our
greylisting server in front of all inbound @msu.edu email, which has cut
spam in half across the @msu.edu domain. We have some extra definitions
included in our Anti-virus software that help in fighting known phishing
attacks or other (419) scams. We have mail filters, which can be
difficult to create and manage but do provide some additional filtering
based on rules you provide. And of course we have a Blocked Senders
List for those annoying people who are sending threatening emails to
you, and the antonym of that, the Trusted Senders List (Do not be
confused, blocked senders is typically not a method to fight spam).
> Point C is important: the broader the user base the more conservative
> spam threshold settings need to be to make sure that people get the
> mail they want to get. The more local and narrowly-defined the user
> base, the more specific the spam settings can become and the more
> effective the filtering. If one sets filtering just for oneself, it
> could be made to work almost perfectly, but few other people would
> accept the same definition of perfection. I think this
> spam-management trade-off issue is often missed when people compare
> the relative effectiveness of spam filtering at different levels of
> user scale and scope.
I don't have too much to add here other than to say I couldn't agree
with you more on this point. :-)
> - Dave
>
Hope all this information was useful, but I will gladly try to answer
any more questions.
./brm
|