Open Bug 151622 Opened 22 years ago Updated 2 years ago

[RFE] filters that score messages (anti-spam)

Categories

(MailNews Core :: Filters, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: ducky, Unassigned)

References

Details

It would be useful for filters to be able to add and subtract points from a spam score. Ideally, I'd like to be able to take actions based on the score (e.g. "delete all messages whose spam score is less than -100), but that depends upon multiple actions (bug 13145). However, even just being able to sort the inbox by score would be useful. Or perhaps have a View that hides all messages with a score lower than -100. There are *LOTS* of things that *usually* indicate that something is spam, but unfortunately, not always. + Messages that don't have me in the To or CC line are usually spam. + Messages with embedded images are usually spam. + HTML messages are usually spam. + Messages with more than five people in the To or CC line are usually spam (see bug 120606). + Messages that contain "enhance" in the subject line are usually spam. However, none of these are failsafe: + I *want* to be in the BCC line for some things, e.g. party announcements. I don't want to get all the "Yeah, I'll be there!" (erroneous) reply-to-alls. + I have a friend who embeds a gif of his signature in all his messages. + See above. + See bug 120606. + The VP could send a message about needing to enhance revenue. If you can add/subtract points, then you can get much more precise spam filtering. If you can easily whitelist at the same time (see bug 34340 and bug 120160), then you can get deadly. I wrote a Visual Basic macro for OL2002 to do this, and it was extremely effective (much more effective than Outlook's built-in junk filters). SpamAssassin scores messages in much the same way, and it too is leathal. I hear someone off in the distance saying, "but people would never take the time to tweak the scoring like that." True, *most* people wouldn't, but some would -- and if there is per-filter import/export (see bug 151612), then it gets easy for people to pass the spam filters around. Heck, post it on a Web page and you get close to resolving bug #126688. Or just use SpamAssassin's published penalties. Now, this idea was discussed at length in npm.mail at length in Jan 99 (see message ID <369A4CDC.5DE16593@netscape.com>). It was discarded partially on the grounds of inadequate resources for 1.0 but partially on the grounds that it would be hard to explain to people. I understand that resources are probably still an issue (though I may be able to help). I think, however, that if you explain it in terms of an integer score, it will be much easier to understand than if you talk about it in terms of a percentage.
Status: UNCONFIRMED → NEW
Ever confirmed: true
What about a Mix of 1. Spamscoring http://spamassassin.taint.org for tagging Mail as Spam 2. Reportingcommunity http://www.cloudmark.com 3. Integration of reporting per right mouse click to http://spamcop.net That would result in the most impressive Spamfightingsolution ever seen.
Nico, those are good ideas, but really should be in separate RFEs. This RFE requests the filter action add/subtract ___ points from the spam score What you requested is really one new filter conditions: when {external third party} says this is spam and a UI feature that composes and sends a message.
Added dependency on multiple actions -- for scoring to work properly, you need to be able to accumulate penalties. For example, you might dock a message 20 points for being from an unknown address, 50 for having an embedded image, 1000 for having Viagra in the subject line, etc. This means multiple actions, bug 13145. There are really two sub-RFEs that I want: + Filter action [add/subtract] ____ points from spam score + Filter condition: if spam score is [greater than/less than] spam score
Depends on: 13145
mass re-assign.
Assignee: naving → sspitzer
Now having the bayesian span recognition, is this still needed?
> Now having the bayesian span recognition, is this still needed? Yes. It's not only about having mails devided into spam/not spam, but also into spam/maybe spam/maybe spam if I don't have time/maybe spam if sent before x-mas/no spam and the like.
Anti-spam is the biggest reason for scoring, so yes, the high-order bit has been cleared. ;-) There *is* a lesser advantage to having scoring: prioritization (assuming you can sort by score). For example, you might add 10 points if the sender is in your address book, 10 points if you are on the TO line, 10 points if you are the only person on the TO line, 10 points if it is from your boss, 200 points if it is from your spouse, etc.
I'm not sure adding/subtracting is the point here is it? Admittedly its likely part of the solution. The point is that there should always at least 3 categories of message: - assumed spam - assumed not spam - possibly spam but the user needs to pass judgement as part of the training process Otherwise I don't see myself trusting Mozilla to trash my spam. I'd like to be able to. The obvious way of achieving and controlling this is to give messages a score and allow the above categories to be configured on that basis.
Jim: this is about custom message filters. It's not related to the Bayesian junk filter.
(In reply to comment #5) > Now having the bayesian span recognition, is this still needed? Yes, I can think of cases where e-mail messages need to be basically immune from being classified as junk mail. In particular, e-mail that is generated by a ticket tracking system (or other process flow system). These messages typically have a very specific subject line format, "[ticket: XXXX-YYYY-ZZZ]" and come from a specific address "ticket-request@example.com". So I should be able to create a rule that simultaneously notifies me via a pop-up window and also tag those messages as being immune from being tagged as spam/junk. Because until you get the bayesian filter properly trained, there's a high probability that it will tag at least some of those messages as junk. Other cases would be filters that move mailing list messages into a sub-folder (a moderated list is highly unlikely to have spam, yet the junk message filter will tag some messages as spam until it is properly trained). The reason that having messages improperly tagged as spam by the bayesian filter is that they forget where they came from, and untagging them and moving them back where they belong is a manual process. (There's a bug 208197 which addresses this.) So when the filter screws up, it's the user who has to clean up the mess. Other thoughts, there are really (3) possibly actions: - "don't junk-filter this message" (which is an "ignore" action), it acts only on messages that match the filter, but doesn't retrain the bayesian filter - "always flag this as junk" (and train the bayesian filter to see this message as junk). Maybe call it "junk this and junk other messages like it". - "always flag this as not-junk" (which differs from "ignore" in that it should auto-train the bayesian filter on this message). Or "never junk this and never junk other messages like it"
AIUI, you can actually do that right now, without needing a point-threshold system. User-defined filters always run before the Bayesian filter. So you can e.g. create a filter for "Sender: ticket-request@example.com" with a "Set junk status to: not junk" action, and it should work.
Product: MailNews → Core
The other thing that would be nice along with this would be to be able to do some recognition of numbers in a header. For instance, CRM114 returns headers like this: X-CRM114-Status: Good ( pR: 2.1939 ) If I could write a filter that would say "if the X-CRM114-Status header has a number less than -500, delete it summarily" (the value ranges from -999 to +999) then that would be very, very nice. Also the ability to sort things by spamminess would be awfully nice. (Perhaps the ability to "label" with a gradated colour?)
sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.