Open Bug 181953 Opened 22 years ago Updated 2 years ago

marking postmaster mail as junk should only train on words in the enclosed message

Categories

(MailNews Core :: Filters, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: endico, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

A lot of spam is adressed to non-existant addresses at mozilla.org such as
anonymous@cvs-mirror.mozilla.org. The mail then bounces, generating a
'Returned mail' message informing the sender of the bounce. Except with spam,
mail to the sender generally bounces so the bounce mail bounces and the whole
mess goes to the postmaster. Embedded in the original message somewhere is the
original spam.

I'd like to mark these bounce messages as junk but I am reluctant to do this
because I don't want to train mozilla to treat phrases such as 'The following
addresses had permanent fatal errors' as junk. Normally the number of legitimate
bounced mails that i should pay attention to is pretty tiny but i get hundreds
of bounce messages containing spam. I'd like for all the bounced spam to be
marked as junk but I'm afraid if I train mozilla on bounce messages then normal
text in bounce mail will  also be treated as junk and some day something will
really break and i'll miss the legitimate bounce messages because mozilla marked
them all as junk. 

When i mark one of these bounce mails as junk, i'd like for mozilla to only use
 the text in the original enclosed message in its junk filters. Finding the
original message shouldn't be hard. Sendmail mailer-daemon messages contain the
original mail as  'Content-Type: message/rfc822' mime attachements.

This kind of logic may apply to other forwarded mail as well.
reassigning to dmose
Assignee: naving → dmose
> I'd like to mark these bounce messages as junk but I am reluctant to do this
> because I don't want to train mozilla to treat phrases such as 'The following
> addresses had permanent fatal errors' as junk

This should be easy to compensate for: in addition to training it on a bunch of
postmaster bounces of junk, train it on a bunch of postmaster bounces of non-junk.

Your proposed solution really seems wrong to me, and I'm inclined to WONTFIX it.
 Any folks on the CC list have comments?
Blocks: 11035
I don't think i have any postmaster bounces of non-junk.
I agree. By forcing mozilla to parse the email to determine if it is a 
Postmaster bounce or not, you simply open a hole for spammers to exploit. The 
better method is to train it on spam and non-spam postmaster bounces. The 
Bayesian filter will learn that the words 'The following addresses had 
permanent fatal errors' are INSIGNIFICANT, and do not bias the mail towards 
junk or non junk.
I don't yet believe your assertion that the filter will learn that the words
'The following addresses had permanent fatal errors' are insignificant. I'm 
sure that would be true if i had roughly equal numbers of bounces with good
and junk mail.

This morning i had 159 postmaster mails containing spam and zero without.
I think maybe about 20 of these weren't recognized as junk. If I mark these as
junk and go for weeks marking postmaster mails at the rate of 20 to one (or
less) then isn't the filter going to strongly associate 'postmaster words' with
junk? 


I don't really need mozilla to determine if a mail is a postmaster bounce
message. What i really need is the ability to be specific about which message
i'm marking as junk. Right now i can only mark an entire imap message as junk.

Some messages contain attachments which themselves are email messages.
I'd like the ability to mark just these embedded messages as junk. Maybe we
could do this with a context menu when clicking on the attachment name
in the attachments list. 

Or, make a way to extract the embedded message to a folder where i could mark
the copy as junk.
how about "only filter the message selection" so that you can select
the parts you want to filter and have junk filter handle the parts
only?
I realize that this is a request for enhancement for Mozilla, but isn't the 
original problem much easier solved by making the MX for cvs-mirror.mozilla.org 
reject invalid recipients at time of RCPT ?  Some MTA software, like Exim 
(which I run on protocam.com), has a configuration option for this.  I see Gila 
is running Sendmail, and Lounge doesn't identify its server in the banner.  It 
looks like sendmail has a ruleset local_check_rcpt for this purpose.

Unless there are other situations where the feature to mark attached messages 
as spam might be necessary, it seems there's already a viable technical fix.
Today I had a similar situation which could be solved in the same way. It was
not a postmaster bounce. I am the moderator for a few mailinglists, and
sometimes people send spam to the moderated list. The mailinglist manager (ezmlm
in this case) sends the spam to me for moderation. The moderation requests ends
up in my spam folder however, because it contains the spam. If I untrain the
message, I untrain all kinds of spam-words. 

So this is another case where marking part of the mail as spam/nonspam could be
used to train the filter.
Product: MailNews → Core
Assigning bugs that I'm not actively working on back to nobody; use
SearchForThis as a search term if you want to delete all related bugmail at
once.
Assignee: dmose → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
(In reply to Dawn Endico from comment #4)
> I don't think i have any postmaster bounces of non-junk.

In this case then, would it not make more sense to simply filter bounce messages?


(In reply to Daniel Wang from comment #7)
> how about "only filter the message selection" so that you can select
> the parts you want to filter and have junk filter handle the parts
> only?

Does this serve most people's needs?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: