136055 - Filter/Search on Body erroneously applied to encoded binary attachments

Reporter

Description

•

23 years ago

If I have a filter rule Body contains "porn" then a legitimate message with a binary attachment gets filtered if the attachment happens to contain ... 1jQNPorNpERB6fvelBUi1+XqGieb7gKwd8asCQNRAO7Uf6AINv/A/+DgH3DW/gJXyejoLLjVRSpI ...

R.K.Aa.

Comment 1

•

23 years ago

related: bug 67421

laurel

Updated

•

23 years ago

Severity: critical → major

R.K.Aa.

Comment 2

•

23 years ago

also related: bug 98141

laurel

Comment 3

•

22 years ago

*** Bug 153973 has been marked as a duplicate of this bug. ***

dmitry

Reporter

Comment 4

•

22 years ago

On the other hand, if the message is multipart/mixed and its html part is encoded as base64, filters of the form "body contains string" do not apply to it. I think the html part of a message should be considered its "body" for filtering purposes, to be decoded if necessary and fed through the "body" filters.

R.K.Aa.

Comment 5

•

22 years ago

*** Bug 166573 has been marked as a duplicate of this bug. ***

Sander

Comment 6

•

22 years ago

*** Bug 159645 has been marked as a duplicate of this bug. ***

Sander

Comment 7

•

22 years ago

*** Bug 181418 has been marked as a duplicate of this bug. ***

NorthMan

Comment 8

•

22 years ago

Confirmed. Voted for. Suggest changing OS to all. This is quite annoying to me also. I have a filter called "porn mail" that checks to see if the body contains "sex", "porn", "farm", etc. I use match "any of these words" because "match all words" doesn't work well for this application. There is therefore no way to setup filters like these without making each filter only 1 word + "and only if message doesn't have attachment" (because I would have to use match all). Filters would be greatly improved if this bug were to be fixed.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Gabriel Barros

Comment 9

•

22 years ago

don't a rule "body doesn't contain 'Content-Transfer-Encoding: base64'" (or any other way to find out attachements. this was the best that passed my mind now) solve this?

NorthMan

Comment 10

•

22 years ago

Well, try it yourself. I don't think that will work though, because you would have to use "match all", instead of "match any of the below rules". Using match all would work, but it doesn't solve the problem. It's just a very awkward workaround. You cannot stop the filters from being applied to a message's attachments. Using match all extremely limits the usefulness of making filters, unless you want to make one filter per word to match. Correct me if I'm wrong.

Justin Kerk

Comment 11

•

22 years ago

Well, I have a rule set up that moves Klez to a Viruses folder using a "Body contains <first line of encoded Klez>" filter, so if this bug were fixed that would stop working.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 12

•

21 years ago

mass re-assign.

Assignee: naving → sspitzer

Gabriel Barros

Comment 13

•

21 years ago

Maybe a "Body" and a "text-only body" that will get only the parts with "Content-type: text/anything". For example, if the Reporter of that bug set the rule: "text-only body" match any porn or xxx it would not tigger the filter on a message that have a gif with ... 1jQNPorNpERB6fvelBUi1+XqGieb7gKwd8asCQNRAO7Uf6AINv/A/+DgH3DW/gJXyejoLLjVRSpI ... because it will look on text/plain, text/html, and others text/* but the Justin Kerk from comment #11 would be able to set "body" match <first line of encoded Klez>

Jason Logan

Comment 14

•

21 years ago

workaround that worked for me and a few others, I get probably 30 or 40 junk emails a day, i have maybe 30 or 40 ppl in my address book and those that i confer with daily. Since most of them are from the same domain mailer, i setup something like so... if sender deos not contain <specified domain> then move to trash if sender deos not contain <specified email> Or <specified email>, etc. i find that filtering out the emails i do want instead of those i dont' want works faster and is much less work. I would still like an filter attachment though, i am using currently 3 rules explicitly stating different parts of the header info using the AND feature I can get 90% of them but a custom one named attachment would be nice.

Dominik Scherer

Comment 15

•

20 years ago

This bug is still present in current builds, e.g. Thunderbird 0.7.*. I would really appreciate if it would be fixed before 1.0 because it's bugging me since Netscape 6.0. I have it set up to mark all emails that contain the word "yps" to be moved to a seperate folder, but it works so bad that almost any mail with an attachment is being moved.

Mike Cowperthwaite

Comment 16

•

20 years ago

*** Bug 267230 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Comment 17

•

20 years ago

That duplicate notes that the problem exists for UUencoded attachments, as well as MIME ones.

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Mike Cowperthwaite

Comment 18

•

20 years ago

*** Bug 272042 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Updated

•

20 years ago

Summary: Filters erroneously apply to encoded binary attachments → Filter/Search on Body erroneously applied to encoded binary attachments

Niek

Comment 19

•

20 years ago

This is a big problem, and while one result of the bug is talked about frequently (the fact that the wrong messages are matched with a search/filter), another problem is mostly overlooked, but arguably even more important: a full body search takes extremely long to complete in folders with many attachments, since those attachments can easily be megabytes in size, while the text in the message (that should be searched) is perhaps only a few percent of that. So, the search could easily be made 95% quicker or so in most situations, where people send a few pictures or other documents every now and then. I really don't understand that it takes years to fix this bug: i thought open-source actually meant that things get fixed quickly, but this really makes me lose confidence in this process and this product.

OstGote!

Comment 20

•

20 years ago

*** Bug 282682 has been marked as a duplicate of this bug. ***

Howard Chu

Comment 21

•

19 years ago

I think this is a dup of bug #132340

Mike Cowperthwaite

Comment 22

•

19 years ago

(In reply to comment #21) > I think this is a dup of bug #132340 Not exactly -- there we *do* want to search the body after decoding; here, we not only don't want to search within (binary) attachments, we don't even want to decode them in the first place (during search).

Howard Chu

Comment 23

•

19 years ago

(In reply to comment #22) > (In reply to comment #21) > > I think this is a dup of bug #132340 > > Not exactly -- there we *do* want to search the body after decoding; here, we > not only don't want to search within (binary) attachments, we don't even want to > decode them in the first place (during search). Ah, ok, that makes sense. But in light of my efforts to unify the junk filter and regular filters, I think we need to make this difference explicit in the UI. E.g., there should be separate criteria: Body Text <contains/etc> <-- that is, only search the plaintext Attachment <contains/etc> <-- only search the attachments, decode as necessary Body <contains/etc> <-- everything and some other bug reports have also requested All Headers and Entire Message as criteria scopes. That would probably cover all the bases.

Howard Chu

Comment 24

•

19 years ago

(In reply to comment #23) > E.g., there should be separate criteria: > Body Text <contains/etc> <-- that is, only search the plaintext > Attachment <contains/etc> <-- only search the attachments, decode as necessary > Body <contains/etc> <-- everything After re-reading some more, that doesn't seem to really cover it completely. It helps to be able to decide which portions of the message to filter. If the portion you choose is encoded, it should always be decoded. (And the spam filter will always operate on the whole message, I didn't need to mention that here.)

Mike Cowperthwaite

Comment 25

•

19 years ago

(In reply to comment #24) > If the portion you choose is encoded, it should always be decoded. (And the > spam filter will always operate on the whole message, I didn't need to > mention that here.) I think that's all fine; but I wonder if it's ever necessary to perform a text search within a binary attachment. As an example, suppose you regularly get messages with text/html attachments, and others with (large) image/jpeg. If you're searching "attachment body" for a string that you expect in some html files, you don't need to spend the time decoding the JPEG and then performing a probably-fruitless search in that file's data; even if you *did* get a match (on a short string, presumably, like that in the original bug report here), it would probably be a false positive. Generally, I would prefer that "attachment body" searching was limited to text attachments (including message/rfc822 attachments). On the other hand, there are Word and PDF and Postscript docs which are in fact mostly text that might well be a good target for filtering. But there are many, many JPEGs, MP3s and the like being mailed around these days which do not seem like good targets for text filtering.

Niek

Comment 26

•

19 years ago

Can anyone explain why it takes > 3 years to fix something simple like this? I'm still getting the wrong search results, and my searches take way too long because it's searching through attachments unnecessarily as i explained earlier. It shouldn't take one programmer more than a day or so to make sure that attachments aren't searched. Or should it?

David :Bienvenu

Comment 27

•

19 years ago

LOL, if it was that easy, yes, it would be done. Body search has no idea about the mime structure of a message, and teaching it about mime would be non-trivial.

Niek

Comment 28

•

19 years ago

But surely it shouldn't have to take over 3 years?! I mean, MIME isn't rocket science.. just follow the specs from the corresponding RFC document. You don't even have to decode anything, just ignore the binary parts. Have a look at 'view source' in an email message: all that basically needs to be done is search through everything below "Content-Type: text/plain", and "Content-Type: text/html", and ignore the rest. It's rather trivial. If the reason that this isn't done is that Thunderbird basically isn't being supported anymore by the developer community, then so be it, but in that case it would be nice to put some sort of note on the main page - like: "this product is no longer being supported or improved", so potential users don't make the mistake to download this program expecting broken things to be fixed within a reasonable time span. Or am i basically just expecting too much? How do other users / developers feel about this? Do most people think it's normal to wait 3 years for simple bugs to be fixed? Also, there doesn't really seem to be any (visible) progress: it would be nice if the people managing this part of the program would post something on this bug page, like "yeah.. we're working on it: it'll be fixed in release 1.xx". That's the way it's done for instance by Sun on the Java bug parade. Right now - i wouldn't be surprised if it's still not fixed in another 3 years. (In reply to comment #27) > LOL, if it was that easy, yes, it would be done. Body search has no idea about > the mime structure of a message, and teaching it about mime would be non- trivial.

David :Bienvenu

Comment 29

•

19 years ago

Just because we're not working on your favorite bug doesn't mean we've done nothing over the last three years. >"Content-Type: text/plain", and "Content-Type: >text/html", and ignore the rest. It's rather trivial. What about messages with nested attachments? Anyway, we're just going to have to agree to disagree about the relative importance and difficult of fixing this bug. Any help you want to offer in coding up a fix for this would be greatly appreciated...

brewthatistrue

Comment 30

•

18 years ago

*** This bug has been marked as a duplicate of 37031 ***

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → DUPLICATE

Mike Cowperthwaite

Updated

•

18 years ago

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → MailNews Core