Closed
Bug 136055
Opened 23 years ago
Closed 18 years ago
Filter/Search on Body erroneously applied to encoded binary attachments
Categories
(MailNews Core :: Filters, defect)
Tracking
(Not tracked)
People
(Reporter: dmitry, Assigned: sspitzer)
References
Details
If I have a filter rule
Body contains "porn"
then a legitimate message with a binary attachment gets filtered if the
attachment happens to contain
...
1jQNPorNpERB6fvelBUi1+XqGieb7gKwd8asCQNRAO7Uf6AINv/A/+DgH3DW/gJXyejoLLjVRSpI
...
*** Bug 153973 has been marked as a duplicate of this bug. ***
On the other hand, if the message is multipart/mixed and its html part is
encoded as base64, filters of the form "body contains string" do not apply to
it. I think the html part of a message should be considered its "body" for
filtering purposes, to be decoded if necessary and fed through the "body" filters.
*** Bug 166573 has been marked as a duplicate of this bug. ***
*** Bug 159645 has been marked as a duplicate of this bug. ***
*** Bug 181418 has been marked as a duplicate of this bug. ***
Confirmed. Voted for.
Suggest changing OS to all.
This is quite annoying to me also. I have a filter called "porn mail" that
checks to see if the body contains "sex", "porn", "farm", etc. I use match "any
of these words" because "match all words" doesn't work well for this
application. There is therefore no way to setup filters like these without
making each filter only 1 word + "and only if message doesn't have attachment"
(because I would have to use match all).
Filters would be greatly improved if this bug were to be fixed.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 9•22 years ago
|
||
don't a rule "body doesn't contain 'Content-Transfer-Encoding: base64'" (or any
other way to find out attachements. this was the best that passed my mind now)
solve this?
Comment 10•22 years ago
|
||
Well, try it yourself. I don't think that will work though, because you would
have to use "match all", instead of "match any of the below rules". Using match
all would work, but it doesn't solve the problem. It's just a very awkward
workaround.
You cannot stop the filters from being applied to a message's attachments.
Using match all extremely limits the usefulness of making filters, unless you
want to make one filter per word to match. Correct me if I'm wrong.
Comment 11•22 years ago
|
||
Well, I have a rule set up that moves Klez to a Viruses folder using a "Body
contains <first line of encoded Klez>" filter, so if this bug were fixed that
would stop working.
Comment 13•21 years ago
|
||
Maybe a "Body" and a "text-only body" that will get only the parts with
"Content-type: text/anything". For example, if the Reporter of that bug set the
rule:
"text-only body" match any porn or xxx
it would not tigger the filter on a message that have a gif with
...
1jQNPorNpERB6fvelBUi1+XqGieb7gKwd8asCQNRAO7Uf6AINv/A/+DgH3DW/gJXyejoLLjVRSpI
...
because it will look on text/plain, text/html, and others text/*
but the Justin Kerk from comment #11 would be able to set
"body" match <first line of encoded Klez>
Comment 14•21 years ago
|
||
workaround that worked for me and a few others, I get probably 30 or 40 junk
emails a day, i have maybe 30 or 40 ppl in my address book and those that i
confer with daily. Since most of them are from the same domain mailer, i setup
something like so...
if sender deos not contain <specified domain> then move to trash
if sender deos not contain <specified email> Or <specified email>, etc.
i find that filtering out the emails i do want instead of those i dont' want
works faster and is much less work.
I would still like an filter attachment though, i am using currently 3 rules
explicitly stating different parts of the header info using the AND feature I
can get 90% of them but a custom one named attachment would be nice.
Comment 15•20 years ago
|
||
This bug is still present in current builds, e.g. Thunderbird 0.7.*. I would
really appreciate if it would be fixed before 1.0 because it's bugging me since
Netscape 6.0.
I have it set up to mark all emails that contain the word "yps" to be moved to a
seperate folder, but it works so bad that almost any mail with an attachment is
being moved.
Comment 16•20 years ago
|
||
*** Bug 267230 has been marked as a duplicate of this bug. ***
Comment 17•20 years ago
|
||
That duplicate notes that the problem exists for UUencoded attachments, as well
as MIME ones.
Updated•20 years ago
|
Product: MailNews → Core
Comment 18•20 years ago
|
||
*** Bug 272042 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Summary: Filters erroneously apply to encoded binary attachments → Filter/Search on Body erroneously applied to encoded binary attachments
Comment 19•20 years ago
|
||
This is a big problem, and while one result of the bug is talked about
frequently (the fact that the wrong messages are matched with a
search/filter), another problem is mostly overlooked, but arguably even more
important: a full body search takes extremely long to complete in folders with
many attachments, since those attachments can easily be megabytes in size,
while the text in the message (that should be searched) is perhaps only a few
percent of that. So, the search could easily be made 95% quicker or so in most
situations, where people send a few pictures or other documents every now and
then. I really don't understand that it takes years to fix this bug: i thought
open-source actually meant that things get fixed quickly, but this really
makes me lose confidence in this process and this product.
Comment 20•20 years ago
|
||
*** Bug 282682 has been marked as a duplicate of this bug. ***
Comment 21•19 years ago
|
||
I think this is a dup of bug #132340
Comment 22•19 years ago
|
||
(In reply to comment #21)
> I think this is a dup of bug #132340
Not exactly -- there we *do* want to search the body after decoding; here, we
not only don't want to search within (binary) attachments, we don't even want to
decode them in the first place (during search).
Comment 23•19 years ago
|
||
(In reply to comment #22)
> (In reply to comment #21)
> > I think this is a dup of bug #132340
>
> Not exactly -- there we *do* want to search the body after decoding; here, we
> not only don't want to search within (binary) attachments, we don't even want to
> decode them in the first place (during search).
Ah, ok, that makes sense. But in light of my efforts to unify the junk filter
and regular filters, I think we need to make this difference explicit in the UI.
E.g., there should be separate criteria:
Body Text <contains/etc> <-- that is, only search the plaintext
Attachment <contains/etc> <-- only search the attachments, decode as necessary
Body <contains/etc> <-- everything
and some other bug reports have also requested
All Headers
and
Entire Message
as criteria scopes. That would probably cover all the bases.
Comment 24•19 years ago
|
||
(In reply to comment #23)
> E.g., there should be separate criteria:
> Body Text <contains/etc> <-- that is, only search the plaintext
> Attachment <contains/etc> <-- only search the attachments, decode as
necessary
> Body <contains/etc> <-- everything
After re-reading some more, that doesn't seem to really cover it completely. It
helps to be able to decide which portions of the message to filter. If the
portion you choose is encoded, it should always be decoded. (And the spam filter
will always operate on the whole message, I didn't need to mention that here.)
Comment 25•19 years ago
|
||
(In reply to comment #24)
> If the portion you choose is encoded, it should always be decoded. (And the
> spam filter will always operate on the whole message, I didn't need to
> mention that here.)
I think that's all fine; but I wonder if it's ever necessary to perform a text
search within a binary attachment.
As an example, suppose you regularly get messages with text/html attachments,
and others with (large) image/jpeg. If you're searching "attachment body" for a
string that you expect in some html files, you don't need to spend the time
decoding the JPEG and then performing a probably-fruitless search in that file's
data; even if you *did* get a match (on a short string, presumably, like that in
the original bug report here), it would probably be a false positive.
Generally, I would prefer that "attachment body" searching was limited to text
attachments (including message/rfc822 attachments). On the other hand, there
are Word and PDF and Postscript docs which are in fact mostly text that might
well be a good target for filtering. But there are many, many JPEGs, MP3s and
the like being mailed around these days which do not seem like good targets for
text filtering.
Comment 26•19 years ago
|
||
Can anyone explain why it takes > 3 years to fix something simple like this?
I'm still getting the wrong search results, and my searches take way too long
because it's searching through attachments unnecessarily as i explained
earlier. It shouldn't take one programmer more than a day or so to make sure
that attachments aren't searched. Or should it?
Comment 27•19 years ago
|
||
LOL, if it was that easy, yes, it would be done. Body search has no idea about
the mime structure of a message, and teaching it about mime would be non-trivial.
Comment 28•19 years ago
|
||
But surely it shouldn't have to take over 3 years?! I mean, MIME isn't rocket
science.. just follow the specs from the corresponding RFC document. You don't
even have to decode anything, just ignore the binary parts. Have a look
at 'view source' in an email message: all that basically needs to be done is
search through everything below "Content-Type: text/plain", and "Content-Type:
text/html", and ignore the rest. It's rather trivial.
If the reason that this isn't done is that Thunderbird basically isn't being
supported anymore by the developer community, then so be it, but in that case
it would be nice to put some sort of note on the main page - like: "this
product is no longer being supported or improved", so potential users don't
make the mistake to download this program expecting broken things to be fixed
within a reasonable time span.
Or am i basically just expecting too much? How do other users / developers
feel about this? Do most people think it's normal to wait 3 years for simple
bugs to be fixed? Also, there doesn't really seem to be any (visible)
progress: it would be nice if the people managing this part of the program
would post something on this bug page, like "yeah.. we're working on it: it'll
be fixed in release 1.xx". That's the way it's done for instance by Sun on the
Java bug parade. Right now - i wouldn't be surprised if it's still not fixed
in another 3 years.
(In reply to comment #27)
> LOL, if it was that easy, yes, it would be done. Body search has no idea
about
> the mime structure of a message, and teaching it about mime would be non-
trivial.
Comment 29•19 years ago
|
||
Just because we're not working on your favorite bug doesn't mean we've done
nothing over the last three years.
>"Content-Type: text/plain", and "Content-Type:
>text/html", and ignore the rest. It's rather trivial.
What about messages with nested attachments?
Anyway, we're just going to have to agree to disagree about the relative
importance and difficult of fixing this bug. Any help you want to offer in
coding up a fix for this would be greatly appreciated...
Comment 30•18 years ago
|
||
*** This bug has been marked as a duplicate of 37031 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
Updated•18 years ago
|
Status: RESOLVED → VERIFIED
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•