Open Bug 379988 Opened 17 years ago Updated 2 years ago

search body should not match words in MIME headers

Categories

(MailNews Core :: Search, defect)

defect

Tracking

(Not tracked)

People

(Reporter: bkausbk, Unassigned)

References

(Blocks 1 open bug)

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3 Build Identifier: 2.0.0.0 (20070326) Searching in messaage body text with STRG+SHIFT+F (searching messages) also find matches in email header. If this is intended there should be an option to ignore message (mime) header(s). Reproducible: Always Steps to Reproduce: 1. STRG+SHIFT+F 2. Search Message Body Actual Results: Search for "atta" will find each email with attachment
Updating summary and confirming. Words like attachment sure will match a lot bc of "Content-Disposition: attachment;"
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Summary: Search (STRG+SHIFT+F) also searches in email header → search body should not match words in MIME headers
Version: unspecified → 2.0
I had what I think is the same problem searching for "encoding". Ctrl-Shift-F Search Message Body for "encoding" retrieves tons of messages without that word, because they have the hidden mail header "Content-Transfer-Encoding:" Also, you get the same buggy behavior using the search box in the Thunderbird toolbar to search "Entire message". Also note that even if you turn on View > Headers > All, the search of Body or Entire message will match strings that don't appear in message headers, e.g. mime part information, the PGP key in application/pgp-signature, etc. IMO, Thunderbird should have separate options to search "Source code" and "Body", and when searching "Body" Thunderbird should skip stuff that you don't see in a normal message view. Skipping message and part headers is a good start; I don't have a strong opinion about skipping HTML tags, etc.
Assignee: mscott → nobody
Component: General → Search
Product: Thunderbird → MailNews Core
QA Contact: general → search
Version: 2.0 → unspecified
Bug 576994 is not a duplicate of this, Aureliano. Or if it is, this one is described incorrectly. That bug claims that the search function always searches headers, when it does not. It's entirely inconsistent, searching some headers in some messages and not in others.
oops, copy-paste error - I meant "This bug claims that..." instead of "That bug claims..."
A similar problem which I may or may not have reported in the past is that when searching message bodies for text, the search will grind its way through attachments including binary image attachments looking for the search key. For some text strings this will produce many bogus matches but worse than that, if many of the messages contain large image attachments it makes the search process horribly slow because the image files are so large relative to the text content. Excluding messages that contain attachments from the search is not a solution since the message being sought might well be one of the ones that HAS an attachment.
Blocks: 576994
Hardware: x86 → All
(In reply to Benjamin Kalytta from comment #0) > Actual Results: > Search for "atta" will find each email with attachment (In reply to Magnus Melin from comment #1) > Words like attachment sure will match a lot bc of "Content-Disposition: attachment;" (In reply to skierpage from comment #2) > I had what I think is the same problem searching for "encoding". > Ctrl-Shift-F Search Message Body for "encoding" retrieves tons of messages > without that word, because they have the hidden mail header > "Content-Transfer-Encoding:" (i) I could see bug 697021 on multipart mail. (Tb searches message headers of next mail) (ii) I couldn't see wrong search on such message headers of each sub-part of multipart mail. (iii) I could see "unwanted search on such header data" only on attached mail's header lines in message/rfc822 part. Because actual mails has many Received; headers, bug 697021 usually occurs on Unix mbox separator and limited message headers only. When POP3 mail, they are "From " line, X-UIDL:, X-Mozilla-Status/Status2:, X-Mozilla-Keys:, X-Account-Key:, Received:, Delivered-To: .... Even when IMAP, bug 697021 usually won't occur on "Content-Disposition: attachment;" of next mail, because such message header is not usually used in primary message header(primary headers which I call: headers from start of mail till first null line). Even when IMAP, Content-Transfer-Encoding: usually doesn't appear within "limited number of header lines of next mail" on which bug 697021 occurs, because actual mail has usually many Received: headers. If occurs, it's usually only on sent mail copy in Sent folder, draft in Drafts, template in Templates, manually crafted mail for testing. To all problem reporters in this bug: You are looking next, aren't you? If message/rfc822 part under multipart/xxx, message body data of the message/rfc822 part is mail data stream, and because of "mail data stream", message header line appears as message body data in message/rfc822 part. And, Tb searches message body data of message/rfc822 part under multipart/xxx. (In reply to S. Kerman from comment #10) > the search will grind its way through attachments including binary image attachments(snip) "Body search searches base64 encoded data in attachment part for requested-string" is already known problem and bug for it is already opened. I believe "unwanted or wrong search of base64 encoded data" and "unwanted search of message/rfc822 part" should be processed separately, even though "unwanted search of attachment part in multipart/xxx" is common. Needless to say, "wrong search of next mail's header" should be clearly distinguised from both of them.
FYI. Bug for "unwanted or wrong search of base64 encoded attachment" is Bug 37031(and many dups of that bug). > Bug 37031 searching message body yields false positives because base64 encoded binary attachments are treated as plaintext
There are two known problems which surely produces phenomenon of "false positive because body search actually searches message header", (1) bug 697021 (if multipart mail, message header of next mail), (2) bug 700541 (message header data in messae/rfc822 part). To all problem reporters in this bug: Which is your case? (a) Same as bug 697021, (b) Same as bug 700541, (c) Different from these two bugs. If (c), can you find bug for same problem as yours in bugs listed in dependency tree for Bug 519202(which is put in Blocks: of this bug)?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.