Open
Bug 521649
Opened 15 years ago
Updated 2 years ago
Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)
Categories
(Thunderbird :: Search, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: thomas8, Unassigned)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5pre) Gecko/20091010 Shredder/3.0pre
STR (test mail attached)
1) compose mail containing words with umlauts (ä,ö,ü) in body text
2) save as draft
3) in draft folder, select "Message body filter" quicksearch and
4) filter for word with German umlauts (ä,ö,ü, e.g. Münster) that is in the body
5) do the same on msg after having received it (see second attachment)
expected
4) and 5): msg body filter should find the msg
actual
4) msg body filter does not find the draft msg
5) msg body filter finds the same msg after it was received
Reporter | ||
Updated•15 years ago
|
Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?)
Reporter | ||
Comment 1•15 years ago
|
||
This is basically the same message as testmail1, but after receiving it in inbox.
Comment 2•12 years ago
|
||
Thomas, does (ä,ö,ü) still fail?
Summary: Quick Search "Message body filter" does not find message text with umlauts in saved drafts messages (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (Character-encoding?)
Comment 3•12 years ago
|
||
it's WFM with current nightly
Comment 4•11 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #2)
> Thomas, does (ä,ö,ü) still fail?
Flags: needinfo?(bugzilla2007)
Reporter | ||
Comment 5•11 years ago
|
||
Yes, this still fails, both TB24 and Trunk (32.0a1 (2014-05-01))
This obviously depends how the umlauts are saved in draft, e.g. the word Münster:
In TB 24, composing new msg:
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Münster<br>
Quick filtering for ü fails, regardless of containing folder (after copying the draft into other folders).
Quick filtering for ü (sic) succeeds.
Fwiw, that's on a German Version of TB 24 sharing Profile with English Version of TB 24.
Not sure if that can cause confusion in language settings?
In Trunk, composing new msg:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Münster
Or at least that's what Ctrl+U msg source viewer shows, which is probably a bug in the source viewer.
If saved as .eml, then opened with Notepad++ advanced text editor, it shows as UTF correctly having the word "Münster". But search still fails, see below.
That's on English Daily, profile should be reasonably clean.
Quick filtering for ü fails, regardless of folder (after copying the draft around)
Quick filtering for ü fails.
Quick filtering for ü succeeds when they are in source (not applicable here).
Flags: needinfo?(bugzilla2007)
Reporter | ||
Comment 6•10 years ago
|
||
For later duping
Updated•9 years ago
|
Comment 7•7 years ago
|
||
As bug 1427124 shows, this doesn't have anything to do with drafts, but with messages which have plaintext and HTML part as the same time, like all drafts.
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (Character-encoding?) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (since they are multipart)
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #7)
> As bug 1427124 shows, this doesn't have anything to do with drafts, but with
> messages which have plaintext and HTML part as the same time, like all
> drafts.
???
I described correctly what I saw at the time, and the evidence is still attached. This had everything to do with drafts at the time of reporting, because the same msg failed when searching the saved draft, but succeeded when searching the received message. And I don't see any multipart in either test message, both have only one part, and both are MIME messages.
In a way this another variation/symptom of the downgrading HTML to plain text saga, only this time plain text won for successful quick filtering, and HTML failed at the time of reporting.
My Comment 5 (almost 5 years later, so might not exactly apply to test cases from 8 years ago) correctly points to the most likely cause of this at the time, which is encoding (so I don't see why you removed that from the sumary):
Draft = HTML -> ü in source -> search für "ü" fails, but search for "ü" succeeds -> searching raw text/HTML
Received = plaintext -> some other encoding (charset=ISO-8859-1) -> search succeeds for that text/plain encoding/charset.
That's essentially the same as what you're saying in bug 1427124, comment 5:
> Most likely the search is done on the raw UTF-8, so only ASCII text is found.
I don't see the link of that with multipart messages, can you enlighten me?
Pls don't just make me look wrong without reading the bug and testcases.
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (since they are multipart) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)
Reporter | ||
Comment 9•7 years ago
|
||
So from here we need to revisit the testcases and see what we're doing today under the same circumstances.
Updated•7 years ago
|
Attachment #405765 -
Attachment mime type: message/rfc822 → text/plain
Updated•7 years ago
|
Attachment #405766 -
Attachment mime type: message/rfc822 → text/plain
Comment 10•7 years ago
|
||
Wow, you're right, I didn't look at the test cases from back then. And drafts aren't even multipart :-(
So I was all wrong. That said, I have no idea how you managed to get
Münster ist eine der schönsten Städte der Welt.
into the draft. But yes, that wouldn't be found.
Sorry about the confusion and my mistake.
The basic problem is another facet of bug 1259534: We search some raw data instead of converting it into un-escaped and decoded text first.
Updated•7 years ago
|
Blocks: qfasfailtracker
Reporter | ||
Comment 11•7 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #10)
> Wow, you're right, I didn't look at the test cases from back then. And
> drafts aren't even multipart :-(
> So I was all wrong. That said, I have no idea how you managed to get
> Münster ist eine der schönsten Städte der Welt.
> into the draft.
Wasn't me, it was Thunderbird (at the time, long back, but I was already there...)
> But yes, that wouldn't be found.
Even today, in 2017. Just tested. And then, it's not all that hard to get ü in source when importing .eml messages not created by TB...
> Sorry about the confusion and my mistake.
No problem, thanks.
> The basic problem is another facet of bug 1259534: We search some raw data
> instead of converting it into un-escaped and decoded text first.
Yes. That's an ugly bug that should be terminated. I know it's a multipart (pun intended) hydra, but cutting off a head here and there might one day kill the beast. Alternatively, blast the whole thing away and start reassembling phoenix from the ashes... Ah well, just dreaming... :|
Reporter | ||
Updated•6 years ago
|
Summary: Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed entities in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1) → Quick Search "Message body filter" does not find message text with umlauts (ä,ö,ü) in saved drafts messages (searching unparsed HTML entities ü etc. in text/html), but succeeds for same msg when received (as text/plain, charset=ISO-8859-1)
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•