Open Bug 404255 Opened 17 years ago Updated 2 years ago

Consider using UTF-8 when searching inside message body (IMAP online search) to avoid search failure

Categories

(MailNews Core :: Networking: IMAP, defect)

defect

Tracking

(Not tracked)

People

(Reporter: shopik, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: intl, Whiteboard: [delight][datalossy])

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Build Identifier: version 2.0.0.9 (20071031)

I'm using English version of TB. If I'm trying to search inside entire message using Cyrillic symbols it doesn't return any messages, because it incorrectly select default search encoding. Here log captured from Wireshark. If you change this parameter to something other than ISO-8859-1 it start working Tools->Options->Display->Formating->Fonts->Incoming Mail->

5 uid SEARCH CHARSET ISO-8859-1 UNDELETED BODY "B5AB"
* SEARCH
5 OK Search completed. 

Reproducible: Always

Steps to Reproduce:
1.Create message with with non ASCI symbols
2.Tools->Options->Display->Formating->Fonts->Incoming Mail->ISO-8859-1 must be selected (default in english version of TB)
3.Make search
4.nothing returns
Actual Results:  
It probably should aware of input charters or use UTF-7 encoding in search always not depends on this setting.
Version: unspecified → 2.0
Reproducible on trunk 3.0a1pre (2008040204).
Windows XP SP2
Consider using UTF-8 or UTF-7 by default in english version of TB too.
Which kind of search, from the search dialog or using quick search? IMAP/POP?
quick search, IMAP, if you have this folder where you searching is available for offline there no such problem because no request is sent to server it searching locally.
Assignee: nobody → bienvenu
Component: Mail Window Front End → Networking: IMAP
Product: Thunderbird → Core
QA Contact: front-end → networking.imap
Version: 2.0 → Trunk
WORKSFORME, Tb trunk(2008041103 build on Japanese MS Win-XP) with mail of Content-Type:text/plain;charset=iso-2022-jp on Gmail IMAP server.
"CHARSET ISO-2022-JP" was specified on search command.

Character encoding is specified in mail header explicitly?
Search of Content-Type:text/html; part with charset for Cyrillic in <meta> or lang attribute, in a mutipart/alternative mail?
Oh, sorry. I seems to have misunderstood problem, and, iso-2022-jp was set in folder property of Gmail IMAP Inbox.
 (a) Tb trunk(en-US) on Japanese MS Win 
 (b) folder/properties/General/Default Character Encoding == iso-8859-1
 (c) Search in context menu of IMAP folder, "BODY contains"
 (d) "日本語" (Japanese kanji for "Japanese") in search word
Following was sent to Gmail IMAP server.
> uid SEARCH CHARSET ISO-8859-1 UNDELETED BODY {3}

Confirmed.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Note:
After the command, data of 0xE52C9E was sent by Tb for +go ahead in my environment.
Workaround:
If fall back charset in mail display is not needed(not_so_many mails with no charset/mail data in non-asci), set UTF-8 in Folder Property/Default Character Encoding, or charset which is usually used for text editing under your OS.
If fall back charset in mail display is required(many mails with no charset/mail data in non-asci, or many mails with incorrect charset), and narrower charset such as iso-8859-1 is required for fall back charset, split of mail folder is a possible circumvention, but I can say nothing for this case.

Adding "IMAP SEARCH CHARSET" in bug summary for ease of search.
Nikolay Shopik(bug opener), please change lengthy summary by me to appropriate one.
Summary: Search inside entire message incorrectly select encoding if using English version of TB → Search inside entire message incorrectly select encoding if using English version of TB (CHARSET of IMAP SEARCH is set to 'Default Character Encoding' of folder property)
you mean to this one - CHARSET of IMAP SEARCH is set to 'Default Character Encoding' of folder property.
Right?
> Right?
No. It's simply a short explanation of my test result. I only hope combination of IMAP,SEARCH,CHARSET that is suggestive IMAP command of "uid SEARCH CHARSET", for ease of search & understanding problem. I think "incorrectly select encoding" or your request of "Consider using UTF-8" is also important words.
Summary: Search inside entire message incorrectly select encoding if using English version of TB (CHARSET of IMAP SEARCH is set to 'Default Character Encoding' of folder property) → Consider using UTF-8, when searching inside message body
Nikolay, is this really "minor" to you?  wanted-TB3?
Keywords: intl
I don't mind to escalate it to normal, minor just because easy workaround. UTF8 aware app is very important for localized version.
Flags: blocking-thunderbird3?
Martin, Any calendar implications? 

(In reply to comment #11)
> UTF8 aware app is very important for localized version.

=> major would seem appropriate then
Severity: minor → major
(In reply to comment #12)
> Martin, Any calendar implications?

Wayne, none that I know of.
Summary: Consider using UTF-8, when searching inside message body → Consider using UTF-8, when searching inside message body (IMAP online search)
would this be delight for intl users?
Whiteboard: [delight]
we'd love to see a patch for this, but it's not blocking 3.0
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Priority: -- → P2
No longer blocks: 464899
OS: Windows XP → All
Hardware: x86 → All
Product: Core → MailNews Core
Summary: Consider using UTF-8, when searching inside message body (IMAP online search) → Consider using UTF-8 when searching inside message body (IMAP online search) to avoid search failure
Whiteboard: [delight] → [delight][datalossy]
Assignee: dbienvenu → nobody

Jorg, Does this ring a bell - sounds similar to bug that may have had activity in past year

Severity: major → normal
Flags: needinfo?(jorgk)
Priority: P2 → --

Quick search/regular search in bodies in - EDIT: local/offline folders - was fixed in bug 1427124.

EDIT: Let's pronounce this fixed by bug 1427124. Oops, the bug is about "online search".

Flags: needinfo?(jorgk)

That was only local search, this is the online case.

Right, I missed "online". Does that "search on server" work at all for bodies? Oops, I don't actually close the bug.

Of course (depending on the server). Many servers have the texts indexed so then body search is very fast server side.

(In reply to Nikolay Shopik from comment #8)

you mean to this one - CHARSET of IMAP SEARCH is set to 'Default Character
Encoding' of folder property.
Right?

Exactly. See (bug 1669775 comment #12)

78.4.0
US English
IMAP and Local Folders
Searching 'Body' in Search Folders... returns nothing.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.