Closed Bug 686519 Opened 13 years ago Closed 13 years ago

Non english characters not shown properly on subject field

Categories

(Thunderbird :: General, defect)

6 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 513472

People

(Reporter: o2627091, Unassigned)

Details

Attachments

(5 files, 1 obsolete file)

Attached image error.png (deleted) —
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 Build ID: 20110902133214 Steps to reproduce: See attachment. Thunderbird 6.0.2 Actual results: On one field the 'ó' character is shown and in the other a interrogation symbol is shown instead. Expected results: Show them properly everywhere.
Can you attach the email here, in .eml format? You can remove private texts from it (in a text editor), just do not touch the subject field.
Attached file Problem with special characters. (deleted) —
In this latest one, the symbol '¿' is sent as 'ż'
Attached file other different problem. (obsolete) (deleted) —
(In reply to Bastard from comment #4) Read this thread, it contains some suggestions: http://groups.google.com/group/mozilla.support.thunderbird/browse_thread/thread/e200c3250862e091
(In reply to Hashem Masoud from comment #5) > (In reply to Bastard from comment #4) > Read this thread, it contains some suggestions: > http://groups.google.com/group/mozilla.support.thunderbird/browse_thread/ > thread/e200c3250862e091 Thank you, I don't know what default encoding was set, but setting to those shows properly the character, among others like 'ñ': - Western (ISO 8859-1) - Western (ISO 8859-15) - Western (Windows-1252) Using UTF-8 makes things worse. Anyway the problem from "Comment 2" is still present. I am using Portable Thunderbird from portable apps and I guess it comes with other character encoding.
Sorry, I mean the error in error.png (Problem with special characters.), the other is solved with character encoding.
Attachment #563144 - Attachment mime type: application/octet-stream → text/plain
Attachment #563146 - Attachment mime type: application/octet-stream → text/plain
I think subjects (in comment 2) should also contain charset definition when it contains chars outside the default charset (probably iso-8859-1?). I don't know if 'ó' is outside. Something like this: =?UTF-8?Q?=20Mails=20re=C3=A7us=20en=20doublon=20suite=20derni=C3=A8re=20mise=20=C3=A0=20jour?= This may be a problem on the sender.
(In reply to Bastard from comment #3) > In this latest one, the symbol '¿' is sent as 'ż' (In reply to Bastard from comment #4) > Created attachment 563146 [details] > other different problem. Message header of the mail. > Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message text data in mail body is shown like next by some charset; > ISO-8859-1 : Hola Alex, ¿al final qué te dijeron? > windows-1250 : Hola Alex, żal final qué te dijeron? What phenomenon when what you did do you call by "the symbol '¿' is sent as 'ż'"? If your expectation is "shown as 'ż'" but it's shown as "¿", Tb *correctly* shows it, because sender specifies charset=ISO-8859-1. Because charset=ISO-8859-1 is sepecified, the binary code is never shown in glyph of 'ż', unless you intentionally force windows-1250 or similar which has glyph of 'ż' at code point of "¿" in ISO-8859-1. What setting of Tb do you call by "default encoding"?
(In reply to WADA from comment #9) > (In reply to Bastard from comment #3) > > In this latest one, the symbol '¿' is sent as 'ż' > (In reply to Bastard from comment #4) > > Created attachment 563146 [details] > > other different problem. > > Message header of the mail. > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Message text data in mail body is shown like next by some charset; > > ISO-8859-1 : Hola Alex, ¿al final qué te dijeron? > > windows-1250 : Hola Alex, żal final qué te dijeron? > > What phenomenon when what you did do you call by "the symbol '¿' is sent as > 'ż'"? > > If your expectation is "shown as 'ż'" but it's shown as "¿", Tb *correctly* > shows it, because sender specifies charset=ISO-8859-1. > Because charset=ISO-8859-1 is sepecified, the binary code is never shown in > glyph of 'ż', unless you intentionally force windows-1250 or similar which > has glyph of 'ż' at code point of "¿" in ISO-8859-1. > What setting of Tb do you call by "default encoding"? I call default encoding the encoding Tb guesses it is the appropiate, maybe as the result of the parse of all downloaded emails. Is there are a few or not varied it guesses a faulty character encoding. To show it I include a snapshot of Tb portable just after start syncing with my account. As you see, there is no selected encoding. Later, when more emails are downloaded it successfully sets Windows-1252.
(In reply to Bastard from comment #10) > > What setting of Tb do you call by "default encoding"? > I call default encoding the encoding Tb guesses it is the appropiate, maybe > as the result of the parse of all downloaded emails. You don't enable Auto-Detect, so Tb doesn't guess even if charset is unknown. (needless to say, Tb doesn't guess when charset is explicitly specified in mail) (In reply to Bastard from comment #11) > No encoding selected at start of email retrieval. Tb doesn't indicate current charset with a black circle if the charset is not listed in View/Character Encoding menu. Add required charset via "Customize List..." in View/Character Encoding. ISO-8859-15, windows-1252 are probably better added in your environment. If charset of mail is unknown and if Auto-Detect is enabled by you, Tb tries to guess it. Fall back charset looks folder's default charset which you choose at Folder Properties. If charset of mail is unknown but if Auto-Detect is disabled by you, Tb doesn't try to guess it, and fall back charset is folder's default charset which you choose at Folder Properties. For mail of next Subject case. > Shown glyph by some character encodings; > ISO-8859-1: Subject: Su ticket se recibió > TIS-620: Subject: Su ticket se recibi๓ > KOI8-R: Subject: Su ticket se recibiС > ISO-8859-8-I: Subject: Su ticket se recibiף > UTF-8: Subject: Su ticket se recibi� Because charset is not specified in Content-Type: header, and because you don't enable Auto-Detect, charset you chose at Folder Poperty/General is applied. Default of this "per folder charset" of newly created folder is charset you choosed at Tools/Option/Display, Fonts, Advanced, Character encodings, Incoming mail. This is used too upon internal rebuild-index, if you delete .msf file. Can you see your problem on this mail even with "per folder charset"=ISO-8859-1?
I still have the problem with Tb 7.0.1 even when ISO-8859-1 is already set by default.
Attached image Problem persists with Tb 7.0.1 (deleted) —
(In reply to Bastard from comment #14) > Created attachment 565457 [details] > Problem persists with Tb 7.0.1 I've deleted all .msf files as well and the problem persists.
(In reply to Bastard from comment #13) > I still have the problem with Tb 7.0.1 even when ISO-8859-1 is already set > by default. Question again. What setting at where of Tb do you call by "ISO-8859-1 is already set by default"? As I said in previous comment, Tools/Option/Display, Fonts, Advanced, Character encodings, Incoming mail is merely a default of newly created mail folder. Did you check mail folder's character encoding setting? Problem with which mail do you still have? Both? - Content-Type: text/html with no charset case (mail you attached first) - Content-Type: ...; charset=iso-8859-1 case (mail you attached secod) Which is your expectation in this case? "¿al final qué" or "żal final qué". Do next test first with your mail of "Subject: Su ticket se recibió"(when shown as ISO-8859-1, test mail you attached to this bug first), in order to see effect of folder's charset in mail display on mail of no charset in Content-Type:. (1) Create a local mail folder. Copy the mail(call mail-1), copy another mail(call mail-2). (place two mails only, for ease of testing) View/Character Encoding/Auto-Detect=Off, for ease of testing. (2) Folder Properties, change to charset=ISO-8859-1, "Repair Folders", OK 2-1. observe thread pane, observe message header pane 2-2. click mail-2 2-3. click mail-1, observe message header pane (3) Folder Properties, change to charset=TIS-620, "Repair Folders", OK 3-1. observe thread pane, observe message header pane 3-2. click mail-2 3-3. click mail-1, observe message header pane (4) Folder Properties, change to charset=KOI8-R, "Repair Folders", OK 4-1. observe thread pane, observe message header pane 4-2. click mail-2 4-3. click mail-1, observe message header pane (5) Folder Properties, change to charset=ISO-8859-8-I, "Repair Folders", OK 5-1. observe thread pane, observe message header pane 5-2. click mail-2 5-3. click mail-1, observe message header pane (6) Folder Properties, change to charset=UTF-8, "Repair Folders", OK 6-1. observe thread pane, observe message header pane 6-2. click mail-2 6-3. click mail-1, observe message header pane Check difference between step x-1 and x-3 of each test with different charset. Check difference of thread pane in step (1) to (6) after "Repair Folders". Do you see "recibi�" at thread pane any case? I could see mismath between thread pane and message header pane at step x-1 oly(e.g. recibiС at thread pane, but still recibi๓ at message header pane). At all step x-3 of all test cases, same glyph was shown at thread pane and message header pane. This is never charset relevant issue. It's simply a message pane refresh issue after Repair Folder(and/or charset change of folder). When ISO-8859-1 is selected as foldr's charset, what is shown at View/Character Encoding? Is ISO-8859-1 used? Or windows-1252 is used? Or other? How about with View/Character Encoding/Auto-Detect=Universal?
I call charset set by default the one that appears as selected in View -> Character encoding after adding the first mail account. I didn't find such specific mail folder's character encoding setting. For the expectation case, it is "¿al final qué". The problem I still have is with this attachment: https://bug686519.bugzilla.mozilla.org/attachment.cgi?id=563144 I've tried the several steps, this is what happens at each step: (1) Each time I copy the email from attachment 563144 [details] to the created folder is shows the � glyph. The charset in this case is set to ISO-8859-1 and autodetect=off, and happens the same even if the option "Apply default to all messages in the folder (individual message character encoding settings and auto-detection will be ignored)" is checked. (2) When hitting repair folders it automatically changes the glyph to the 'ó' symbol, just before hitting ok. Both message header pane and thread pane show correctly the symbol and remains like that after switching from one email to another. (3) 3-1 The header pane changes title in header pane from "recibió" to "recibi๓", but the subject in thread pane still looks like the old charset encoding. 3-2 3-3 The header pane remains he same and now the thread pane changes the title word "recibió" to "recibi๓". (4) The same as (3) but with "recibiС" (5) The same as (3) but with "recibiף" (6) The same as (3) but with "recibi" After the first time I clicked "Repair folder", I never saw again the � glyph for the messages already present in that folder. I completely agree with this: "I could see mismath between thread pane and message header pane at step x-1 oly(e.g. recibiС at thread pane, but still recibi๓ at message header pane). At all step x-3 of all test cases, same glyph was shown at thread pane and message header pane. This is never charset relevant issue. It's simply a message pane refresh issue after Repair Folder(and/or charset change of folder)." After doing the last step and changing from UTF-8 to ISO-8859-1, View -> Character encoding still shows UTF-8, I have to do click on mail-2 and then mail-1, and then does shown ISO-8859-1 on View -> Character encoding. I could notice that changing charset encoding from View->Character encoding to a new one, for example HZ, it changes the appropiate symbols but only in the lower pane, and clicking then mail-2 and then mail-1 is uses again the old charset, in my case ISO-8850-1. So my appreciation is that the View menu does not change effectively the charset encoding of the folder, but only shows a preview.
(In reply to Bastard from comment #17) > (1) Each time I copy the email from attachment 563144 [details] to the > created folder is shows the � glyph. The charset in this case is set to > ISO-8859-1 and autodetect=off, and happens the same even if the option > "Apply default to all messages in the folder (individual message character > encoding settings and auto-detection will be ignored)" is checked. Folder's charset which consistently shows � glyph at thread pane(at message header pane and message pane too, though) was ISO-2022-JP in my environment. What charset is set in Folder Properties/General of your Inbox? For "� at thread pane but ó at message header pane" in attached screen shot: I could see next with folder's charset=UTF-8 in my environment, with crafted test mail. - Save the mail as .eml, edit the .eml(use ISO-8859-1 or windows-1252), add "Subject: Su ticket se recibió" line just before <table> tag of html, save file. - Drag&Drop the .eml to Tb's thread pane of a local mail folder. Tb imports the .eml to folder. - Change folder's charset to UTF-8, click the mail again. - Message display (a) Thread pane: Su ticket se recibi (b) Message header pane: Subject: Su ticket se recibi (c) Message pane: Subject: Su ticket se recibi� This mismatch among panes is by difference of used font. Some fonts return glyph of space for bynary of ó in ISO-8859-1, but some fonts return "not-defined" for it then Tb uses U+FFFD of the font. And, some fonts may return glyph of space for U+FFFD instea of glyph of �. Such difference of fonts mainly depends on attribute of font; unicode font or not(Shift_JIS fonts are widely used in Japanese MS Win). Possible cause of "� at thread pane but ó at message header pane" in attached screen shot is; folder's charset was changed but message pane was not refreshed by "click other mail, then click the mail again". However, above "defference of used font" is possibly a cause of such phenomenon in your evironment. Confusing but fonts for utf-8 data is defined at Tools/Option/Display, Formatting, Fonts, Advanced, Fonts for: Other Languages. What fonts do you choose at "Fonts for: Other Languages"?
(In reply to Bastard from comment #17) > For the expectation case, it is "¿al final qué". Even when Content-Type: text/...; charset=ISO-8859-1 is properly set in mail, if you check next folder's option, > [x] Apply default to all messages in the folder > (individual message character encoding settings and auto-detection will be ignored) charset correctly specified in Content-Type: header is ignored, as clearly stated in the option setting. This option is for torelance with "incorrect charset in Content-Type: header by bad mailers". Do you enable this option for your Inbox?
Third possible cause of "� at thread pane but ó at message header pane" in attached screen shot; - Folder's charset was a charset which produces � at thread pane - You selected a charset which shows ó, such as ISO-8859-1, windws-1252, at View/Character Encoding menu. If this case, it's normal phenomenon, because, as you already know, selection of a charset at View/Character Encoding is ad-hoc choice for currently shown mail and is applied to display of message header pane/message pane/attachmnt pane only.
(In reply to WADA from comment #18) > (In reply to Bastard from comment #17) > > (1) Each time I copy the email from attachment 563144 [details] to the > > created folder is shows the � glyph. The charset in this case is set to > > ISO-8859-1 and autodetect=off, and happens the same even if the option > > "Apply default to all messages in the folder (individual message character > > encoding settings and auto-detection will be ignored)" is checked. > > Folder's charset which consistently shows � glyph at thread pane(at message > header pane and message pane too, though) was ISO-2022-JP in my environment. > What charset is set in Folder Properties/General of your Inbox? All Inbox folders Folder Properties/General are set to Western (ISO-8859-1), but I repeat I only see the glyph at the header pane, not at the lower pane (message). > Possible cause of "� at thread pane but ó at message header pane" in > attached screen shot is; folder's charset was changed but message pane was > not refreshed by "click other mail, then click the mail again". > However, above "defference of used font" is possibly a cause of such > phenomenon in your evironment. > Confusing but fonts for utf-8 data is defined at Tools/Option/Display, > Formatting, Fonts, Advanced, Fonts for: Other Languages. > What fonts do you choose at "Fonts for: Other Languages"? The font that comes selected to my by default is: - On Thunderbird Portable (7.0.1) on Windows 7: Calibri size 17 - On MAC OS X Lion install version (7.0.1): Lucida Grande size 15 The glyph problem happens exactly the same on both versions, and it always fixes properly doing Folder -> Properties -> Repair folder. There is not such Other languages setting for me.
(In reply to WADA from comment #19) > (In reply to Bastard from comment #17) > > For the expectation case, it is "¿al final qué". > > Even when Content-Type: text/...; charset=ISO-8859-1 is properly set in > mail, if you check next folder's option, > > [x] Apply default to all messages in the folder > > (individual message character encoding settings and auto-detection will be ignored) > charset correctly specified in Content-Type: header is ignored, as clearly > stated in the option setting. > This option is for torelance with "incorrect charset in Content-Type: header > by bad mailers". > > Do you enable this option for your Inbox? No, I had it disables, comes disabled by default on all Tb versions.
This bug could be fixed applying the "Repair folder" anytime the charset is changed and apply it to new mail coming, instead of doing it manually each time.
(In reply to Bastard from comment #24) > This bug could be fixed applying the "Repair folder" anytime the charset is > changed and apply it to new mail coming, instead of doing it manually each time. Intentional "Repair Folder" in previous testing is to see effect of rebuiding of index(.msf) explicitly after folder's charset change. Tb currently executes rebuild-index(==Repair Folder in panel) internally & automatically after folder's charset change. Observe thread pane and message header pane in next test(without Repair Folder), please. (1) Folder's charset=ISO-8859-1, click mail-1. Thread pane: recibió Message header pane: recibió (2) Change to KOI8-R. (2-1) Folder Properties/General, change charset to KOI8-R. click OK, without Repair Folder (2-2) Thread pane: recibiС <= changed by internal rebuild-index. thread pane is refreshed. Message header pane: recibió <= not refreshed after rebuild-index. (2-3) Click mail-2, then click mail-1 at thread pane Thread pane: recibiС Message header pane: recibiС <= refreshed by re-display. Sorry for non-minimum and confusing testing.
(In reply to WADA from comment #25) > (In reply to Bastard from comment #24) > > This bug could be fixed applying the "Repair folder" anytime the charset is > > changed and apply it to new mail coming, instead of doing it manually each time. > > Intentional "Repair Folder" in previous testing is to see effect of > rebuiding of index(.msf) explicitly after folder's charset change. > Tb currently executes rebuild-index(==Repair Folder in panel) internally & > automatically after folder's charset change. Observe thread pane and message > header pane in next test(without Repair Folder), please. > (1) Folder's charset=ISO-8859-1, click mail-1. > Thread pane: recibió > Message header pane: recibió > (2) Change to KOI8-R. > (2-1) Folder Properties/General, change charset to KOI8-R. > click OK, without Repair Folder > (2-2) Thread pane: recibiС <= changed by internal rebuild-index. > thread pane is refreshed. > Message header pane: recibió <= not refreshed after rebuild-index. > (2-3) Click mail-2, then click mail-1 at thread pane > Thread pane: recibiС > Message header pane: recibiС <= refreshed by re-display. > Sorry for non-minimum and confusing testing. OK, this is exactly as happens to me, but, where is the bug that makes the � appear in the first place? Because in the message as first copied the glyph remains no matter how many times I change the charset, given that I don't click Repair folder.
(In reply to Bastard from comment #22) > All Inbox folders Folder Properties/General are set to Western (ISO-8859-1), > but I repeat I only see the glyph at the header pane, not at the lower pane (message). For mail of "Subject: Su ticket se recibió"(if shown as ISO-8859-1), which is mail you attached to this bug first? Or for different mail? As seen in test of comment #25, message header pane/message pane is not refreshed by folder's charset change(invokes internal rebuild-index automatically) or explicit "Repair Folders". This case?
(In reply to Bastard from comment #26) > but, where is the bug that makes the � appear in the first place? > Because in the message as first copied the glyph remains no matter how many times > I change the charset, given that I don't click Repair folder. Which of next mails? a) Mail you attached to this bug first (no charset in Content-Type:, Subject is not encoded) b) Mail you attached to this bug second (Content-Type: text/...; charset=ISO-8859-1 is correctly specified) ("żal final qué te dijeron?" is shown, instead of "¿al final qué te dijeron?") c) Other mail
Attachment #563146 - Attachment is obsolete: true
(In reply to WADA from comment #27) > (In reply to Bastard from comment #22) > > All Inbox folders Folder Properties/General are set to Western (ISO-8859-1), > > but I repeat I only see the glyph at the header pane, not at the lower pane (message). > > For mail of "Subject: Su ticket se recibió"(if shown as ISO-8859-1), which > is mail you attached to this bug first? Or for different mail? > > As seen in test of comment #25, message header pane/message pane is not > refreshed by folder's charset change(invokes internal rebuild-index > automatically) or explicit "Repair Folders". This case? If Tb does rebuild-index internally then there is something wrong, because I insist that before hitting the Repair folder for the first time the glyph doesn't go even changing charset. Aren't you able to reproduce the bug, that the � glyph doesn't change if you don't hit manually Repair folder? The attachment I am talking to is the only one available actually: https://bugzilla.mozilla.org/attachment.cgi?id=563144
(In reply to Bastard from comment #29) > The attachment I am talking to is the only one available actually: > https://bugzilla.mozilla.org/attachment.cgi?id=563144 > If Tb does rebuild-index internally then there is something wrong, because I > insist that before hitting the Repair folder for the first time the glyph > doesn't go even changing charset. What charset was set at Folder Properties/General before before charset change at there? What glyph was shown at message header pane before the folder's charset change? From what charset to what charset did you change at Folder Properties/General without Repair Folder? Did you click OK at Folder Properties/General dialog and close Properties/General dialog after the folder's charset change request? (Forcing click of "Repair Folder" in first testing is to avoid this kind of problem in test operation. See test procedure without "Repair Folder" I wrote.) > Aren't you able to reproduce the bug, that the � glyph doesn't change if you don't hit manually Repair folder? I couldn't reproduce, except when folder's charset change from a charset which shows � to other charset which shows � too.
I understand. When you load the eml with ISO-8859-1 set you don't see the � glyph, maybe it only happens when I receive the mail directly from the sender's server. Is there any way to attach the Tb portable but removing personal data? So you can see by yourself.
(In reply to Bastard from comment #31) > When you load the eml with ISO-8859-1 set you don't see the � glyph, > maybe it only happens when I receive the mail directly from the sender's server. IMAP? If yes, server may return U+FFFD for non-encoded non-ascii data in Subject: to "fetch.headers subject". This is known problem of server. If rebuild-index is invoked at local folder, Tb uses Subject: header data directly, then uses folder's charset to message body data and subject header data, because of no charset in Content-Type.
IMAP server's problem is Bug 513472. Command use by Tb was "fetch BODY.PEEK[HEADER.FIELDS (... subject ...".
Yes, I am using IMAP with Gmail. I can wait until bug 513472 is solved and test again, to see if this bug is a duplicate. For now, I think it is premature to mark it as such.
Do you recognize this string, is Custom_folder.msf? <(86=21)(87=NOREPLY@pccomponentes.com)(88=example@gmail.com)(8F =Su ticket se recibi$EF$BF$BD)
It is clearly define in java: http://www.fileformat.info/info/unicode/char/fffd/index.htm If needed, I could attach the msf file from custom folder.
(In reply to Bastard from comment #35) > Do you recognize this string, is Custom_folder.msf? > <(86=21)(87=NOREPLY@pccomponentes.com)(88=example@gmail.com)(8F > =Su ticket se recibi$EF$BF$BD) Yes. I could see same display as your screen shot at last, by upload of the mail to Gmail IMAP. UTF-8 binary for U+FFFD is 0xEFBFBD. Tb puts binary returned to BODY.PEEK[HEADER.FIELDS from server in .msf, and Tb seems to interpret as utf-8 and show � at thread pane. Because Subject: header in message source is as-is and folder's charset is ISO-8859-1 and no charset in Content-Type:, ó is shown at message header pane. Because "Repair Folder" of IMAP is re-fetch of headers followed by re-fetch of all headers & mail data, problem is persistent even after Repair Folder and folder's charset change.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Note that the bug is also present in Mozilla Firefox 7.0.1, going go your Gmail inbox -> Select problematic email -> droplist menu -> show original. please check as well with your gmail account.
FYI. Gmail IMAP didn't return U+FFFD, Gmail IMAP returned as-is binary, if charset=ISO-8859-1 is properly specified in Content-Type: header, then bug 513472 didn't occur in your case. Note: Gmail doesn't keep two versions because difference is "charset=ISO-8859-1" only. So, move old version of no charset to [Gmail]/Trash, Shift+Delete at [Gmail]/Trash, Compact of [Gmail]/Trash, then upload of new version with charset=ISO-8859-1, is required. Gmail looks torelant with not-encoded(malformed) Subject; header, if charset is properly spefified in Content-Type: header and the charset is same as charset used in not-encoded Subject: header. Please note that cause is bug of PHP application of pccomponentes.com. - PHP application of pccomponentes.com should encode Subject:, if non-7bits-ascii is used in Subject: header. - PHP application of pccomponentes.com should specify charset in Content-Type:, if non-7bits-ascii is used in message body.
I've done some research and there is how it is shown on other mail clients: - Mail from OS X Lion: Su ticket se recibi� - Microsoft Outlook 2010: Su ticket se recibi� - The bat! 5.0.6 shows it properly: Su ticket se recibió So I think it is matter of the mail client to workaround it.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: