Open Bug 689942 Opened 13 years ago Updated 2 years ago

Mails under folder named with Polish characters ÓŻ can not be accessed

Categories

(Thunderbird :: Folder and Message Lists, defect)

7 Branch
x86
Windows Vista
defect

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: info, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl)

Attachments

(1 file)

Attached image TB_bug.JPG (deleted) —
User Agent: Mozilla/5.0 (Windows NT 6.0; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 Build ID: 20110902133214 Steps to reproduce: if I name a folder with any word that contains ÓŻ (client of mine named folder RÓŻNE which means 'miscellaneous' in Polish), all emails in that folder will be listed, but no one can be opened (message prompts: can not find file c:\users\blablalbla\folder_path\file_path) It happens only with these two Polish chars together (tried other Polish characters and everything works fine), and only if both are uppercase. Bug was on version 6 and remains on 7th
If you look in your user profile directory, what file name is used for this folder? Is there an actual folder with that name, or a folder with a name made of numbers? I tried creating a folder here with the ROZNE name, and we recognized that the characters are illegal for file names, and created a numeric folder name on disk, and associated it with the ROZNE name in the UI.
Folder name looks just as I've entered (RÓŻNE). If you look at the prompt message I've attached in first post, there is deformed path (instead of ÓŻ there is y with dash upon it - it's not a Polish char at all). I don't know if you can type Polish characters, but if you switch to Polish keyboard just press RightAlt+O or RigthAlt+Z to get it - hope that helps.
If binary of "ÓŻ" of "RÓŻNE" in a charset(perhaps a Polish charset or unicode in your case) is somehow translated by someone to "y with dash upon it" of other charset(perhaps popular one or standard one, such as windows-1252), phenomenon you saw can happen. Character set of filename used by Win depends on OS's charset and volume type, and API used by Tb too. NTFS volume? Or old volume like FAT32? What is your OS's system charset? (It's defaulted to Shift_JIS if Japanese Win-XP) By the way, can you copy&paste filename with "y with dash upon it"? As bugzilla.mozilla.org uses utf-8 as page's charset, unicode version of "y with dash upon it" is probably written in comment, regardless of your system charset.
I've found this bug in client's computer, and recreated on my own. First one is running Polish Windows 7 64-bit Home Premium, second one runs Vista Bussiness 32 bit Polish ,both on NTFS partitions. Both uses "Polish(programmers)" keyboard set, windows-1250 charset (Central Europe). Here is c&p: Odnalezienie pliku mailbox:///C|/Users/c2/AppData/Roaming/Thunderbird/Profiles/lhzqivtc.default/Mail/Local Folders/Inbox.sbd/RӯNE?number=44172022 było niemożliwe. Sprawdź ścieżkę dostępu do pliku, a następnie spróbuj ponownie. One more thing: I can change folder name inside TB and everything works fine - the strange thing is that the folder mail list loads ok, I can change the name of it, I can not only access mail content.
In windows-1250 ( http://en.wikipedia.org/wiki/Windows-1250 ), 0xD3 = Ó (LATIN CAPITAL LETTER O WITH ACUTE) 0xAF = Ż (LATIN CAPITAL LETTER Z WITH DOT ABOVE) In Unicode ( http://www.fileformat.info/info/unicode/char/4ef/index.htm ), ӯ (CYRILLIC SMALL LETTER U WITH MACRON) = U+04EF utf-8 binary for U+04EF / ӯ = 0xD3AF It's possibly next: As OS's system charset is windows-1250, ÓŻ=0xD3AF in windows-1250 is valid folder name character, so Tb doesn't hash it. When Tb requests file creation, Tb requests with the valid binary of 0xD3AF of windows-1250. However, OS interprets it as utf-8 binary, then OS creates file named ӯ(0xD3AF in utf-8). FYI. Under Japanese MS Win-XP(Shift_JIS), Tb 6 created files of 79357a77.msf and 79357a77 for folder named RÓŻNE. As system charset is Shift_JIS, Tb hashed t. In 79357a77.msf, following data is seen; > (84=79357a77)(85=R$C3$93$C5$BBNE) $C3$93$C5$BB is hex-decimal representation in .msf by Tb for binary of ÓŻ in utf-8. Ó = U+00D3 : http://www.fileformat.info/info/unicode/char/d3/index.htm Ż = U*017B : http://www.fileformat.info/info/unicode/char/17b/index.htm
http://en.wikipedia.org/wiki/NTFS#Internals says; > Internals >(snip) > NTFS allows any sequence of 16-bit values for name encoding (file names, stream names, index names, etc.). > This means UTF-16 codepoints are supported, but the file system does not check > whether a sequence is valid UTF-16 > (it allows any sequence of short values, not restricted to those in the Unicode standard). I guess Tb requests file creation using 0xD3AF without windows-1250 to Unicode conversion because 0xD3AF is valid windows-1250 file name character, and Tb interprets returned 0xD3AF in file name as utf-8 then shows it as ӯ. Tb perhaps uses ӯ of Unicode = U+04EF in mail folder file access to show mail data and fails to find file. Upon rename, binary of 0xD3AF is used by Tb or already obtained file handle is used, then rename is successful.
aceman, does this align with your folder work?
Flags: needinfo?(acelists)
I'm not aware of any folder name work. But WADA's explanation seems very reasonable.
Flags: needinfo?(acelists)

Kacper wrote 9/2016 "yes it still occures, but... only on pop3 accounts". (I haven't tested to know if that is still true)

Keywords: intl
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: