Open
Bug 1047223
Opened 10 years ago
Updated 2 years ago
When locale encoding LANG is not UTF-8, a file saved with a multibyte filename is not saved validly (two files saved, one empty)
Categories
(Toolkit :: Async Tooling, defect)
Tracking
()
NEW
Tracking | Status | |
---|---|---|
firefox48 | --- | wontfix |
firefox49 | --- | wontfix |
firefox-esr45 | --- | wontfix |
firefox50 | --- | fix-optional |
firefox51 | --- | fix-optional |
People
(Reporter: karma, Unassigned)
References
Details
(Keywords: regression)
Attachments
(1 file)
(deleted),
text/plain
|
Details |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140723165929
Steps to reproduce:
* set the LANG environment variable as non-UTF-8 (I set "ja_JP.eucJP" traditional Japanese env, and also "C" for the test)
* Save a file ("Save page as...", "Save link as...", Click a download link) that has multibyte filename.
Actual results:
generate two files. (on firefox 25 or later)
The one file has valid contents, but the filename is encoded invalidly by UTF-8.
The other one is size 0, but the filename is encoded validly by EUC-JP.
Expected results:
Save as a file that has valid contents and valid encoded filename.
Summary: When LANG is not UTF-8, a file is not saved correctly in a multibyte filenames → When LANG is not UTF-8, a file is not saved validly in a multibyte filenames
Comment 1•10 years ago
|
||
Can you provide a testcase, please?
QA Whiteboard: [bugday-20140804]
Flags: needinfo?(karma)
Comment 3•10 years ago
|
||
Point at a file with which it happens.
You set LANG=C, and save the file using "Save link as...", then reproduce.
Comment 6•10 years ago
|
||
Thanks!
Confirmed that two files appeared, one of them empty.
I've added UTF-8 text to the file, saved it from file://, and got:
* a file with UTF-8 contents, whose name looks like
in UTF-8 "ã??ã?¹ã??.txt" or (ls -b) "ã\302\203\302\206ã\302\202¹ã\302\203\302\210.txt"
in C "\303\243\302\203\302\206\303\243\302\202\302\271\303\243\302\203\302\210.txt"
* an empty file, whose name is
in UTF-8 "テスト.txt"
in C "\343\203\206\343\202\271\343\203\210.txt"
Firefox wouldn't show the name of the file in file:// properly with the C locale.
Status: UNCONFIRMED → NEW
Component: Untriaged → Internationalization
Ever confirmed: true
Product: Firefox → Core
Summary: When LANG is not UTF-8, a file is not saved validly in a multibyte filenames → When locale encoding LANG is not UTF-8, a file saved with a multibyte filename is not saved validly (two files saved, one empty)
Comment 7•10 years ago
|
||
When saved from this page, the file on the left contains the original contents:
$ LANG=en_US.UTF-8 ls -b
ƹÈ.txt ƹ\310.txt
$ LANG=C ls -b
\303\206\302\271\303\210.txt \306\271\310.txt
Comment 9•8 years ago
|
||
Here's regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b2486721572e&tochange=2ab07dec6404
this should be a regression from bug 847863
as the issue disappears by setting browser.download.useJSTransfer to false and restart, on nightly 2013-08-22.
Blocks: 847863
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox51:
--- → affected
status-firefox-esr45:
--- → affected
Keywords: regression
Comment 10•8 years ago
|
||
at least this bug exists from bug 899107, that adds browser.download.useJSTransfer.
Updated•8 years ago
|
Flags: needinfo?(paolo.mozmail)
Comment 11•8 years ago
|
||
So, after a bit of debugging with arai, it seems
a) something creates a placeholder file name with Latin-1 encoding: that will will remain untouched (empty)
b) the worker started by osfile_async_front.jsm (osfile_sync_worker.jsm) creates a temporary .part file, and then at the end of the download renames it to a file name with UTF-8 encoding
Thus, two files remain: an empty latin-1 encoding file and a full UTF-8 encoding name.
It looks to me that for user friendliness, the worker should rename to a Latin-1 file name, which would fix the problem.
However, the problem could also be solved by not creating a Latin-1 encoding placeholder (or creating a UTF-8 placeholder): in that case the original file name would be kept, albeit not with user-friendly encoding.
Comment 12•8 years ago
|
||
> However, the problem could also be solved by not creating a Latin-1 encoding
> placeholder (or creating a UTF-8 placeholder): in that case the original
> file name would be kept, albeit not with user-friendly encoding.
The perfect solution: if the "worker" can take into account the LC_CTYPE env variable...
Updated•8 years ago
|
Comment 13•8 years ago
|
||
(In reply to szaszg from comment #12)
> The perfect solution: if the "worker" can take into account the LC_CTYPE env
> variable...
It definitely seems an inconsistency between OS.File and nsIFile. I don't know enough of the Linux architecture to understand if we should honor LC_CTYPE everywhere, or ignore it everywhere.
What I'm concerned about is only whether honoring LC_CTYPE would create cases where we cannot write the file at all, if the target name contains characters not supported by the encoding.
Anyways, since this is actually an OS.File question, moving the bug to Toolkit :: Async Tooling.
Component: Internationalization → Async Tooling
Flags: needinfo?(paolo.mozmail)
Product: Core → Toolkit
Comment 14•8 years ago
|
||
(In reply to :Paolo Amadini from comment #13)
> (In reply to szaszg from comment #12)
> > The perfect solution: if the "worker" can take into account the LC_CTYPE env
> > variable...
>
> It definitely seems an inconsistency between OS.File and nsIFile. I don't
> know enough of the Linux architecture to understand if we should honor
> LC_CTYPE everywhere, or ignore it everywhere.
The problem is that: Firefox honors the LC_CTYPE somewhere, but the "real" writing process insist on UTF-8.
If FF everywhere uses UTF-8 then it will be ugly, but working.
If FF everywhere uses LC_CTYE it will be "nice" and working.
But now it is unusable if we set other locale than UTF-8...
BTW: http://man7.org/linux/man-pages/man7/locale.7.html
>
> What I'm concerned about is only whether honoring LC_CTYPE would create
> cases where we cannot write the file at all, if the target name contains
> characters not supported by the encoding.
Hmm... We cannot afraid this really... On linux (unix) only two character forbidden: / - slash (0x02f) and NULL (0x00). The first is the directory separator the second is the "string end" character. This is independent from locale. (The file name is really and totally independent from the locale - file names are pure C strings/byte sequences ended with 0x00.)
In our case Firefox write a file with full of "not supported" charaters (UTF-8 "garbage" in pure ASCII - C local). The file is writen well, but the file name encoding is wrong...
If the target name contains "not supported" characters, Firefox should transliterate them ((lib)iconv//TRANSLIT - https://linux.die.net/man/1/iconv, https://www.gnu.org/software/libiconv/).
Other side effect, if we open a file from "network" wich file name has non-ASCII letters (accented), the file saved with UTF-8 encoded name (and of course a well encoded empty file created), and Firefox feeds the "application" with the "local encoded" filename... But this file is empty...
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•