Open
Bug 56908
Opened 24 years ago
Updated 2 years ago
Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format
Categories
(MailNews Core :: Internationalization, defect)
MailNews Core
Internationalization
Tracking
(Not tracked)
NEW
People
(Reporter: marina, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intl, Whiteboard: [patchlove][needs updated patch?])
Attachments
(2 files, 1 obsolete file)
(deleted),
image/jpeg
|
Details | |
(deleted),
patch
|
jcranmer
:
review-
|
Details | Diff | Splinter Review |
Steps to reproduce:
- invoke a new mail composition;
- attach a file with non-ascii name;
- send and get message;
- now select message and save it in eml format;
- open the saved file in Browser by going File|Open (select file)
//now note: non-ascii name for the attached file in the body displays single
non-ascii chars as two single chars
Comment 3•24 years ago
|
||
Sorry, this isn't happening for rtm.
- rhp
Status: NEW → ASSIGNED
Target Milestone: --- → Future
Comment 4•24 years ago
|
||
Comment 5•24 years ago
|
||
I don't think that that is a good way to solve this problem. META charset forces
the HTML engine to restart with the new charset. It would be better to start
with the right charset in the first place. In the HTTP world, this is done by
an HTTP Content-Type header. In this case, the mail engine is generating some
HTML, but HTTP might not be involved, so we probably have to pass something to
the HTML engine to make it believe that the HTTP charset has been set. I don't
know the details, but I believe the architecture would be better that way.
Comment 6•24 years ago
|
||
I understand you say. But if auto-detect engine of Mozilla is very smart, this
issue doesn't occur. Is there the best way whether encoding is UTF-8??
Comment 7•24 years ago
|
||
No no no. Auto-detect is even worse than META charset, architecturally. Take a
look at the APIs for the HTML engine, and see if there is some way to make it
believe that the HTTP charset has been set. We are generating the HTML, so we
don't want to rely on any auto-detect heuristics when consuming that HTML.
Comment 8•24 years ago
|
||
If it helps any, the html engine should be getting the content via the channel.
nsIChannel::GetContentType. Our mime engine sets the content type on the channel
it presents to the html parser. The parser should be using this information
(this should be the same way the parser gets the content type from the http channel)
Comment 9•24 years ago
|
||
Mass change to bugs filed by marina --> QA contact to marina.
thanks!
QA Contact: momoi → marina
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 11•21 years ago
|
||
I understand erik's concern, but I have a different opinion. It's not that much
expensive to reset the charset and it's happening everyday on the web. There are
a lot of people who believe that http header should have been given a lower (not
higher) priority than 'meta charset'. Besides, in cases eml files are moved
around, including 'charset' information in it is a good thing (TM).
mscott, can we make an assumption that eml files have been always in UTF-8? What
if somebody transcodes them outside Mozilla? Well, that's her responsibility.
So, my question would be if Mozilla always used UTF-8 for eml files. If yes, we
may do something in nsIChannel.
OS: Windows NT → All
Hardware: PC → All
Comment 12•20 years ago
|
||
All my eml messages with umlauts (char encoding ISO-8859-1 etc) are wrongly
displayed in the browser window. If I manually switch the encoding to UTF-8 the
display is correct. See for example bug 206421.
Comment 13•20 years ago
|
||
related is Bug 263850
Updated•20 years ago
|
Product: MailNews → Core
Comment 14•20 years ago
|
||
Change 'eml' in summary to '.eml' for ease of search ('extremly' will hit).
Summary: Non-ascii file name is displayed incorrectly in the browser after being saved in eml format → Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format
Updated•20 years ago
|
Comment 15•20 years ago
|
||
The patch here can be still applied. If it's still a problem, we should check
that in. Given that '.eml' file can be moved around and viewed by a program
other than Mozilla, it should have 'in-band' information about the character
encoding.
Assignee | ||
Updated•16 years ago
|
Product: Core → MailNews Core
Updated•16 years ago
|
QA Contact: marina → i18n
Updated•12 years ago
|
Assignee: bugzilla → nobody
Status: ASSIGNED → NEW
Comment 16•12 years ago
|
||
Makoto Kato, is your patch still needed and good?
Flags: needinfo?(m_kato)
Priority: P3 → --
Whiteboard: [patchlove][needs updated patch?]
Target Milestone: Future → ---
Comment 17•11 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #16)
> Makoto Kato, is your patch still needed and good?
This depends on HTML rendering engine implementation. Gecko detects as UTF-8 even if no charset, but IE cannot detect as UTF-8.
Flags: needinfo?(m_kato)
Comment 18•11 years ago
|
||
tested on Firefox 29, Chrome 34 and IE11. Chrome 34 and IE11 cannot detect exported HTML as UTF-8. So character corruption is caused on these browsers.
Comment 19•11 years ago
|
||
Attachment #19335 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #8356940 -
Flags: review?(Pidgeot18)
Comment 20•11 years ago
|
||
Comment on attachment 8356940 [details] [diff] [review]
rebased patch
Review of attachment 8356940 [details] [diff] [review]:
-----------------------------------------------------------------
First off, this needs a test.
Second off, this patch is wrong. I created a message with an ISO-2022-JP body text (and a non-ASCII filename), and found that the resulting HTML saved the part as ISO-2022-JP instead of UTF-8, so declaring a charset would be liable to break messages that currently work. (That's how bad our charset logic is).
Finally, I'd prefer <meta charset=""> over the http-equiv form.
Attachment #8356940 -
Flags: review?(Pidgeot18) → review-
Comment 21•11 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #20)
> Comment on attachment 8356940 [details] [diff] [review]
> rebased patch
>
> Review of attachment 8356940 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> First off, this needs a test.
> Second off, this patch is wrong. I created a message with an ISO-2022-JP
> body text (and a non-ASCII filename), and found that the resulting HTML
> saved the part as ISO-2022-JP instead of UTF-8,
When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP. How do you save to ISO-2022-JP?
Step
====
1. Compose message as ISO-2022-JP
2. send and receive this. Message is the following.
--------------010309080004020102040601
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit
--------------010309080004020102040601
Content-Type: image/png;
name="=?ISO-2022-JP?B?GyRCRUQbKEIucG5nLnBuZw==?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename*=ISO-2022-JP''%1B%24%42%45%44%1B%28%42%2E%70%6E%67%2E%70%6E%67
3. Save as HMTL
Result
======
HTML is encoded as UTF-8.
> be liable to break messages that currently work. (That's how bad our charset
> logic is).
>
> Finally, I'd prefer <meta charset=""> over the http-equiv form.
Should we add DOCTYPE for HTML5, too?
Flags: needinfo?(Pidgeot18)
Comment 22•11 years ago
|
||
(In reply to Makoto Kato (:m_kato) from comment #21)
> When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP. How do
> you save to ISO-2022-JP?
I had actual Japanese text in the body. The HTML attachment name is saved as UTF-8, while the body text itself was ISO-2022-JP.
> > Finally, I'd prefer <meta charset=""> over the http-equiv form.
>
> Should we add DOCTYPE for HTML5, too?
Probably not. The email-to-html predates HTML 4.01, and probably depends on quirks mode in a few places.
Flags: needinfo?(Pidgeot18)
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•