Closed Bug 8899 Opened 25 years ago Closed 25 years ago

Yahoo Japan (EUC) Page as attachment cannot be viewed inline

Categories

(MailNews Core :: Internationalization, defect, P1)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: nhottanscp)

References

()

Details

Attachments

(4 files)

** Observed with 6/25/99 Win32 M8 build ** When we send HTML pages using 4.6 or 4.7, some of these pages arrive without the "Content-Disposition: inline" header. From the mail discusssion we had today and by actually using the pref item: mail.inline_attachments the current default seems to be "true". But the attachments like the above URL only shows up as a link and is not displayed inline. Here's what the headers look like: --------------F3E7E5FE4178CFD21E1EBEBE Content-Type: text/html Content-Transfer-Encoding: base64 Content-Base: "http://home.netscape.com/ja/" Content-Location: "http://home.netscape.com/ja/" Note the absence of Content-Disposition line. Messages which have teh CTE line is displayed inline. In 4.x, we actually did not listen to the CTE, but if the "View | View attachment inline" menu is chosen, it shows any "displayable" msg inline even without the CTE header. 1. This is Bug #1 in 5.0, i.e. the pref default is not working when the CTE is absent. 2. Issue #2: In 5.0, we probably should consider listening to the CTE and rely on the menu setting only if the CTE line is absent. Even if we don't enable this menu item till later, we might want to turn on this CTE-honoring in the backend now. Or is there some reason against trusting the CTE? I don't know enough about this issue to know if we should change 4.x behavior, which ignored the CTE.
QA Contact: lchiang → pmock
<update QA contact>
Status: NEW → ASSIGNED
Target Milestone: M8
Actually, I have a guess at what's going on here. Naoki, tell me if this makes sense. This isn't an issue with the "mail.inline_attachments" pref or the content-disposition header. They are working as they should, but gecko is not displaying the output from libmime for the following reason. First, to see that we are outputting the page inline, do the following: 1 - bring up messenger 5.0 and display the problem message 2 - now bring up a 5.0 browser window and load the URL: file:///c:/temp/tempMessage.eml?header=none (note: you probably won't see anything past the URL) 3 - Do a "View Source" - notice how there is source output for the entire web page. Now, this is what I think is happening. We start decoding everything to UTF-8 and the message and the body part is encoded with charset = "iso-2022-jp". When we hit the web page, we start decoding that message to UTF-8, but there is no charset= on the part, so we fall back to the body, which is iso-2022-jp. Now, we do this conversion and output to Gecko, but the web page has a <META HTTP-EQUIV ="Content-Type" ="text/html; charset=x-sjis> line. I assume that Gecko is listening to this and trying to display UTF-8 data (which is probably wrong to begin with) as x-sjis. So, the bug about the content-disposition is invalid, but this is a problem we need to figure out. Naoki, do you have any ideas? Here is the output from libmime for the message body: < META HTTP-EQUIV ="Content-Type" ="text/html; charset=UTF-8">< !doctype html public "-//w3c//dtd html 4.0 transitional//en">< html> ã..ã..ã.¯SJISã.®ã..ã.¼ã.¸ã.§ã..ã..< br>& nbsp;< p>< A HREF ="http://home.netscape.com/ja/"> http://home.netscape.com/ja/< /A>< /html>< BASE HREF ="http://home.netscape.com/ja/">< HTML>< HEAD>< TITLE> Netcenter .Ö.æ.¤.±.»< /TITLE>< META HTTP-EQUIV ="Content-Type" ="text/html; charset=x-sjis">< META http-equiv =PICS-Label ='(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0)'>  < META http-equiv =PICS-Label ='(PICS-1.1 "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' < META HTTP-EQUIV ="Content-Type" ="text/html; charset=UTF-8">< !doctype html public "-//w3c//dtd html 4.0 transitional//en">< html> ã..ã..ã.¯SJISã.®ã..ã.¼ã.¸ã.§ã..ã..< br>& nbsp;< p>< A HREF ="http://home.netscape.com/ja/"> http://home.netscape.com/ja/< /A>< /html>< BASE HREF ="http://home.netscape.com/ja/">< HTML>< HEAD>< TITLE> Netcenter .Ö.æ.¤.±.»< /TITLE>< META HTTP-EQUIV ="Content-Type" ="text/html; charset=x-sjis">< META http-equiv =PICS-Label ='(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0)'>  < META http-equiv =PICS-Label ='(PICS-1.1 "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))'.......
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Actually, the view inline is working, but what is mentioned in this bug which is the real problem is Bug #8903. Since I have that logged to me, I am going to close this one. - rhp
Summary: View inline default does not work on msgs without Content-Disposition line → Netscape Japanese Home Page as attachment cannot be viewed inline
You're The Content-disposition seems to be bogus. I manually eliminated the inline disposition line from one of the atatchment msgs that were displaying OK, and it still displayed OK. Rich, there is a problem with your theory in that I have many other msgs in which JPN attachments are all showing OK and all of them have meta tags like "EUC-JP", "Shift_JIS", and "x-sjis" in addition to the one matching the Japanese mail charset, "ISO-2022-JP". But there is no problem showing them inline. I looked at Content-Base and Content-Location headers for a clue but that did not help. So there is something strange about attaching Netscape Japanese Home Page -- which is the only page so far showing this problem. Accordingly I modified the summary line. I'm sending you and Naoki a mailbox file which contains a number of messages -- the only one with this problem is the "NetCenter..." one from Netscape Japanese Home Page. If there is no objection to this, I'll re-open this bug later.
I still think there will be a problem with the fact that we are going to try to convert the attachment to UTF-8 and if we don't have a valid charset on that parts Content-Type header, we will drop back to the one for the message itself and if that isn't specified, we go back to us-ascii. All in all, I think we are creating bogus UTF-8 for this page. - rhp
Naoki recently put in a Japanese auto-detection hack for attachments in case the content-type charset parameter indicates that the main body is in Japanese. My understanding is that this is why all the JPN attachments are showihng OK. There will of course be a problem showing any other charset.
Status: RESOLVED → REOPENED
QA Contact: pmock → momoi
Assignee: rhp → nhotta
Status: REOPENED → NEW
Resolution: INVALID → ---
I was able to re-create another msg which shows this problem. Attach this page under HTML mail: 1. http://kaze:20020/xsjis2.html This made was made with 2 modifications to the original 2. http://kaze:20020/xsjis.html This latter is shown inline when sent as attachment. The former is not. The differece between these pages are as follows: A. I changed the Japanese <TITLE> ... </TITLE> in page 2 to the same one as the Netscape Japanese Home Page. B. I inserted a number of ascii lines in page 1 before we get to the Japanese body part (<PRE> ... </PRE>) -- you see them displayed. This problem then seems to depend on the type of data in the attached page. It could be that Japanese auto-detection is failing with this kind of page and gets into a condition rhp describes. I'm re-opening the bug and re-assigning it to nhotta and changing the Component to international and assigning myself as QA Contact.
Component: MIME → Internationalization
Status: NEW → ASSIGNED
Accepting.
I verified that the problem is in the auto detect implementation. That part is going to be replaced by the new XPCOM interface (nsIStringCharsetDetector) for M8. I will test this bug when I finish that migration. One change for M8 is that auto detection choice to be done by a pref instead of using the main body charset as a hint (you can send comments for this issue to me or mozilla i18n news group).
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Target Milestone: M8 → M10
I checked in mime/src/comi18n.cpp which now uses nsIStringCharsetDetector. Note that the new charset detector also does incorrect detection (see the attachment). Two examples by momoi the first example is not shown inline because of the wrong charset detection. The second example some lines are detected incorrectly (this case showed as inline but some lines shows incorrectly). So the remaining problem is accuracy of charset detectors. Assigning to Frank and set to M10 (I don't think this problem blocks other testing). Also, there is an issue of whether we should show non inline or show garbage in case of detection failure but probably should be discussed separately.
*** Bug 8903 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
I have no idea what these three example in the attachment mean. Are those EUC-JP data or Shift_JIS data ?
They are all Shift_JIS from two url http://kaze:20020/xsjis2.html and http://kaze:20020/xsjis.html. The second result was correct. Both first and third got wrong detection results.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Target Milestone: M10 → M9
check in the fix. Please verify
Status: RESOLVED → REOPENED
** Checked with 7/15/99 Win32 M9 Build ** I was expecting to see the attached page displayed inline. But that did not turn ou to be the case. Also when I clicked on the link, the page did not display as Japanese, either. Whatever change you made is not having a desired effect. re-opning it.
Resolution: FIXED → ---
Clearing Fixed resolution due to ReOpen of this bug.
Status: REOPENED → ASSIGNED
I need to change the cp1252 verifier to a better one.
Put in temp fix by remove UCS2BE, UCS2LE, and CP1252 verifier from the string based version one. Need to crate a better cp1252 verifier for this case. Naoki, if the temp fix work, then please DO NOT CLOSE THE BUG , but move it to M10. I want to put in a better CP1252 veifier for this. Thanks
Severity: normal → blocker
Priority: P3 → P1
The new module seems to be working better with the Browser though it fails on un-labeled ISO-2022-JP page. On the Mail side, this causes a crash with all the attachment test cases (ISO-2022-JP, EUC-JP, and Shift_JIS). Basically, as Messenger begins to load an attachment, it crashes. Here's part of what I sent to Talkback. Trigger Type: Program Crash Trigger Reason: Access violation Call Stack: (Signature = nsXPCOMStringDetector::Report f121549c) nsXPCOMStringDetector::Report[d:\builds\seamonkey\mozilla\intl\chardet\src\nsPSM Detectors.cpp, line 421] MimeCharsetConverterClass::Convert[d:\builds\seamonkey\mozilla\mailnews\mime\src \comi18n.cpp, line 1410] MIME_ConvertCharset [d:\builds\seamonkey\mozilla\mailnews\mime\src\comi18n.cpp, line 1549] mime_convert_charset[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemoz2.cpp, line 160] MimeInlineText_rotate_convert_and_parse_line[d:\builds\seamonkey\mozilla\mailnew s\mime\src\mimetext.cpp, line 292] convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp p, line 113] mime_LineBuffer [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line 235] MimeInlineText_parse_decoded_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\sr c\mimetext.cpp, line 237] mime_decode_base64_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeenc. cpp, line 300] MimeDecoderWrite [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeenc.cpp, line 603] MimeLeaf_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeleaf.cpp , line 149] MimeMultipart_parse_child_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mim emult.cpp, line 538] MimeMultipart_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemult. cpp, line 207] convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp p, line 113] mime_LineBuffer [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line 235] MimeObject_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeobj.cp p, line 220] MimeMessage_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemsg.cpp , line 172] convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp p, line 113] mime_LineBuffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line 235] MimeObject_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeobj.cp p, line 220] MimeMessage_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemsg.cpp , line 110] MimePluginInstance::Write[d:\builds\seamonkey\mozilla\mailnews\mime\src\plugin_i nst.cpp, line 371] plugin_stream_write[d:\builds\seamonkey\mozilla\network\cnvts\cvplugin.cpp, line 69] net_read_file_chunk[d:\builds\seamonkey\mozilla\network\protocol\file\mkfile.c, line 964] net_ProcessFile[d:\builds\seamonkey\mozilla\network\protocol\file\mkfile.c, line 1328] NET_ProcessNet[d:\builds\seamonkey\mozilla\network\main\mkgeturl.c, line 3363] ntdll.dll + 0x74fd (0x77f674fd) 0x0010c200
If you want to see the full reports, you can view them here but clicking on the Bug number on this page. http://cyclone/reports/reporttemplate.cfm?style=1&reportID=1099
In my local build, I don't see the crash. Instead libmime got result code 0x02be1fb0 after DoIt() call. Libmime uses the main body's charset for this case (ISO-2022-JP) andthe attachment is displayed incorrectly or as an link.
The crash does not occur if I don't explicitly have the following line in the prefs50,js. user_pref("intl.charset.detector", "japsm"); But then I only get the ISO-2022-JP attachment showing correctly. I thought we defaulted to the Japanese detection module if no prefs50.js is defined for detector. Or has that been changed?
>I thought we defaulted to the Japanese detection module if >no prefs50.js is defined for detector. Or has that been >changed? That's a bug in my code. It didn't fall back to 'japsm' from the beginning. Please file a separate bug for that. I can implement fall back to 'japsm' or do no charset detection (maybe this is better).
Depends on: 10605
Per ftang's request, the crash part of the bug was split into Bug 10605. The current cannot be verified until this new bug is fixed. The dependency is also marked.
I have fixed 10605. Please verify again. Thanks
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
Target Milestone: M9 → M10
With 7/29/99 Win32 (Necko) build, JIS and Shift_JIS attachments now can be viewed inline. EUC pages don't load inline, however. EUC detection is working on the Browser side with the JPN detector turned on. So, thia seems to be a mail-sepcific issue. Sending it to Naoki with Target Milestone M10.
Severity: blocker → critical
Downgraded severity to critical.
Status: NEW → ASSIGNED
> EUC pages don't load inline, however. EUC detection > is working on the Browser side with the JPN detector turned on. In messenger, charset detection is done for each line. Momoi san, which EUC page is failing? I could try to break it to htmls with only 1 line then see if they works in the browser.
Summary: Netscape Japanese Home Page as attachment cannot be viewed inline → Yahoo Japan (EUC) Page as attachment cannot be viewed inline
Changed the title since Netscape Japan page is now viewable. For EUC page, I found that it works if I save yahoo japan page as a text and attach to a mail. This means some EUC characters with HTML tag combination may cause the problem. I will break the page into each lines then investigate.
Attached file One line EUC text file. (deleted) —
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
I created a text file by retrieving one line from yahoo japan page. That is viewable with 4.5 auto-detect on but not viewable with my local build (pulled 8/10). I think this is the reason we cannot view EUC attachments because the detection is applied line by line. Reassign to Frank since this is a generic (not mail specific) charset detection problem.
BTW, I fixed a problem in window's native detector so it works with browser. user_pref("intl.charset.detector", "jams"); That detects the EUC file (attached) correctly.
Assignee: ftang → nhotta
No, this is not generic problem since only mail will send data to detector line by line, please fix it by either change the libmime to send more data (say the whole file) to the detector, or keep the last detected vaule somewhere. The time that spend on improving the detecting algorithm for < 80 bytes will be much longer than fixing the libmime.
But I think the new detector should be at least the same level as accuracy as 4.X. We may need to separate bugs for the detector accuracy and limemime issue. Frank, what do you think?
> or keep the last detected vaule somewhere I don't think this is good because the user will see garbage lines until the detector succeeds. > improving the detecting algorithm for < 80 bytes will be much longer How about using the old 4.X detector for those cases. The old code may be ugly but it was tuned for Japanese HTML (both web and attachments)? We may port it to COM and make it a separate DLL then japsm may call it if <80 bytes.
Status: NEW → ASSIGNED
I plan to do following changes for M10. Port 4.x ja detector to XPCOM and check in to intl/chardet/src/classic. A new pref "mail.charset.detector", libmime will use it when it specified otherwise it will use "intl.charset.detector".
>Port 4.x ja detector to XPCOM and check in to intl/chardet/src/classic. This is done. Can be specified by user_pref("intl.charset.detector", "jaclassic"); Name of the DLL is "chardetc" and this is windows only (for now). >A new pref "mail.charset.detector", libmime will use it when it specified >otherwise it will use "intl.charset.detector". Not done yet.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
I checked in mailnews/mime/src/comi18n.cpp rev 1.36. user_pref("mail.charset.detector", "jaclassic"); I can view yahoo japan attachment by above pref setting. If that's not specified then "intl.charset.detector" is used. If neither of those prefs are specified (default) then no charset detection will happen. I filed a separate bug for libmime data passing issue (feeding data line by line) as #12481. Any charset detector specific bug (i.e. can reproduce in browser) should be filed separately. Marking as FIXED.
Status: RESOLVED → VERIFIED
** Checked with 9/16/99 Win32 M11 build ** I looked at "japsm", "jaclassic", and "jams". They all were able to show JIS, EUC_JP and SJIS attachments inline when specified in: mail.charset.detector. (However, as expected, only "jaclassic" displayed EUC attachment correctly. We need to either release note this or make it a default for M11 (if that is possible.) Marking it verified/fixed.
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: