Closed Bug 57164 Opened 24 years ago Closed 23 years ago

[charset]loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]

Categories

(Core :: XML, defect, P3)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: ftang, Assigned: bzbarsky)

References

()

Details

(Keywords: css2, relnote, Whiteboard: relnote-devel (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset))

Attachments

(7 files)

According to "Jun '99: W3C Recommendation: Associating stylesheets with XML documents" http://www.w3.org/TR/xml-stylesheet/ .... The following pseudo attributes are defined .... charset CDATA #IMPLIED .... Which mean we should listen to the charset attribute of <?xml-stylesheet tag to load the stylesheet. Currently we don't reproduce procedure 1. visit http://ftang/ftang/css2/kanji/bug.xml 2. visit http://ftnag/ftang/css2/kanji/correct.xml they should look the same. bug.xml use bug.css and include the charset informating at the <?xml-stylesheet ?> tag (as charset="Shift_JIS") correct.xml use correct.css and include the charset information at the first line of css by using @charset "Shift_JIS"; The correct.xml currently working since the @charset is already implemented. The <?xml-stylesheet charset="Shift_JIS" ?> is not working now. I think this is not important for Netscape6 rtm, but it will be nice if we can fix this right after.
Future.
Target Milestone: --- → Future
Frank: Nice catch! I'm assuming your server does not return authoritative character set information, e.g. in the Content-Type field? If it is, that would override the chatset field of the stylesheet PI. RELEASE NOTE ITEM: Mozilla currently does not support the 'charset' pseudo-attribute of the XML Stylesheet Linking PI. Workaround: Use the CSS2 @charset rule to specify the encoding as the first rule in your stylesheet.
Component: Style System → XML
Whiteboard: (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset)
I don't think we need to fix this this time. It will be nice if we can fix this after RTM ASAP.
Whiteboard: (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset) → relnote-devel (py8ieh: fodder for XML importtest) (py8ieh: attach testcases) (py8ieh: also check text/css charset parameter) (py8ieh: check HTML <link> and HTTP "Link:" also support charset)
QA Contact: chrisd → petersen
Nom. nsbeta1 on grounds of standards compliance correctness and enablement of international content.
Keywords: nsbeta1
Franck, could you attach the two testcases to the bug report? Thanks.
Reassigned to ftang. Franck, please attach the two testcases to the bug report and reassign the bug to me. Related bugs are bug 66190 and bug 63502
Assignee: pierre → ftang
Target Milestone: Future → ---
Assignee: ftang → pierre
Summary: loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter → loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]
Frank please attach the pages as there are people outside of netscape who might be interested in this bug.
Attached file ftang's bug.xml (deleted) —
Attached file ftang's correct.xml (deleted) —
Attached file ftang's bug.css (deleted) —
Attached file ftang's correct.css (deleted) —
Boris: another charset bug... Do you want to take it?
Target Milestone: --- → mozilla1.0
um.... Let me wrap up my other ones first.. I have no idea where to even start on this one. But I'll keep it in mind. :)
using build 2001100903 win32 both testcases do not work. Is @charset also broken?
Blocks: 104166
I think files were converted when uploaded with non-Japanese browser encoding or something like that. I zipped up the original 4 files and attached it above. Unarchive the file with WinZip and you should see the @charset working with correct.xml & correct.css files.
Keywords: nsbeta1
ok, with the zip attachment it works.
Summary: loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT] → [charset]loading stylesheet in xml by using <?xml-stylesheet do not listen to charset parameter [IMPORT]
Frank, I have the fix for bug 72658 in my tree, so the included testcase worksforme (since the document charset and the stylesheet charset are the same). I can disable that code while I work on this, but could you possibly create a stylesheet in a _different_ charset from the document for testing and verification purposes?
Explanation of test cases from "attachcomments.txt" file: The following test should be conducted with default browser encoding set to Western (ISO-8859-1), Edit | Prefs | Navigator | Languages. auto-detection must be OFF. ** For test cases 1-4: Element names are in Japanese: all XML & CSS files are in Shift_JIS Japanese. 1. shiftjisA.xml/shiftjisa.css -- stylesheet charset; no @charset in .css file. (Patch for 72658 or patch for this bug should work) 2. shiftjisB.xml/shiftjisb.css -- no stylesheet charset; @charset in .css file. (should work now with no patches) 3. shiftjisC.xml/shiftjisc.css -- stylesheet charset; @charset in .css file. (should work now with no patches) 4. shiftjisD.xml/shiftjisd.css -- no stylesheet charset; no @charset in .css file. (Only the patch for 72658 can fix this problem.) ** All CSS files in the following tests are encoded in UTF-8. XML files are either in Shift_JIS Japanese or UTF-8. 5. utf8a.xml/utf8a.css -- XML in Shift_JIS; stylesheet charset=UTF-8; no @charset in .css file. (Color style works because the element names are in ASCII. Character display is incorrect. Only Patch for this bug can fix the latter problem.) 6. utf8b.xml/utf8b.css -- XML in Shift_JIS; stylesheet charset=UTF-8; no @charset in .css file. (NO styling applied. Unlike 5, element names are UTF-8 Japanese in .css file. Only patch for this bug can fix it.) 7. utf8c.xml/utf8c.css -- XML in Shift_JIS; no stylesheet charset; @charset exists in .css file. Element names in UTF-8 Japanese in .css file. (This should work now without any patches) 8. utf8d.xml/utf8d.css -- XML doc in UTF-8 but no encoding declaration; no stylesheet charset; no @charset in .css file. Element names in UTF-8 Japanese in .css file. (Only the patch for 72658 can fix this problem.) 9. utf8e.xml/utf8e.css -- XML doc in UTF-8 but no encoding declaration; no stylesheet charset; @charset=UTF-8 in .css file. Element names in UTF-8 Japanese in .css file. (This should work now without any patches.) Test cases 5 & 6 can be viwed correctly only with the fix for this bug. Test cases 4 & 8 can be correctly viwed only with the fix for Bug 72658. Test case 1 can be fixed with the patch for this bug or Bug 72658. ** These test cases also show that Mozilla can handle non-ASCII element names in CSS definitions. (IE6 cannot currently.) Mozilla can also handle non-ASCII attribute names, values, and IDs in CSS definitions but these are not in the current test cases.
Thanks for the testcases! My build currently passes all of them except utf8d.xml/utf8d.css At a guess, this is because the stylesheet is loaded _before_ we've done charset sniffing on the XML document (I assume that's how we get the XML doc's charset). In particular, we ask the document for its charset in that case and the document tells us that it's in ISO-8859-1....
I'll take this one after all... :) Patch fixes this and also bug 72658 and bug 83207
Assignee: pierre → bzbarsky
Keywords: patch, review
The comment below is not correct, since the default charset of an XML document is UTF-8. I would advice deleting ", falling back to ISO-8869-1". + // NOTE: the SetCharset method will always get the preferred + // charset from the charset passed in unless it is the + // emptystring, which causes the default charset (that of the + // document, falling back to ISO-8869-1) to be set
Hmm.. Perhaps I should clarify that to: "that of the document, falling back to ISO-8859-1 if no document is present" But that being said, would UTF-8 be a more reasonable fallback for the default charset if we have absolutely no other way of getting it?
Blocks: 83207
The default charset for XML is UTF-8, I have no idea what the default charset for CSS would be. See if the spec has anything to say. If not, I think ISO-8859-1 is good for CSS.
according to http://www.w3.org/TR/REC-CSS2/syndata.html#q23 <quote> When a style sheet resides in a separate file, user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest): 1. An HTTP "charset" parameter in a "Content-Type" field. 2. The @charset at-rule. 3. Mechanisms of the language of the referencing document (e.g., in HTML, the "charset" attribute of the LINK element). </quote>
Yes. And 4. Use the document's character encoding What's the fallback in case all of 1-4 fail, though? (yes, we _do_ have a case in which this is necessary due to other issues that are sort of outside the scope of this bug, imo).
Oops! Quoted the wrong part. Wanted to quote this part: <quote> For transmission and storage, these characters must be encoded by a character encoding that supports the set of characters available in US-ASCII(e.g., ISO 8859-x, SHIFT JIS, etc.). </quote> it doesn't say what should be the default though. I'm wondering if it should be the same as however moz treat html pages?
Ok.... tracing through the code, the _only_ time that we actually need that #5 fallback is when we are loading the agent sheets. There are ways to restructure the code that would make this fallback unnecessary, as I said. Not going to do it as part of this patch. But our internal sheets are fine in ISO-8859-1. So we can just leave it at that. So, with my proposed change to that comment, reviews?
> What's the fallback in case all of 1-4 fail, though? > (yes, we _do_ have a case in which this is necessary > due to other issues that are sort of outside the > scope of this bug, imo). I meant testcase #8 to prove that whatever current document encoding determined by the browser should propagate into unlabaled (in terms of charset/encoding) CSS files. You should just check what the final document encoding is and then use that for CSS, too. My intent was that that encoding should be UTF-8 as required by XML 1.0. NOT ISO-8859-1.
Yep. That's what my patch does. The XML document was actually reporting its own encoding incorrectly. That's what my change to nsXMLDocument.cpp fixes.
Replace fprintf(stderr) with a debug macro. We have a couple of other instances of 'stderr' in nsCSSLoader that need to be removed. Rename parameters to OnStreamComplete() as "a-Uppercase" (ie. aContext, aString...) Why in CSSLoaderImpl::SetCharset() do you look for "@charset" in strStyleDataUndecoded instead of in aStyleSheetData directly?
Comment on attachment 53578 [details] [diff] [review] Proposed patch (works correctly on all of the attached testcases) r=pierre with minor changes above
Attachment #53578 - Flags: review+
Oops. the stderr was not meant to be in there at all. removed. :) Parameters renamed. aStyleSheetData is a char*. It used to be a nsString, but I've changed that... basically, the creation of the nsString moved from OnStreamComplete to SetCharset(). bug 80106 will address further improvements to how we parse @charset; that's what I plan to work on once this is done... Or did I misunderstand the comment?
Comment on attachment 53578 [details] [diff] [review] Proposed patch (works correctly on all of the attached testcases) sr=attinasi
Attachment #53578 - Flags: superreview+
Checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
QA Contact: petersen → rakeshmishra
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: