Closed Bug 255820 Opened 20 years ago Closed 18 years ago

[FIX]Reloading document.written document screws up its charset

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9alpha5

People

(Reporter: arielladog, Assigned: bzbarsky)

References

Details

Attachments

(5 files, 1 obsolete file)

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3) Gecko/20040816 Create an iframe and update its contents with some HTML, that includes the following code: <META http-equiv=Content-Type content="text/html; charset=us-ascii"> Then any external style-sheets on the iframe document don't get applied. Reproducible: Always Steps to Reproduce: 1. Create an external style sheet (in example below demoPrint.css) 2. Run the following code: x = document.createElement("iframe"); document.body.appendChild(x); var res = '<HTML><HEAD>'; res += '<link rel="stylesheet" type="text/css" href="demoPrint.css"/>'; res += '<META http-equiv=Content-Type content="text/html; charset=us- ascii">'; res += '</HEAD><BODY>asdfjljadf</BODY></HTML>'; var d = x.contentDocument; d.open('text/html'); d.write(res); d.close(); Actual Results: Your external style sheet doesn't get applied Expected Results: You'd expect your external style-sheet to get applied If you don't use an external style-sheet the CSS comes out fine. Also, if you get rid of the charset=us-ascii, the external style-sheet is applied.
BTW, if the content isn't dynamic (an iframe loading a document), then the external style-sheet seems to get applied. This bug probably doesn't affect many people, but it took me a long time to figure out exactly what was going wrong and is very annoying.
Assignee: nobody → dbaron
Component: Layout: HTML Frames → Style System (CSS)
QA Contact: core.layout.html-frames → ian
Do you have a testcase that demonstrates the problem? Does the stylesheet have non-ASCII characters in it?
I don't know how to do a testcase with an external stylesheet and put it up here. ';ve just been using something real simple in demoPrint.css: body{ background:blue; } Again, as an inline style it works fine, but as an external one, it doesn't.
(In reply to comment #3) > I don't know how to do a testcase with an external stylesheet and put it up > here. ';ve just been using something real simple in demoPrint.css: two steps: upload your CSS as attachment: http://bugzilla.mozilla.org/attachment.cgi?bugid=255820&action=enter then get the address of the attachment and use it in your HTML, the upload HTML as attachment. The box for creating attachments is just above the box where you write your comments.
Attached file External CSS File (deleted) —
Attached file Test Case (obsolete) (deleted) —
This is the testcase discussed.
Attached file Better Test Case (deleted) —
This is a better test-case because it includes three test cases: 1. With charset=us-ascii and external style-sheet (doesn't work) 2. With external style-sheet and without charset=us-ascii (works) 3. With charset=us-ascii and internal CSS (works)
Attachment #156318 - Attachment is obsolete: true
Can anyone confirm that they are seeing the same results and that there's nothing wrong with the test case?
anyone?
Attached file Minimal-ish testcase (deleted) —
In this testcase, the document should be reported as "ISO-8859-1", and instead it's being reported as UTF-16. With the sheet testcases, that causes the sheet to be parsed as UTF-16, which naturally fails.
More data: 1) Using "ISO-8859-1" in the charset meta tag gives me ISO-8859-1. Using "UTF-8" gives me UTF-16. Using a bogus encoding (eg "us-asciii") gives me ISO-8859-1. 2) This happens whenever the <meta> tag is preceded by a parser-blocking tag (eg <script> or <style>). Note that it's considered good practice to have the charset meta tag as the first thing in the head, since otherwise behavior is not well-defined. 3) Doing this "normally" (not via document.write) doesn't have this charset problem. I'm betting the UTF-16 comes from a wyciwyg channel somehow. I'm still looking into why we end up with a wyciwyg channel (the meta charset observer has code to prevent reloading when we're in document.write, after all).
There are three "issues" (the third one is not an issue) going on here: 1) If the parser blocks (like in these cases), then by the time we get to the meta tag in the content sink the document is no longer writing. So we don't set the DOCUMENT_WRITE flag in HTMLContentSink::NotifyTagObservers and the meta tag ends up triggering a document reload. We should probably fix this. 2) When we reload, we end up using the charset of the wyciwyg channel instead of the meta charset. This is sorta correct, since we're parsing wyciwyg data at that point... but screws up this case. Perhaps we should save the wyciwyg data in the document-native encoding (and save the encoding name in the cache entry)? 3) The reason ISO-8859-1 works (while us-ascii does not) is that the meta charset observer doesn't do a reload if the charset is the same as the charset that's already set, and ISO-8859-1 is what we default to, with my intl settings. Note that none of this is an issue for documents which follow the HTML spec's recommendation of having the <meta> charset tag before any other content. Over to DOM for issues #1 and #2. We need to fix them both, I think...
Assignee: dbaron → general
Status: UNCONFIRMED → NEW
Component: Style System (CSS) → DOM: Level 0
Ever confirmed: true
2) does not work, because you can document.write characters outside of the document encoding... like document.write("\u1234");
Then we somehow need to set the DocumentCharacterSet() to the original charset, I think (while parsing as UTF-16).
Blocks: 376760
Attached patch Proposed fix (deleted) — Splinter Review
As advertised. This basically stores the charset and source as needed in the wyciwyg cache entry... We need more tests for all this charset stuff in nsHTMLDocument. :(
Assignee: general → bzbarsky
Status: NEW → ASSIGNED
Attachment #264079 - Flags: superreview?(jst)
Attachment #264079 - Flags: review?(jst)
OS: Windows XP → All
Priority: -- → P2
Hardware: PC → All
Summary: Inserting dynamic content into <iframe> with charset=us-ascii causes external style sheets to not be applied → [FIX]Reloading document.written document screws up its charset
Target Milestone: --- → mozilla1.9alpha5
Comment on attachment 264079 [details] [diff] [review] Proposed fix r+sr=jst
Attachment #264079 - Flags: superreview?(jst)
Attachment #264079 - Flags: superreview+
Attachment #264079 - Flags: review?(jst)
Attachment #264079 - Flags: review+
Checked in, with these two lines added to the IsXHTML() branch after TryChannelCharset: parserCharsetSource = charsetSource; parserCharset = charset; I should write a test for that...
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Checked in the test for that as well.
How does the document.written() charset end up on the wyciwyg channel? By setting a breakpoint in nsWyciwygChannel::SetCharsetAndSource, I don't see a stack document.write() on it using the old parser. (I must be missing something obvious but what?)
> I don't see a stack document.write() on it using the old parser. I'm not sure what you mean... The SetCharsetAndSource call happens, with the old parser, when the <meta> tag is parsed.
Attached file Stacks from debugger (deleted) —
(In reply to comment #23) > > I don't see a stack document.write() on it using the old parser. > > I'm not sure what you mean... > > The SetCharsetAndSource call happens, with the old parser, when the <meta> tag > is parsed. I tried this again. I feel I'm still missing something obvious but I fail to see what. I set a breakpoint in nsWyciwygChannel::SetCharsetAndSource and ran the test case with the old parser. Then I copied all the stacks that hit the breakpoint (see attached). I don't see nsParser on any of the stacks.
Hmm. Indeed. It looks like in this case the charset (and "meta tag" source) comes from nsHTMLDocument::TryHintCharset via aMarkupDV->GetHintCharacterSet and aMarkupDV->GetHintCharacterSetSource respectively. Those hints are set in nsDocShell::ReloadDocument, called from the nsMetaCharsetObserver::Notify stuff.
I debugged this some more. The script at http://mxr.mozilla.org/mozilla-central/source/content/html/document/test/test_bug255820.html?force=1#86 runs *thrice* (i.e. f2 triggers *two* reloads: one charset reload and one location.reload()). That surprised me quite a bit. Is it intentional and desirable to avoid a charset reload when a document.written meta charset is parsed synchronously but to let it cause a charset reload when parsed from a continue event after unblocking the parser?
Furthermore, document.close(); makes the document finish and drop its wyciwyg channel. Would it be appropriate to make the document hold onto its wyciwyg channel until DOMContentLoaded? That way, the HTML5 parser would have a chance to get the charset info written on the cache entry without renavigating a document.open()ed stream without an explicit author request to renavigate.
I'm not sure I'm quite following the question in comment 26. As far as comment 27, what happens if an open() happens between close() and DOMContentLoaded?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: