Closed Bug 675499 Opened 13 years ago Closed 13 years ago

in a utf-8 page, a utf-8 encoded NBSP (c2a0) from a linked css with unspecified encoding is decoded and rendered as latin

Categories

(Core :: DOM: HTML Parser, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: al_9x, Assigned: hsivonen)

References

()

Details

(Whiteboard: [inbound])

Attachments

(2 files)

Fx 5.0.1, new profile load http://es5.github.com/ "NOTEÂ Â Â " will be seen at the beginning and throughout the three utf-8 encoded NBSP following NOTE are coming from a linked css whose encoding is not specified (http, bom, @charset) .note::after { content: "   "; } according to http://www.w3.org/TR/CSS2/syndata.html#charset the css should inherit the encoding (if not specified) of the doc, which is utf-8
Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 Mozilla/5.0 (Windows NT 5.1; rv:8.0a1) Gecko/20110729 Firefox/8.0a1 > "NOTEÂ Â Â " will be seen at the beginning cannot reproduce
Component: Layout: Text → General
QA Contact: layout.fonts-and-text → general
Attached image new profile (deleted) —
8.0a1 (2011-07-31) as well
wfm with Mozilla/5.0 (Windows NT 6.1; rv:8.0a1) Gecko/20110731 Firefox/8.0a1 SeaMonkey/2.5a1
If I copy and load the page (and css) locally (file:) I can still repro, but if I remove all but the first DIV from the body, the problem goes away. The page is large, perhaps it's a question of timing: the determination of the page's charset is not being made quickly enough (due to its size) so that when the presumably async load of the css tries to determine its own charset, the page still thinks it's latin (which is the default)
the <meta charset="utf-8"> is at the top jut before the css link.
Component: General → Style System (CSS)
QA Contact: general → style-system
(In reply to comment #6) > the <meta charset="utf-8"> is at the top jut before the css link. Nevertheless, the problem is present when the page is large and slower to load and absent when small.
Henri, it looks like the style preload (off of nsHtml5LoadFlusher::Run calling FlushSpeculativeLoads) happens before the document charset is set (that comes via nsHtml5ExecutorFlusher::Run calling nsHtml5TreeOpExecutor::RunFlushLoop). Is that sort of expected? I can add the current document charset charset to the hashtable key, I suppose, but it's a little weird to have the preload start (and presumably get as far as OnDetermineCharset, so the data is already partially available!) before the charset is set on the document.
Status: UNCONFIRMED → NEW
Component: Style System (CSS) → HTML: Parser
Ever confirmed: true
QA Contact: style-system → parser
(In reply to comment #8) > Henri, it looks like the style preload (off of nsHtml5LoadFlusher::Run > calling FlushSpeculativeLoads) happens before the document charset is set > (that comes via nsHtml5ExecutorFlusher::Run calling > nsHtml5TreeOpExecutor::RunFlushLoop). > > Is that sort of expected? Given the code, one can expect this to happen, but this is an implementation error on my part. eTreeOpSetDocumentCharset shouldn't be a tree op in the first place. It should be in the preload op queue instead.
Assignee: nobody → hsivonen
I don't see the problem in action on my computer, but by code inspection, the bug is clearly there. Due to the race condition nature of the bug, there isn't a way to write a deterministic test case for this. Anyway, let's use this fix if the tryserver is OK with it.
I can in fact reproduce the problem, and the attached patch fixes it.
Comment on attachment 550040 [details] [diff] [review] Move charset setting to the other op queue >+++ b/parser/html/nsHtml5SpeculativeLoad.cpp >+ aExecutor->SetDocumentCharsetAndSource(narrowName, >+ intSource); Fix indent, please? >+++ b/parser/html/nsHtml5SpeculativeLoad.h >+ * queue as opposed to tree operation queue is that the manifest must get "manifest"? You mean "charset change"? >+ * processed before any actually speculative loads such as style sheets. s/actually/actual/ ? >+ * If mOpCode is eSpeculativeLoadImage, this is the value of the >+ * "crossorigin" attribute. If mOpCode is eSpeculativeLoadStyle >+ * or eSpeculativeLoadScript then this is the value of the >+ * "charset" attribute. Otherwise it's empty. This needs fixing for the new eSpeculativeLoadSetDocumentCharset opcode. r=me with those nits.
Attachment #550040 - Flags: review?(bzbarsky) → review+
Status: NEW → ASSIGNED
Whiteboard: [inbound]
Landed followup to fix maemo build bustage: https://bugzilla.mozilla.org/show_bug.cgi?id=675499 e.g. http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312310251.1312310578.31642.gz > /builds/slave/m-in-lnx-maemo5-gtk/build/parser/html/nsHtml5SpeculativeLoad.h:54: error: comma at end of enumerator list
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: