Closed Bug 493425 Opened 16 years ago Closed 16 years ago

HTML Entities are parsed even if they miss the closing semicolon

Categories

(Firefox :: General, defect)

3.5 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 474670

People

(Reporter: etrapani, Unassigned)

Details

If a page has a sequence such as "this is a pa&ge in html" Firefox renders the &ge as if it were the entity ≥ but it should not since there is no semicolon at the end. In the file http://hg.mozilla.org/releases/mozilla-1.9.1/file/afac8b5958bc/parser/htmlparser/src/nsHTMLEntities.cpp , line 188 you can read: //this little piece of code exists because entities may or may not have the terminating ';'. But the standard does not seem to allow that.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
(In reply to comment #0) > But the standard does not seem to allow that. Entity name is implicitly terminated by special character(character which is not usable as entity name. e.g. # = Space) or at end_of_data, and is explicitly terminated by ";". It's SGML rule which HTML 4 bases on. HTML 4? Or Next HTML 5? "pa&ge" in <textarea> value? Or other such as <a href="...pa&ge...">pa&ge...</a>? If HTML 4, DUP of Bug 155047 in any HTML source case. Bug 474670(for <textarea> value case) is also DUP of Bug 155047. But that bug is still kept open, because no answer to next question by bug opener of Bug 474670. > For me the question is if the content of a textarea is part of the DOM or not. > If not the content should not be interpreted at all. If you are talking about new HTML 5, I can say nothing. HTML 5 leaved from SGML spec, but I don't know definition/spec of "entity name" by HTML 5.
But it wasn't a text area. It was the rendering of a .po file in HTML in a table cell. The tool I use to translate Firefox showed the "greater or equal" character and, after a search for a bug I discovered that it was a rendering problem. I checked with browsershots.org and IE based browsers and Opera shows it right (as I see it), that is "pa&ge in new tab" instead of "pa≥ in new tab"
(In reply to comment #3) > I checked with browsershots.org and IE based browsers and Opera shows it right (as I see it) As for HTML 4, Firefox is "right"(respects SGML), and others are "not right"(doesn't respect SGML). See Bug 155047, please. W3C stops to base on SGML from HTML 5. > http://en.wikipedia.org/wiki/XHTML_5#New_markup > The HTML5 syntax is no longer based on SGML despite its markup being very close. This is because no other browsers respect SGML at least for "character entity name"(your case) and "comment"("--" between "<!--" and "-->" issue. see bug 214476). I guess ";" will be changed to "mandatory terminator" by HTML 5. I think your bug is better to be re-opened, and kept as "; of character entity name" version of bug 214476, after checking spec of HTML 5.
HTML 5 defines "Character references" as follows. > http://www.w3.org/TR/html5/syntax.html#character-references > 8.1.4 Character references >(snip) > Named character references >(snip) > The ampersand must be followed by one of the names given in the named character references section, using the same case. > The name must be one that is terminated by a U+003B SEMICOLON (;) character. > Decimal numeric character reference >(snip) > The digits must then be followed by a U+003B SEMICOLON character (;). > Hexadecimal numeric character reference >(snip) > The digits must then be followed by a U+003B SEMICOLON character (;). Bug 253034 is for "HTML5 compliant entity parsing" of "Hexadecimal numeric character reference". > Bug 253034 : Entity parsing not HTML5 compliant Bug 373864 is request of early landing of "HTML5 compliant parser". > Bug 373864 : (html5-parsing) Replace HTML parser with an HTML5 parser Bug 385776 has similar bug summary to yours, but looks to be focusing on different issue(problem in specific "Named character reference").
You need to log in before you can comment on or make changes to this bug.