Closed
Bug 373864
(html5-parsing)
Opened 18 years ago
Closed 15 years ago
Replace HTML parser with an HTML5 parser
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a5
People
(Reporter: sayrer, Assigned: hsivonen)
References
(Depends on 2 open bugs, Blocks 4 open bugs, )
Details
Attachments
(4 files, 6 obsolete files)
No idea on a target for this, but we have many failing tests from the html5lib suite.
Reporter | ||
Updated•18 years ago
|
Summary: convert the html parser web applications 1.0 algorithm → convert the html parser to the web applications 1.0 algorithm
Comment 1•18 years ago
|
||
mrbkap had looked into this last summer.
/be
Reporter | ||
Comment 2•18 years ago
|
||
(In reply to comment #1)
>
> mrbkap had looked into this last summer.
I also pinged him on bug 328930, but after I signed off irc, I realized I wasn't clear if he wanted to do that himself, or just wanted it done. This will make a fine tracking bug if working on the parser becomes a priority.
Comment 3•18 years ago
|
||
I certainly want to work on this. The problem is that I simply don't have time to do so between school and my other bugs (the new residual style stuff ended up as P3 for Firefox 3).
Updated•16 years ago
|
Depends on: 385776
Summary: convert the html parser to the web applications 1.0 algorithm → Convert the HTML parser to the Web Applications 1.0 (HTML 5) algorithm
Assignee | ||
Comment 4•16 years ago
|
||
Mozilla Corporation has contracted me to translate the Validator.nu HTML parser into Gecko-compatible C++.
Assignee | ||
Comment 5•16 years ago
|
||
Assigning to me and adjusting summary to plan.
Alias: html5-parsing
Assignee: sayrer → hsivonen
Summary: Convert the HTML parser to the Web Applications 1.0 (HTML 5) algorithm → Replace HTML parser with an HTML5 parser
Assignee | ||
Comment 6•16 years ago
|
||
Instead of making this a tracking bug, bugs pertaining to the HTML5 parser repo have "[HTML5] " at the start of their summary.
Comment 7•16 years ago
|
||
Henri, if you don't mind, I'm going to make HTML parser bugs that will be resolved (either as FIXED or WONTFIX) as dependencies of this one so they don't get lost.
Assignee | ||
Comment 8•16 years ago
|
||
(In reply to comment #7)
> Henri, if you don't mind, I'm going to make HTML parser bugs that will be
> resolved (either as FIXED or WONTFIX) as dependencies of this one so they don't
> get lost.
OK. Makes sense.
Assignee | ||
Updated•16 years ago
|
Depends on: html5-keygen
Assignee | ||
Updated•16 years ago
|
Depends on: html5-parsing-land
Updated•15 years ago
|
Comment 9•15 years ago
|
||
These are 2,722 html5 tokenizer tests from the html5lib suite, converted to mochitest using the parser harness from bug 366936. There are 18 failures, attached in separate result file. Please let me know if you'd like these submitted as bugs and/or added to the known failures list.
I've included most of the tests from the tokenizer suite, but omitted the following:
- xmlViolation.test
- contentModelFlags.test
- tests involving invalid unicode chars
- tests involving multi-byte unicode chars
The latter two classes were excluded because they cause exceptions in the parser sub-harness, in the call to btoa().
I've made a few changes to the parser sub-harness to enable it to work with these files, and have verified that the changes don't interfere with the original treebuilder tests.
Attachment #389048 -
Flags: review?(hsivonen)
Comment 10•15 years ago
|
||
Assignee | ||
Comment 11•15 years ago
|
||
(In reply to comment #9)
> Created an attachment (id=389048) [details]
> mochitest test cases for html5lib tokenizer tests
>
> These are 2,722 html5 tokenizer tests from the html5lib suite, converted to
> mochitest using the parser harness from bug 366936.
Great! Thank you!
> There are 18 failures,
> attached in separate result file. Please let me know if you'd like these
> submitted as bugs and/or added to the known failures list.
The tests that fail and whose description mentions [R]CDATA (or just CDATA) haven't been converted to tree builder tests correctly. The original tokenizer tests have additional JSON items for initializing the tokenizer to the RCDATA or the CDATA state and for setting up the element name that the tokenizer should look for to end the [R]CDATA run.
To convert these tests (from escapeFlag.test, it seems) to tree builder tests correctly, you need to add an <xmp> start tag. This tests the CDATA case. To test the RCDATA case, you need to replace "xmp" with "textarea" everywhere in the tests.
I don't see why "U+0080 in lookahead region" failed. Odd.
The other failures are genuine bugs that definitely need addressing. I'm particularly surprised at the failures related to CR and and the failures related to bogus entities.
Updated•15 years ago
|
Attachment #389049 -
Attachment is obsolete: true
Comment 12•15 years ago
|
||
This is a much better implementation of the html5lib tokenizer tests. It correctly handles unicode characters and other escaped characters. I omitted only tests for 0xd800 and 0xdfff, as the server-side js that produces the test markup contains an algorithm for generating utf-8 that doesn't correctly handle these invalid characters.
I've made some modifications to a few of the tests (e.g., the escapeFlag and contentModelFlags tests) to account for the fact that the tokenizer output is run through the tree builder, as Henri notes above.
There are 3,375 tests in total with only 4 failures; see next attachment.
Attachment #389048 -
Attachment is obsolete: true
Attachment #390501 -
Flags: review?(hsivonen)
Attachment #389048 -
Flags: review?(hsivonen)
Comment 13•15 years ago
|
||
Comment 15•15 years ago
|
||
I've added to this patch a refresh of the tree builder tests which already existed in the repo. I've refreshed the existing tests from html5lib, and added new tests which weren't being run. I've also updated the parser sub-harness to better handle MathML and SVG elements, and to know to skip #document-fragment tests. There doesn't seem to be any clean way of running fragment tests without direct access to the parser; if anyone can think of a way, let me know and I'll add them. Test results attached separately.
Attachment #390501 -
Attachment is obsolete: true
Attachment #391186 -
Flags: review?(hsivonen)
Attachment #390501 -
Flags: review?(hsivonen)
Comment 16•15 years ago
|
||
Attached are the tree builder test results obtained from running the updated tree builder tests against the HTML5 parser. The results can be summarized as follows:
- only 1 known failure still fails (an <isindex> case)
- 7 tests fail from regressions.txt, but they all appear to be correct according to the HTML5 parsing algorithm. Henri, can you review these? If they are correct I will update them in regressions.txt
- 1 failure involving <keygen>; should this element appear in the DOM? Maybe not, see bug 101019.
- 8 failures in MathML and SVG tests involving xlink:href and xml:lang attributes. Henri, can you review and let me know if the tests need updating or if a bug should be filed?
Comment 17•15 years ago
|
||
Here's another update of the html5lib tests. This update allows Java->C++ tests to be run; test_html5_tree_construction_js_compare.html runs all of the tree construction tests by comparing their output in Gecko vs the JS validator from Henri. The js_compare mode ignores comments and doctypes, since the JS version of the parser does not emit those. Results of this are attached separately; there are currently 13 failures. Most of the failures are due to the fact that the JS parser doesn't seem to be able to handle xml:lang or xlink:href correctly, but the first two failures are potentially interesting.
Attachment #391186 -
Attachment is obsolete: true
Attachment #391458 -
Flags: review?(hsivonen)
Attachment #391186 -
Flags: review?(hsivonen)
Comment 18•15 years ago
|
||
Assignee | ||
Comment 19•15 years ago
|
||
Comment on attachment 391458 [details] [diff] [review]
mochitest test cases for html5lib parser tests (v4)
I feel I'm too inexperienced with Mochitest coding patterns to review this properly. sayrer wrote the code originally, so I think he'd be a better reviewer than me.
However, here are some observations:
* The dumpTree() method looks a bit odd. Why the global state variable? Is the purpose to defer the action somehow?
* Is reorderToMatchExpected() for the old parser only? I'd expect it to be unnecessary with the new parser.
* In principtle, (walker.currentNode.namespaceURI.toLowerCase().indexOf("math") != -1) should be (walker.currentNode.namespaceURI == "http://www.w3.org/1998/Math/MathML") and likewise (walker.currentNode.namespaceURI == "http://www.w3.org/2000/svg") for SVG. Probably doesn't matter in practice now, but if code happened to somehow break in a way that makes it matter, it would be good to have the right namespace tests in the harness.
* html5lib_license.txt should probably be updated to contain a fresh copy of the upstream copyright year and contributor list.
Attachment #391458 -
Flags: review?(hsivonen) → review?(sayrer)
Reporter | ||
Updated•15 years ago
|
Attachment #391458 -
Flags: review?(sayrer) → review-
Reporter | ||
Comment 20•15 years ago
|
||
Comment on attachment 391458 [details] [diff] [review]
mochitest test cases for html5lib parser tests (v4)
we shouldn't have this stuff conditioned on a bunch of global variables. please refactor to avoid them. looks good otherwise.
iirc, reorderToMatchExpected was required because the original test suite from html5lib depended on python's dictionary ordering details.
Updated•15 years ago
|
Updated•15 years ago
|
Comment 21•15 years ago
|
||
Completely refactored code in parser*.js to avoid global variables. Also, I verified that reorderToMatchExpected() has no effect on current tests so I removed it.
Attachment #391458 -
Attachment is obsolete: true
Attachment #393791 -
Flags: review?
Updated•15 years ago
|
Attachment #393791 -
Flags: review? → review?(sayrer)
Reporter | ||
Updated•15 years ago
|
Attachment #393791 -
Flags: review?(sayrer) → review+
Comment 22•15 years ago
|
||
How should we handle the current test failures? Shall I flag them as 'todo' so that they won't show as failures on tinderbox, and check them in, so that the tests can start running and catch regressions? Or would you prefer to leave them out of the repo for now?
Assignee | ||
Comment 23•15 years ago
|
||
I'd prefer checking in with todo flags.
Comment 24•15 years ago
|
||
I've updated the tests to mark current failures as 'todo' and pushed as http://hg.mozilla.org/mozilla-central/rev/fc7d931fd75b.
Comment 25•15 years ago
|
||
(In reply to comment #24)
> I've updated the tests to mark current failures as 'todo' and pushed as
> http://hg.mozilla.org/mozilla-central/rev/fc7d931fd75b.
Backed out because of persistent leaks. E.g. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250318073.1250321057.30450.gz
Assignee | ||
Updated•15 years ago
|
Updated•15 years ago
|
Comment 26•15 years ago
|
||
Is there a bug on file about turning this on by default? I can't find one and just wanted to see what the plan was for turning this on by default. A few users (including myself) agree that we all haven't seen any major issues with the HTML5 parser is a long time and perhaps this could be enabled by default to get wider testing.
Assignee | ||
Comment 27•15 years ago
|
||
(In reply to comment #26)
> Is there a bug on file about turning this on by default?
None other than this one, though one could argue that this bug as currently worded is about more than flipping the pref by default.
> I can't find one and
> just wanted to see what the plan was for turning this on by default.
I have a shared bugzilla query named "HTML5 Parsing Blocker". I'm working on the assumption that the length of that list needs to go to zero before the pref can be flipped by default.
In addition to items on that list, there needs to be a new "slow tp" test suite that tests tp4 over a simulated slow network. I don't know how that item is progressing. Then the parser core needs to pass code review (blocking on me to write more documentation).
> A few
> users (including myself) agree that we all haven't seen any major issues with
> the HTML5 parser is a long time and perhaps this could be enabled by default to
> get wider testing.
I think the HTML5 parser on the trunk is in good enough shape to be turned on by trunk testers. The problem is that it isn't in good enough shape for Mochitest.
Assignee | ||
Updated•15 years ago
|
No longer depends on: html5-keygen
Comment 28•15 years ago
|
||
(In reply to comment #27)
> (In reply to comment #26)
> > Is there a bug on file about turning this on by default?
>
> None other than this one, though one could argue that this bug as currently
> worded is about more than flipping the pref by default.
>
> I think the HTML5 parser on the trunk is in good enough shape to be turned on
> by trunk testers. The problem is that it isn't in good enough shape for
> Mochitest.
So, I sonder if what we need here is for a bug to be opened on flipping the preference with all the mochitest issues being in bugs that block it.
It seems without that, there is no real direction to developers on what needs ot be done to get this enabled by default.
Assignee | ||
Comment 29•15 years ago
|
||
Enabled by default now:
http://hg.mozilla.org/mozilla-central/rev/129e19d979f0
I'm leaving this open as a tracking bug.
Assignee | ||
Comment 30•15 years ago
|
||
Had to revert this yesterday:
http://hg.mozilla.org/mozilla-central/rev/ccb50d524490
Assignee | ||
Comment 31•15 years ago
|
||
Enabled by default again:
http://hg.mozilla.org/mozilla-central/rev/358113b3642e
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•15 years ago
|
Updated•15 years ago
|
Target Milestone: --- → mozilla1.9.3a5
You need to log in
before you can comment on or make changes to this bug.
Description
•