Closed
Bug 70828
Opened 24 years ago
Closed 20 years ago
Tags that are not closed by the end of the document do strange things.
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
RESOLVED
FIXED
Future
People
(Reporter: ericmao, Assigned: mrbkap)
References
Details
(Keywords: testcase)
Attachments
(5 files, 1 obsolete file)
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
patch
|
rbs
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rbs
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
I created a simple HTML file:
<html> <body bgcolor="white" "bogus extra stuff""""> </body> </html>
I opened the file up in Mozilla 0.8 under Win2K, and went to View Page Source.
I expected to see the same HTML code that I typed in, despite the fact that the
HTML code contained errors. Instead, some of the double-quotes got cleaned up:
<html> <body bgcolor="white" bogus extra stuff> </body> </html>
Comment 1•24 years ago
|
||
Same in here, under Linux. The question is whether it's a bug: a html file
should look like that: <TAG OPT1="value1"> and not <TAG "value1"> ...
Comment 2•24 years ago
|
||
sending to parser for triage
Assignee: ben → harishd
Component: XP Apps: GUI Features → Parser
QA Contact: sairuh → janc
Comment 3•24 years ago
|
||
Confirming as per user comment, setting OS=All.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 2000 → All
Comment 4•24 years ago
|
||
View source should be showing the source as it really is -- otherwise its
usefulness for debugging web pages is not very great (and why does anyone ever
look at view source otherwise?).
because we want to see what content mozilla decided was worth parsing ;-b
otherwise, we'd use telnet to get the real content :) *sigh* this is at least
normal severity.
Severity: minor → normal
Keywords: dataloss
Updated•24 years ago
|
QA Contact: janc → bsharma
I appreciate that this is truly annoying, but can it actually alter the
semantics of content so that developers are misled rather than just
inconvenienced? Futuring for now, will reconsider given a compelling test case.
Target Milestone: --- → Future
Comment 7•24 years ago
|
||
Comment 8•24 years ago
|
||
The attached testcase sucks as HTML goes. Nevertheless...
Moving the closing quote to before </html> will make things work OK. But the
fact remains that view source just silently lost half the source of that page,
and more importantly has lost all indications that an error of any kind has occured.
I'm not completely sure that it's the same bug, but it seems related...
Comment 9•23 years ago
|
||
*** This bug has been marked as a duplicate of 57724 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Comment 10•23 years ago
|
||
Reopening 57724 dependencies for independent resolution.
Comment 11•23 years ago
|
||
Outer single quotes are converted to double quotes both in "View->Source" and
via the "Save Page As..." dialogue. In the latter case, the saved page will
not work because this change breaks the nested quotes.
Comment 12•23 years ago
|
||
Roland, I cannot reproduce the error you describe when viewing the source of
that attachment (linux trunk build 2002-05-01-21). I see single quotes...
The "save as" issue is a separate issue (with the "web page, complete" mode
only) and should be filed as a separate bug.
Comment 13•22 years ago
|
||
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity. Only changing open bugs to
minimize unnecessary spam. Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Comment 14•22 years ago
|
||
Please actually read bugs before changing the severity, ok?
Comment 15•21 years ago
|
||
See also comment #3 of this bug: 223753
Comment 16•21 years ago
|
||
*** Bug 233609 has been marked as a duplicate of this bug. ***
Comment 17•20 years ago
|
||
This "cleaning up" of the source before interpretation can also cause security
issues.
Eg my website allows users to add certain simple HTML tags to their posts, but
not others. If, however, a user enters this:
<B ONCLICK="window.open('http://www.badsite.com')"
then mozilla will automatically append the closing > and render the next part of
the website's own content with this onclick link.
Admittedly, it's my fault in this case for not having a good enough regexp for
filtering out bad tags (which I've now fixed), but I do wonder whether Mozilla
should be displaying "what an attacker means" rather than "what the designer said".
The following was the vulnerable html cleanup code I had used. I've simplified
$allowed for clarity.
$allowed='\s*\/?\s*(b|i|u|s|pre|tt|ul|ol|li|p|)\s*';
$memo=preg_replace("/<((?!($allowed>))[^<>]*)>/is", "<\\1>", $memo);
Assignee | ||
Comment 18•20 years ago
|
||
--> me since I'm working on this.
Assignee: harishd → mrbkap
Status: REOPENED → NEW
Assignee | ||
Comment 19•20 years ago
|
||
Fixing summary to reflect the bug that this is covering (our awful handling of
unclosed tags and the end of the document).
For the record: start tags disappear altogether, end tags get duplicated.
Summary: view source makes double-quotes disappear → Tags that are not closed by the end of the document do strange things.
Assignee | ||
Comment 20•20 years ago
|
||
bz, rbs, do you have any more information on the stuff here? If you know anyone
who may, please add them to the CC list.
My explanation for this bug turned into a 300 word mini-essay on end-of-file
handling at the tokenizer level, so instead of filling up this bug with it, I'll
attach it (it's also available at:
http://www.owlnet.rice.edu/~mrbkap/bug70828explain.html).
My explanation is pretty long and comprehensive, but the short of it is that I
have a partial patch with a couple of problems left to fix. I'm not sure about
some parts of the patch (see the explanation). Any suggestions on my explanation
or solution for this are welcome!
Also, I can generalize my explanation and put it up as parser documentation (on
EOF handling in the tokenizer) somewhere (since htmlparser documentation is
terribly lacking), I just need to know what format/where to put it.
Assignee | ||
Comment 21•20 years ago
|
||
I'm attaching this to the bug so that if/when I take the explanation down from
the internet, people looking into parser code can still find it here (and don't
receive a 404).
Assignee | ||
Comment 22•20 years ago
|
||
This is a partial fix for this bug. I'm still unsure as to the
nsScanner::IsIncremental()/mIsFinalChunk decision and there are a couple of
(easier) bugs that I'd like to fix in this code before I spend more time on
this bug.
Assignee | ||
Comment 23•20 years ago
|
||
Comment on attachment 161187 [details] [diff] [review]
work in progress
I've decided to go with the existing code's idea and fix this (for the most
part) with nsScanner::IsIncremental(). I'll post a new patch later this week.
Attachment #161187 -
Attachment is obsolete: true
Assignee | ||
Comment 24•20 years ago
|
||
This patch fixes this bug. Basically, because of document.write(),
mIsFinalChunk is useless to us (this is the reason it isn't really used
anywhere in the parser). Basically, I went through and each place we can
receive EOF, and added a check to see if the document was out of data.
This also fixes the bug where the document: "<script>foo;</script " would show
up as "<script>foo;</script</script" in view-source.
I'm not sure if I was just unobservant, but my debug build seems to be viewing
source very slowly (1/2 to 1/4 of a second lag on slashdot.org). I'm unsure if
it's my patch, because when I locally backed it out, I still saw the lag. If
someone could test and see if this does adversely view-source performance, it
would be great.
Assignee | ||
Comment 25•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
rbs, could you take a look at this?
An additional note: the changes to nsScanner::ReadTagIdentifier are because if
we ran out of data before reaching an invalid character, we wouldn't end up
appending what we had already.
Attachment #162547 -
Flags: review?(rbs)
Comment 26•20 years ago
|
||
Note that testing performance in a debug build is very pointless....
Comment 27•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
I think you are better off at this point to find out how to run the parser
regression tests. They will add another degree of confidence. This patch is
invasive and hard to assess. I am witholding my r= pending what you see from
the regression tests.
Attachment #162547 -
Flags: review?(rbs)
Assignee | ||
Comment 28•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
Rerequesting r= after running the parser regression tests.
The only two testcases to change were quote002.html and quote003.html, both of
which exhibit the problem that this bug is trying to fix (unclosed start tag at
the end of the document gets dropped).
Attachment #162547 -
Flags: review?(rbs)
Comment 29•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
r=rbs
Attachment #162547 -
Flags: review?(rbs) → review+
Assignee | ||
Comment 30•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
bz, asking you for sr since I'm touching nsScanner::ReadTagIdentifier. again.
Attachment #162547 -
Flags: superreview?(bzbarsky)
Assignee | ||
Comment 31•20 years ago
|
||
Comment on attachment 162547 [details] [diff] [review]
patch v2
This patch misses a case: the document |<a b| will lose the 'b'. I have a fix
in my tree, but don't have time to attach it now. The patch is pretty much the
same, except for an extra check in CAttributeToken::Consume().
Assignee | ||
Comment 32•20 years ago
|
||
This makes us not lose the last attribute in an unclosed tag.
Assignee | ||
Comment 33•20 years ago
|
||
Comment on attachment 163144 [details] [diff] [review]
patch v2
This should be the final patch. The differences between this patch are all in
CAttributeToken::Consume(). I also fixed some typos and cleaned up some
comments.
Attachment #163144 -
Flags: superreview?(bzbarsky)
Assignee | ||
Updated•20 years ago
|
Attachment #162547 -
Flags: superreview?(bzbarsky)
Assignee | ||
Comment 34•20 years ago
|
||
Comment on attachment 163144 [details] [diff] [review]
patch v2
In an attempt to get this in faster. I'm playing with sr's. Trying roc first.
rbs, I'm asking for r= on this one (for the attribute changes).
Attachment #163144 -
Flags: superreview?(roc)
Attachment #163144 -
Flags: superreview?(bzbarsky)
Attachment #163144 -
Flags: review?(rbs)
Comment 35•20 years ago
|
||
Comment on attachment 163144 [details] [diff] [review]
patch v2
r=rbs
Attachment #163144 -
Flags: review?(rbs) → review+
Attachment #163144 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 36•20 years ago
|
||
Fix checked in (albeit with the wrong bug number in the comment).
Status: NEW → RESOLVED
Closed: 23 years ago → 20 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•