Closed
Bug 18480
Opened 25 years ago
Closed 24 years ago
bad html crashes mozilla
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
M16
People
(Reporter: jdaly, Assigned: harishd)
References
()
Details
(Keywords: crash, testcase, Whiteboard: [nsbeta2+] Hack in hand; Omitting ending tags should be avoided)
Attachments
(2 files)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
i'm running mozilla pulled from cvs on 9 nov 1999 on linux. loading the page
above causes mozilla to crash. granted, it's bad html (no font closing tags),
but that shouldn't crash the browser, right? the page loads fine in netscape
4.7.
My pull from this morning isn't crashing. I'm on NT, though. Marking WORKSFORME,
but if you still see it in todays build then a stack trace would be helpful
I do see a stack overflow when I try and load a different URL from that window.
Reassigning to Parser, to see if they can do something about making the content
model more sensible. There are no closing <FONT> tags and so it looks like
everything just ends up nested and we have a ridiculously nested content model
Per spec. start and end tags are mandatory for FONT but FONT nesting is not
illegal. Gecko parser does support this behavior. So, the problem seems to be
in the layout land....Troy!! :)
Back to troy@netscape.com
The problem is not really in layout. It's in the content model...
Rick, Kipp was the content model guy. Who owns it now?
Here's one for you, since you've cheerfully accepted ownership of content model
land.
Updated•25 years ago
|
Severity: normal → major
Comment 7•25 years ago
|
||
I monitored the memory usage while loading documents with "open" tags - it
seems mozilla (and viewer) are not releasing memory properly after loading. So,
even if a bad HTML does not crash mozilla, it will definitely crash after the
HTML gets loaded a few times.
Updated•25 years ago
|
Severity: major → critical
Component: Layout → Parser
OS: Linux → All
Hardware: PC → All
Comment 9•25 years ago
|
||
UPdating severity to critical since a lot of sites are crashing mozilla just
because a page does not close a tag. Also, updaing platform and OS to all, and
component to parser.
Comment 10•25 years ago
|
||
In bug 20087 I reported Mozilla crashing on ODP edit screens. ( Example:
http://www.math.berkeley.edu/~zach/edit.html ) That page has a missing </small>
tag; however, inserting the </small> tag does not help (
http://www.math.berkeley.edu/~zach/editd.html ). A <td></td> pair are also
missing (the subject of 20087), which may or may not be relevant here.
Comment 11•25 years ago
|
||
Another pointer: Go to http://linuxtoday.com/story.php3?
sn=12842&showcomments=flat ; boom - crash. This page has a missing </TABLE> tag.
In fact, so do all pages generated by story.php3 with showcomments=flat. So
effectively you cannot read any user comment in linuxtoday.
Comment 12•25 years ago
|
||
In an attempt to get my bug list in order again, marking all the bugs I have
currently as ASSIGNED.
Comment 13•25 years ago
|
||
Comment 14•25 years ago
|
||
This bug shows up when enough open <font> tags are used. On my pc
(PII/400Mhz/64Mb RAM), this is at +/- 2500 open tags. The closing <BODY> and
<HTML> tags do not matter in this bug.
Whiteboard: [MAKINGTEST] cheezycrust@atmosphere.be → [TESTCASE] Omitting ending tags should be avoided
Comment 15•25 years ago
|
||
The crash is a result of us blowing the stack during frame construction (Troy
has seen a similar crash on page unloading in nsGenericElement::SetDocument)
because the tree is too deep. It seems like we have the following options:
1) Identify all places where recursive processes need to track the depth of
recursion and either bale out or switch to an iterative equivalent if we go too
deep. Considering the many places we use recursion, this may not be practical.
2) Restrict the depth of the tree in the content sink. This may result in
incorrect layout in pathalogical cases. It turns out that we may actually have
reasonably correct layout in the penguin case if we just flatten in the failure
case.
3) Special-case certain tags (such as FONT) that we believe may be used
incorrectly (as it is in this case) and only allow a few levels of nesting.
For simplicitly, I favor the second option. Troy, Harish - comments?
Comment 16•25 years ago
|
||
Bulk moving [testcase] code to new testcase keyword. Sorry for the spam!
Keywords: testcase
Assignee | ||
Comment 18•25 years ago
|
||
*** Bug 26034 has been marked as a duplicate of this bug. ***
Comment 19•25 years ago
|
||
*** Bug 1070 has been marked as a duplicate of this bug. ***
Comment 20•25 years ago
|
||
*** Bug 1070 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Target Milestone: M16
Comment 21•25 years ago
|
||
Here's another real world test link:
<A HREF =
"http://www.iso.port.ac.uk/docs/downloaded/pike-tutorial/tutorial_onepage.html">
http://www.iso.port.ac.uk/docs/downloaded/pike-tutorial/tutorial_onepage.html
</a>. In this one, the problem seems to be the nested A NAME and A TYPE tags.
Comment 22•25 years ago
|
||
*** Bug 31008 has been marked as a duplicate of this bug. ***
Comment 23•25 years ago
|
||
Another example: http://gostanford.com/ crashes. but the same paged with /FONT
tags added does not: http://deathstar.stanford.edu/~adastra/gostanford.html
Comment 24•25 years ago
|
||
*** Bug 31550 has been marked as a duplicate of this bug. ***
Comment 25•25 years ago
|
||
*** Bug 18168 has been marked as a duplicate of this bug. ***
Comment 26•25 years ago
|
||
*** Bug 31008 has been marked as a duplicate of this bug. ***
Comment 27•25 years ago
|
||
comments carried over from bug 31008
------- Additional Comments From Roland.Mainz@informatik.med.uni-giessen.de
2000-03-25 08:27 -------
Loading the URL http://www.autsch.de/bitsteller/linux_penguin.html is very slow
on my Solaris 2.7/MU4 SPARC system.
And trying to select some text at the bottom takes 20secs or more (Ultra
5/333MHz/2MB cache).
Verified it on my Linux system - compared to Netscape 4.x it is VERY slow ;-(
Comment 28•25 years ago
|
||
bug 37618 shows a side-effect of one unclosed font-tag in combination with
<select> <option>
Comment 29•24 years ago
|
||
Nominating nsbeta2 because we've seen this bug on linuxtoday.com, and penguin
graphics all over the place such as http://rio.dhs.org/penguin.html (which seems
to be a popular page as it's a frequent DUP), *especially* because it crashes on
http://gostanford.com/ (my alma mater--Berkeley site bugs can be marked WONTFIX
;-> ), and above all because missing end tags are common on the web.
Keywords: nsbeta2
Comment 30•24 years ago
|
||
*** Bug 39163 has been marked as a duplicate of this bug. ***
Comment 31•24 years ago
|
||
Adding mostfreq as keyword: Per today there are 11 "nested duplicates" of this
bug, and it's a crasher, originally reported in December 1998. (bug 1070)
I think it deserves the attention it can get.
Keywords: mostfreq
Comment 32•24 years ago
|
||
Putting on [nsbeta2+] radar.
Whiteboard: [TESTCASE] Omitting ending tags should be avoided → [nsbeta2+][TESTCASE] Omitting ending tags should be avoided
Comment 33•24 years ago
|
||
Taking this bug off Vidurs list.
Assignee: vidur → jst
Status: ASSIGNED → NEW
Comment 34•24 years ago
|
||
I have a fix for this in my tree, the fix is to not let the content model get
too deep, if it does we stop opening containers. The structure of resulting page
will be incorrect, but if we get into a situation like this the input data is
invalid anyway. Does this sound like a reasonable safeguard against crappy HTML?
Status: NEW → ASSIGNED
Comment 35•24 years ago
|
||
That sounds fine... Do you decide if it is "too deep" based on a hardcoded
depth or based on free memory? I would very very strongly argue that we should
go for a memory-based threshold rather than a fixed number. If we limit it to
a fixed depth like 1024 then we just _know_ that someone will legitimately need
us to support 1025 nested containers and have plenty of free memory for it...
Whiteboard: [nsbeta2+][TESTCASE] Omitting ending tags should be avoided → [nsbeta2+] Omitting ending tags should be avoided
Comment 36•24 years ago
|
||
The amount of free memory is actually irrelevant (on some platforms at least)
here, we don't crash because we run out of memory, we crash due to a stack
overrun.
IMO setting a hardcoded limit is ok here since we can set the limit quite high,
like 1024 like you suggested, if someone writes valid HTML that nests that deep,
they have bigger problems than Mozilla not displaying it correctly ;-)
My guess is that an average HTML document doesn't nest deeper than 10-20 levels
deep so setting the limit at something like 1024 would leave plenty of room
for missing end tags that occasionally don't get closed by the parser.
I know one commersial SGML parser that has a hardcoded nesting depth limit as
low as 48, and that's plenty deep in most cases, I'm not saying we should set
our limit that low, but normally that should be enough.
Comment 37•24 years ago
|
||
I would vote for a limit which is calculated from the file size. This would
catch both cases: Valid and deeply nested HTML in large files and crappy HTML in
small files...
Comment 38•24 years ago
|
||
Johnny: Ok, point taken. Is there no way we can detect when we are hitting the
top of the stack? I'm just a little worried that whatever limit we pick may be
too high for some architectures to which we are ported (or equally, much
smaller than a particular architecture could easily cope with).
Roland: We don't (always) no the size of the page before we start parsing it.
Think of a page generated by a script which just outputs "<FONT>" continuously.
We need to be able to cope with this worse case scenario without crashing.
Comment 39•24 years ago
|
||
Ian, no, there's no reliable XP way to detect when we're about to run out of
stack space AFAIK. There are hacks that do a pretty good job at this, but
there's no way to tell for sure.
Besides, the stack overflow may occure in different places as you see in Vidurs
comment above so we'd haveto add this stack overflow detection hack in many
places. It's not in the content construction code, it's later on, and it can IMO
happen in quite a few (possible still unidentified) places within layout.
IMO we should simply hardcode the limit as high as we can without causing an
stack overrun, that's the highest we can go, period.
It doesn't matter how big or small the file is, just because the file is
big/small doesn't mean we can nest deeper, we would still fill the stack.
We might want to make the limit platform specific at some point since the stack
size warries from platform to platform.
Comment 40•24 years ago
|
||
> I have a fix for this in my tree, the fix is to not let the content model get
> too deep, if it does we stop opening containers. The structure of resulting
> page will be incorrect, but if we get into a situation like this the input
> data is invalid anyway.
I think if you hit the limit than instead of not opening the new container, you
could maybe close the last contaniner and open the new one.
This would have slightly better result, when the limit is hit.
Also from the outside it looks like the bottleneck is not laying out a deep
tree, the "residual style" handler fails long before that.
I think the residual style handling should be stopped at a lower limit.
This way, opening block level elements above this limit could kill unclosed
inline elements, so you get a more shallower tree and possibly won't hit the
"hard" limit on most of the faulty pages.
---
I do think that the limits should be hardcoded XP,
because layout should behave XP no matter what...
Having some (crappy I admit) pages breaking on Windows but non on Linux because
a different (or worse autodetected) limit, will just drive QA people nuts.
Comment 41•24 years ago
|
||
Why not try to avoid stack problems by using CPS (continuations-passing style)
instead of recursive calls?
Comment 42•24 years ago
|
||
Bajusz, what you're suggesting probably result in a better content model, and it
might be worth doing. Howevr, this fix is really only a last effort to avoid a
crash, and thus I would like to keep it as simple and maintainable as possible.
Making sense of whatever HTML we're fed is really up to the parser (ie
normalizing the HTML), this fix is here for one reason only, to prevent the
crash.
Aleksey, that's one possibility and I think that's been discussed, but it's not
really an option in this stage of the development cycle, the change is not
trivial, to say the least. The point here is that we shouldn't be fed a content
model deep enough to overflow the stack any case, and if we are we simply stop
opening the containers until we start closing them again...
Updated•24 years ago
|
Whiteboard: [nsbeta2+] Omitting ending tags should be avoided → [nsbeta2+][jst] Omitting ending tags should be avoided
Comment 43•24 years ago
|
||
Has this been fixed? I cannot make Linux build 2000060908 (M17) crash with any
of the testcases listed here. Some of them are slow, but no crashy crashy. If
this is fixed, it would make jst a little less doomed for M16.
Comment 44•24 years ago
|
||
Nope, this is not fixed ASAP, you probably need a lot more than 2500 unclosed
font tags today, try 50000 and you'll most likely crash.
I'm handing this over to harishd (per agreement), I'll attach my fix, the
nesting depth limit is still just hardcoded in the patch...
is still
Assignee: jst → harishd
Status: ASSIGNED → NEW
Comment 45•24 years ago
|
||
Updated•24 years ago
|
Whiteboard: [nsbeta2+][jst] Omitting ending tags should be avoided → [nsbeta2+] Omitting ending tags should be avoided
Whiteboard: [nsbeta2+] Omitting ending tags should be avoided → [nsbeta2+] Hack in hand; Omitting ending tags should be avoided
Comment 46•24 years ago
|
||
If we do add such a hack, then make sure it stays on the contentsink side, and
not in the parser. The parser is agnostic about these things (you can imagine
using the parser outside of gecko -- where the stack depth is not an issue).
Comment 47•24 years ago
|
||
Yes, the hack is completely in the content sink, completely independent of the
parser.
Comment 48•24 years ago
|
||
M16 has been out for a while now, these bugs target milestones need to be
updated.
Assignee | ||
Comment 49•24 years ago
|
||
Ya..ya. I just checked the *hack* in. Milestone wouldn't matter anymore.
Thanx for bringing it up though :)
Status: NEW → ASSIGNED
Assignee | ||
Comment 50•24 years ago
|
||
FIXED.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 24 years ago
Resolution: --- → FIXED
Comment 51•24 years ago
|
||
With the June 26th build (2000062608), I'm not reproduce the crash as described.
Marking verified fixed.
Status: RESOLVED → VERIFIED
Comment 52•23 years ago
|
||
*** Bug 15918 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•