Closed Bug 326645 Opened 19 years ago Closed 18 years ago

Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks]

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.9alpha1

People

(Reporter: martijn.martijn, Assigned: bzbarsky)

References

Details

(4 keywords, Whiteboard: may have exposed bug 340796)

Crash Data

Attachments

(3 files, 3 obsolete files)

See upcoming testcase, which crashes Mozilla after a second or so.
It also crashes Mozilla1.7.12.
Unfortunately, it's not a minimised testcase (which I haven't been able to make yet).

Talkback ID: TB14952503H

Somehow, it doesn't crash in my debug build, it just stops after the 'failed to load URL' assertion.
Attached file testcase (deleted) —
Summary: Crash when changing enumerated properties of iframe → Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks]
The root cause of the crash is:

(gdb) frame 3
#3  0xb733e769 in nsContentList::PopulateSelf (this=0x884f420, aNeededLength=4294967295)
    at ../../../../mozilla/content/base/src/nsContentList.cpp:869
869           PopulateWith(root, PR_TRUE, elementsToAppend);
(gdb) p mDocument
$11 = (class nsIDocument *) 0x8746660
(gdb) p mDocument->GetRootContent()
$12 = (class nsIContent *) 0x8709228
(gdb) p mDocument->GetRootContent()->GetOwnerDoc()
$13 = (class nsIDocument *) 0x8746660
(gdb) p mDocument->GetRootContent()->GetDocument()
$14 = (class nsIDocument *) 0x0

That should not happen -- $14 should be the same as $13 and $11.  I'll see whether I can add some asserts that would let me catch this when the mismatch first appears.
Attached patch Proposed fix (obsolete) (deleted) — Splinter Review
So nsDocument::Destroy was putting us into this bogus state...  This patch fixes that; the rest is window-dressing.
Assignee: general → bzbarsky
Status: NEW → ASSIGNED
Attachment #211470 - Flags: superreview?(peterv)
Attachment #211470 - Flags: review?(bugmail)
OS: Windows XP → All
Priority: -- → P2
Hardware: PC → All
Summary: Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks] → [FIX]Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks]
Target Milestone: --- → mozilla1.9alpha
Comment on attachment 211470 [details] [diff] [review]
Proposed fix

>Index: content/base/src/nsDocument.h
...
> #ifdef DEBUG
>-  void VerifyRootContentState();
>+  // aAllowBogusChild is kinda needed by nsDocument::Destroy and ~nsDocument.
>+  void VerifyRootContentState(PRBool aCompareToChildArray = PR_FALSE);
> #endif

You call it both "aAllowBogusChild" and "aCompareToChildArray" here. And neither is IMHO very descriptive :)

How about "aIgnoreMismatchedRoot"?


>Index: content/base/src/nsDocument.cpp
>@@ -743,26 +743,39 @@ nsDocument::~nsDocument()
...
>+      // Note: we want neither DOM mutations nor notifications happening here
>+      // (the latter wouldn't matter anyway, since we've cleared our
>+      // observers).
>       PRUint32 count = mChildren.ChildCount();
>       for (indx = PRInt32(count) - 1; indx >= 0; --indx) {
>+        nsIContent* content = mChildren.ChildAt(indx);
>+        if (content == mRootContent) {
>+          // Null out mRootContent first; this is similar to what
>+          // RemoveChildAt() does.
>+          mRootContent = nsnull;
>+        }
>         mChildren.ChildAt(indx)->UnbindFromTree();
>         mChildren.RemoveChildAt(indx);
>       }
>     }
>   }
> 
>   mRootContent = nsnull;

This shouldn't be needed any more, no? Especially since we should be able to remove the GetCurrentDoc() check just ouside the context.

And it really looks like we should remove the mRootContent check too. We always want to unbind all root content even if there's no documentElement.

>+#ifdef DEBUG
>+  VerifyRootContentState(PR_TRUE);
>+#endif

Why does this need to be PR_TRUE?

r=me with that fixed or explained.
Attachment #211470 - Flags: review?(bugmail) → review+
(In reply to comment #4)
> This shouldn't be needed any more, no? Especially since we should be able to
> remove the GetCurrentDoc() check just ouside the context.
> 
> And it really looks like we should remove the mRootContent check too. We always
> want to unbind all root content even if there's no documentElement.

We do unbind all content, the nsAttrAndChildArray destructor takes care of that. We can't rely on it for elements because it passes PR_FALSE for aDeep to UnbindFromTree. Maybe we should make nsAttrAndChildArray::Clear take an aDeep argument and call it with PR_TRUE here? Calling RemoveChildAt is more expensive because it memmove's every time.

> >+#ifdef DEBUG
> >+  VerifyRootContentState(PR_TRUE);
> >+#endif
> 
> Why does this need to be PR_TRUE?

I don't understand why we need aCompareToChildArray/aAllowBogusChild anyway, since we now always seem to null out mRootContent?
> I don't understand why we need aCompareToChildArray/aAllowBogusChild anyway,

Because it's there to assert that either mRootContent is null or it has the right current document.... I guess I could just assert that manually both in the relevant spots and in VerifyRootContentState and remove the boolean arg.

Passing a boolean to nsAttrAndChildArray::Clear sounds reasonable to me.  Sicking, what do you think?
I was actually just referring to removing the |mRootContent = nsnull|. As well as removing the |if|s around the loop. But passing a bool to nsAttrAndChildArray::Clear seems ok to me.
Assignee: bzbarsky → general
Status: ASSIGNED → NEW
Assignee: general → bzbarsky
So the problem with just doing Clear() is that we want to unset mRootContent _before_ we do this stuff.  Think mutation event listeners that trigger during this code.  :(
(In reply to comment #8)
> So the problem with just doing Clear() is that we want to unset mRootContent
> _before_ we do this stuff.  Think mutation event listeners that trigger during
> this code.  :(

And setting mRootContent to nsnull before calling Clear wouldn't work?
I'm not sure it would; it would produce very different ordering from what normally happens (eg when PIs are being unbound mRootContent would already be null, etc).

If you think it should work, please take the bug?  I'm not going to be able to work on this in the foreseeable future (till summer?) in any case.  :(
Attached patch Alternative patch (obsolete) (deleted) — Splinter Review
This fixes the inconsistent state by removing mRootContent. I removed mBodyContent while I was at it.
Attachment #214276 - Flags: superreview?(bzbarsky)
Attachment #214276 - Flags: review?(bzbarsky)
Comment on attachment 214276 [details] [diff] [review]
Alternative patch

>Index: content/base/src/nsDocument.cpp
> nsDocument::InsertChildAt(nsIContent* aKid, PRUint32 

>   if (aKid->IsContentOfType(nsIContent::eELEMENT)) {
>+    if (GetRootContent()) {

Why not combine with && into a single if?

>Index: content/base/src/nsDocument.h

>+  nsIContent* GetRootContent() const;

  virtual nsIContent* GetRootContent() const;

>Index: content/html/document/src/nsHTMLDocument.cpp
>@@ -2434,41 +2419,42 @@ nsHTMLDocument::GetElementById(const nsA

>+    nsIContent* root = GetRootContent();
>     if (!IdTableIsLive()) {

I don't like that.  This will make us GetRootContent even when the table is live (in which case we _know_ there's nothing for us here, and the root content is only needed in a debug block.  So I'd put this GetRootContent() call inside the !IdTableIsLive() block.  And in the debug block further down, just reget it -- what do we care about an extra debug-only call?  ;)

> nsHTMLDocument::GetBodyContent()
>+  // Loop backwards because any non-elements, such as doctypes and PIs
>+  // are likly to appear before the root element.

s/likly/likely/

>+  for (i = mChildren.ChildCount(); i > 0; --i) {
>+    nsIContent* html = mChildren.ChildAt(i - 1);
>+    if (html->Tag() == nsHTMLAtoms::html &&
>+        html->IsContentOfType(nsIContent::eHTML)) {

Why not just GetRootContent() and then see whether it's the right thing?  For that matter, why are we adding the check that this is actually <html>?  We didn't use to do that...

>+      // Look for body inside html
>+      for (i = html->GetChildCount(); i > 0; --i) {

This reversed the search, so if there are two bodies there the answer changes from what it is now.  I realize this is not likely, but still, I'd prefer we didn't change that.

>Index: content/html/document/src/nsImageDocument.cpp
>@@ -656,17 +656,17 @@ nsImageDocument::CheckOverflowing(PRBool
>+  nsIContent* content = GetBodyContent();
>   nsRefPtr<nsStyleContext> styleContext =
>     context->StyleSet()->ResolveStyleFor(content, nsnull);

Add a null-check while you're here?  There's no guarantee that GetBodyContent() returns non-null.

>Index: content/svg/content/src/nsSVGSVGElement.cpp
>+  PRBool IsRoot() {
>+    return IsInDoc() && !GetParent();

Add an assert that this is the same as being equal to GetRootContent()?

r=bzbarsky, but I'd like jst to ok the removal of mBodyContent...
Attachment #214276 - Flags: superreview?(jst)
Attachment #214276 - Flags: superreview?(bzbarsky)
Attachment #214276 - Flags: review?(bzbarsky)
Attachment #214276 - Flags: review+
Comment on attachment 214276 [details] [diff] [review]
Alternative patch

Fine with me. sr=jst
Attachment #214276 - Flags: superreview?(jst) → superreview+
Sicking, this is all yours.  ;)
Assignee: bzbarsky → bugmail
Status: ASSIGNED → NEW
Priority: P2 → --
Summary: [FIX]Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks] → Crash when changing enumerated properties of iframe [@ nsHTMLDocument::MatchLinks]
Target Milestone: mozilla1.9alpha → ---
Attachment #211470 - Flags: superreview?(peterv) → superreview-
Flags: blocking1.9a1+
(In reply to comment #12)
> >+  for (i = mChildren.ChildCount(); i > 0; --i) {
> >+    nsIContent* html = mChildren.ChildAt(i - 1);
> >+    if (html->Tag() == nsHTMLAtoms::html &&
> >+        html->IsContentOfType(nsIContent::eHTML)) {
> 
> Why not just GetRootContent() and then see whether it's the right thing?  For
> that matter, why are we adding the check that this is actually <html>?  We
> didn't use to do that...

I added the check for html on purpose because IMHO that's the behaviour we want. There's another change in behaviour in this patch. We now honor non-xhtml bodies in xhtml docs (as well as the other way around). I talked with jst and he agreed that's what we want.

> >+      // Look for body inside html
> >+      for (i = html->GetChildCount(); i > 0; --i) {
> 
> This reversed the search, so if there are two bodies there the answer changes
> from what it is now.  I realize this is not likely, but still, I'd prefer we
> didn't change that.

Yeah, I agree, i changed it to loop forward.

> >Index: content/html/document/src/nsImageDocument.cpp
> >@@ -656,17 +656,17 @@ nsImageDocument::CheckOverflowing(PRBool
> >+  nsIContent* content = GetBodyContent();
> >   nsRefPtr<nsStyleContext> styleContext =
> >     context->StyleSet()->ResolveStyleFor(content, nsnull);
> 
> Add a null-check while you're here?  There's no guarantee that 
> GetBodyContent() returns non-null.

Not really sure what the right behaviour is if there's no body. I simply returned NS_OK
Status: NEW → ASSIGNED
Checked in, thanks for reviews
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
The decl of VerifyRootContentState in nsDocument.h should go away too, no?
This caused regression bug 330302 (probably by changing the behavior of nsHTMLDocument::GetBody somehow).
Depends on: 330302
So this patched caused bug 330302 as well as a slight performance regression and so the patch is currently backed out.

The performance issue might be slightly helped by bug 331530.

A bigger problem is bug 330302 though. I'll talk with bryner about why we even have a nsDocument::Destroy function, it was added for the bfcache.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Slightly better (obsolete) (deleted) — Splinter Review
This contains some slight modifications to save a few cycles, though probably not enough to make a difference.
Flags: blocking1.8.1?
So we should really fix this, one way or another.   See also bug 335896 (which is basically caused by the same problems)...
Blocks: 335896
Flags: blocking1.8.0.5+
bz: So I think we should go with your patch here. Please check in whenever you can so that maybe we can get it on the 1.8.0 branch.
Attached patch Minimal patch, updated to tip (deleted) — Splinter Review
Let's do the minimal patch that will fix this, then, and file a followup on making the world better...
Assignee: bugmail → bzbarsky
Attachment #211470 - Attachment is obsolete: true
Attachment #214276 - Attachment is obsolete: true
Attachment #216201 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Attachment #224627 - Flags: superreview?(bugmail)
Attachment #224627 - Flags: review?(bugmail)
Attachment #224627 - Flags: approval1.8.0.5?
Attachment #224627 - Flags: approval-branch-1.8.1?(bugmail)
Comment on attachment 224627 [details] [diff] [review]
Minimal patch, updated to tip

Please land this as soon as possible so that we get some bake time before 1.5.0.5
Attachment #224627 - Flags: superreview?(bugmail)
Attachment #224627 - Flags: superreview+
Attachment #224627 - Flags: review?(bugmail)
Attachment #224627 - Flags: review+
Attachment #224627 - Flags: approval-branch-1.8.1?(bugmail)
Attachment #224627 - Flags: approval-branch-1.8.1+
Fixed on trunk and 1.8 branch.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9alpha
Verified FIXED using the testcase https://bugzilla.mozilla.org/attachment.cgi?id=211374 on a SeaMonkey trunk 2006-06-08-09 build under Windows XP.
Status: RESOLVED → VERIFIED
Comment on attachment 224627 [details] [diff] [review]
Minimal patch, updated to tip

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #224627 - Flags: approval1.8.0.5? → approval1.8.0.5+
Blocks: 341315
This probably exposed bug 340796.  I'm holding off on landing this on the 1.8.0 branch until I get an approved fix for that...
Depends on: 340796
Whiteboard: may have exposed bug 340796
*** Bug 341315 has been marked as a duplicate of this bug. ***
No longer blocks: 341315
*** Bug 341050 has been marked as a duplicate of this bug. ***
Depends on: 341730
This looks like it also exposed bug 341730...

I'm going to try to come up with a completely safe null-check-only patch for the 1.8.0.x branch here.
OK, the null-check thing does not help that much on the 1.8.0 branch.  It does fix this crash, but then we crash messing with what looks like a deleted nsAttributeChildList from JS(???).  I get that result even with the patch in this bug (which _does_ fix the crash on trunk).

Is it still worth getting the null-check in on 1.8.0.5?
So it might be worth taking it on the 1.5.0.5 tree since we could maybe fix the later crash in a further release. I'm not sure if it's worth it though given that we're transitioning to FF2 in a not too distant future.
Again, this does not fully fix things on branch; if fixes the crash that used to happen, but now another one happens...
Attachment #226222 - Flags: superreview?(bugmail)
Attachment #226222 - Flags: review?(bugmail)
Attachment #226222 - Flags: approval1.8.0.5?
Comment on attachment 226222 [details] [diff] [review]
Null-check-only patch -- for 1.8.0.x branch only

We decided not to take this since the current crash is an unexploitable null-pointer deref, whereas the new crash is messing with a deleted object which could be worse.
Attachment #226222 - Flags: approval1.8.0.5? → approval1.8.0.5-
Attachment #224627 - Flags: approval1.8.0.5+ → approval1.8.0.5-
If we can work out the regressions maybe this can make 1.8.0.6
Flags: blocking1.8.0.6?
Flags: blocking1.8.0.5-
Flags: blocking1.8.0.5+
Pushing this out to 1.8.0.6 in case we get a fix for this and the later crashes. Though I'm not sure how vital this is since it seems like this is "just" a nullpointer deref caused by fuzzing.
OK, makes sense.  For what it's worth, I see the same crash on 1.8.x as on 1.8.0.x...  Bug 342085 filed on that; I'll try to come up with a patch there.
Blocks: 342085
Comment on attachment 226222 [details] [diff] [review]
Null-check-only patch -- for 1.8.0.x branch only

OK, with the "null-check-only patch" and the patch for bug 342085 (which should be really really safe) I no longer crash on this testcase on both 1.8 and 1.8.0 branches.  Given that, I think I'd like to rerequest 1.8.0.5 approval on the null-check.
Attachment #226222 - Flags: approval1.8.0.5- → approval1.8.0.5?
Comment on attachment 226222 [details] [diff] [review]
Null-check-only patch -- for 1.8.0.x branch only

great! r/sr=sicking
a=sicking for drivers
Attachment #226222 - Flags: superreview?(bugmail)
Attachment #226222 - Flags: superreview+
Attachment #226222 - Flags: review?(bugmail)
Attachment #226222 - Flags: review+
Attachment #226222 - Flags: approval1.8.0.5?
Attachment #226222 - Flags: approval1.8.0.5+
Fixed on the 1.8.0 branch.  Thanks for the speedy review!
Keywords: fixed1.8.0.5
verified with:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060620 Firefox/1.5.0.5
Flags: blocking1.8.0.7?
Flags: blocking1.8.1?
Crash Signature: [@ nsHTMLDocument::MatchLinks]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: