Closed Bug 80345 Opened 24 years ago Closed 24 years ago

frequent crash in nsWindow::UpdateIdle; Trunk & M09 crash [@ 0x00000000 - nsAppShell::Run]

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(5 files)

frequent crash in nsWindow::UpdateIdle

looks like tmp_list->data points to garbage.  this is because the nsWindow
object is not AddRef'd before being put on the update_queue.
Attached patch fixes the problem (deleted) β€” β€” Splinter Review
Keywords: crash
Severity: normal → critical
*** Bug 53704 has been marked as a duplicate of this bug. ***
*** Bug 56086 has been marked as a duplicate of this bug. ***
*** Bug 68589 has been marked as a duplicate of this bug. ***
Keywords: mostfreq
r=dr
Assignee: karnaze → darin
Doesn't UpdateIdle need to Release each window, too -- before it calls
g_slist_free on old_queue?

/be
Looks reasonable to me, so sr=waterson if you'll respond to brendan's comments.
brendan: you're correct.  i checked it out and with my fix, we leak windows. 
when i add a Release after window->Update(), i seem to get the reference count
balanced, but we crash in layout almost immediately after clicking on a link.
Darin, how about attaching that crash stack and some of the scene-of-the-crime
variables in registers and nearby memory?

/be
Oh yeah, blizzard's out for the weekend, I'm virtual blizzard, but I'm sure he
will want to be cc'd so he can catch up and maybe help us.

/be
Brendan: check out bug 56086 (dup of this) for a stack and some
"scene-of-the-crime" variables.
dr: I saw that, but I need to see where layout croaks with Darin's attempt to
Release in Update Idle.  I'm also curious whether his patch releases during the
tmp_list walk, or in a separate pass, just before the g_slist_free call.]

/be
sorry for not being very responsive... here's the patch i have most recently
tested.
Attached patch second attempt at fixing the problem. (deleted) β€” β€” Splinter Review
i confirmed that this patch gets the refcount properly balanced; however, it
seems that if the final Release happens from within UpdateIdle, then we crash
while destroying the window.  i'll attach a stack trace just as soon as i have
a linux build.
it seems to me that if the window cannot be destroyed from UpdateIdle (for
whatever reason), then my attempt at fixing this bug has been misguided.  we
should probably be looking to simply ensure that the window is removed from the
update_queue in its destructor.  in fact, i bet this would be the correct patch.

once i have a linux build, i'll go give that patch a try.
bug 53704 has the correct diagnosis of the problem.  the nsWindow instance
is being added to the update_queue multiple times, but only removed once.
i'm not sure yet how the window could be added to the list multiple times,
since it should be controlled by the mIsUpdating member variable.  hmm...
dr says this is the bug I've been seeing (crash on hitting back/forward) in the
past hour, ccing self.
OK.  Looking at this.
added topcrash keyword and Trunk & M09 crash [@ 0x00000000 - nsAppShell::Run] to 
summary for tracking (from dupped bug 56086)
Keywords: topcrash
Summary: frequent crash in nsWindow::UpdateIdle → frequent crash in nsWindow::UpdateIdle; Trunk & M09 crash [@ 0x00000000 - nsAppShell::Run]
Blocks: 79726
Since I don't see this crash I wonder if it's related to the version of
glib/gtk.  Maybe there's a bug in older version of g_slist_remove() or something.
yes exactly!  in fact, i'm only seeing this under gtk/glib-1.2.8.  with 1.2.9
i can't reproduce it.
turns out there are no differences b/w gslist.c in 1.2.8 and 1.2.9, so it is
unlikely that it is this code that is causing the problem.  i'm also seeing
crashes from g_idle_dispatch, making a callback to some garbage address.

Here's the differences between gmain.c in 1.2.8 and 1.2.9:


--- ../glib-1.2.8/gmain.c   Thu May 18 22:58:48 2000
+++ gmain.c Mon Feb 26 22:00:21 2001
@@ -1134,6 +1134,8 @@
   else
     newrec = g_chunk_new (GPollRec, poll_chunk);

+  /* This file descriptor may be checked before we ever poll */
+  fd->revents = 0;
   newrec->fd = fd;
   newrec->priority = priority;

@@ -1360,7 +1362,7 @@
         GTimeVal *dispatch_time,
         gpointer user_data)
 {
-  GSourceFunc func = source_data;
+  GSourceFunc func = (GSourceFunc) source_data;

   return func (user_data);
 }
@@ -1373,7 +1375,7 @@
 {
   g_return_val_if_fail (function != NULL, 0);

-  return g_source_add (priority, FALSE, &idle_funcs, function, data, notify);
+  return g_source_add (priority, FALSE, &idle_funcs, (gpointer) function, data,
notify);
 }

 guint
Darin, Blizzard: We had this discussion in 56086 -- I had the same thought, but
it turned out to be incorrect. The crash *is* reproducible using gtk/glib 1.2.9.

Here's how to reproduce the crash:
  1: mozilla -g (necessary for the obvious reason of getting debug info, but
     also to slow your computer down enough for the timing to work out just
     right).
  2: Load all the items in Debug->Verification (I find loading all the items
     from Mozilla to Mirabilis is good enough of a history list).
  3: Hit back and forward a lot. You don't need to hit them fast. In fact,
     when jrgm showed me how to repro this, he seemed to patiently wait for
     the pages to partially load.
  4: Boom.

Darin: want to accept this bug?...
dan: thanks for the info.  makes sense since there's hardly any difference b/w
glib 1.2.8 and 1.2.9, and also b/c people are reporting this with glib-1.2.10!
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
Attached patch fixes the problem for real (deleted) β€” β€” Splinter Review
turns out the problem was that calling Update on a nsWindow from the UpdateIdle
handle could cause other windows to be added to the update_queue.  this is
bad when a window is added to the queue that is already on the queue, but not
yet processed.  my patch fixes this by first iterating over the update_queue and
clearing the mIsUpdating flags.  then it makes a second pass to actually call
window->Update(), so that if windows get added to the update_queue, their 
mIsUpdating flag will not be incorrectly set.  (hope my explanation makes sense).
Looks good to me. I'll r= it after I put it through my tests :)

BTW, one thing you might want to do (and I'd like to see this in the patch,
actually) is add a comment to nsIWidget.idl, mentioning this bug number, saying
that calling |update| can cause other widgets to update as well.

Thanks!
Attached patch adding some comments to nsIWidget.idl (deleted) β€” β€” Splinter Review
r/sr=blizzard on both patches.

Thanks for finding this, Darin!
r=dr. check the sucka in! thanks again!
Keywords: patch
fix checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Marking verified in the May 21 build.
Status: RESOLVED → VERIFIED
Crash Signature: [@ 0x00000000 - nsAppShell::Run]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: