Closed Bug 143664 Opened 23 years ago Closed 22 years ago

(timers) Crash killing a popup window (in pt_PostNotifies?) [nsWebShellWindow::FirePersistenceTimer, PR_Unlock, pthread_cond_signal, __pthread_lock]

Categories

(Core :: XPCOM, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME
mozilla1.5alpha

People

(Reporter: mozilla-bugs, Assigned: danm.moz)

References

()

Details

(Keywords: crash)

Attachments

(1 file)

Reproducible: not sure I've had Mozilla (BuildID 2002050205 trunk) crash on RedHat Linux 7.2 when I went to http://weather.yahoo.com/forecast/USNY0717_c.html (using a Personal Toolbar bookmark) and then killed (via a WM's "X") the pop-up ad window when it have just started loading. gdb says: #0 __pthread_lock (lock=0x5, self=0x0) at spinlock.c:84 #1 0x40221060 in pthread_cond_signal (cond=0x5) at condvar.c:241 #2 0x40209e8c in pt_PostNotifies () from /usr/lib/libnspr4.so #3 0x4020a04b in PR_Unlock () from /usr/lib/libnspr4.so #4 0x407e23df in nsWebShellWindow::FirePersistenceTimer () from /usr/lib/mozilla/components/libnsappshell.so #5 0x4018ef3b in nsTimerImpl::Fire () from /usr/lib/libxpcom.so #6 0x4018f06f in handleTimerEvent () from /usr/lib/libxpcom.so #7 0x4018aa82 in PL_HandleEvent () from /usr/lib/libxpcom.so #8 0x4018af26 in PL_ProcessEventsBeforeID () from /usr/lib/libxpcom.so #9 0x407fee1f in processQueue () from /usr/lib/mozilla/components/libwidget_gtk.so #10 0x40156111 in nsVoidArray::EnumerateForwards () from /usr/lib/libxpcom.so #11 0x407fee66 in nsAppShell::ProcessBeforeID () from /usr/lib/mozilla/components/libwidget_gtk.so #12 0x408062fa in handle_gdk_event () from /usr/lib/mozilla/components/libwidget_gtk.so #13 0x40379d7f in gdk_event_dispatch () from /usr/lib/libgdk-1.2.so.0 #14 0x403ad773 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0 #15 0x403add39 in g_main_iterate () from /usr/lib/libglib-1.2.so.0 #16 0x403adeec in g_main_run () from /usr/lib/libglib-1.2.so.0 #17 0x402c8333 in gtk_main () from /usr/lib/libgtk-1.2.so.0 #18 0x407feb69 in nsAppShell::Run () from /usr/lib/mozilla/components/libwidget_gtk.so #19 0x407ddba2 in nsAppShellService::Run () from /usr/lib/mozilla/components/libnsappshell.so #20 0x08053475 in getCountry () #21 0x08053dbb in main () #22 0x4050b647 in __libc_start_main (main=0x8053c70 <main>, argc=1, ubp_av=0xbffff7f4, init=0x804cf8c <_init>, fini=0x8055660 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffff7ec) at ../sysdeps/generic/libc-start.c:129
Keywords: crash
Changed the product to Browser for investigation.
Assignee: wtc → Matti
Component: NSPR → Browser-General
Product: NSPR → Browser
QA Contact: wtc → imajes-qa
Version: 4.2 → other
-> timer (?) = pavlov
Assignee: Matti → pavlov
Component: Browser-General → ImageLib
QA Contact: imajes-qa → tpreston
Component: ImageLib → XPCOM
Priority: -- → P2
Target Milestone: --- → mozilla1.1alpha
Had a similar crash again. Again, killed a pop-up wildow that have just popped up. This time it was on a different web site, with BuildID 2002051312 on RedHat Linux 7.2 gdb said: (gdb) bt #0 0x006f006f in ?? () #1 0x4018f95b in nsTimerImpl::Fire () from /usr/lib/libxpcom.so #2 0x4018fa8f in handleTimerEvent () from /usr/lib/libxpcom.so #3 0x4018b462 in PL_HandleEvent () from /usr/lib/libxpcom.so #4 0x4018b906 in PL_ProcessEventsBeforeID () from /usr/lib/libxpcom.so #5 0x40841e2f in processQueue () from /usr/lib/mozilla/components/libwidget_gtk.so #6 0x40156d61 in nsVoidArray::EnumerateForwards () from /usr/lib/libxpcom.so #7 0x40841e76 in nsAppShell::ProcessBeforeID () from /usr/lib/mozilla/components/libwidget_gtk.so #8 0x4084931a in handle_gdk_event () from /usr/lib/mozilla/components/libwidget_gtk.so #9 0x4037ad7f in gdk_event_dispatch () from /usr/lib/libgdk-1.2.so.0 #10 0x403ae773 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0 #11 0x403aed39 in g_main_iterate () from /usr/lib/libglib-1.2.so.0 #12 0x403aeeec in g_main_run () from /usr/lib/libglib-1.2.so.0 #13 0x402c9333 in gtk_main () from /usr/lib/libgtk-1.2.so.0 #14 0x40841b79 in nsAppShell::Run () from /usr/lib/mozilla/components/libwidget_gtk.so #15 0x40820a02 in nsAppShellService::Run () from /usr/lib/mozilla/components/libnsappshell.so #16 0x08053225 in getCountry () #17 0x08053b6b in main () #18 0x4050c647 in __libc_start_main (main=0x8053a20 <main>, argc=1, ubp_av=0xbffed9c4, init=0x804cdcc <_init>, fini=0x8055430 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffed9bc) at ../sysdeps/generic/libc-start.c:129
*** Bug 151633 has been marked as a duplicate of this bug. ***
Summary: Crash in pt_PostNotifies [nsWebShellWindow::FirePersistenceTimer, PR_Unlock, pthread_cond_signal, __pthread_lock] → Crash killing a popup window (in pt_PostNotifies?) [nsWebShellWindow::FirePersistenceTimer, PR_Unlock, pthread_cond_signal, __pthread_lock]
Summary: Crash killing a popup window (in pt_PostNotifies?) [nsWebShellWindow::FirePersistenceTimer, PR_Unlock, pthread_cond_signal, __pthread_lock] → (timers) Crash killing a popup window (in pt_PostNotifies?) [nsWebShellWindow::FirePersistenceTimer, PR_Unlock, pthread_cond_signal, __pthread_lock]
pavlov -> dougt
Assignee: pavlov → dougt
i can't reproduce this. I have tried it on a few different sites (espn, money, aol). I couldn't get a popup from yahoo (yeah yahoo!). However, there is clearly a life time problem in the nsWebShellWindow. The code inializes a timer with a pointer to |this|. However, there is no reference incremented for this -- so the timer does not own the nsWebShellWindow. During the ~() of nsWebShellWindow, it does call timer cancel, however this may be too late. Of course, it begs the question, why isn't cancel() cancelling the timer. Could it be that the timer is being fired at the same time the nsWebShellWindow is being destroyed? The simple fix here is to simply addref |this| before init'ing the timer and release the nsWebShellWindow in the callback. Dan, this blames to you. thoughts on it?
*** Bug 155846 has been marked as a duplicate of this bug. ***
I agree; since the timer object isn't solely owned by the window, it should (effectively) hold a reference to the window. It's probably the Right Thing. You should also release not only in the callback but also in the window's Destroy method, where it cancels the timer. It might be cleaner to use InitWithCallback instead of InitWithFuncCallback so the reference is handled automatically. Also unable to reproduce this bug, I've been eyeballing timer code trying to figure out how this could result in a callback to a dead window object. Haven't figured that out yet. Seems like the timer object may be outliving its usefulness somehow. I wonder if this new reference will cause a circular reference and leak. That's better than a crash of course but it'd still be nice to know what's happening. I plan to poke around a little more. But lacking any breakthroughs, yes, a nice addref seems appropriate.
over to danm for the nsWebShellWindow bandaide and more poking.
Assignee: dougt → danm
Re-setting target milestone for triage.
Target Milestone: mozilla1.1alpha → ---
Attached patch timer bumps window refcount (deleted) — Splinter Review
I've never been able to reproduce this bug but as we've already discussed, the timer object, which refers to its owner window and isn't solely owned by the window, should hold a reference to the window. Maybe that's the cause of the crash. This is the cleanest implementation (see comment 8), it doesn't cause nsWebShellWindow to leak, and it seems like a good thing to do.
Status: NEW → ASSIGNED
Flags: blocking1.4?
Target Milestone: --- → mozilla1.4final
Attachment #122800 - Flags: superreview?(jaggernaut)
Attachment #122800 - Flags: review?(dougt)
Comment on attachment 122800 [details] [diff] [review] timer bumps window refcount x
Attachment #122800 - Flags: review?(dougt) → review+
that comment should have been: It can't hurt.
Comment on attachment 122800 [details] [diff] [review] timer bumps window refcount sr=jag if you remove the call to cancel from the destructor.
Attachment #122800 - Flags: superreview?(jaggernaut) → superreview+
Attachment #122800 - Flags: approval1.4?
Attachment #122800 - Flags: approval1.4?
darin says the new(ish) timer architecture, which proxies back to the originating thread to execute the callback, will prevent a race condition from firing the timer callback after it's been cancelled in nsWebShellWindow::Destroy. The additional reference in this patch shouldn't really help anything. And indeed talkback reports, while they do show a smattering of this crash in the 1.0 branch, show nothing on the trunk. So this crash seems already fixed on the trunk, probably by the new timer code. Still it seems the COM Thing to Do, letting the timer (effectively) hold a reference to its callback object. We should probably do this if only for appearance' sake. I'm retargetting past the current stability (1.4) milestone. When the time comes we should probably also revisit whether the timerlock is necessary.
Flags: blocking1.4?
Target Milestone: mozilla1.4final → mozilla1.5alpha
addref/release part of patch is checked in. Again, the crash has already been fixed so this is no longer especially important, but it is more proper COM, bumping the window's refcount while another object holds a reference to it. Closing the bug WFM.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: