799142 - Crash because ~TimerEventAllocator() may be called before TimerEventAllocator::Free() is called [@ nsRunnable::Release() ]

Reporter

Description

•

12 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0 Build ID: 20121005155445 Steps to reproduce: I tried to close Firefox when I'm playing Browser Hunt. http://ie.microsoft.com/testdrive/performance/browserhunt/default.xhtml Actual results: Firefox sometimes crashed because of access violation. I debugged my builds of Firefox 15.0.1, 16.0 and trunk, and I found out that ~TimerEventAllocator() was called before TimerEventAllocator::Free() was called, when Firefox crashed as mentioned above. Expected results: All TimerEventAllocator::Free() calls should be completed before call of ~TimerEventAllocator().

Tetsuro Kato (tete)

Reporter

Updated

•

12 years ago

URL: http://ie.microsoft.com/testdrive/per...

Tetsuro Kato (tete)

Reporter

Updated

•

12 years ago

Crash Signature: [@ nsTimerEvent::`vector deleting destructor''(unsigned int) ]

Tetsuro Kato (tete)

Reporter

Updated

•

12 years ago

Crash Signature: [@ nsTimerEvent::`vector deleting destructor''(unsigned int) ] → [@ nsTimerEvent::`vector deleting destructor''(unsigned int) ] [@ nsRunnable::Release() ]

Tetsuro Kato (tete)

Reporter

Comment 1

•

12 years ago

Sorry, ~TimerEventAllocator() isn't explicitly declared in the original source. It was added to my builds explicitly for debugging purpose. Firefox 16.0 rc build1 seems to show another crash signature [@ nsRunnable::Release() ], so I added it to "Crash Signature".

Matthias Versen [:Matti]

Comment 2

•

12 years ago

Can you please post a crash ID from about:crashes ?

Keywords: crash

Tetsuro Kato (tete)

Reporter

Comment 3

•

12 years ago

Crash IDs: Firefox 18.0a1: 75443141-6066-4ab4-92a9-975ca2121008 Firefox 16.0 rc build1: 142a8001-7c35-4367-b67c-47a4e2121008

Matthias Versen [:Matti]

Updated

•

12 years ago

Component: Untriaged → XPCOM

Product: Firefox → Core

Makoto Kato [:m_kato]

Comment 4

•

12 years ago

This is race condition. thread 0 (gecko main thread) is shutting down after another thread is starting nsRunnable::run(). Then, if run() spends a lot of times (has busy job), thread 0 will release allocator before finishing run(). Although this is possible issue, I don't think that this is frequency. mark as new since there is crash signature...

Status: UNCONFIRMED → NEW

Ever confirmed: true

Adam Roach [:abr]

Comment 6

•

12 years ago

The scenario in Bug 825480 seems to reliably trigger this crash on certain hardware configurations, so we're going to mark this as [blocking-rtcweb+] for the time being. I'm not familiar with the code involved, so this is nothing more than a casual armchair analysis: The TimerEventAllocator class is used in only one file -- but it's of class nsFixedSizeAllocator, which hits 29 other files. Based on that first-order examination, I'd estimate that a conversion to reference counting (which would solve the race described) is probably realistic, but not trivial.

Whiteboard: [blocking-webrtc+]

Scoobidiver (away)

Updated

•

12 years ago

Whiteboard: [blocking-webrtc+] → [blocking-webrtc+][ietestdrive]

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Whiteboard: [blocking-webrtc+][ietestdrive] → [webrtc][blocking-webrtc+][ietestdrive]

Jason Smith [:jsmith]

Comment 7

•

12 years ago

I'm getting this to reproduce using the data channels test page by sending files around eventually as well.

tracking-firefox21: --- → ?

Jason Smith [:jsmith]

Comment 8

•

12 years ago

Seems to be easily reproducible with http://mozilla.github.com/webrtc-landing/data_test.html on Win 7.

Keywords: reproducible

bhavana bajaj [:bajaj]

Updated

•

12 years ago

status-firefox21: --- → affected

tracking-firefox21: ? → +

bhavana bajaj [:bajaj]

Comment 9

•

12 years ago

:mreavy,:jesup : Can you please help find an assignee for this bug ? Thanks !

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

12 years ago

CCing people who've had their fingers in TimerImpl recently... There's a race condition in the Timer code that has existed and apparently we're now triggering. Could some of you take a look and see what might work as a solution?

Boris Zbarsky [:bzbarsky]

Comment 11

•

12 years ago

As far as I can tell, the TimerEventAllocator thing added in bug 733277 assume nsITimer is used only from the main thread, right?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 12

•

12 years ago

Oh boy. I use timers off the main thread all over the place.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Flags: needinfo?(mreavy)

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Flags: needinfo?(rjesup)

Benjamin Smedberg

Updated

•

12 years ago

Assignee: nobody → ehsan

Priority: -- → P2

Lukas Blakk [:lsblakk] use ?needinfo

Comment 13

•

12 years ago

Thanks for the assignee, Benjamin - clearing the needsinfo

Flags: needinfo?(rjesup)

Flags: needinfo?(mreavy)

Benjamin Smedberg

Updated

•

12 years ago

Blocks: 733277

Lukas Blakk [:lsblakk] use ?needinfo

Comment 14

•

12 years ago

If bug 733277 assumes nsITimer is only used from the main thread and there are instances of use outside the main thread then should bug 733277 be backed out while FF 21 is still on m-c and re-worked?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 15

•

12 years ago

(In reply to Lukas Blakk [:lsblakk] from comment #14) > If bug 733277 assumes nsITimer is only used from the main thread and there > are instances of use outside the main thread then should bug 733277 be > backed out while FF 21 is still on m-c and re-worked? We need ehsan's input here, but if bug 733277 assumes only main thread timers then I would say yes, it should be backed out. It probably caused subtle race bugs in anything involving LazyIdleThread (IndexedDB, JumplistBuilder from a quick MXR search) or Workers.

Josh Matthews [:jdm]

Comment 16

•

12 years ago

Ehsan is out until the 19th.

Randell Jesup [:jesup] (needinfo me)

Comment 17

•

12 years ago

How likely are those subtle races causing impossible-to-reproduce/debug crashes/hangs/malfunctions?

Benjamin Smedberg

Comment 18

•

12 years ago

Since the original landed in March, I'm going to wait for ehsan to return and comment before taking other actions on this bug.

Patch (v1) 12 years ago (no longer active) (deleted), patch		Details \| Diff \| Splinter Review
Patch (v2) 12 years ago (no longer active) (deleted), patch	benjamin : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review