Closed Bug 921323 Opened 11 years ago Closed 11 years ago

crash in OpenGL@0x32af (MakeCurrent from OMTC)

Categories

(Core :: Graphics: Layers, defect)

All
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox25 + wontfix
firefox26 + wontfix
firefox27 + verified
firefox28 --- verified

People

(Reporter: BenWa, Assigned: mstange)

References

Details

(Keywords: crash, topcrash-mac)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-0b8faad2-fab1-48a2-bf30-aa22e2130926. =============================================================
I think this a dup of bug 899044. Lots of different stack traces, but always crashing in MakeCurrent.
Probably but it points back to the fullscreen crash on 10.6 for which the test is disabled on 10.7+. We're not handling some of the widget changes to the windows and it seems to be messing with our context. It's preventing us from shipping OMTC on 10.6, giving us random test failures on orange and is a mac top crasher.
Can you explain how it leads to the fullscreen crash for 10.6? That's probably an important link, but I don't see it :)
I'm going into 890997 but here it goes anyways: Looks for call to DestroyNativeWindow: http://mxr.mozilla.org/mozilla-central/search?string=DestroyNativeWindow For example we call that from here: http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsCocoaWindow.mm#539 I'm guessing destroying the window without syncing with the compositor that has a context attached to that window isn't good. I think we need to crack down on these problems cause they aren't only affecting 10.6. We also probably want to audit for more bad things else where. Perhaps we can find the internal symbol for when a context gets dettached from the widget and trace the points where it happens.
Right. I was more wondering if there was any evidence to suggest that is what is causing this crash bug.
I can't find any correlation between the crashes and windows closing, but if that is the case, then this should help.
Attachment #810944 - Flags: review?(bgirard)
Note that entering fullscreen or changing some the window properties recreate the window. I'm not sure I understand how this patch is trying to help. Can you explain how you're hoping it will work. In the case where we temporarily destroy the widget we should aim to just unwind and rebind our GLContext from the old to the new widget instead of destroying and recreating all our resources.
(In reply to Benoit Girard (:BenWa) from comment #7) > Note that entering fullscreen or changing some the window properties > recreate the window. > > I'm not sure I understand how this patch is trying to help. Can you explain > how you're hoping it will work. In the case where we temporarily destroy the > widget we should aim to just unwind and rebind our GLContext from the old to > the new widget instead of destroying and recreating all our resources. That function gets called on nsChildView children of the nsCocoaWindow when we call DestroyNativeWindow. My change makes us synchronously tear down the compositor/GLContext. I'm much more interested in not crashing than I am in optimal behaviour at this point.
At the risk of being unreasonable (in which case I'd be ok if you found someone else to review this change), I feel like this change is just a shot in the dark without studying the problem for which we have can reliably reproduce. Not only is it a shot in the dark but it's a fairly invasive change with poor justification. In the future it's going to be hard to maintain because it's a hack. Why is this compositor specific? Why isn't it going through the shutdown part like everything else? With this poor understanding we wont be able to remove/maintain this line because we don't even understand why it's there in the first place. (In reply to Matt Woodrow (:mattwoodrow) from comment #8) > I'm much more interested in not crashing than I am in optimal behaviour at > this point. Then how much is this hurting us? It's a non trivial change. OMTC itself is just an optimization so if we prefer stability over performance (which we do) then we should disable OMTC over a hack that make performance worse and for which we're not sure fixes the initial problem.
Well I'm just entirely confused at this point. (In reply to Benoit Girard (:BenWa) from comment #9) > At the risk of being unreasonable (in which case I'd be ok if you found > someone else to review this change), I feel like this change is just a shot > in the dark without studying the problem for which we have can reliably > reproduce. We can? I only see an intermittent orange bug, and some low volume crash reports, without useful comments. Being able to reproduce this would make life much easier. > Not only is it a shot in the dark but it's a fairly invasive > change with poor justification. > > In the future it's going to be hard to maintain because it's a hack. Why is > this compositor specific? Why isn't it going through the shutdown part like > everything else? With this poor understanding we wont be able to > remove/maintain this line because we don't even understand why it's there in > the first place. Isn't this what you asked me to do? You said that the crash reports pointed at it being the same issue as why we have it disabled on 10.6. I asked why you thought that, and you pointed to a particular set of calls that might cause these problems. This patch should effectively fix that path, but as I said, I can't see any correlation between these MakeCurrent crashes and the 10.6 ones (they appear to be happening during a transition to/from fullscreen and crash in glDrawArrays). I can't see anything that would suggest windows being created/destroyed or fullscreen happening from the TBPL logs, or the crash reports. So, if you *do* think this is the problem, then we should take the patch and find out if it changes anything. If you don't (which is fine, because I don't either), then we're back to the drawing board. > > (In reply to Matt Woodrow (:mattwoodrow) from comment #8) > > I'm much more interested in not crashing than I am in optimal behaviour at > > this point. > > Then how much is this hurting us? It's a non trivial change. OMTC itself is > just an optimization so if we prefer stability over performance (which we > do) then we should disable OMTC over a hack that make performance worse and > for which we're not sure fixes the initial problem. I don't think performance when switching to fullscreen or closing a window is in any way comparable to doubling scrolling throughput (which is what has been recorded for OMTC). One is an edge case, and the other is basically our biggest perf target.
Milan, do we have anyone that actually knows the cocoa widget code that could help out here? I think OMTC is a big enough performance win that we should do everything we can to prevent it being turned off (to the point of living with a low-volume crasher, potentially). As BenWa suggested, I'm just taking shots in the dark here with no STR and no knowledge of the coca widget code or API. I suspect he's right that we're doing something wrong with our asynchronous interaction between GL and cocoa, but I haven't been able to find out exactly what thus far. I can/will continue looking at this, any assistance would be appreciated though.
Flags: needinfo?(milan)
Markus, can you help?
Flags: needinfo?(mstange)
I'll try.
Flags: needinfo?(mstange)
Flags: needinfo?(milan)
Tracy, can you confirm this is a top crash?
Flags: needinfo?(twalker)
overall, not a topcrasher. but for Mac its: #2 on Fx25 #4 on Fx26 #12 on Fx27
Flags: needinfo?(twalker)
Ahh ok, I though that classified as a top crasher for the platform.
(In reply to Benoit Girard (:BenWa) from comment #16) > Ahh ok, I though that classified as a top crasher for the platform. Yes, it does. Sorry, I forgot to add the keyword when posting the stats.
Keywords: topcrash
Thanks!
Another bug that confirms we should consider having topcrash-$platform keywords :)
Comment on attachment 810944 [details] [diff] [review] Destroy compositor when we close the window Review of attachment 810944 [details] [diff] [review]: ----------------------------------------------------------------- Let's wait and see if Markus can help out here before taking this on.
Attachment #810944 - Flags: review?(bgirard)
Looks like Matt's on this, assigning to him.
Assignee: nobody → matt.woodrow
I'm pretty confident that the work in bug 886999, bug 914437, bug 923114 and bug 923133 will fix this, so assigning this to myself.
Assignee: matt.woodrow → mstange
Status: NEW → ASSIGNED
Current Ranking (Mac only): * Release: N/A * Beta: #2 (0.63%) * Aurora: #8 (1.27%) * Nightly: #22 (0.25%)
Me and mstange discussed this yesterday. We're going to let is slide for Beta since it's too late for the patches. This should be fixed on Nightly as of today or yesterday. We're going to uplift to Aurora after a few more days of baking.
Thanks Benoit. I'm going to call this wontfix for Fx25 given that info.
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrashtopcrash-mac
Adding the signature that is showing up on 10.9 as well, since it appears to be the same crash.
Crash Signature: [@ OpenGL@0x32af] → [@ OpenGL@0x32af] [@ OpenGL@0x3ac8]
(In reply to Benoit Girard (:BenWa) from comment #24) > Me and mstange discussed this yesterday. We're going to let is slide for > Beta since it's too late for the patches. This should be fixed on Nightly as > of today or yesterday. We're going to uplift to Aurora after a few more days > of baking. Benoit - did anything land on nightly that can be nominated for uplift on this bug?
Flags: needinfo?(bgirard)
Crash Signature: [@ OpenGL@0x32af] [@ OpenGL@0x3ac8] → [@ OpenGL@0x32af ] [@ OpenGL@0x3ac8 ] [@ OpenGL@0x4133 ]
Should we uplift your patches for this to aurora now?
Flags: needinfo?(bgirard) → needinfo?(mstange)
Too late now for speculative uplifts to Beta, marking wontfix for 26.
This was fixed for Firefox 27 by one or more patches in this push (probably those from bug 914437): https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=30c25c68bb3e
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Depends on: 914437
Flags: needinfo?(mstange)
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Flagging for verification based on crashstats.
Keywords: verifyme
No crashes were recorded on 27 beta based on Socorro in the last 3 weeks for: - [@ OpenGL@0x32af ] signature: http://goo.gl/1tZToV - [@ OpenGL@0x3ac8 ] signature: http://goo.gl/yqQsYm - [@ OpenGL@0x4133 ] signature: http://goo.gl/NiYaV2 No crashes were recorded on Aurora 28.0a2 based on Socorro in the last 3 weeks for: - first signature: http://goo.gl/kJRbAX - second signature: http://goo.gl/Uetq2r - third signature: http://goo.gl/Lpyiha
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: