Closed
Bug 1117027
Opened 10 years ago
Closed 10 years ago
Unity DT2 demo crashes on first click with e10s disabled on OS X
Categories
(Core :: DOM: UI Events & Focus Handling, defect)
Tracking
()
RESOLVED
FIXED
mozilla38
Tracking | Status | |
---|---|---|
e10s | - | --- |
firefox35 | --- | unaffected |
firefox36 | --- | unaffected |
firefox37 | + | fixed |
firefox38 | + | fixed |
People
(Reporter: brendan, Assigned: smichaud)
References
(Blocks 1 open bug, )
Details
(Keywords: regression)
Attachments
(3 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
mstange
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Four crash reports, last from very latest Nightly (December 31 tip), the other three from the one I had fired up to test tonight, which was built from a December 25 tip:
bp-e9d67f16-5170-484e-aba4-579e52150102 1/1/15 9:00 PM
bp-1f1e5a53-7ffd-4e83-aef0-2379a2150102 1/1/15 8:57 PM
bp-0fdee63f-4e57-4d30-beb8-caa4b2150102 1/1/15 8:47 PM
bp-8cdb5f26-f6ba-475a-a9f6-6da722150102 1/1/15 8:45 PM
In a (default) e10s window, DT2 is still unplayable, the trackpad is very hard to control (perhaps not as bad as last year, but unplayable).
Cc'ing a few folks. Recent-ish regression, I demo'ed DT2 successfully in London at NDC, early last month.
Event Handling as starting component but probably wrong. Please fix! Thanks.
/be
Reporter | ||
Comment 1•10 years ago
|
||
> In a (default) e10s window, DT2 is still unplayable, the trackpad is very hard to control (perhaps not as bad as last year, but unplayable).
This is me being unclear: the default e10s window loads DT2 and does not crash on first click. As noted, the non-e10s window is the only way to get a playable trackpad, and the click is required to start tracking (workaround for bug in older Unity build that won't be fixed in this hosted demo, which will be fixed for the DT2 release).
/be
Comment 2•10 years ago
|
||
Stack in those crash reports (0 or 1 frames) are perhaps suspicious; common 0x5f3ffff8 may be of interest.
Comment 3•10 years ago
|
||
er, common *crash address* 0x5f3ffff8 may be of interest.
Comment 4•10 years ago
|
||
I suspect all the crashes are stack overflows, despite EXC_BAD_ACCESS crash reason. RSP is 0x00007fff5f400000 in all cases; seems too round to be a coincidence. I don't see memory regions in the dump to verify, though.
That means it would be good to get a real stack.
Reporter | ||
Comment 6•10 years ago
|
||
As dbaron surmised...
Who can take this bug?
/be
Comment 7•10 years ago
|
||
Looks like http://mxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.cpp?rev=e7b88b459b2a#3975 should deal with this kind of recursion.
I don't have a OSX machine to reproduce. Perhaps cpearce can reproduce.
Flags: needinfo?(cpearce)
Reporter | ||
Comment 8•10 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #7)
> Looks like
> http://mxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.
> cpp?rev=e7b88b459b2a#3975 should deal with this kind of recursion.
Thanks -- any idea what regressed this? Something that landed in the last month.
/be
Reporter | ||
Updated•10 years ago
|
Version: unspecified → Trunk
Comment 9•10 years ago
|
||
The regression is possibly something OSX specific, the stuff happening under
widget->SynthesizeNativeMouseMove (which calls SynthesizeNativeMouseEvent)
http://hg.mozilla.org/mozilla-central/annotate/57e4e9c33bef/widget/cocoa/nsChildView.mm#l1128
Comment 10•10 years ago
|
||
Can't repro on Windows here. Someone who has a clue about MacOSX would be best suited to fixing this.
Flags: needinfo?(cpearce)
Updated•10 years ago
|
Flags: needinfo?(smichaud)
Reporter | ||
Comment 11•10 years ago
|
||
Is Steven really the only Mac-enabled, likely-to-diagnose-this-bug developer around?
/be
Reporter | ||
Comment 12•10 years ago
|
||
Sorry to "Dear LazyWeb" -- I meant maybe someone could bisect. I'll do it when I have time, unless someone beats me to it. Thanks,
/be
Comment 13•10 years ago
|
||
I can reproduce this crash, but only on my main HiDPI screen. If I open the game on my external monitor, it works just fine.
Keywords: regression,
regressionwindow-wanted
Comment 14•10 years ago
|
||
I'm on OSX 10.9.5.
Comment 15•10 years ago
|
||
I can also reproduce this with e10s disabled entirely, as you'd expect. There's no additional useful console spam in a debug build, though I did hit a corrupted actor state abort in PLayer in the non-e10s window in e10s mode scenario.
Updated•10 years ago
|
Summary: Unity DT2 demo in non-e10s window crashes on first click → Unity DT2 demo in crashes on first click with e10s disabled
Assignee | ||
Comment 16•10 years ago
|
||
Andrew, could you give detailed STR? (Unless just running the demo and clicking once on it is enough to crash.)
I'll also look for a regression range.
The most recent big Mac-specific plugin change is the patch for bug 1092630.
Flags: needinfo?(smichaud)
Comment 17•10 years ago
|
||
1. Load page, then wait for it to finish downloading.
2. Click on either "play" button.
3. Wait until it goes full-screen. Sometimes I clicked on the "ok" for full screen permission, but I think sometimes not.
4. Click using the trackpad for an immediate crash.
It is easy to trigger. The only caveat is, as I said, that I couldn't reproduce on an external monitor. I was able to reproduce on the main HiDPI screen, with or without an external monitor plugged in.
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to comment #17)
Yup, that does it for me (I crash, with today's m-c nightly in non-e10s mode) on my Retina MacBook Pro (with no external monitor attached) with all of OS X 10.7.5, 10.9.5 and 10.10.1. (The only reason I didn't test OS X 10.8.5 is that I don't have it installed there.)
I'll get and post one or more all-thread stack gdb/lldb stack traces.
It's very likely the trigger is bug 1092630 ... but we don't yet know for sure.
Assignee | ||
Comment 19•10 years ago
|
||
One extremely puzzling thing is that I don't see a Firefox Unity plugin process while the demo is running. Do you know what's up with that, Brendan?
Flags: needinfo?(brendan)
Assignee | ||
Comment 20•10 years ago
|
||
And the demo works with the Unity plugin disabled. So apparently it doesn't use the Unity plugin at all.
Comment 21•10 years ago
|
||
Yes, I believe this is a demo of a Unity-engine game compiled to ASM.js. It may use Flash or something for sounds, I'm not sure what the state of that is, but otherwise no plugins.
Assignee | ||
Comment 22•10 years ago
|
||
I don't need to post a gdb/lldb stack trace, since Brendan has already done so in comment #6.
It looks like EventStateManager::GenerateMouseEnterExit() is re-entered in an infinite loop until we run out of stack space. So, though it may be easier to trigger this bug on OS X, it's probably ultimately a bug in cross-platform event handling code.
Assignee | ||
Comment 23•10 years ago
|
||
Thanks for the info, Andrew.
> It may use Flash or something for sounds
It doesn't. I don't see any Firefox plugin processes at all.
Flags: needinfo?(brendan)
Comment 24•10 years ago
|
||
(In reply to Steven Michaud from comment #23)
> Thanks for the info, Andrew.
>
> > It may use Flash or something for sounds
>
> It doesn't. I don't see any Firefox plugin processes at all.
Correct, it uses Web Audio for sound, and the engine itself is asm.js. No plugins here.
Assignee | ||
Comment 25•10 years ago
|
||
Here's the regression range:
firefox-2014-12-11-03-02-06-mozilla-central
firefox-2014-12-12-03-02-01-mozilla-central
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0cf461e62ce5&tochange=5288b15d22de
Keywords: regressionwindow-wanted
Assignee | ||
Comment 26•10 years ago
|
||
> It's very likely the trigger is bug 1092630
Bizarrely I turned out to be right -- even though we now know this bug has nothing to do with plugins.
I don't think it can be the cause, though. As I said above, that's probably a cross-platform bug.
Since this bug doesn't (probably) require any Mac-specific wizardry to fix, is it possible that someone else can take it? :-)
Depends on: 1092630
Assignee | ||
Comment 27•10 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/a6db8f54f5aa
Bug 1092630: Get rid of native widgets for OS X NPAPI plugins, make things work much better under e10s. Patch by Josh Aas, Markus Stange, Steven Michaud, David Parks. r=smichaud/jst/josh (more reviews pending)
author Josh Aas <joshmoz@gmail.com>
Thu Dec 11 08:44:07 2014 -0600 (at Thu Dec 11 08:44:07 2014 -0600)
Comment 28•10 years ago
|
||
That patch did change the coordinate handling in SynthesizeNativeMouseEvent
Comment 29•10 years ago
|
||
[Tracking Requested - why for this release]: Breaks an asm.js game demo
Comment 30•10 years ago
|
||
(As an aside, I'd guess the demo not working with e10s is bug 1044228, because with the test case in that bug I get the same weird behavior where the position skips all over the place as I get in this demo.)
Assignee | ||
Comment 31•10 years ago
|
||
> That patch did change the coordinate handling in SynthesizeNativeMouseEvent
So it did, here:
https://hg.mozilla.org/mozilla-central/annotate/912036eeb024/widget/cocoa/nsChildView.mm#l1134
And I now realize the comment is wrong.
I'll post a patch reversing this part of the patch for bug 1092630, and ask people to test it.
Updated•10 years ago
|
Summary: Unity DT2 demo in crashes on first click with e10s disabled → Unity DT2 demo in crashes on first click with e10s disabled on OS X
Assignee | ||
Comment 32•10 years ago
|
||
> I'll post a patch reversing this part of the patch for bug 1092630, and ask people to test it.
It doesn't work locally (it causes a different problem), so I won't do this.
But in any case it's clear that at least some parts of Josh's patch for bug 1092630 make wrong assumptions. And I've covered a lot of this ground before myself. So (sigh) I'm probably the best one to take this bug.
Assignee: nobody → smichaud
Status: NEW → ASSIGNED
tracking-e10s:
--- → -
Reporter | ||
Updated•10 years ago
|
Summary: Unity DT2 demo in crashes on first click with e10s disabled on OS X → Unity DT2 demo crashes on first click with e10s disabled on OS X
Assignee | ||
Comment 33•10 years ago
|
||
I've got a patch that seems to fix both this bug and bug 1112267. Before I post it I want to see the results of a tryserver run. But our build infrastructure seems to have crapped out for the time being, so that will probably have to wait for tomorrow.
In fact my patch is just what I described in comment #31. But I wasted many hours before discovering that this bug's testcase won't run in self builds. It only runs in "official" releases, nightlies and tryserver builds. In other builds the testcase's asm.js code seems to go into an infinite loop, and you eventually get the "unresponsive script" warning. I spent more time trying to figure out why -- to figure out what build options I'd need to add to my mozconfig files (or take away from them). I still haven't managed to do this.
Brendan and Alon (or anybody else), do you have any idea why the demo doesn't work in my self builds?
Flags: needinfo?(brendan)
Flags: needinfo?(azakai)
Comment 34•10 years ago
|
||
I'm able to run the test case in the URL in my own regular optimized debug build on OSX, with clang "Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)". It was a bit pokey starting up, but not more than ten or so seconds, and I didn't see a slow script dialogue.
Assignee | ||
Comment 35•10 years ago
|
||
Andrew, could you attach your mozconfig?
I hadn't thought to change what version of clang I use. I'll do that (by building on a different version of OS X) and see if it makes a difference. I've been building on OS X 10.7.5, with "Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)".
Comment 36•10 years ago
|
||
Assignee | ||
Comment 37•10 years ago
|
||
Yup, it seems to be the version of clang you use that makes the difference (or possibly the version of OS X you build on): A self build done on OS X 10.8.5, using "Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)", works just fine.
Weird.
Flags: needinfo?(brendan)
Flags: needinfo?(azakai)
Comment 38•10 years ago
|
||
I'd like to see the potential patch, no need to wait to verify that it's correct. Can you post it?
Assignee | ||
Comment 39•10 years ago
|
||
Here's my patch, Josh. I just reversed your change to nsChildView::SynthesizeNativeMouseEvent().
Your comment is wrong. Cocoa "points" (which elsewhere get called "display pixels") are *not* (in general) the same as Gecko "device pixels". "Device pixels" are actual pixels, which are not (in the general case) directly addressable. "Display pixels" *are* directly addressable, using methods that take parameters like NSPoint and NSRect.
In non-HiDPI mode "device pixels" and "display pixels" are physically the same. But not in HiDPI mode, where a "display pixel" equals four "device pixels".
Yes, it's horribly confusing. But then so is HiDPI mode itself.
Attachment #8546757 -
Flags: review?(joshmoz)
Assignee | ||
Comment 40•10 years ago
|
||
For the record, here's the results of my tryserver run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0d7659bc48e
There were no non-spurious test failures.
And here's the opt tryserver build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-f0d7659bc48e/try-macosx64/firefox-37.0a1.en-US.mac.dmg
Attachment #8546757 -
Flags: review?(joshmoz) → review?(mstange)
Comment 41•10 years ago
|
||
Comment on attachment 8546757 [details] [diff] [review]
Fix
Review of attachment 8546757 [details] [diff] [review]:
-----------------------------------------------------------------
Looks like David Parks introduced this change in bug 1092630 comment 12 in order to fix test failures. Do tests still pass with this patch?
Do you know how this bug caused the crash?
Attachment #8546757 -
Flags: review?(mstange) → review+
Comment 42•10 years ago
|
||
Oh, you already ran tests and they were green. Hmm. And our test machines don't run on HiDPI machines, so David wouldn't have seen the test failure on TBPL to begin with.
David, did you run the tests on a HiDPI machine when you made the change to nsChildView::SynthesizeNativeMouseEvent? Do you know which test failure you were fixing?
Flags: needinfo?(davidp99)
Comment 43•10 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #42)
> David, did you run the tests on a HiDPI machine when you made the change to
> nsChildView::SynthesizeNativeMouseEvent? Do you know which test failure you
> were fixing?
Both David and I were running tests locally and trying to fix bugs with HiDPI. We should definitely make sure this doesn't regress local HiDPI tests.
Comment 44•10 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #42)
> David, did you run the tests on a HiDPI machine when you made the change to
> nsChildView::SynthesizeNativeMouseEvent? Do you know which test failure you
> were fixing?
Yes, I'm on a retina MBP -- this work was partially about fixing existing test failures on HiDPI devices... just because thats the kind of machine I use and the tests were the best way to confirm the rest of the plugin work. Its entirely possible that I made the wrong call about the function's expected coordinate system.
I would have changed it to fix, on HiDPI mac, one of three tests:
test_plugin_mouse_coords.html
test_cocoa_focus.html
test_cocoa_window_focus.html
It was likely the first one... If reverting the change maintains green, then go for it.
Flags: needinfo?(davidp99)
Assignee | ||
Comment 45•10 years ago
|
||
David's change to nsChildView::SynthesizeNativeMouseEvent() is clearly wrong, and my reversion of it is clearly correct. But other changes may also be required, including to some of the tests. I'll look into this over the next few days, and only then land my patch.
Assignee | ||
Comment 46•10 years ago
|
||
(In reply to comment #41)
> Do you know how this bug caused the crash?
No, not in detail. David's change to nsChildView::SynthesizeNativeMouseEvent() is so clearly wrong that I think fixing it is enough (as long as it also fixes this bug, which it does). But, like I said above, some tests may also need to be fixed.
I may very well be right that the stack exhaustion is also a bug in cross-platform code. But I'll leave that for someone else to fix, if/when it comes up again.
Assignee | ||
Comment 47•10 years ago
|
||
(In reply to comment #44)
> test_plugin_mouse_coords.html
I get the following failures in this test when I run it locally (on my Retina MacBook Pro) in current trunk code (i.e. without my patch for this bug):
36 INFO TEST-UNEXPECTED-FAIL | layout/generic/test/test_plugin_mouse_coords.html | p1 mouse move Y - got 18, expected 38
37 INFO TEST-UNEXPECTED-FAIL | layout/generic/test/test_plugin_mouse_coords.html | p2 mouse move Y - got 18, expected 38
38 INFO TEST-UNEXPECTED-FAIL | layout/generic/test/test_plugin_mouse_coords.html | p3 mouse move Y - got 11, expected 31
David, do you also see these failures when you run this test locally (on a Retina MacBook Pro)? On which version of OS X do make the build and run the test? What kind of build is it, generally?
I built a non-opt non-debug build on OS X 10.7.5, and also tested there.
Flags: needinfo?(davidp99)
Assignee | ||
Comment 48•10 years ago
|
||
(Following up comment #47)
Interestingly, these failures only happen (in current trunk code) with a 64-bit build -- not with a 32-bit build.
Comment 49•10 years ago
|
||
(In reply to Steven Michaud from comment #47)
> (In reply to comment #44)
>
> David, do you also see these failures when you run this test locally (on a
> Retina MacBook Pro)? On which version of OS X do make the build and run the
> test? What kind of build is it, generally?
>
> I built a non-opt non-debug build on OS X 10.7.5, and also tested there.
The test passes fine for me (using a plain trunk build - i.e. not including the patch in this bug) on an OSX 10.9 retina MBP. about:buildconfig says x86_64.
Flags: needinfo?(davidp99)
Comment 50•10 years ago
|
||
PS: It was a debug build:
> ac_add_options --enable-debug
Assignee | ||
Comment 51•10 years ago
|
||
I found out which test was failing for you, David, and how to fix it:
> test_cocoa_focus.html
The test as it stood didn't properly support HiDPI mode. Now (with my changes) it does.
This test failed in local builds done on both OS X 10.7.5 and 10.9.5, with "--enable-debug" on or off. With my changes it always succeeds (using my local builds).
Attachment #8546757 -
Attachment is obsolete: true
Attachment #8552032 -
Flags: review?(mstange)
Comment 52•10 years ago
|
||
Comment on attachment 8552032 [details] [diff] [review]
Fix plus test change
Review of attachment 8552032 [details] [diff] [review]:
-----------------------------------------------------------------
Perfect!
Attachment #8552032 -
Flags: review?(mstange) → review+
Updated•10 years ago
|
Assignee | ||
Comment 53•10 years ago
|
||
Comment on attachment 8552032 [details] [diff] [review]
Fix plus test change
Landed on mozilla-inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/953db4d13beb
Comment 54•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Assignee | ||
Comment 55•10 years ago
|
||
Comment on attachment 8552032 [details] [diff] [review]
Fix plus test change
Approval Request Comment
[Feature/regressing bug #]: 1092630
[User impact if declined]: Pointer locking broken, among other things
[Describe test coverage new/current, TreeHerder]: Manual testing and brief baking on m-c
[Risks and why]: Low risk -- fixes obvious mistakes in our code
[String/UUID change made/needed]: None
Attachment #8552032 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8552032 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 56•10 years ago
|
||
Comment on attachment 8552032 [details] [diff] [review]
Fix plus test change
Landed on mozilla-aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/8a791f4daca
Assignee | ||
Updated•10 years ago
|
Updated•6 years ago
|
Component: Event Handling → User events and focus handling
You need to log in
before you can comment on or make changes to this bug.
Description
•