Closed
Bug 932678
Opened 11 years ago
Closed 11 years ago
[10.9] "Butterfly Demo" "hangs" then crashes the Unity plugin (after SIGSEGV at Mono:GC_mark_from + 1004 in plugin code)
Categories
(Core Graveyard :: Plug-ins, defect, P3)
Tracking
(firefox25 affected, firefox26 affected, firefox27 affected, firefox28 affected)
People
(Reporter: cpeterson, Unassigned)
References
(Blocks 1 open bug, )
Details
(Keywords: hang, reproducible, thirdparty, Whiteboard: [summary in comment #34])
Crash Data
Attachments
(3 files)
STR:
1. Load http://unity3d.com/gallery/demos/live-demos
2. Play any of the Unity demos *except* "Butterfly Demo"
3. See the demo play as expected
4. Play the Butterfly Demo
RESULT:
After the progress bar reaches 100%, Firefox hangs and stderr logs the following messages (without actually logging the supposed stack trace):
plugin-container[68266] <Error>: CGImageCreateWithImageProvider: invalid image size: 0 x 0.
plugin-container[68266] <Error>: CGImageCreateWithImageProvider: invalid image size: 0 x 0.
Stacktrace:
Both Chrome 30 and Safari 7 fail to load the Butterfly Demo, but those browsers do not hang. I am using Unity Player v4.2.2f1.
Comment 1•11 years ago
|
||
> <Error>: CGImageCreateWithImageProvider: invalid image size: 0 x 0.
These errors (or ones very much like them) go way back: See bug 409452 and bug 400865. Bug 409452 also mentions lots of CPU being eaten.
I suspect this is an Apple bug of some kind -- an inappropriate reaction to an error condition. But if this is a 100% reproducible testcase we may be able to find a workaround.
Comment 2•11 years ago
|
||
What version of OS X did you test on?
Reporter | ||
Comment 3•11 years ago
|
||
I can reproduce this hang on two different MacBook Pros (Retina and non-Retina), both running OS X 10.9.0.
Comment 4•11 years ago
|
||
Just attaching to the hung Firefox and getting a stack would be good (plugin-container also). Unless this is a 100% CPU hang, in which case a profile via instruments might be better. In any case, not high priority.
Priority: -- → P3
Comment 5•11 years ago
|
||
The Butterfly demo works just fine (for me) in FF 24 on OS X 10.8.5 with the same version of the Unity plugin (which is the current version). This is even though I see your error message three times. I'll keep testing.
> Just attaching to the hung Firefox and getting a stack would be good (plugin-container also).
Only do this with a mozilla-central nightly (they don't have their symbols stripped). Otherwise the FF-specific symbols in your stack will be all wrong. And it's probably best to get an all-thread stack trace (thread apply all bt).
Comment 6•11 years ago
|
||
Or send them SIGABRT (in Fx27+ builds) to trigger the crash reporter and get the stacks that way. https://developer.mozilla.org/en-US/docs/How_to_Report_a_Hung_Firefox
Comment 7•11 years ago
|
||
Note that Apple's latest XCode commandline tools (for Mavericks) deliberately don't include gdb. So you may need to use lldb instead.
Comment 8•11 years ago
|
||
I see your hang on OS X 10.9 (though I don't see high CPU usage). Also, after about 30 seconds, I get an error page telling me that the Unity Player plugin has crashed. This is in FF 25.
So this is presumably 10.9-specific, and is very likely to be an Apple bug.
Blocks: mavericks-compat
Summary: Unity plugin "Butterfly Demo" hangs Firefox → [10.9] Unity plugin "Butterfly Demo" hangs Firefox
Updated•11 years ago
|
Summary: [10.9] Unity plugin "Butterfly Demo" hangs Firefox → [10.9] "Butterfly Demo" "hangs" then crashes the Unity plugin
Comment 9•11 years ago
|
||
(Following up comment #8)
bp-0ecc6db0-44e1-45c7-92d1-c3e0e2131030
Does this mean anything to you, Benjamin? :-)
Comment 10•11 years ago
|
||
Comment 11•11 years ago
|
||
(Following up comment #10)
These all happened on OS X 10.9, but I'm not sure they're all related.
Most of them concern the Unity plugin, and presumably *are* related. But there are also a few for the Silverlight and DivX plugins.
Comment 12•11 years ago
|
||
(Following up comment #8 and comment #10)
I get the same thing with today's mozilla-central nightly. But then, after I submit the report (and it appears in my about:crashes list), Socorro is unable to find it.
Note that none of the reports from comment #10 concern nightlies -- only releases (currently FF 24 and FF 25).
Reporter | ||
Comment 13•11 years ago
|
||
I used SIGABRT to trigger this crash report during the hang:
b3db3461-f563-43ea-8c81-c3f8c2131030
Comment 14•11 years ago
|
||
> b3db3461-f563-43ea-8c81-c3f8c2131030
bp-b3db3461-f563-43ea-8c81-c3f8c2131030
This crash is in the main process. Is there another, associated crash report for the plugin process?
Comment 15•11 years ago
|
||
By the way, I'm trying to use lldb to find a stack trace of the code that displays the "invalid image size" error messages. Haven't yet managed it.
Reporter | ||
Comment 16•11 years ago
|
||
Steven: I could crash the plugin-container with SIGABRT, but Firefox did not upload a crash report. I attached gdb to the plugin-container and dumped all the thread stack traces in the attached file.
Comment 17•11 years ago
|
||
> plugin-container-threads.txt
Unfortunately I don't see anything interesting there (or in the main process stack trace, for that matter). Both processes, though clearly in the middle of some kind of IPC communication, are (as best I can tell) doing "normal" waiting on all threads.
Benjamin may be able to glean more out of them. But at this point I think our best hope is to figure out what code is causing the error messages to be displayed. By fiddling with that we may be able to work around this bug (which like I said is probably an Apple bug).
Comment 18•11 years ago
|
||
Does this happen in FF26?
The child is sending a sync message (PPluginInstanceChild::SendShow)
The parent is sending an RPC message (PPluginInstanceParent::CallPBrowserStreamConstructor).
This is inherently racy, and that's ok because the IPC mechanism should resolve the race by having the sync message (SendShow) win.
Does this happen also in FF26?
Reporter | ||
Comment 19•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #18)
> Does this happen in FF26?
Yes. I can reproduce the hang in Firefox 25, 26, 27, and 28.
Comment 20•11 years ago
|
||
Here's a trace (all threads) of all three "invalid image size: 0 x 0" errors, plus a call to "Mono`GC_mark_from" that I don't fully understand. (It's the undocumented CGPostError() method that displays the errors.)
All these calls are made from plugin code -- over which (of course) we don't have direct control.
But I'm going to devise an interpose library that hooks CGImageCreateWithImageProvider and stops it from being called with a zero-sized image. If this stops these crashes, we'll have proved that this is an Apple bug, and have shown how we and others can work around it.
Comment 21•11 years ago
|
||
I should have mentioned that this trace was made using today's mozilla-central nightly.
Reporter | ||
Comment 22•11 years ago
|
||
(In reply to Steven Michaud from comment #20)
> Here's a trace (all threads) of all three "invalid image size: 0 x 0"
> errors, plus a call to "Mono`GC_mark_from" that I don't fully understand.
Unity uses the Mono .NET runtime for its scripting language.
Comment 23•11 years ago
|
||
> But at this point I think our best hope is to figure out what code
> is causing the error messages to be displayed. By fiddling with
> that we may be able to work around this bug (which like I said is
> probably an Apple bug).
This turned out to be a red herring.
The "invalid image size: 0 x 0" error messages also get logged by the
other demos (which don't crash), and also appear on other versions of
OS X. And avoiding them by hooking doesn't get rid of the crashes,
either. (I ended up hooking CGImageCreateWithImageInRect(), which in
Unity plugin code sometimes gets called with rect.size.width or
rect.size.height set to '0'. Avoiding these calls did stop the error
messages being displayed, but didn't stop the "crashes".)
Next I'm going to try to get a stack trace of whatever code logs the
"Stacktrace:" message.
Comment 24•11 years ago
|
||
> Next I'm going to try to get a stack trace of whatever code logs the
> "Stacktrace:" message.
I've found that this is logged from Mono code (the Mono bundle in the
Unity plugin) using a call to fwrite(). But the code from which this
happens (mono_handle_native_sigsegv) is a signal handler, which is the
only thing on the stack whenever it's called (always on the main
thread). After the "Stacktrace:" message is logged, the
mono_handle_native_sigsegv method calls mono_jit_walk_stack_from_ctx,
but apparently this fails.
So it's pretty clear a crash is happening somewhere in Mono code,
which it doesn't handle entirely properly. But there's not much more
we can say. We pretty much have to hand this off to the Unity
developers.
Chris, would you mind opening a bug with Unity? When/if you do, you
should probably refer to this bug.
Comment 25•11 years ago
|
||
As for the "hang", here's what I suspect is going on:
The mono_handle_native_sigsegv signal handler stops the Unity plugin from crashing, but it's effectively dead from this point -- so it stops handling IPC messages. Firefox (the main process) notices this and eventually kills the Unity plugin after a timeout.
Sometimes Firefox takes a long time to notice that the Unity plugin is dead -- apparently it can die when no messages are "expected" from it. But I find you can trigger Firefox's countdown by doing something that should trigger IPC messages -- for example changing the browser window's or app's focus.
Comment 26•11 years ago
|
||
When I run Firefox from Terminal and Firefox does manage to kill the Unity plugin, I see error messages like the following in Terminal:
###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv
Comment 27•11 years ago
|
||
Mono uses sigaction() to set signal handlers, so in principle I should be able to use an interpose library to hook these calls and prevent Mono from installing a handler for SIGSEGV. But Mono somehow prevents this method (and also fwrite()) from being hookable using an interpose library.
But a Unity developer could do this directly in Mono code. The advantage would be that, without the signal handler, one could see exactly where the SIGSEGV crash is happening.
Comment 28•11 years ago
|
||
Oops, comment #27 is wrong -- I *can* hook Mono's calls to sigaction. Hopefully I'll be able to post a crash stack in a bit.
Comment 29•11 years ago
|
||
What does mono_handle_native_sigsegv actually *do* instead of crashing?
This is very low priority, I don't think you should spend much more time on it. It seems that the Firefox plugin hang detector works properly and kills the plugin after 60 seconds, so we have a backstop for users.
Reporter | ||
Comment 30•11 years ago
|
||
I filed a Unity support request for this hang, Case #00138536: "Unity hang bug on OS X 10.9?"
Comment 31•11 years ago
|
||
> What does mono_handle_native_sigsegv actually *do* instead of crashing?
I have no idea.
> Hopefully I'll be able to post a crash stack in a bit.
Apparently not. Stopping "Mono" and "UnityPlayer" from using sigaction to set signal handlers for SIGSEGV does stop the "hang" from happening -- the plugin process does die immediately. But lldb doesn't give me a stack trace (for the crashing plugin-container process) -- it just tells me that the process has exited with status = 0.
Possibly the process died because of a SIGKILL or SIGSTOP, but I'll leave that to the Unity developers to figure out.
By the way, Benjamin, if I hadn't done my analysis, we wouldn't have a clue what the problem is here. At least now we have something we can reasonably pass to the Unity developers.
Comment 32•11 years ago
|
||
Actually I was doing the hooking wrong. If you do it right you *do* get a trace of the crash (the SIGSEGV access violation) in lldb. And in fact we've already seen it in attachment 824863 [details] above:
Process 926 stopped
* thread #1: tid = 0x6e80, 0x144d6905 Mono`GC_mark_from + 1004, queue = 'com.apple.main-thread, stop reason = signal SIGSEGV
frame #0: 0x144d6905 Mono`GC_mark_from + 1004
Mono`GC_mark_from + 1004:
-> 0x144d6905: movl 4(%eax), %esi
0x144d6908: cmpl %esi, -172(%ebp)
0x144d690e: ja 0x144d691c ; GC_mark_from + 1027
0x144d6910: cmpl %esi, -176(%ebp)
(lldb) bt all
* thread #1: tid = 0x6e80, 0x144d6905 Mono`GC_mark_from + 1004, queue = 'com.apple.main-thread, stop reason = signal SIGSEGV
frame #0: 0x144d6905 Mono`GC_mark_from + 1004
frame #1: 0x144d796a Mono`GC_mark_some + 466
frame #2: 0x144cf6b6 Mono`GC_stopped_mark + 470
frame #3: 0x144cfa9a Mono`GC_try_to_collect_inner + 351
frame #4: 0x144cfe0b Mono`GC_try_to_collect + 136
frame #5: 0x144cfe62 Mono`GC_gcollect + 26
...
I didn't previously understand why lldb stopped at Mono`GC_mark_from. So now we know that's where the access violation is.
Comment 33•11 years ago
|
||
For those few intrepid Mozilla developers who realize the value of reverse engineering, and are interested in learning more about it, here's the interpose library I used to hook Mono's calls to sigaction().
interpose.mm contains instructions on how to build it, and various other explanatory comments.
Updated•11 years ago
|
Summary: [10.9] "Butterfly Demo" "hangs" then crashes the Unity plugin → [10.9] "Butterfly Demo" "hangs" then crashes the Unity plugin (after SIGSEGV at Mono:GC_mark_from + 1004 in plugin code)
Comment 34•11 years ago
|
||
So to sum up, here's what's happening:
1) An access violation takes place at Mono:GC_mark_from + 1004 in Unity plugin code.
2) A SIGSEGV handler, Mono:mono_handle_native_sigsegv (in plugin code) handles the signal without either properly working around the error or letting the plugin process die.
3) The plugin process is no longer able to handle IPC messages. If/when the main process finds itself "expecting" one of these messages, it assumes the plugin process is hung and starts a timer.
4) Once the timer has expired, the main process kills the plugin process.
Comment 35•11 years ago
|
||
> it assumes the plugin process is hung and starts a timer.
It assumes the plugin process might be hung and starts a timer.
Updated•11 years ago
|
Crash Signature: [@ hang | libsystem_kernel.dylib@0x177ca ]
Whiteboard: [summary in comment #34]
Updated•11 years ago
|
Keywords: thirdparty
Updated•11 years ago
|
Comment 36•11 years ago
|
||
Hi folks, this report just crossed my desk from our Support department. I've filed this into our internal system as case 574149 and marked it as a high priority for my team to fix.
Thanks for your diligence, I'll report back with our progress or questions.
Regards,
-- Ian Dundore
Webplayer Team Lead, Unity Development
Reporter | ||
Updated•11 years ago
|
Comment 37•11 years ago
|
||
Hello,
The core issue appears to be an incompatibility between the version of Mono in the 2.x Unity runtime and OSX 10.9 Mavericks, which causes the plugin to crash ungracefully. Given the complexity of fixing 2.x's Mono runtime, we've elected to block 2.x content on Mavericks, which will prevent this crash/hang from occurring.
This fix is currently scheduled for release with Unity 4.5.
Thanks for the report.
Regards,
-- Ian Dundore
Webplayer Team Lead, Unity Development
Comment 38•11 years ago
|
||
Thanks, Ian, for the fix and the information.
Could you give a rough estimate when Unity 4.5 will be released?
Comment 39•11 years ago
|
||
Hi Steven,
Unfortunately, I really can't make a reliable estimate as Unity 4.5 is in alpha-testing right now. It will be early next year.
Regards,
-- Ian Dundore
Webplayer Team Lead, Unity Development
Reporter | ||
Comment 40•11 years ago
|
||
I can no longer repro this crash with UnityPlayer version 4.3.5f1 (on Nightly 31).
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Updated•3 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•