Closed
Bug 837539
Opened 12 years ago
Closed 12 years ago
WebRTC crash [@VDCVideoNewPlugIn]
Categories
(Core :: WebRTC: Audio/Video, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla22
Tracking | Status | |
---|---|---|
firefox20 | + | fixed |
firefox21 | + | fixed |
firefox22 | --- | fixed |
firefox-esr17 | --- | unaffected |
b2g18 | --- | unaffected |
People
(Reporter: posidron, Assigned: smichaud)
References
Details
(Keywords: crash, sec-high, testcase, Whiteboard: [getUserMedia][blocking-gum+][qa-][adv-main20-])
Crash Data
Attachments
(9 files, 3 obsolete files)
(deleted),
text/html
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
patch
|
smichaud
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This seems to be a crash in the device driver of MacOS.
newton % sw_vers
ProductName: Mac OS X
ProductVersion: 10.8.2
BuildVersion: 12C3006
Before the crash happens we can see in the console:
Sun Feb 3 22:32:42 2013 CMIO_Unit_Output_Base.cpp:197:RenderBus something has gone wrong!
Sun Feb 3 22:32:42 2013 CMIO_Unit_Output_Base.cpp:199:RenderBus idx = 0, mNumInputs = 1, mNumInputsAllocatedFor = 0
Sun Feb 3 22:32:42 2013 CMIO_Unit_Output_Base.cpp:201:RenderBus theInput = 0x1211dec80, mPullInputOnNextRound = 0x0, mInputHasSeenEndOfData = 0x0
[...]
The testcase will not crash in a debug/non-opt build.
Tested with m-i changeset: 120689:eae4b34eb792 and -O2
Tested with m-c changeset: 120354:2cc710018b14 and -O2
Reporter | ||
Comment 1•12 years ago
|
||
Comment 2•12 years ago
|
||
Very fast start/stop of video here.... Perhaps we're triggering an OS/Mac Foundation bug or bug in the webrtc.org code by stopping capture before anything is captured, or some such.
It's possible 3.20 will fix it.
A workaround may be to enforce a minimum 'on' period (or number of frames captured) before stop() takes effect, especially as it's known that debug builds don't crash.
Assignee: nobody → rjesup
Priority: -- → P1
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift]
Updated•12 years ago
|
Flags: in-testsuite?
Comment 3•12 years ago
|
||
We're trying to ship this as part of FF 20. So noming for tracking.
tracking-firefox20:
--- → ?
tracking-firefox21:
--- → ?
Updated•12 years ago
|
status-firefox20:
--- → affected
status-firefox21:
--- → affected
Comment 4•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #2)
> It's possible 3.20 will fix it.
Christoph -- 3.20 just landed yesterday. Can you retest and see if this is still happening? Thanks.
Flags: needinfo?(cdiehl)
Comment 6•12 years ago
|
||
(In reply to Christoph Diehl [:cdiehl] from comment #5)
> Still works for me.
Okay. Does it work for you on Aurora as well?
Reporter | ||
Comment 7•12 years ago
|
||
With current Aurora we see the same error messages but no crash.
It does not look like a real ASan specific catch here, will test it later with an ASan build of Aurora though.
Reporter | ||
Comment 8•12 years ago
|
||
I have tested this now with an ASan debug build of Aurora but it passes the testcase as well. The build I used is: https://people.mozilla.com/~choller/firefox/asan/20130214-mozilla-aurora-macosx64-debug-d083267a188a+asan.html
I currently have not enough disk space left to fetch the Aurora tree and make a optimized build out of it. All my previous builds are optimized and the testcases crashes in those builds successfully.
Comment 9•12 years ago
|
||
Let's close this as a works for me then. If we end up reproducing later with some other configuration, then let's reopen.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Comment 10•12 years ago
|
||
My opt asan mac build just finished (inbound). Second click on "crash" crashed with the same signature
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 11•12 years ago
|
||
Jib is seeing
Thu Feb 14 17:44:37 2013 CMIO_Graph.cpp:8839:HandleRenderNotify CMIOGraph::HandleRenderNotify() called but graph is not initialized!
http://pastebin.mozilla.org/2139860
where I'm crashing with the RenderBus: something has gone wrong! error.
He's running OSX 10.8.2, I'm on 10.7. What are others running where the crash is seen or not seen in an opt asan build on Mac?
Could be:
a) OS bug fixed in 10.8 (or a missing OS trap of bad inputs added in 10.8)
b) compiler bug that causes bad inputs (see a)
c) code in mac drivers in this edge case (fast open/close) has a bug that simply passed bad args to the OS
???
Reporter | ||
Comment 12•12 years ago
|
||
(In reply to Christoph Diehl [:cdiehl] from comment #0)
> newton % sw_vers
> ProductName: Mac OS X
> ProductVersion: 10.8.2
> BuildVersion: 12C3006
It's not a)
Comment 13•12 years ago
|
||
cdiehl: I assume that means your crashes were under 10.8?
d) different Mac Camera drivers react differently
I have a MacBook Pro laptop, non-retina, Core i7. (how do I get driver/HW info from it?) I know jib has a newer mac
Reporter | ||
Comment 14•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #13)
> cdiehl: I assume that means your crashes were under 10.8?
Correct
> I have a MacBook Pro laptop, non-retina, Core i7. (how do I get driver/HW
> info from it?) I know jib has a newer mac
$ system_profiler | more
Comment 15•12 years ago
|
||
I got the same error as jib in bug 815231 when running the testcase (constant errors like that). May be a timing issue too
Comment 16•12 years ago
|
||
FaceTime HD Camera (Built-in):
Product ID: 0x8509
Vendor ID: 0x05ac (Apple Inc.)
Version: 5.16
Serial Number: CC2B7C06LLDGFKL0
Speed: Up to 480 Mb/sec
Manufacturer: Apple Inc.
Location ID: 0xfa200000 / 3
Current Available (mA): 500
Current Required (mA): 500
System Version: Mac OS X 10.7.5 (11G56)
Kernel Version: Darwin 11.4.2
Comment 17•12 years ago
|
||
Yes I have a Retina MacBook Pro laptop "Mid-2012" Core i7 and I'm NOT crashing.
FaceTime HD Camera (Built-in):
Product ID: 0x8510
Vendor ID: 0x05ac (Apple Inc.)
Version: 80.25
Serial Number: CC2C9Q0K4NDN9KE0
Speed: Up to 480 Mb/sec
Manufacturer: Apple Inc.
Location ID: 0x1a110000 / 3
Current Available (mA): 500
Current Required (mA): 500
System Version: OS X 10.8.2 (12C3006)
Kernel Version: Darwin 12.2.1
Comment 18•12 years ago
|
||
I'm failing on every second click of "Crash". Interestingly, that was the pattern I saw with bug 815231 - one would succeed, the second would fail with errors in the logs, and repeat for an hour.
I got a webrtc_trace:65535 log from it, and it closed the capture device as the last thing it did. Interestingly, this is the same point as where we were having issues with shutdown hangs due to sending events tot he mainthread in the QTKit code. I'm adding the "proxy-release-to-mainthread" patch to this build to see if it helps.
Comment 19•12 years ago
|
||
To the newly cc'd people (smichaud, joedrew, bjacob): this seems to be deep in Mac-land, and the QTKit docs suck at this level. We could use some assistance in attacking this. Google searches on these error messages isn't finding much.
With the shutdown hang patch that remotes the shutdown to mainthread the problem still happens, and perhaps even more easily or with more error messages before the crash. And oddly, others like jib can't reproduce it at all. (I suspect strongly an ASAN build is not needed to hit this, though an opt build may be.)
Any help or pointers would be greatly appreciated. The code talking to the mac stuff is deep in media/webrtc/trunk, in .mm files.
Thanks!
Comment 20•12 years ago
|
||
+Jeff, who has done a lot of very deep debugging in places like this before.
Before you do anything else, though, I would send this testcase to Apple through our developer contacts (Steven should know how to use our developer resources) in order to ensure they know they've got a bug in their driver.
Comment 21•12 years ago
|
||
Joe - thanks, that makes sense and matches what I told akeybl. It seems totally related to how fast you open and close the driver.
Comment 22•12 years ago
|
||
FYI, I also tried waiting until isRunning changed after startRunning/stopRunning, with no effect
Assignee | ||
Comment 23•12 years ago
|
||
> Steven should know how to use our developer resources
I don't. And I'm not sure we really have any "Apple developer resources", if that's what you're asking about. I'll CC Josh Aas to see if he might be able to help.
As for this bug, please tell me *exactly* what I need to do to be able to reproduce it. If I need to do a special build, suggest mozconfig options for it. If additional steps are required (beyond just starting the build and running the testcase), please let us know. Finally let us know if special hardware is required (sounds like you need a camera, as can be found on a MacBook Pro).
I'm good at reverse engineering. So once I know how to reproduce this bug, I may be able to find out what's causing it, and possibly even find a workaround. But reverse engineering is very time consuming, so I can't promise quick results.
Comment 24•12 years ago
|
||
Ok, here are the STR:
a) use an ASAN mac build (per the instructions in the wiki).
Full opt build, no debug symbols. I'll upload my mozconfig
b) load into gdb (you don't need to set the ASAN breakpoint, as it crashes)
c) go to the testcase here
d) click Crash; click to share the camera. Repeat if needed. Usually crashes on first or second try for me. Never crashes for jib with a slightly different camera (though he gets errors (different ones).
NOTE: ASAN may very well NOT be needed; you can try first with a mac opt build. I'll try spinning one myself. Also, it may be ok to have debug symbols in an opt build, though maybe that changes the timing enough to miss the bug.
Comment 25•12 years ago
|
||
Assignee | ||
Comment 26•12 years ago
|
||
Thanks for the info.
> a) use an ASAN mac build (per the instructions in the wiki).
I assume you mean https://developer.mozilla.org/en-US/docs/Building_Firefox_with_Address_Sanitizer
My life would be easier if I could build with the clang/llvm that come with recent versions of XCode (specifically with the XCode Commandline Tools). I'll try that and let you know my results.
Comment 27•12 years ago
|
||
I suspect that works. I'm spinning a plain Mac opt build now
Comment 28•12 years ago
|
||
A regular full-opt build also crashes, though with a null deref instead of a random address, and crashes in CMIOUnitDALInputEntry(). It gets the same "RenderBug something has gone wrong!" errors preceding the crash. Crashed on second select of "Crash".
This should make testing easier
clang -v says 4.1 (421.11.65) based on LLVM 3.1svn
Comment 29•12 years ago
|
||
BTW, I applied the fixes from
https://webrtc-codereview.appspot.com/1097014
to our tree, but got the same crash (this was with asan this morning). It does fix the "can't use Skype for video after getUserMedia" in FF (and in Chrome).
Assignee | ||
Comment 30•12 years ago
|
||
I can get today's mozilla-central nightly to crash on a "15-inch Early 2011" MacBook Pro running OS X 10.6.8 or 10.7.5 (though for some reason not 10.8.2). I've got gdb stack traces for both crashes.
Note that as of the addition of support for Benoit Girard's profiler, mozilla-central nightlies no longer have their symbols stripped. So, wonderfully, gdb stack traces of mozilla-central nightlies are meaningful, and have been for about a year (if I remember right).
Assignee | ||
Comment 31•12 years ago
|
||
Assignee | ||
Comment 32•12 years ago
|
||
Breakpad IDs for my crashes on OS X 10.6.8 and 10.7.5:
bp-b22c1d21-a9d8-4357-a37e-ab16c2130215
bp-b3df7be0-0a36-48aa-944e-495262130215
Assignee | ||
Comment 33•12 years ago
|
||
(Following up comment #32)
Note that there's serious corruption in the Breakpad stacks. The gdb traces are *much* higher quality.
Comment 34•12 years ago
|
||
Those (at least the 10.7 ones) match what I and cdiehl (on 10.8) have hit. jib gets
Thu Feb 14 17:44:37 2013 CMIO_Graph.cpp:8839:HandleRenderNotify CMIOGraph::HandleRenderNotify() called but graph is not initialized!
Thu Feb 14 17:44:37 2013 CMIO_Graph.cpp:8839:HandleRenderNotify CMIOGraph::HandleRenderNotify() called but graph is not initialized!
where I crashed. (My older test (bug 815231) for open/close would also get that error - it's probably a dup of this, though it may show even more strongly it's a timing issue.)
Comment 35•12 years ago
|
||
I tried to reproduce this crash but on my MBP (13", early 2011) it always crashes with the signature as filed on bug 815231.
Comment 36•12 years ago
|
||
I think they're different manifestations of the same general problem, and the way to produce them is extremely similar. Can you dump your camera type/driver/etc?
Comment 37•12 years ago
|
||
FaceTime HD Camera (Built-in):
Product ID: 0x8509
Vendor ID: 0x05ac (Apple Inc.)
Version: 5.16
Serial Number: CC2B2P01YVDG6LL0
Speed: Up to 480 Mb/sec
Manufacturer: Apple Inc.
Location ID: 0xfa200000 / 2
Current Available (mA): 500
Current Required (mA): 500
System Version: Mac OS X 10.7.5 (11G63)
Kernel Version: Darwin 11.4.2
Reporter | ||
Comment 38•12 years ago
|
||
The testcase of bug 815231 does not crash with m-c changeset: 122315:cc37417e2c28 for me.
Comment 39•12 years ago
|
||
Steven - any luck? I'm working on a possible wallpaper patch to land as a backup if we can't find a root cause or get action from Apple.
Assignee | ||
Comment 40•12 years ago
|
||
Sorry, I let this one slip through the cracks ... probably because I felt obliged to list it merely as "security related" in my list of bugs on the Mozilla Status Board, and forgot what it was about :-(
I'll see if I can at least find a regression range today.
Assignee | ||
Comment 41•12 years ago
|
||
This bug's testcase doesn't do anything before the firefox-2013-01-06-03-09-02-mozilla-central nightly -- presumably because that's the first mc nightly after the patch for bug 825594 landed.
But (at least in my tests) these crashes don't start with that build. Rather they start with the firefox-2013-01-08-03-34-57-mozilla-central nightly.
That gives us the following regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=605ae260b7c8&tochange=dccab70d8554
I can't begin to guess which patch in this range might have triggered the crashes. Furthermore I can only easily reproduce them on OS X 10.6.8, and the tree is no longer buildable there by ordinary mortals since the switch to clang (which happened with the patch for bug 733905 on 2012-07-19).
So it'll be very difficult for me to find the trigger using hg bisect.
Anyone have any ideas?
Assignee | ||
Comment 42•12 years ago
|
||
Randell: If your wallpaper patch is at all acceptable I'd go ahead with it.
Assignee | ||
Comment 43•12 years ago
|
||
Whoever can reproduce this bug, please test my regression range from comment #41.
Comment 44•12 years ago
|
||
If that's the regression range, then I see two possible bug candidates:
- bug 827007
- bug 826576
Comment 45•12 years ago
|
||
Steven - those bugs Jason mentions make it actually Stop() the stream before destroying it and deal with some timing holes in the MSG code with RemoveListener(). So bisecting to find that it's one of those patches doesn't tell us a lot, other than somehow stopping the capture before separately from destroying the capture object changes the timing or "something else" to trigger it.
I can crash it reliably on 10.7.5. cdiehl can crash it on 10.8.2. It *seems* to be HW dependent as well as (maybe) OS dependent.
Comment 46•12 years ago
|
||
I'll try to bisect, but I don't think it will tell me anything useful other than it's timing-related.
I don't have a wallpaper patch yet; going to work on it tonight more. Basically it will be a delay in shutting down the camera on Mac (min time between startup and shutdown). If that doesn't work, wait for a frame (and maybe max N seconds for safety). I do not know this will work; I merely presume/hope it will.
Assignee | ||
Comment 47•12 years ago
|
||
Randell and cdiehl: Please please please try testing with mozilla-central nightlies. And if you can repro please look for regression range(s).
Note that if this is an Apple bug (and perhaps more than one), we may not be able to "fix" it -- we may only have the choice between different kinds of workaround.
Assignee | ||
Comment 48•12 years ago
|
||
One issue to consider (since some of our code is off the main thread) is thread safety -- some Apple APIs are documented not to be thread safe.
Assignee | ||
Comment 49•12 years ago
|
||
> The code talking to the mac stuff is deep in media/webrtc/trunk, in
> .mm files.
I've just started looking through the .mm files at this location.
Note that I'm completely unfamiliar with this code or with the QTKit.
As best I can tell from a quick look-through, the low-level mac stuff
relevant to this bug is all in the following file:
media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/video_capture_qtkit_objc.mm
Is this correct?
(Since no rendering seems to occur in the testcase, I assume the
rendering code isn't relevant.)
For reference here's Apple's QTKit Framework Reference:
https://developer.apple.com/library/mac/#documentation/QuickTime/Reference/QTCocoaObjCKit/_index.html
Comment 50•12 years ago
|
||
(In reply to Steven Michaud from comment #49)
> As best I can tell from a quick look-through, the low-level mac stuff
> relevant to this bug is all in the following file:
>
> media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/
> video_capture_qtkit_objc.mm
>
> Is this correct?
That's correct. Note also that the locking around the buffer access is broken, but when I asked jesup to test my patch that fixes that <http://pastebin.mozilla.org/2141915>, it did not solve the problem:
12:55:44 < jesup> derf: for fun I tried your lock patch on bug 837539. No joy
Comment 51•12 years ago
|
||
Yes, that's the location of the code.
I've seen the docs; they don't really help. There's some level of thread safety here, as we had to proxy a shutdown release to mainthread because mainthread had stopped processing native events in shutdown. They apparently use messages to the main native event loop to implement thread safety. I believe I tried doing that always, and it didn't help (though it changed the number of duplicate errors I got before it crashed).
Assignee | ||
Comment 52•12 years ago
|
||
I did a search on 'QTKit "thread safe"' and found the following Apple doc:
http://developer.apple.com/library/mac/#technotes/tn2125/_index.html
It has a section on "QTKit Capture":
http://developer.apple.com/library/mac/#technotes/tn2125/_index.html#//apple_ref/doc/uid/DTS10003392-CH1-SUBSECTION17
This makes it sound like "QTKit Capture" is generally thread safe, though there are some precautions to take.
Assignee | ||
Comment 53•12 years ago
|
||
> They apparently use messages to the main native event loop to
> implement thread safety.
I don't understand this :-)
Is it implemented in current code? And if so where?
Assignee | ||
Comment 54•12 years ago
|
||
> QTCaptureSession* _captureSession;
> QTCaptureDeviceInput* _captureVideoDeviceInput;
> QTCaptureDecompressedVideoOutput* _captureDecompressedVideoOutput;
These three objects, from video_capture_qtkit_objc.h, appear to be the only QTKit objects used by our low-level code. In principle we should be serializing access to them with locks, but aren't currently doing so.
I'm not sure this matters (the same purpose may already be accomplished in higher-level code). But I thought I'd mention it.
That's all for today. I'm off to shovel some snow!
Assignee | ||
Comment 55•12 years ago
|
||
> In principle we should be serializing access to them with locks, but
> aren't currently doing so.
Oops, I missed _rLock.
And otherwise none of the unlocked accesses to these objects seem to
need protection.
Assignee | ||
Comment 56•12 years ago
|
||
>> They apparently use messages to the main native event loop to
>> implement thread safety.
>
> I don't understand this :-)
Now I understand it, I think. Here's an example from our source code:
[_captureDecompressedVideoOutput
performSelectorOnMainThread:@selector(setPixelBufferAttributes:)
withObject:captureDictionary waitUntilDone:NO];
// [_captureDecompressedVideoOutput setPixelBufferAttributes:captureDictionary];
What the substitute (not commented out) code does is to arrange for
this method to be called on the main thread -- which (of course)
guarantees that different calls to the method won't happen
simultaneously.
Assignee | ||
Comment 57•12 years ago
|
||
Tomorrow I'll do a test build with a bunch of debug logging (NSLog statements) added to video_capture_qtkit_objc.mm. I'll also knock out selected bits and pieces of code, to see what difference that makes.
In other words, I'll semi-randomly mess around with video_capture_qtkit_objc.mm, and see where it leads me.
Do please keep looking for those regression ranges, though. This will *not* be an easy bug to fix or work around.
Assignee | ||
Comment 58•12 years ago
|
||
As promised I've been messing around with video_capture_qtkit_objc.mm. Along the way I discovered that this weird patch "fixes" this bug's crashes for me, on OS X 10.6.8.
Whoever can reproduce these crashes, please let me know if it also "fixes" your crashes.
It's very unlikely that it will be an acceptable workaround. But I want to find out if I've hit an important nerve, and if it's worthwhile trying to figure out *why* this patch "works". Which may lead me to a more acceptable workaround.
I've started a tryserver mozilla-central build, for those who want one. It should be available in a few hours.
Assignee | ||
Comment 59•12 years ago
|
||
Here's the tryserver build made with my patch from comment #58:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-d32b2c2f0954/try-macosx64/firefox-22.0a1.en-US.mac.dmg
Assignee | ||
Comment 60•12 years ago
|
||
Besides fixing the crash on my 15" early 2011 MBP running OS X 10.6.8, I find that it gets rid of the following error message when tested on a Retina MBP running OS X 10.7.5 (where I haven't seen any crashes):
CMIO_Graph.cpp:8709:HandleRenderNotify CMIOGraph::HandleRenderNotify() called but graph is not initialized!
So I suppose a QTCaptureVideoPreviewOutput object gets automatically initialized in a way that a QTCaptureDecompressedVideoObject doesn't, and that I need to figure out how to explicitly perform that initialization on the latter object.
That's what I'll be trying to do tomorrow.
Assignee | ||
Comment 61•12 years ago
|
||
Note that I also see a bunch of "autoreleased with no pool in place" running this bug's testcase (even on hardware and OS versions where I don't crash). These messages are harmless and unrelated. They're also easily fixed.
I'll do that when/if I fix or work around this bug.
Comment 62•12 years ago
|
||
Still crashes on 10.7.5 :-(
Different errors though:
Wed Feb 27 20:33:13 2013 CMIO_Unit_Output_Base.cpp:197:RenderBus something has gone wrong!
Wed Feb 27 20:33:13 2013 CMIO_Unit_Output_Base.cpp:199:RenderBus idx = 0, mNumInputs = 1, mNumInputsAllocatedFor = 0
Wed Feb 27 20:33:13 2013 CMIO_Unit_Output_Base.cpp:201:RenderBus theInput = 0x11d568ef0, mPullInputOnNextRound = 0x0, mInputHasSeenEndOfData = 0x0
2013-02-27 20:33:14.193 firefox[27453:14203] *** QTCaptureVideoPreviewOutput warning: Output received the following error while decompressing video: Error Domain=NSOSStatusErrorDomain Code=-67446 "The operation couldn’t be completed. (OSStatus error -67446.)". Make sure that the pixel buffer attributes of all outputs are valid.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000007c00000001
[Switching to process 27453 thread 0x14203]
0x0000007c00000001 in ?? ()
Assignee | ||
Comment 63•12 years ago
|
||
Thanks for the info. And too bad my "fix" didn't work for you.
But your new errors could still be the result of lack of initialization, so I'll still keep trying to pursue my hunch.
By the way, I think I've ruled out the possibility that the crashes and errors are due to accessing deleted objects. Deliberately leaking all our native objects didn't make the crashes go away. Neither did testing with CFZombieLevel=17 (which among other things stops the "zombies" from ever being freed).
To use any kind of allocation debugging on OS X, you generally have to turn off jemalloc. Do this by setting a NO_MAC_JEMALLOC environment variable to some value.
Even with jemalloc off, though, I still haven't got libgmalloc working in current trunk code. Which is a shame, since this used to be one of my favorite tools.
Comment 64•12 years ago
|
||
That would fit with an early theory of mine: if we shut it down before it captures a frame, <something> isn't initialized.
Assignee | ||
Comment 65•12 years ago
|
||
Folks, can someone give me a testcase that captures the camera and does something with it? Even one that just displays the camera output in the box.
That should help me figure out how initialization normally takes place.
Also it'd be nice to have a capture-the-camera-and-displays-its-output testcase that works in other browsers, like Safari and Chrome. That way I'd be able to observe any tricks that those guys play to get things working. (I hope and assume that Safari, at least, would also use QTKit to do this.)
Assignee | ||
Comment 66•12 years ago
|
||
> Even one that just displays the camera output in the box.
Even one that just displays the camera output in *a* box.
Comment 67•12 years ago
|
||
http://mozilla.github.com/webrtc-landing/gum_test.html
Chrome also uses QTKit. See src/media/video/capture/mac/video_capture_device_qtkit_mac.mm
Assignee | ||
Comment 68•12 years ago
|
||
I keep digging around in the bowels of OS X, and I think I've found the key to this bug (at least on OS X 10.7 and 10.8) -- undocumented methods in an undocumented framework (the CoreMediaIO framework).
Here's the source for an interpose library that hooks both of these methods, and prevents one of them (CMIOUninitializeGraph()) from actually doing anything. I'd like the people who can reproduce this bug's crashes (on OS X 10.7 and 10.8) to try it and see if it stops the crashes from happening.
If my hunch is right, the root cause of this bug's crashes (on 10.7 and 10.8) is that CMIOUninitializeGraph() is sometimes getting called "too early".
To test with this interpose library, you first need to compile it (just run make). Then set the DYLD_INSERT_LIBRARIES environment variable as follows at a Terminal prompt:
export DYLD_INSERT_LIBRARIES=[full/path/to/]interpose.dylib
Then run your test build from the same Terminal prompt.
Comment 69•12 years ago
|
||
that dylib interpose took me from 100% crashing on the second "Crash" click to not being able to crash it!
I think we're on the right track!
Assignee | ||
Comment 70•12 years ago
|
||
Glad to hear it!
Tomorrow I'll see if I can turn my hunch into a real patch.
Comment 71•12 years ago
|
||
Hmm...I don't think a crashtest will help here. This is happening way low at the OS integration layer, which fake streams don't touch. So we probably can't a crashtest here.
Flags: in-testsuite? → in-testsuite-
Assignee | ||
Comment 72•12 years ago
|
||
For a bug that's been so much trouble to figure out, this has an absurdly easy fix :-)
Please try out this patch and let us know your results. It's worked for me in my tests (getting rid of the crashes on OS X 10.6.8 and the initialization errors on OS X 10.7.5).
I'll await your results before posting a patch that also fixes the autoreleased with no pool errors.
Assignee: rjesup → smichaud
Status: REOPENED → NEW
Assignee | ||
Comment 73•12 years ago
|
||
Here's another interpose library, this time just for logging. On OS X 10.7 and 10.8, it shows CMIOUninitializeGraph() being called "too early" (on a secondary thread) in builds without my fix, and being called "on time" in builds with my fix.
See the instructions from comment #68 on how to use it.
Assignee | ||
Comment 74•12 years ago
|
||
Given that this bug is caused by a lack of initialization, and not (apparently from my results in comment #63) from accessing deleted objects, I doubt very much that it has any security implications.
Assignee | ||
Comment 75•12 years ago
|
||
I decided not to wait for your tests ... but I'd still like to hear their results.
I'm not sure who to ask for a review. If you don't feel comfortable reviewing this, Randell, please let me know.
Attachment #720012 -
Attachment is obsolete: true
Attachment #720070 -
Flags: review?(rjesup)
Comment 76•12 years ago
|
||
Comment on attachment 720070 [details] [diff] [review]
Full fix (including for autorelease pool errors)
Review of attachment 720070 [details] [diff] [review]:
-----------------------------------------------------------------
Releasing on the main thread matches some things I found in my investigations on the web. (None of which were definitive unfortunately).
I don't feel I know enough about the qtkit/ObjC stuff to comment on the pool changes. Are they needed? If so, can you get a Mac person to look at them?
r+ for the proxy to main-thread. (We had to do this already for handling close of the top-level webrtc engine on Mac in shutdown, because it internally proxied to the native Main Event loop, but in shutdown we'd stopped processing it.)
::: media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/video_capture_qtkit.mm
@@ +54,5 @@
>
> VideoCaptureMacQTKit::~VideoCaptureMacQTKit()
> {
>
> + nsAutoreleasePool localPool;
I'm guessing these take the place of the commented-out NSAutoreleasePools in the *_objc.mm files?
::: media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/video_capture_qtkit_info_objc.h
@@ +23,5 @@
>
> @interface VideoCaptureMacQTKitInfoObjC : NSObject{
> bool _OSSupportedInfo;
> NSArray* _captureDevicesInfo;
> + //NSAutoreleasePool* _poolInfo;
What's the point of leaving these commented out?
::: media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/video_capture_qtkit_info_objc.mm
@@ +138,5 @@
> {
> return [NSNumber numberWithInt:0];
> }
>
> + //_poolInfo = [[NSAutoreleasePool alloc]init];
What's the point of leaving these commented out?
Attachment #720070 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 77•12 years ago
|
||
Comment on attachment 720070 [details] [diff] [review]
Full fix (including for autorelease pool errors)
> I don't feel I know enough about the qtkit/ObjC stuff to comment on
> the pool changes.
> Are they needed?
Yes. Though the problems they fix are unrelated to the crashes.
> If so, can you get a Mac person to look at them?
Will do (on Monday).
> + nsAutoreleasePool localPool;
>
> I'm guessing these take the place of the commented-out
> NSAutoreleasePools in the *_objc.mm files?
In a way. They do correctly what the commented out stuff did
incorrectly.
> What's the point of leaving these commented out?
No point, really. They're completely useless. I'll get rid of them.
Assignee | ||
Comment 78•12 years ago
|
||
Randell, I assume you did test my patch, and that it did get rid of your crashes :-)
Comment 79•12 years ago
|
||
Oh, you wanted a test as well as a review? ;-) Yes, it seems to reliably solve my crashes, and I crashed 100% of the time before.
Assignee | ||
Comment 80•12 years ago
|
||
Thanks for the info. Now I feel better.
Remember that my first test patch worked perfectly for me (on two different machines and OS versions), but not for you.
Assignee | ||
Comment 81•12 years ago
|
||
Attachment #720070 -
Attachment is obsolete: true
Attachment #720707 -
Flags: review?
Assignee | ||
Comment 82•12 years ago
|
||
Comment on attachment 720707 [details] [diff] [review]
Full fix (get rid of _poolInfo instead of commenting it out)
Carrying forward rjesup's r+.
Attachment #720707 -
Flags: review? → review+
Comment 83•12 years ago
|
||
We need this fixed ASAP so it can be uplifted for Beta 3 hopefully. Otherwise I need to drop stuff and try to finish a working wallpaper patch (probably "wait until a frame comes in or X seconds before stopping"). Can you find a reviewer for the rest? Ask on #developers? Thanks!
Comment 84•12 years ago
|
||
Alternatively, if that's tough or if you believe the pool bug fixed here is unrelated to the reported problems here and/or minor, we could split this into two bugs, and only uplift the close one.
Assignee | ||
Updated•12 years ago
|
Attachment #720707 -
Flags: review?(bgirard)
Assignee | ||
Comment 85•12 years ago
|
||
(I'd actually already asked Benoit Girard to review in comment #81, but for some reason it didn't stick.)
Comment 86•12 years ago
|
||
Comment on attachment 720707 [details] [diff] [review]
Full fix (get rid of _poolInfo instead of commenting it out)
Review of attachment 720707 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/webrtc/trunk/webrtc/modules/video_capture/mac/qtkit/video_capture_qtkit_info.mm
@@ +13,5 @@
> #import "video_capture_qtkit_info_objc.h"
>
> #include "video_capture.h"
>
> +class nsAutoreleasePool {
Why can't we reusage an existing copy such as http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsCocoaUtils.h#43 ? This pattern is everywhere. I don't think it impacts our release code but it bloats our symbols/debug info.
Attachment #720707 -
Flags: review?(bgirard) → review+
Assignee | ||
Comment 87•12 years ago
|
||
> Why can't we reusage an existing copy such as
> http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsCocoaUtils.h
The WebRTC stuff is in a separate executable, so I didn't think to try
to link in stuff from the rest of the tree. But I'd forgotten that
we'd implemented nsAutoreleasePool in a header file, which might work.
I'll try that, and land it if it works.
Otherwise we eventually want to create an equivalent of nsCocoaUtils.h
for the WebRTC stuff, which I suspect needs a *lot* of cleanup.
However I don't want to do that now.
Assignee | ||
Comment 88•12 years ago
|
||
Landed on mozilla-inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/9fa5887d41e3
The WebRTC stuff has a completely different build system than Firefox does (more like Chromium's). So I quickly gave up trying to figure out how to include a header file from elsewhere in the Mozilla tree.
I also discovered (and removed) one more useless NSAutoreleasePool* in video_capture_qtkit_objc.h and video_capture_qtkit_objc.mm.
The autorelease pool errors are harmless and unrelated, but they are also very visible and annoying. They fill up the system console with useless error messages, and make it a lot more difficult to use the system console for debugging.
Assignee | ||
Comment 89•12 years ago
|
||
Carrying forward a single r+ for rjesup's and bgirard's r+es.
Attachment #720707 -
Attachment is obsolete: true
Attachment #720758 -
Flags: review+
Comment 90•12 years ago
|
||
(In reply to Steven Michaud from comment #88)
> The WebRTC stuff has a completely different build system than Firefox does
> (more like Chromium's). So I quickly gave up trying to figure out how to
> include a header file from elsewhere in the Mozilla tree.
Alright not a problem
Comment 91•12 years ago
|
||
(In reply to Steven Michaud from comment #88)
> Landed on mozilla-inbound:
> http://hg.mozilla.org/integration/mozilla-inbound/rev/9fa5887d41e3
Randell, can you please file an issue/patch upstream?
Comment 92•12 years ago
|
||
Comment 93•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago → 12 years ago
status-firefox22:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Comment 94•12 years ago
|
||
Randell had a solid way to crash this and verified this crash with this patch. So I don't think this needs QA testing here.
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift] → [getUserMedia][blocking-gum+][webrtc-uplift][qa-]
Comment 95•12 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #94)
> Randell had a solid way to crash this and verified this crash with this
> patch. So I don't think this needs QA testing here.
verified this crash no longer reproduces*
Comment 96•12 years ago
|
||
Comment on attachment 720758 [details] [diff] [review]
Full fix (what I actually landed)
[Approval Request Comment]
Bug caused by (feature/regressing bug #): original import
User impact if declined: Crashes on some macs when you have fast opens/closes. It's possible there's a (very?) tough to exploit security vulnerability here given a few of the stacks had random addresses logged.
Testing completed (on m-c, etc.): On m-c. Verified by myself (was 100% reliable crash) and Jason. cdiehl may be able to verify as well.
Risk to taking this patch (and alternatives if risky): Low - a typical "proxy-to-native-mainthread" plus some incorrect pool cleanups. Very unlikely to be worse than the current state. Likely safer than a wallpaper patch, very likely that it's no more risky (wallpaper patch would touch a lot more code and would be harder to verify).
String or UUID changes made by this patch: none.
Attachment #720758 -
Flags: approval-mozilla-beta?
Attachment #720758 -
Flags: approval-mozilla-aurora?
Comment 97•12 years ago
|
||
Comment on attachment 720758 [details] [diff] [review]
Full fix (what I actually landed)
Given the sec rating, that this has already landed to central and that we're about to go to build on FF20 beta 3, approving for uplift. Please land today so this gets into beta 3.
Attachment #720758 -
Flags: approval-mozilla-beta?
Attachment #720758 -
Flags: approval-mozilla-beta+
Attachment #720758 -
Flags: approval-mozilla-aurora?
Attachment #720758 -
Flags: approval-mozilla-aurora+
Comment 98•12 years ago
|
||
Comment 99•12 years ago
|
||
Updated•12 years ago
|
status-b2g18:
--- → unaffected
status-firefox-esr17:
--- → unaffected
Updated•12 years ago
|
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift][qa-] → [getUserMedia][blocking-gum+][webrtc-uplift][qa-][adv-main20-]
Updated•12 years ago
|
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift][qa-][adv-main20-] → [getUserMedia][blocking-gum+][qa-][adv-main20-]
Updated•11 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•