Closed Bug 1135541 Opened 10 years ago Closed 10 years ago

[EME] Ensure crash reporting does something sensible when EME plugins crash

Categories

(Core :: Audio/Video, defect)

x86_64
Windows Vista
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: cpearce, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 5 obsolete files)

If an EME GMP crashes, we should ensure the crash reporter does something sensible. Hopefully the work for this is already done and we can resovlve this as WORKSFORME, but I thought it best to file to ensure I don't forget to check this.
I'm CCing bsmedberg as AFAIK he has most knowledge of GMP crash stuff. Still, if you need him to comment, I think he only reacts to ni? and not to CCing.
bsmedberg said in an email: We collect crash reports for all GMP plugins, but the code which associates that with a DOM window and allows the user to submit the crash is context-sensitive. So for OpenH264, the code flow looks like this: http://mxr.mozilla.org/mozilla-central/source/dom/media/PeerConnection.js#162 http://mxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#2303 Then the shared Firefox plugin frontend reacts to the PluginCrashed event and shows the UI. This may not need any changes: http://mxr.mozilla.org/mozilla-central/source/browser/modules/PluginContent.jsm#342 http://mxr.mozilla.org/mozilla-central/source/browser/modules/PluginContent.jsm#854 You'll need to implement similar logic for documents that are using a GMP.
Assignee: nobody → edwin
Flags: needinfo?(bobowen.code)
Depends on: 1145432
Attached patch WIP (obsolete) (deleted) — Splinter Review
Got this far; very much a WIP. Managed to get firefox UI showing for the plugin crash, but this way it would dispatch to _all_ tabs that have a CDM which isn't what we want. Recommend moving the Observer stuff to MediaKeys instead of having it in HTMLMediaElement. Couldn't figure out why the crashreporter dialog wasn't showing at the time; I've been told it might be related to bug 1145432. Not sure if it provides any actual insight into the solution. (probably more insight into the haphazard inner workings of Edwin's mind...)
Gerald bravely volunteered to help us out with this...
Assignee: edwin → gsquelart
The fix for bug 1145432 is on inbound. I still don't see the crash reporting dialog even with this fix, but I do see dumps, which I don't get without it.
Flags: needinfo?(bobowen.code)
Note that for plugin (and content) crashes we usually do not display a crash reporting dialog but instead display an in-content crash reporting UI in place of the plugin on the page. We do get a very low submission rate for those crashes though, so that might even be worth rethinking, but as the browser process is still up, we'll want something displayed by Firefox and not the native crash reporter window (which we only need because when Firefox crashed, it cannot display UI or run submission by itself). For EME, it may make sense to either display the crash UI in place of the video (as I guess that piece of the UI gets useless at that point) or it may make sense to actually create a doorhanger-style crash UI that we can use for all plugin stuff including GMP. This might need UX people to chime in.
I think we do want this dispatching to all tabs which use a CDM: it's likely that the user will need to refresh the tab in order to get their content working correctly. We shouldn't mess with the UI here: the existing plugin-crashed notification bar is what we want for now.
Attached patch 1135541-gmp-crash-reporting-WIP.patch (obsolete) (deleted) — Splinter Review
Moved crash handling to MediaKeys, as Edwin recommended in comment 3 -- We may move it back to HTMLMediaElement if that actually makes more sense. Implemented check that the crashed plugin is "ours" (the MediaKeys') before showing EME crash UI drop-down. Moved crashing code to separate patch (to follow). Still WIP!
Attachment #8580430 - Attachment is obsolete: true
Small patch to force a crash when using an EME plugin, for easier testing. Alternatively the plugin may be manually terminated from the Windows Task Manager or a kill command.
Attached patch 1135541-gmp-crash-reporting.patch (obsolete) (deleted) — Splinter Review
Handling of gmp-crash-plugin for EME plugins. Done in MediaKeys; Any HTMLMediaElement using an EME plugin has at most one MediaKeys, and each MediaKeys has at most one HTMLMediaElement, so the crash report will appear in all relevant pages. Review focus please: Making sure the chain of GetPluginId() is done correctly (correct thread and/or thread-safe). Leaving the crashing patch attached to this bug for now, will remove it after review.
Attachment #8581196 - Attachment is obsolete: true
Attachment #8581425 - Flags: review?(cpearce)
Comment on attachment 8581425 [details] [diff] [review] 1135541-gmp-crash-reporting.patch Review of attachment 8581425 [details] [diff] [review]: ----------------------------------------------------------------- Looks pretty good. Just minor changes, please address them before landing. ::: dom/media/eme/MediaKeys.cpp @@ +3,5 @@ > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this file, > * You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +#include "GMPService.h" Please move MediaKeys.h to the top of the file, so that we can ensure that MediaKeys.h includes everything it needs to compile on its own, in non-unified builds. @@ +390,5 @@ > MediaKeySystemStatus::Cdm_created); > + > + if (mProxy) { > + // Store plugin id, for use with Observe('gmp-plugin-crash'). > + mLastGMPParentPluginID.Truncate(); You don't need this extra truncate; GetPluginId does it for you. @@ +489,5 @@ > > + nsCOMPtr<nsIObserverService> observerService = > + mozilla::services::GetObserverService(); > + if (observerService) { > + observerService->AddObserver(this, "gmp-plugin-crash", false); Unless your nsIObserver implementation also implements nsIWeakReference, adding an observer holds a strong reference to your obsever object. Meaning that if you are observing a topic you'll be kept alive. So you need to remove the observer in MediaKeys::Shutdown(), else we'll leak (at least until the observer service shuts down and drops its references). You can detect leaks by running with XPCOM_MEM_LEAK_LOG=2 environment variable. @@ +562,5 @@ > + init.mPluginName = Substring(sData, space1 + 1, space2 - (space1 + 1)); > + init.mPluginDumpID = Substring(sData, space2 + 1, sData.Length() - (space2 + 1)); > + } > + > + // The following PluginCrashedEvent fields stay empty: I would prefer if you didn't checkin commented out code. @@ +565,5 @@ > + > + // The following PluginCrashedEvent fields stay empty: > + //init.mBrowserDumpID; > + //init.mPluginFilename; > + // TODO: Can/should we fill them? Should we fill them? I don't know... @@ +567,5 @@ > + //init.mBrowserDumpID; > + //init.mPluginFilename; > + // TODO: Can/should we fill them? > + > + NS_ENSURE_TRUE(!!mParent, NS_ERROR_FAILURE); We're not supposed to use NS_ENSURE_TRUE anymore, as per the style guide. I don't make the rules, I just enforce them. if (!mParent) { NS_WARNING("someAppropriateWarning"); return NS_ERROR_FAILURE; } or: if (NS_WARN_IF(!mParent)) { return NS_ERROR_FAILURE; } ::: dom/media/gmp/GMPDecryptorParent.h @@ +28,5 @@ > explicit GMPDecryptorParent(GMPParent *aPlugin); > > // GMPDecryptorProxy > + > + virtual void GetPluginId(nsString& aPluginId) const override; You could make this: virtual const nsAString& GetPluginId() const; Then you'd have nicer semantics.
Attachment #8581425 - Flags: review?(cpearce) → review+
Attached patch 1135541-gmp-crash-reporting.patch (obsolete) (deleted) — Splinter Review
Updated patch as per review in comment 12. Major rework to get rid of the potentially-leaking AddObserver, hence the r?. Now the GMPService keeps a list of (small) crash handlers, and regularly discards obsolete handlers. MediaKeys registers its crash handler that displays a pop-down if the EME GMP terminates; the handler can mark itself as obsolete if the related window or document evaporates. https://treeherder.mozilla.org/#/jobs?repo=try&revision=35824fe03f10
Attachment #8581197 - Attachment is obsolete: true
Attachment #8581425 - Attachment is obsolete: true
Attachment #8582921 - Flags: review?(cpearce)
Comment on attachment 8582921 [details] [diff] [review] 1135541-gmp-crash-reporting.patch Review of attachment 8582921 [details] [diff] [review]: ----------------------------------------------------------------- r+ with the comments addressed. ::: dom/media/eme/CDMProxy.cpp @@ +136,5 @@ > MOZ_ASSERT(!GetNodeId().IsEmpty()); > + if (mCDM) { > + mKeys->OnCDMCreated(aPromiseId, GetNodeId(), mCDM->GetPluginId()); > + } else { > + nsAutoCString blankPluginId; If mCDM is null, does it make sense to report that the CDM is created? It seems to me that you should instead be calling mKeys->RejectPromise(aPromiseId, NS_ERROR_DOM_INVALID_STATE_ERR); ::: dom/media/eme/MediaKeys.cpp @@ +371,5 @@ > +class CrashHandler : public gmp::GeckoMediaPluginService::PluginCrashCallback > +{ > +public: > + CrashHandler(const nsACString& aPluginId, > + nsCOMPtr<nsPIDOMWindow> aParentWindow, You can pass raw pointers through, i.e. nsPIDOMWindow*, you don't need to pass nsCOMPtr<T> through. Doing so causes an addref/release cycle, which if all the objects have threadsafe refcounting can add up to a perf hit when scaled up to a codebase the size of ours. Or so I was told. @@ +426,5 @@ > + if (!document) { > + return false; > + } > + nsCOMPtr<nsIDocument> parentWindowDocument = parentWindow->GetExtantDoc(); > + if (!parentWindowDocument || document.get() != parentWindowDocument.get()) { You've got duplicate code here and in Run() above... How about having a function that returns the target of the event, and IsStillValid() returns true if the helper returns non-null? ::: dom/media/gmp/GMPDecryptorParent.cpp @@ +34,5 @@ > + return mPlugin->GetPluginId(); > + } else { > + NS_WARNING("No plugin!"); > + static const nsCString blank = NS_LITERAL_CSTRING(""); > + return blank; I would have thought that having a static nsCString here would be reported as a leak by our leak detector (since the leak check runs before the statics are finalized; the memory would be freed after the leak check finished and the statics' finalizers run). So... If this can run after we've shutdown the plugin, then it'll have to not return a reference. It won't run often, so we can pay that perf cost. Or you could cache the pluginId in this class, and keep returning a reference. ::: dom/media/gmp/GMPService.cpp @@ +178,5 @@ > + auto& callback = mPluginCrashCallbacks[i - 1]; > + const nsACString& callbackPluginId = callback->PluginId(); > + if (!callback->IsStillValid()) { > + LOGD(("%s::%s - Removing obsolete callback for pluginId %s", > + __CLASS__, __FUNCTION__, callbackPluginId.Data())); You don't use callbackPluginId except in the LOG function, so why not just call it in the LOG function, i.e.: need to cache callback->PluginId() LOGD(("%s::%s - Removing obsolete callback for pluginId %s", __CLASS__, __FUNCTION__, PromiseFlatCString(callback->PluginId()).get())); Using PromiseFlatCString also guarantees that your string is null terminated at Data()+Length(), which nsACString doesn't guarantee, as the concrete class underneath could be a substring of another longer string. (Unlikely in this case, since you wrote the PluginId() function, but it's a good habit to get into). @@ +197,5 @@ > +GeckoMediaPluginService::RemovePluginCrashCallbacks(const nsACString& aPluginId) > +{ > + RemoveObsoletePluginCrashCallbacks(); > + for (size_t i = mPluginCrashCallbacks.Length(); i != 0; --i) { > + auto& callback = mPluginCrashCallbacks[i - 1]; I think since you're dealing with a refcounted object you should be explicit and fully declare the type rather than using auto, i.e. nsRefPtr<PluginCrashCallback>& callback; @@ +217,5 @@ > + const nsACString& callbackPluginId = callback->PluginId(); > + if (!callback->IsStillValid()) { > + LOGD(("%s::%s(%s) - Removing obsolete callback for pluginId %s", > + __CLASS__, __FUNCTION__, aPluginId.Data(), callbackPluginId.Data())); > + mPluginCrashCallbacks.RemoveElementsAt(i - 1, 1); You can use RemoveElementAt(i - 1) instead of specifying a range length of 1 to remove. Ditto elsewhere. @@ +218,5 @@ > + if (!callback->IsStillValid()) { > + LOGD(("%s::%s(%s) - Removing obsolete callback for pluginId %s", > + __CLASS__, __FUNCTION__, aPluginId.Data(), callbackPluginId.Data())); > + mPluginCrashCallbacks.RemoveElementsAt(i - 1, 1); > + } else { This could be: } else if (...) { to save you one indent below. ::: dom/media/gmp/GMPService.h @@ +57,5 @@ > + PluginCrashCallback(const nsACString& aPluginId) > + : mPluginId(aPluginId) > + { > + MOZ_ASSERT(NS_IsMainThread()); > + MOZ_COUNT_CTOR(PluginCrashCallback); You don't need the MOZ_COUNT_CTOR or DTOR, as NS_INLINE_DECL_REFCOUNTING does it for you: https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#513
Attachment #8582921 - Flags: review?(cpearce) → review+
Flags: needinfo?(cpearce)
Flags: needinfo?(cpearce)
Keywords: checkin-needed
Flags: needinfo?(cpearce)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment on attachment 8583550 [details] [diff] [review] 1135541-gmp-crash-reporting.patch Approval Request Comment [Feature/regressing bug #]: EME [User impact if declined]: Crashes in EME plugins will be unreported. We'd then be flying blind as to whether EME plugins are crashing a lot or not. [Describe test coverage new/current, TreeHerder]: Local testing. An automated test for this is a Q2 deliverable for the automation team. [Risks and why]: Low. We should only run this code when things have gone wrong already. [String/UUID change made/needed]: None.
Attachment #8583550 - Flags: approval-mozilla-aurora?
Comment on attachment 8583550 [details] [diff] [review] 1135541-gmp-crash-reporting.patch >--- a/dom/media/eme/MediaKeys.cpp >+class CrashHandler : public gmp::GeckoMediaPluginService::PluginCrashCallback >+{ [..] >+ virtual bool IsStillValid() >+ { nit: this function was missing an "override" annotation, which breaks my local build (with clang 3.6 and warnings-as-errors) -- see bug 1117034 for more background on the warning. I'm going to push a trivial followup to fix this in a few minutes.
Landed 'override' followup (with the rubber-stamp ehsan granted to me over in bug 1126447 comment 2): https://hg.mozilla.org/integration/mozilla-inbound/rev/a74c470d1a6a (If this gets backported to aurora per comment 18, ideally we should backport this followup as well, to keep m-c and m-a closer in sync. This followup doesn't affect runtime behavior at all; it's just a hint to help with error-checking that the compiler does while building our code.)
Attachment #8583550 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(cpearce)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: