Closed
Bug 1443329
Opened 7 years ago
Closed 7 years ago
BHR triggers for ForcePaint don't account for all scenarios
Categories
(Core :: DOM: Content Processes, enhancement, P1)
Core
DOM: Content Processes
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: dthayer, Assigned: dthayer)
References
Details
(Whiteboard: [bhr][fxperf:p1])
Attachments
(2 files)
When sending RenderLayers to the content process, we:
- Race the message with a ForcePaint message to the hang monitor thread
- Then, in that thread, we call NotifyActivity for the BHR monitor
- Then, we trigger a JS interrupt, which will call RecvRenderLayers if JS was running
- When exiting RecvRenderLayers, we call NotifyWait on the BHR monitor, ending the "hang" and reporting a sample stack
This flow works for most cases, but it doesn't seem to account for the case where the ForcePaint message loses its race by a large enough margin that RecvRenderLayers has already exited. In this scenario, the chain of events would be:
- RecvRenderLayers -> BHR NotifyWait()
- ForcePaint -> BHR NotifyActivity()
- Interrupt JS (no JS is running) -> BHR registers a permahang, because we don't NotifyWait until the next RecvRenderLayers
Assignee | ||
Comment 1•7 years ago
|
||
Confirmed by adding a sleep in between sending RenderLayers and ForcePaint to the content process. Doing a sleep of any duration >100ms yields a 1024ms "hang" (1024 ms is the cut-off point for the ForcePaint BHR monitor.)
Comment hidden (mozreview-request) |
Comment 3•7 years ago
|
||
mozreview-review |
Comment on attachment 8956274 [details]
Bug 1443329 - Don't notify BHR if ForcePaint loses its race
https://reviewboard.mozilla.org/r/225166/#review231326
Thanks for finding this, dthayer. Hopefully this will help clean up our BHR data a bit!
::: dom/ipc/ProcessHangMonitor.cpp:427
(Diff revision 1)
> + if (mForcePaintEpoch >= aLayerObserverEpoch) {
> + return IPC_OK();
> + }
> + mForcePaintMonitor->NotifyActivity();
I guess this means (or rather, I guess this has always meant) that we don't ever gather BHR stacks whenever ProcessHangMonitor loses the race.
I _assume_ that's a rare enough case that we should still be able to get some useful data about hangs here. From your testing, is my assumption grounded in reality?
In either case, I wonder if it'd add too much complication for us to trigger a NotifyActivity from the RecvRenderLayers on the main thread as well. Perhaps fodder for a follow-up.
::: dom/ipc/ProcessHangMonitor.cpp:427
(Diff revision 1)
>
> - mForcePaintMonitor->NotifyActivity();
> -
> {
> MonitorAutoLock lock(mMonitor);
> + if (mForcePaintEpoch >= aLayerObserverEpoch) {
Can you please add an inline comment describing the race-losing case that we're dealing with here? Same with the ClearForcePaint function where we check the epoch.
Attachment #8956274 -
Flags: review?(mconley) → review+
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #3)
> I _assume_ that's a rare enough case that we should still be able to get
> some useful data about hangs here. From your testing, is my assumption
> grounded in reality?
Hmm, now that you mention it I think we should probably just get it out of the way in this bug. Should be pretty simple.
Updated•7 years ago
|
Component: Tabbed Browser → DOM: Content Processes
Product: Firefox → Core
Assignee | ||
Comment 5•7 years ago
|
||
(In reply to Doug Thayer [:dthayer] from comment #4)
> Hmm, now that you mention it I think we should probably just get it out of
> the way in this bug. Should be pretty simple.
In the process of testing the change for this I'm noticing that BHR is grossly misreporting the duration of ForcePaint hangs. Not sure why though. Looking into it.
Assignee | ||
Comment 6•7 years ago
|
||
So essentially the ForcePaint BHR data is just garbage, since it should hit the case from Bug 1443329 100% of the time. Going to fix that and then apply the changes from this on top of it.
Whiteboard: [bhr][fxperf:p1]
Updated•7 years ago
|
Assignee: nobody → dothayer
Assignee | ||
Updated•7 years ago
|
Status: NEW → ASSIGNED
Updated•7 years ago
|
Priority: -- → P1
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 9•7 years ago
|
||
mozreview-review |
Comment on attachment 8960292 [details]
Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR
https://reviewboard.mozilla.org/r/229062/#review234844
Clearing review request until I hear more about the Atomics thing. Maybe I'm just misunderstanding - but if so, I'd like to know where my understanding is falling over.
::: dom/ipc/ProcessHangMonitor.cpp:453
(Diff revision 1)
>
> +
> +void
> +HangMonitorChild::MaybeStartForcePaint()
> +{
> + if (mBHRMonitorActive.exchange(true)) {
Wait a second... maybe I'm confused by how Atomic.exchange works, but aren't we getting back what _was_ inside mBHRMonitorActive as the return value?
So after the most recent ClearForcePaint, doesn't that mean that the first call to MaybeStartForcePaint will evaluate as false, but subsequent calls to MaybeStartForcePaint (until the next ClearForcePaint is called) will call NotifyActivity?
Or am I misunderstanding?
::: dom/ipc/ProcessHangMonitor.cpp:453
(Diff revision 1)
>
> +
> +void
> +HangMonitorChild::MaybeStartForcePaint()
> +{
> + if (mBHRMonitorActive.exchange(true)) {
Probably a good idea to `MOZ_RELEASE_ASSERT(IsOnThread())` here too.
Attachment #8960292 -
Flags: review?(mconley)
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #9)
> Comment on attachment 8960292 [details]
> Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR
>
> https://reviewboard.mozilla.org/r/229062/#review234844
>
> Clearing review request until I hear more about the Atomics thing. Maybe I'm
> just misunderstanding - but if so, I'd like to know where my understanding
> is falling over.
>
> ::: dom/ipc/ProcessHangMonitor.cpp:453
> (Diff revision 1)
> >
> > +
> > +void
> > +HangMonitorChild::MaybeStartForcePaint()
> > +{
> > + if (mBHRMonitorActive.exchange(true)) {
>
> Wait a second... maybe I'm confused by how Atomic.exchange works, but aren't
> we getting back what _was_ inside mBHRMonitorActive as the return value?
>
> So after the most recent ClearForcePaint, doesn't that mean that the first
> call to MaybeStartForcePaint will evaluate as false, but subsequent calls to
> MaybeStartForcePaint (until the next ClearForcePaint is called) will call
> NotifyActivity?
>
> Or am I misunderstanding?
Woops, no you're totally right.
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #9)
> ::: dom/ipc/ProcessHangMonitor.cpp:453
> (Diff revision 1)
> >
> > +
> > +void
> > +HangMonitorChild::MaybeStartForcePaint()
> > +{
> > + if (mBHRMonitorActive.exchange(true)) {
>
> Probably a good idea to `MOZ_RELEASE_ASSERT(IsOnThread())` here too.
I'm not sure I understand. The goal of this function is to be safe to call from either the main thread or the hang monitor thread.
Comment 12•7 years ago
|
||
(In reply to Doug Thayer [:dthayer] from comment #11)
> I'm not sure I understand. The goal of this function is to be safe to call
> from either the main thread or the hang monitor thread.
Ah - right, sorry. In that case, maybe best to document that the method is (and should remain) threadsafe.
Comment hidden (mozreview-request) |
Comment 14•7 years ago
|
||
mozreview-review |
Comment on attachment 8960292 [details]
Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR
https://reviewboard.mozilla.org/r/229062/#review234868
Looks good! Thanks, dthayer.
Attachment #8960292 -
Flags: review?(mconley) → review+
Comment 15•7 years ago
|
||
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d47df4e1481b
Don't notify BHR if ForcePaint loses its race r=mconley
https://hg.mozilla.org/integration/autoland/rev/ff98ae41c558
Ensure we always call NotifyActivity on ForcePaint BHR r=mconley
Comment 16•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d47df4e1481b
https://hg.mozilla.org/mozilla-central/rev/ff98ae41c558
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in
before you can comment on or make changes to this bug.
Description
•