Closed
Bug 1291815
Opened 8 years ago
Closed 8 years ago
DAMP (Talos): Wait for requests to settle before closing toolbox
Categories
(DevTools :: General, defect, P1)
DevTools
General
Tracking
(firefox53 fixed)
RESOLVED
FIXED
Firefox 53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: jryans, Assigned: jryans)
References
Details
Attachments
(1 file)
While investigating a Talos regression (bug 1289410), I noticed that the toolbox close portion does not wait for RDP requests to complete first. This means there may be many tasks in progress processing the reload that just took place, and those could make the close numbers much higher than they would be in real usage of a particular tool. It also leads various errors being logged when the toolbox closes with requests still in progress.
If we agree it's good to address this for more accurate close measurements, we can do so by adding a utility function to wait until all requests have settled.
Making this change will invalidate previous DAMP results.
Assignee | ||
Updated•8 years ago
|
Priority: -- → P3
Comment 1•8 years ago
|
||
I think we will need something like this to land the new console frontend (Bug 1304178). All of my damp runs that flip the pref show regressions in other tools, which doesn't match up with what should be happening, since the frontend isn't opened on those other tools probes.
I have a push where I just set a timeout before proceeding with reloading the page and closing the tab, which causes a slight improvement on current damp times alone:
Results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=f45881b2dbc0&framework=1&filter=damp&showOnlyImportant=0 and the changeset: https://hg.mozilla.org/try/rev/0d7e82b19167a33b794914aca11cb6dbe6039687.
But more importantly, it stabilizes the regression caused by preffing on the new frontend (note there are still console-specific improvements to do to get this regression down even further). Without the change is significantly worse than with the change, moreso than the small improvement we see above by just applying the change alone. That makes me think we are definitely letting one test bleed into others.
Without: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=65e07684396b&framework=1&filter=damp&showOnlyImportant=0
With: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=f6a48435889e&framework=1&filter=damp&showOnlyImportant=0
Ryan, did you have an implementation in mind for waiting for RDP requests to settle? I'm OK with invalidating previous DAMP results since we are obviously having some noise and inaccuracy in the measurement.
Blocks: 1304178
Flags: needinfo?(jryans)
Assignee | ||
Comment 2•8 years ago
|
||
I have a WIP approach in a branch from when I filed this, but I don't remember how close it was to being done... I'll take another look tomorrow.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → jryans
Status: NEW → ASSIGNED
Flags: needinfo?(jryans)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
mozreview-review |
Comment on attachment 8796332 [details]
Bug 1291815 - Wait for requests to settle between DAMP subtests.
https://reviewboard.mozilla.org/r/82220/#review80840
Just a quick inline note - haven't reviewed the actual settling code yet.
::: testing/talos/talos/tests/devtools/addon/content/damp.js:281
(Diff revision 1)
> let subtests = {
> webconsoleOpen: Task.async(function*() {
> yield this.testSetup(url);
> yield openToolboxAndLog(label + ".webconsole", "webconsole");
> yield reloadPageAndLog(label + ".webconsole");
> + yield this.waitForRequestsToSettle();
I'd prefer this logic is folded into the this.closeToolbox helper (before startRecordTimestamp is set) since I believe that will cover all individual usages here.
We could also get rid of the waitForRequestsToSettle function there since we already have a handle on `target`
Comment 6•8 years ago
|
||
Here's another comparison link between current fx-team tip and one with this patch applied:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd07a8debfae&newProject=try&newRevision=c1d8459f3d77&framework=1&filter=damp&showOnlyImportant=0
Then also with this patch applied + the new console code compared with just this patch applied:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c1d8459f3d77&newProject=try&newRevision=4913c3c317d7&framework=1&filter=damp&showOnlyImportant=0
Comment 7•8 years ago
|
||
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #4)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8eb10444620
> https://treeherder.mozilla.org/perf.html#/
> comparechooser?newProject=try&newRevision=d8eb10444620
I went ahead and retriggered from the base revision on this push - here's the compare page for that one: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2a3ee1d58982&newProject=try&newRevision=d8eb10444620&framework=1&showOnlyImportant=0
Comment 8•8 years ago
|
||
I'm seeing errors/timeouts on non-e10s runs with this applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8eb10444620&selectedJob=28277414. I don't see any halfway done jobs in the log (no match on 'simple.webconsole.open.DAMP' for instance): https://archive.mozilla.org/pub/firefox/try-builds/jryans@gmail.com-d8eb10444620a242dc6f583ac1fa4a3fa2a72c85/try-linux64/try_ubuntu64_hw_test-g2-bm104-tests1-linux-build2028.txt.gz.
Joel, does this log tell you anything? Maybe this is just a case where it didn't complete in time. And if so, can we bump it up just for a try run at least to see the results?
Flags: needinfo?(jmaher)
Updated•8 years ago
|
Comment 9•8 years ago
|
||
I honestly think this is related to bug 1306780, most likely we have an addon signing issue
Flags: needinfo?(jmaher)
Assignee | ||
Comment 10•8 years ago
|
||
Should we attempt to proceed with landing here (after resolving review issues)?
I guess it appears we can't know the e10s disabled impact ahead of time due to the try failure, which seems to be a new failure mode from last time I tried to change DAMP (for example in bug 1154874 I was able to successfully get e10s off and on results from DAMP in a try run from Aug. 23).
Flags: needinfo?(bgrinstead)
Comment 11•8 years ago
|
||
Hm, yes I guess so if the failure is just infrastructure related. Going to forward the review to Alex.
Flags: needinfo?(bgrinstead)
Updated•8 years ago
|
Attachment #8796332 -
Flags: review?(bgrinstead) → review?(poirot.alex)
Comment 12•8 years ago
|
||
mozreview-review |
Comment on attachment 8796332 [details]
Bug 1291815 - Wait for requests to settle between DAMP subtests.
https://reviewboard.mozilla.org/r/82220/#review82182
r+ for all but damp modifications. I don't know enough about it to have much opinion and don't really follow the issues you are facing on try.
::: devtools/shared/client/main.js:1185
(Diff revision 1)
> + this._pendingRequests.forEach(requestsForActor => {
> + requests = requests.concat(requestsForActor);
> + });
> + this._activeRequests.forEach(requestForActor => {
> + requests = requests.concat(requestForActor);
> + });
let requests = {...this._pendingRequests.keys(), ...this.activeRequest.keys()];
\o/ JS from 2016!
::: devtools/shared/client/main.js:1197
(Diff revision 1)
> + // With protocol.js, each front can potentially have it's own pools containing child
> + // fronts, forming a tree. Descend through all the pools to locate all child fronts.
> + while (poolsToVisit.length) {
> + let pool = poolsToVisit.shift();
> + fronts.add(pool);
> + for (let child of pool.poolChildren()) {
I imagine if we have a tree, that should happen, but I don't know protocol.js very well to trust it.
Shouldn't we check if pool is already the set and continue if that's the case?
::: devtools/shared/client/main.js:1204
(Diff revision 1)
> + }
> + }
> +
> + // For all fronts, extract the promise from any ongoing requests.
> + for (let front of fronts) {
> + let frontRequests = front._requests.map(({ deferred }) => deferred.promise);
If feel wrong to access to _requests from here. Would be better to expose an helper. May be not _requests itself, but a waitForRequestsToSettle on it?
Attachment #8796332 -
Flags: review?(poirot.alex) → review+
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #5)
> Comment on attachment 8796332 [details]
> Bug 1291815 - Wait for requests to settle between DAMP subtests.
>
> https://reviewboard.mozilla.org/r/82220/#review80840
>
> Just a quick inline note - haven't reviewed the actual settling code yet.
>
> ::: testing/talos/talos/tests/devtools/addon/content/damp.js:281
> (Diff revision 1)
> > let subtests = {
> > webconsoleOpen: Task.async(function*() {
> > yield this.testSetup(url);
> > yield openToolboxAndLog(label + ".webconsole", "webconsole");
> > yield reloadPageAndLog(label + ".webconsole");
> > + yield this.waitForRequestsToSettle();
>
> I'd prefer this logic is folded into the this.closeToolbox helper (before
> startRecordTimestamp is set) since I believe that will cover all individual
> usages here.
>
> We could also get rid of the waitForRequestsToSettle function there since we
> already have a handle on `target`
Okay, I've folded it into to closeToolbox as suggested.
Assignee | ||
Comment 14•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8796332 [details]
Bug 1291815 - Wait for requests to settle between DAMP subtests.
https://reviewboard.mozilla.org/r/82220/#review82182
> let requests = {...this._pendingRequests.keys(), ...this.activeRequest.keys()];
> \o/ JS from 2016!
This doesn't seem quite right, at least for the the `_pendingRequests` case. Each value is itself an entire array. I am currently depending on concat's behavior of extracting the elements when passed an array.
I'll add a note about the types for each of these.
> I imagine if we have a tree, that should happen, but I don't know protocol.js very well to trust it.
> Shouldn't we check if pool is already the set and continue if that's the case?
Are you concerned it's actually a graph with cycles and not a tree, so we could return to a pool we've already visited? I don't believe that's possible right now... if it were, I guess I would expect the algorithm to loop forever, but that doesn't happen in practice.
`poolChildren` does filter out the one case I am aware of, where something can be "self-owned" so thing A appears in a pool owned by A.
> If feel wrong to access to _requests from here. Would be better to expose an helper. May be not _requests itself, but a waitForRequestsToSettle on it?
Okay, I added methods to the front to avoid exposing internals.
Assignee | ||
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
Okay, this definitely does hang when e10s is disabled, need to investigate further.
Assignee | ||
Comment 17•8 years ago
|
||
Analysis is currently blocked by bug 1309294 in Talos log processing.
Depends on: 1309294
Comment 18•8 years ago
|
||
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #17)
> Analysis is currently blocked by bug 1309294 in Talos log processing.
Fixed now.
Updated•8 years ago
|
Blocks: enable-new-console
Updated•8 years ago
|
Comment 19•8 years ago
|
||
Comment 20•8 years ago
|
||
Still seeing timeouts on non-e10s:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c04a4f991dbb3441a6f82b30c1f3e83349f43d84
https://archive.mozilla.org/pub/firefox/try-builds/bgrinstead@mozilla.com-c04a4f991dbb3441a6f82b30c1f3e83349f43d84/try-linux64/try_ubuntu64_hw_test-g2-bm103-tests1-linux-build2657.txt.gz
Comment 21•8 years ago
|
||
Running locally with ./mach talos-test --disable-e10s -a damp
I'm seeing it not proceed past the inspector open on the simple page with an error in the browser console:
Protocol error (noSuchActor): No such actor for ID: server1.conn1.child1/domnode33 inspector.js:188:7
Inspector.prototype._handleRejectionIfNotDestroyed chrome://devtools/content/inspector/inspector.js:188:7
Handler.prototype.process resource://gre/modules/Promise-backend.js:935:21
this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7
this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7
Front<.onPacket/< resource://devtools/shared/protocol.js:1295:9
(Async: DevTools RDP)
Front<.request resource://devtools/shared/protocol.js:1239:14
generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14
Inspector.prototype.onNewSelection chrome://devtools/content/inspector/inspector.js:782:7
emit resource://devtools/shared/event-emitter.js:191:13
Selection.prototype.setNodeFront resource://devtools/client/framework/selection.js:127:5
Inspector.prototype._deferredOpen/< chrome://devtools/content/inspector/inspector.js:264:9
once/handler resource://devtools/shared/event-emitter.js:135:11
emit resource://devtools/shared/event-emitter.js:191:13
Inspector.prototype._onMarkupFrameLoad chrome://devtools/content/inspector/inspector.js:1338:5
Inspector.prototype._initMarkup chrome://devtools/content/inspector/inspector.js:1320:5
Inspector.prototype._deferredOpen chrome://devtools/content/inspector/inspector.js:256:5
Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:137:18
TaskImpl.prototype._run resource://devtools/shared/task.js:311:39
Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23
this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7
this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7
Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9
(Async: DevTools RDP)
Front<.request resource://devtools/shared/protocol.js:1239:14
generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14
WalkerFront<.querySelector< resource://devtools/shared/fronts/inspector.js:621:12
Inspector.prototype._getDefaultNodeForSelection/< chrome://devtools/content/inspector/inspector.js:330:14
Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23
this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7
this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7
Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9
(Async: DevTools RDP)
Front<.request resource://devtools/shared/protocol.js:1239:14
generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14
InspectorFront<.getPageStyle< resource://devtools/shared/fronts/inspector.js:976:12
Inspector.prototype._getPageStyle chrome://devtools/content/inspector/inspector.js:290:12
Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:130:11
TaskImpl.prototype._run resource://devtools/shared/task.js:311:39
Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23
this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7
this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7
Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9
(Async: DevTools RDP)
Front<.request resource://devtools/shared/protocol.js:1239:14
generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14
initCssProperties< resource://devtools/shared/fronts/css-properties.js:208:28
TaskImpl.prototype._run resource://devtools/shared/task.js:311:39
TaskImpl resource://devtools/shared/task.js:273:3
createAsyncFunction/asyncFunction resource://devtools/shared/task.js:247:14
Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:127:33
TaskImpl.prototype._run resource://devtools/shared/task.js:311:39
TaskImpl resource://devtools/shared/task.js:273:3
createAsyncFunction/asyncFunction resource://devtools/shared/task.js:247:14
open resource://devtools/client/inspector/panel.js:12:12
Toolbox.prototype.loadTool/onLoad resource://devtools/client/framework/toolbox.js:1338:19
Assignee | ||
Comment 22•8 years ago
|
||
Looks like I can reproduce the issue with e10s off in my local branch after rebasing.
I'll try to investigate it now.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 25•8 years ago
|
||
The interdiff is hard to read since it includes rebased code + other review comments. What part was needed to fix the non-e10s hang?
Assignee | ||
Comment 26•8 years ago
|
||
Okay, I think e10s off should be working now. Root cause was I needed to handle requests that are rejected during toolbox close as well in my settling path.
Just started a new run, let's see what the results look like as they come in...
Baseline:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ab3ed7d7b601666f76f8814aba8013f84aba538
New:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=13ffd2f593939c2ca6bc315b30a6ac35017ba781
Talos comparison:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2ab3ed7d7b601666f76f8814aba8013f84aba538&newProject=try&newRevision=13ffd2f593939c2ca6bc315b30a6ac35017ba781&framework=1&filter=damp&showOnlyImportant=0
Assignee | ||
Comment 27•8 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #25)
> The interdiff is hard to read since it includes rebased code + other review
> comments. What part was needed to fix the non-e10s hang?
Yeah, rebasing is tricky for MozReview! Here are the core changes as separate patches before I squashed them:
https://github.com/jryans/gecko-dev/commit/bf094537135a1659d9c9f60d59f348d19696bd04
https://github.com/jryans/gecko-dev/commit/c904e1f3949f4db8f9ab76231d703886f470b001
Comment 28•8 years ago
|
||
Looks like on average the suite speeds up around 5%. Let's accept that as the 'new normal' since it should be more accurate and will hopefully limit noise between subtests
Assignee | ||
Comment 29•8 years ago
|
||
Yep, the results seem as expected overall: the close numbers reduce in many cases, especially Net Monitor, since we're waiting for tool traffic to settle before timing close. A few numbers grew slightly, but that should more accurately reflect the work they are doing.
Assignee | ||
Updated•8 years ago
|
Priority: P3 → P1
Comment 30•8 years ago
|
||
Pushed by jryans@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/25cf55049850
Wait for requests to settle between DAMP subtests. r=ochameau
Comment 31•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
Updated•6 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•