Closed Bug 1291815 Opened 8 years ago Closed 8 years ago

DAMP (Talos): Wait for requests to settle before closing toolbox

Categories

(DevTools :: General, defect, P1)

Product:

Component:

Type:

defect

Priority:

P1

Severity:

normal

Tracking

(firefox53 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 53

Tracking Flags:

Tracking

Status

firefox53

---

fixed

People

(Reporter: jryans, Assigned: jryans)

References

Details

Attachments

(1 file)

Bug 1291815 - Wait for requests to settle between DAMP subtests. 8 years ago J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow) (deleted), text/x-review-board-request	ochameau : review+	Details

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Description

•

8 years ago

While investigating a Talos regression (bug 1289410), I noticed that the toolbox close portion does not wait for RDP requests to complete first. This means there may be many tasks in progress processing the reload that just took place, and those could make the close numbers much higher than they would be in real usage of a particular tool. It also leads various errors being logged when the toolbox closes with requests still in progress. If we agree it's good to address this for more accurate close measurements, we can do so by adding a utility function to wait until all requests have settled. Making this change will invalidate previous DAMP results.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Updated

•

8 years ago

Priority: -- → P3

Brian Grinstead [:bgrins]

Comment 1

•

8 years ago

I think we will need something like this to land the new console frontend (Bug 1304178). All of my damp runs that flip the pref show regressions in other tools, which doesn't match up with what should be happening, since the frontend isn't opened on those other tools probes. I have a push where I just set a timeout before proceeding with reloading the page and closing the tab, which causes a slight improvement on current damp times alone: Results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=f45881b2dbc0&framework=1&filter=damp&showOnlyImportant=0 and the changeset: https://hg.mozilla.org/try/rev/0d7e82b19167a33b794914aca11cb6dbe6039687. But more importantly, it stabilizes the regression caused by preffing on the new frontend (note there are still console-specific improvements to do to get this regression down even further). Without the change is significantly worse than with the change, moreso than the small improvement we see above by just applying the change alone. That makes me think we are definitely letting one test bleed into others. Without: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=65e07684396b&framework=1&filter=damp&showOnlyImportant=0 With: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3889408efb2d&newProject=try&newRevision=f6a48435889e&framework=1&filter=damp&showOnlyImportant=0 Ryan, did you have an implementation in mind for waiting for RDP requests to settle? I'm OK with invalidating previous DAMP results since we are obviously having some noise and inaccuracy in the measurement.

Blocks: 1304178

Flags: needinfo?(jryans)

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 2

•

8 years ago

I have a WIP approach in a branch from when I filed this, but I don't remember how close it was to being done... I'll take another look tomorrow.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Updated

•

8 years ago

Assignee: nobody → jryans

Status: NEW → ASSIGNED

Flags: needinfo?(jryans)

Comment hidden (mozreview-request)

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 4

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8eb10444620 https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=d8eb10444620

Brian Grinstead [:bgrins]

Comment 5

•

8 years ago

mozreview-review

Comment on attachment 8796332 [details] Bug 1291815 - Wait for requests to settle between DAMP subtests. https://reviewboard.mozilla.org/r/82220/#review80840 Just a quick inline note - haven't reviewed the actual settling code yet. ::: testing/talos/talos/tests/devtools/addon/content/damp.js:281 (Diff revision 1) > let subtests = { > webconsoleOpen: Task.async(function*() { > yield this.testSetup(url); > yield openToolboxAndLog(label + ".webconsole", "webconsole"); > yield reloadPageAndLog(label + ".webconsole"); > + yield this.waitForRequestsToSettle(); I'd prefer this logic is folded into the this.closeToolbox helper (before startRecordTimestamp is set) since I believe that will cover all individual usages here. We could also get rid of the waitForRequestsToSettle function there since we already have a handle on `target`

Brian Grinstead [:bgrins]

Comment 6

•

8 years ago

Here's another comparison link between current fx-team tip and one with this patch applied: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=bd07a8debfae&newProject=try&newRevision=c1d8459f3d77&framework=1&filter=damp&showOnlyImportant=0 Then also with this patch applied + the new console code compared with just this patch applied: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c1d8459f3d77&newProject=try&newRevision=4913c3c317d7&framework=1&filter=damp&showOnlyImportant=0

Brian Grinstead [:bgrins]

Comment 7

•

8 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #4) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8eb10444620 > https://treeherder.mozilla.org/perf.html#/ > comparechooser?newProject=try&newRevision=d8eb10444620 I went ahead and retriggered from the base revision on this push - here's the compare page for that one: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2a3ee1d58982&newProject=try&newRevision=d8eb10444620&framework=1&showOnlyImportant=0

Brian Grinstead [:bgrins]

Comment 8

•

8 years ago

I'm seeing errors/timeouts on non-e10s runs with this applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8eb10444620&selectedJob=28277414. I don't see any halfway done jobs in the log (no match on 'simple.webconsole.open.DAMP' for instance): https://archive.mozilla.org/pub/firefox/try-builds/jryans@gmail.com-d8eb10444620a242dc6f583ac1fa4a3fa2a72c85/try-linux64/try_ubuntu64_hw_test-g2-bm104-tests1-linux-build2028.txt.gz. Joel, does this log tell you anything? Maybe this is just a case where it didn't complete in time. And if so, can we bump it up just for a try run at least to see the results?

Flags: needinfo?(jmaher)

Brian Grinstead [:bgrins]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1306780

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

8 years ago

I honestly think this is related to bug 1306780, most likely we have an addon signing issue

Flags: needinfo?(jmaher)

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 10

•

8 years ago

Should we attempt to proceed with landing here (after resolving review issues)? I guess it appears we can't know the e10s disabled impact ahead of time due to the try failure, which seems to be a new failure mode from last time I tried to change DAMP (for example in bug 1154874 I was able to successfully get e10s off and on results from DAMP in a try run from Aug. 23).

Flags: needinfo?(bgrinstead)

Brian Grinstead [:bgrins]

Comment 11

•

8 years ago

Hm, yes I guess so if the failure is just infrastructure related. Going to forward the review to Alex.

Flags: needinfo?(bgrinstead)

Brian Grinstead [:bgrins]

Updated

•

8 years ago

Attachment #8796332 - Flags: review?(bgrinstead) → review?(poirot.alex)

Alexandre Poirot [:ochameau]

Comment 12

•

8 years ago

mozreview-review

Comment on attachment 8796332 [details] Bug 1291815 - Wait for requests to settle between DAMP subtests. https://reviewboard.mozilla.org/r/82220/#review82182 r+ for all but damp modifications. I don't know enough about it to have much opinion and don't really follow the issues you are facing on try. ::: devtools/shared/client/main.js:1185 (Diff revision 1) > + this._pendingRequests.forEach(requestsForActor => { > + requests = requests.concat(requestsForActor); > + }); > + this._activeRequests.forEach(requestForActor => { > + requests = requests.concat(requestForActor); > + }); let requests = {...this._pendingRequests.keys(), ...this.activeRequest.keys()]; \o/ JS from 2016! ::: devtools/shared/client/main.js:1197 (Diff revision 1) > + // With protocol.js, each front can potentially have it's own pools containing child > + // fronts, forming a tree. Descend through all the pools to locate all child fronts. > + while (poolsToVisit.length) { > + let pool = poolsToVisit.shift(); > + fronts.add(pool); > + for (let child of pool.poolChildren()) { I imagine if we have a tree, that should happen, but I don't know protocol.js very well to trust it. Shouldn't we check if pool is already the set and continue if that's the case? ::: devtools/shared/client/main.js:1204 (Diff revision 1) > + } > + } > + > + // For all fronts, extract the promise from any ongoing requests. > + for (let front of fronts) { > + let frontRequests = front._requests.map(({ deferred }) => deferred.promise); If feel wrong to access to _requests from here. Would be better to expose an helper. May be not _requests itself, but a waitForRequestsToSettle on it?

Attachment #8796332 - Flags: review?(poirot.alex) → review+

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 13

•

8 years ago

(In reply to Brian Grinstead [:bgrins] from comment #5) > Comment on attachment 8796332 [details] > Bug 1291815 - Wait for requests to settle between DAMP subtests. > > https://reviewboard.mozilla.org/r/82220/#review80840 > > Just a quick inline note - haven't reviewed the actual settling code yet. > > ::: testing/talos/talos/tests/devtools/addon/content/damp.js:281 > (Diff revision 1) > > let subtests = { > > webconsoleOpen: Task.async(function*() { > > yield this.testSetup(url); > > yield openToolboxAndLog(label + ".webconsole", "webconsole"); > > yield reloadPageAndLog(label + ".webconsole"); > > + yield this.waitForRequestsToSettle(); > > I'd prefer this logic is folded into the this.closeToolbox helper (before > startRecordTimestamp is set) since I believe that will cover all individual > usages here. > > We could also get rid of the waitForRequestsToSettle function there since we > already have a handle on `target` Okay, I've folded it into to closeToolbox as suggested.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 14

•

8 years ago

mozreview-review-reply

Comment on attachment 8796332 [details] Bug 1291815 - Wait for requests to settle between DAMP subtests. https://reviewboard.mozilla.org/r/82220/#review82182 > let requests = {...this._pendingRequests.keys(), ...this.activeRequest.keys()]; > \o/ JS from 2016! This doesn't seem quite right, at least for the the `_pendingRequests` case. Each value is itself an entire array. I am currently depending on concat's behavior of extracting the elements when passed an array. I'll add a note about the types for each of these. > I imagine if we have a tree, that should happen, but I don't know protocol.js very well to trust it. > Shouldn't we check if pool is already the set and continue if that's the case? Are you concerned it's actually a graph with cycles and not a tree, so we could return to a pool we've already visited? I don't believe that's possible right now... if it were, I guess I would expect the algorithm to loop forever, but that doesn't happen in practice. `poolChildren` does filter out the one case I am aware of, where something can be "self-owned" so thing A appears in a pool owned by A. > If feel wrong to access to _requests from here. Would be better to expose an helper. May be not _requests itself, but a waitForRequestsToSettle on it? Okay, I added methods to the front to avoid exposing internals.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 15

•

8 years ago

Here's a new Talos run after resolving review comments: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=741f076751d159e2d29dd75772b415a371de3644&newProject=try&newRevision=1d19a311d2fe6d293def36c4350bb43238607c29&framework=1&filter=damp&showOnlyImportant=0

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 16

•

8 years ago

Okay, this definitely does hang when e10s is disabled, need to investigate further.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 17

•

8 years ago

Analysis is currently blocked by bug 1309294 in Talos log processing.

Depends on: 1309294

William Lachance (:wlach)

Comment 18

•

8 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #17) > Analysis is currently blocked by bug 1309294 in Talos log processing. Fixed now.

Brian Grinstead [:bgrins]

Updated

•

8 years ago

Blocks: enable-new-console

Brian Grinstead [:bgrins]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1317312

Brian Grinstead [:bgrins]

Comment 19

•

8 years ago

New talos push: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=c04a4f991dbb3441a6f82b30c1f3e83349f43d84

Brian Grinstead [:bgrins]

Comment 20

•

8 years ago

Still seeing timeouts on non-e10s: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c04a4f991dbb3441a6f82b30c1f3e83349f43d84 https://archive.mozilla.org/pub/firefox/try-builds/bgrinstead@mozilla.com-c04a4f991dbb3441a6f82b30c1f3e83349f43d84/try-linux64/try_ubuntu64_hw_test-g2-bm103-tests1-linux-build2657.txt.gz

Brian Grinstead [:bgrins]

Comment 21

•

8 years ago

Running locally with ./mach talos-test --disable-e10s -a damp I'm seeing it not proceed past the inspector open on the simple page with an error in the browser console: Protocol error (noSuchActor): No such actor for ID: server1.conn1.child1/domnode33 inspector.js:188:7 Inspector.prototype._handleRejectionIfNotDestroyed chrome://devtools/content/inspector/inspector.js:188:7 Handler.prototype.process resource://gre/modules/Promise-backend.js:935:21 this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7 this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11 this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7 this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7 Front<.onPacket/< resource://devtools/shared/protocol.js:1295:9 (Async: DevTools RDP) Front<.request resource://devtools/shared/protocol.js:1239:14 generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14 Inspector.prototype.onNewSelection chrome://devtools/content/inspector/inspector.js:782:7 emit resource://devtools/shared/event-emitter.js:191:13 Selection.prototype.setNodeFront resource://devtools/client/framework/selection.js:127:5 Inspector.prototype._deferredOpen/< chrome://devtools/content/inspector/inspector.js:264:9 once/handler resource://devtools/shared/event-emitter.js:135:11 emit resource://devtools/shared/event-emitter.js:191:13 Inspector.prototype._onMarkupFrameLoad chrome://devtools/content/inspector/inspector.js:1338:5 Inspector.prototype._initMarkup chrome://devtools/content/inspector/inspector.js:1320:5 Inspector.prototype._deferredOpen chrome://devtools/content/inspector/inspector.js:256:5 Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:137:18 TaskImpl.prototype._run resource://devtools/shared/task.js:311:39 Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23 this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7 this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11 this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7 this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7 Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9 (Async: DevTools RDP) Front<.request resource://devtools/shared/protocol.js:1239:14 generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14 WalkerFront<.querySelector< resource://devtools/shared/fronts/inspector.js:621:12 Inspector.prototype._getDefaultNodeForSelection/< chrome://devtools/content/inspector/inspector.js:330:14 Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23 this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7 this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11 this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7 this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7 Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9 (Async: DevTools RDP) Front<.request resource://devtools/shared/protocol.js:1239:14 generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14 InspectorFront<.getPageStyle< resource://devtools/shared/fronts/inspector.js:976:12 Inspector.prototype._getPageStyle chrome://devtools/content/inspector/inspector.js:290:12 Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:130:11 TaskImpl.prototype._run resource://devtools/shared/task.js:311:39 Handler.prototype.process resource://gre/modules/Promise-backend.js:932:23 this.PromiseWalker.walkerLoop resource://gre/modules/Promise-backend.js:813:7 this.PromiseWalker.scheduleWalkerLoop resource://gre/modules/Promise-backend.js:744:11 this.PromiseWalker.schedulePromise resource://gre/modules/Promise-backend.js:776:7 this.PromiseWalker.completePromise resource://gre/modules/Promise-backend.js:711:7 Front<.onPacket/< resource://devtools/shared/protocol.js:1297:9 (Async: DevTools RDP) Front<.request resource://devtools/shared/protocol.js:1239:14 generateRequestMethods/</frontProto[name] resource://devtools/shared/protocol.js:1377:14 initCssProperties< resource://devtools/shared/fronts/css-properties.js:208:28 TaskImpl.prototype._run resource://devtools/shared/task.js:311:39 TaskImpl resource://devtools/shared/task.js:273:3 createAsyncFunction/asyncFunction resource://devtools/shared/task.js:247:14 Inspector.prototype.init< chrome://devtools/content/inspector/inspector.js:127:33 TaskImpl.prototype._run resource://devtools/shared/task.js:311:39 TaskImpl resource://devtools/shared/task.js:273:3 createAsyncFunction/asyncFunction resource://devtools/shared/task.js:247:14 open resource://devtools/client/inspector/panel.js:12:12 Toolbox.prototype.loadTool/onLoad resource://devtools/client/framework/toolbox.js:1338:19

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 22

•

8 years ago

Looks like I can reproduce the issue with e10s off in my local branch after rebasing. I'll try to investigate it now.

Comment hidden (mozreview-request)

Comment hidden (mozreview-request)

Brian Grinstead [:bgrins]

Comment 25

•

8 years ago

The interdiff is hard to read since it includes rebased code + other review comments. What part was needed to fix the non-e10s hang?

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 26

•

8 years ago

Okay, I think e10s off should be working now. Root cause was I needed to handle requests that are rejected during toolbox close as well in my settling path. Just started a new run, let's see what the results look like as they come in... Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ab3ed7d7b601666f76f8814aba8013f84aba538 New: https://treeherder.mozilla.org/#/jobs?repo=try&revision=13ffd2f593939c2ca6bc315b30a6ac35017ba781 Talos comparison: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2ab3ed7d7b601666f76f8814aba8013f84aba538&newProject=try&newRevision=13ffd2f593939c2ca6bc315b30a6ac35017ba781&framework=1&filter=damp&showOnlyImportant=0

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 27

•

8 years ago

(In reply to Brian Grinstead [:bgrins] from comment #25) > The interdiff is hard to read since it includes rebased code + other review > comments. What part was needed to fix the non-e10s hang? Yeah, rebasing is tricky for MozReview! Here are the core changes as separate patches before I squashed them: https://github.com/jryans/gecko-dev/commit/bf094537135a1659d9c9f60d59f348d19696bd04 https://github.com/jryans/gecko-dev/commit/c904e1f3949f4db8f9ab76231d703886f470b001

Brian Grinstead [:bgrins]

Comment 28

•

8 years ago

Looks like on average the suite speeds up around 5%. Let's accept that as the 'new normal' since it should be more accurate and will hopefully limit noise between subtests

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Comment 29

•

8 years ago

Yep, the results seem as expected overall: the close numbers reduce in many cases, especially Net Monitor, since we're waiting for tool traffic to settle before timing close. A few numbers grew slightly, but that should more accurately reflect the work they are doing.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Assignee

Updated

•

8 years ago

Priority: P3 → P1

Comment 30

•

8 years ago

Pushed by jryans@gmail.com: https://hg.mozilla.org/integration/autoland/rev/25cf55049850 Wait for requests to settle between DAMP subtests. r=ochameau

Brian Grinstead [:bgrins]

Updated

•

8 years ago

Blocks: 1308216

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 31

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/25cf55049850

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox53: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 53

Updated

•

6 years ago

Product: Firefox → DevTools

You need to log in before you can comment on or make changes to this bug.