Closed
Bug 1338522
Opened 8 years ago
Closed 8 years ago
Add telemetry for UI responsiveness during migration of profile data from another browser
Categories
(Firefox :: Migration, defect)
Firefox
Migration
Tracking
()
RESOLVED
FIXED
Firefox 54
People
(Reporter: dao, Assigned: dao)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 5 obsolete files)
(deleted),
text/x-review-board-request
|
Gijs
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
(deleted),
patch
|
jcristau
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•8 years ago
|
||
Some notes:
- I've tested ResponsivenessMonitor.jsm, but the code in AutoMigrate.jsm is entirely untested at the moment, so only asking for feedback.
- This doesn't break down by bookmark/history/password import since AutoMigrate.jsm doesn't seem to know enough to make that distinction. Could probably move this to MigrationUtils.jsm instead, although perhaps we can just correlate the data with that from the FX_MIGRATION_*_IMPORT_MS probes?
- performance.now doesn't seem to work in JSMs, so I just used Date.now. Shouldn't matter for our purpose.
Attachment #8836009 -
Flags: feedback?(gijskruitbosch+bugs)
Comment 2•8 years ago
|
||
Comment on attachment 8836009 [details] [diff] [review]
patch
Review of attachment 8836009 [details] [diff] [review]:
-----------------------------------------------------------------
I don't know that this makes sense. It's basically a busy loop every 16ms. It'll make performance worse. I don't know how the add-on responsiveness stuff works, but maybe that can help? Yoric would know more.
::: toolkit/modules/ResponsivenessMonitor.jsm
@@ +31,5 @@
> + return;
> + }
> +
> + let now = Date.now();
> + this._accumulatedDelay += (now - this._prevTimestamp) - this.intervalMS;
we should ensure we don't decrease the delay if the timer fires early by our measurement.
Attachment #8836009 -
Flags: feedback?(gijskruitbosch+bugs)
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to :Gijs from comment #2)
> I don't know that this makes sense. It's basically a busy loop every 16ms.
> It'll make performance worse.
It's not a loop and not really busy since the callback doesn't do anything remotely expensive. I don't see how this would make performance worse in a meaningful way, but I can change the interval to 100ms to ease that concern.
> we should ensure we don't decrease the delay if the timer fires early by our
> measurement.
We're interested in multi-second hangs here, so a timer firing a few milliseconds early wouldn't matter in the grand scheme of things, but I can handle that case for sanity.
Assignee | ||
Comment 4•8 years ago
|
||
Attachment #8836009 -
Attachment is obsolete: true
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #1)
> - This doesn't break down by bookmark/history/password import since
> AutoMigrate.jsm doesn't seem to know enough to make that distinction. Could
> probably move this to MigrationUtils.jsm instead, although perhaps we can
> just correlate the data with that from the FX_MIGRATION_*_IMPORT_MS probes?
Do you have thoughts on this?
Flags: needinfo?(gijskruitbosch+bugs)
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #3)
> (In reply to :Gijs from comment #2)
> > I don't know that this makes sense. It's basically a busy loop every 16ms.
> > It'll make performance worse.
>
> [...] I can change the interval to 100ms to ease that concern.
In fact a one second interval should still get us the data we need, given the severity of the problem that we're trying to identify.
Attachment #8836343 -
Attachment is obsolete: true
Comment 7•8 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #6)
> Created attachment 8836371 [details] [diff] [review]
> patch v3
>
> (In reply to Dão Gottwald [:dao] from comment #3)
> > (In reply to :Gijs from comment #2)
> > > I don't know that this makes sense. It's basically a busy loop every 16ms.
> > > It'll make performance worse.
> >
> > [...] I can change the interval to 100ms to ease that concern.
>
> In fact a one second interval should still get us the data we need, given
> the severity of the problem that we're trying to identify.
I'm not sure that's true... we don't know anything about this hanging since we can't reproduce it, but the videos don't show the typical "not responding" stuff from Windows (with a native title-bar being drawn despite our asking Windows not to - or has Win10 stopped doing that?) so I'm not sure exactly what's happening. Really, to be sure whether whatever reporting we add is effective we'd need to be able to reproduce, or just land it and then know based on whether we get results back that show non-zero-hanginess.
(In reply to Dão Gottwald [:dao] from comment #5)
> (In reply to Dão Gottwald [:dao] from comment #1)
> > - This doesn't break down by bookmark/history/password import since
> > AutoMigrate.jsm doesn't seem to know enough to make that distinction. Could
> > probably move this to MigrationUtils.jsm instead, although perhaps we can
> > just correlate the data with that from the FX_MIGRATION_*_IMPORT_MS probes?
>
> Do you have thoughts on this?
Correlating telemetry after collection is almost impossible, IME. I would add 3 probes similar to the _IMPORT_MS ones and collect hangs for each item. This has the added bonus that we'll gather data for the non-automigration case, too.
Flags: needinfo?(gijskruitbosch+bugs)
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to :Gijs from comment #7)
> > In fact a one second interval should still get us the data we need, given
> > the severity of the problem that we're trying to identify.
>
> I'm not sure that's true... we don't know anything about this hanging since
> we can't reproduce it, but the videos don't show the typical "not
> responding" stuff from Windows (with a native title-bar being drawn despite
> our asking Windows not to - or has Win10 stopped doing that?) so I'm not
> sure exactly what's happening.
The videos seem to show Firefox being super slow but processing events every now and then rather than being completely hung, so this would explain why Windows doesn't consider Firefox hung, but a timer with a one second interval should still detect that case.
> Really, to be sure whether whatever reporting
> we add is effective we'd need to be able to reproduce, or just land it and
> then know based on whether we get results back that show non-zero-hanginess.
Right.
> Correlating telemetry after collection is almost impossible, IME. I would
> add 3 probes similar to the _IMPORT_MS ones and collect hangs for each item.
> This has the added bonus that we'll gather data for the non-automigration
> case, too.
Okay.
Assignee | ||
Comment 9•8 years ago
|
||
Attachment #8836371 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8836384 -
Attachment description: responsivenessMonitor.diff → patch v4
Assignee | ||
Comment 10•8 years ago
|
||
Somehow getKeyedHistogramById refuses to find my histograms. Maybe you know or see what I did wrong... Probably something trivial or terribly stupid.
Attachment #8836384 -
Attachment is obsolete: true
Attachment #8836449 -
Flags: feedback?(gijskruitbosch+bugs)
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #10)
> Created attachment 8836449 [details] [diff] [review]
> patch v5
This is on top of bug 1338812's patch which has landed but isn't on mozilla-central yet.
Comment 12•8 years ago
|
||
Comment on attachment 8836449 [details] [diff] [review]
patch v5
Review of attachment 8836449 [details] [diff] [review]:
-----------------------------------------------------------------
The histograms are compiled into a cpp file, so you need a non-artifact build (so also not ./mach build faster even on a non-artifact build) of toolkit/components in order to get the changes. Yes, it's annoying. I don't know why telemetry works that way and why it doesn't read the definitions out of an external file (presumably an optimization of sorts).
Note that this will also need data-review from e.g. bsmedberg.
::: browser/components/migration/MigrationUtils.jsm
@@ +265,5 @@
> + let maybeStartResponsivenessMonitor = resourceType => {
> + let responsivenessMonitor;
> + let histogramId = getHistogramIdForResourceType(resourceType, "FX_MIGRATION_*_JANK_MS");
> + if (histogramId) {
> + responsivenessMonitor = new ResponsivenessMonitor;
Nit: add () to the new ...() call.
@@ +268,5 @@
> + if (histogramId) {
> + responsivenessMonitor = new ResponsivenessMonitor;
> + responsivenessMonitor.start();
> + }
> + return [responsivenessMonitor, histogramId];
We always deconstruct this, so this could return object shorthand ({responsivenessMonitor, histogramId}) that we deconstruct that way, avoiding callsites potentially mis-assigning these by changing the order, which can happen in the array case (and meaning you might not have to update callsites if we add/remove return values).
::: toolkit/modules/ResponsivenessMonitor.jsm
@@ +26,5 @@
> + this._timer.initWithCallback(this, this.intervalMS, Ci.nsITimer.TYPE_REPEATING_SLACK);
> + },
> +
> + notify(timer) {
> + if (timer != this._timer) {
Can this actually happen? Looks to me like `this.abort()` would kill a pre-existing timer even if we call start() multiple times on the same object. I guess in theory the timer's cancel() call could race with the notification having already been queued on our thread that the timer has 'hit', or something?
Perhaps it makes more sense to move the start() code into the constructor instead? That would make it impossible to reuse the object, which might be easier to understand.
Attachment #8836449 -
Flags: feedback?(gijskruitbosch+bugs) → feedback+
Assignee | ||
Updated•8 years ago
|
Attachment #8836449 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 15•8 years ago
|
||
mozreview-review |
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
https://reviewboard.mozilla.org/r/112090/#review113366
Attachment #8836743 -
Flags: review?(gijskruitbosch+bugs) → review+
Comment 16•8 years ago
|
||
FWIW, I can confirm that I can see a 7-10-fold difference (~1.5s vs. 14-30s) in the values of this histogram if I remove all the frecency notifications (which get posted from the sql update queries via sql functions to the sql thread and then to the main thread and then to various JS consumers) as well as various other callbacks that all post their own notification to the main thread from the sql thread (as runnables). So it seems to measure something useful, at least.
Assignee | ||
Updated•8 years ago
|
Attachment #8836743 -
Flags: review?(benjamin)
Assignee | ||
Updated•8 years ago
|
Attachment #8836743 -
Flags: feedback?(benjamin)
Assignee | ||
Comment 17•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
Also trying data collection peers who might be closer to my time zone.
Attachment #8836743 -
Flags: feedback?(a.m.naaktgeboren)
Assignee | ||
Updated•8 years ago
|
Attachment #8836743 -
Flags: feedback?(rweiss)
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
https://reviewboard.mozilla.org/r/112090/#review113782
::: toolkit/components/telemetry/Histograms.json:5116
(Diff revision 2)
> + "kind": "exponential",
> + "n_buckets": 20,
> + "high": 60000,
> + "releaseChannelCollection": "opt-out",
> + "keyed": true,
> + "description": "Accumulated timer delay as an indicator for decreased responsiveness while auto-importing bookmarks from another browser, keyed by the name of the browser."
Reading both this description and the related code, I still don't quite understand what this measures. You have a 1s timer and then measure the variance between when you got the timer and when you expected to get it? Doesn't that scale by the number of seconds the whole process takes? How is that valuable to you?
In addition, this whole system is using JS date, which is non-monotonic and therefore very vulnerable to clock skew, which is especially a problem for short-duration measurements such as this.
We have much more fine-grained event loop telemetry available in general which measures runnable delays, and it's probably very easy to subset that for specific tasks like this.
That's not a data-review-, exactly, but I do think you should at least improve the description because it's very confusing.
Comment 19•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
I'd like to see the description again.
Attachment #8836743 -
Flags: feedback?(rweiss)
Attachment #8836743 -
Flags: feedback?(benjamin)
Attachment #8836743 -
Flags: feedback?(a.m.naaktgeboren)
Attachment #8836743 -
Flags: feedback-
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #18)
> Comment on attachment 8836743 [details]
> Bug 1338522 - Add telemetry for responsiveness during auto migration of
> profile data from another browser.
>
> https://reviewboard.mozilla.org/r/112090/#review113782
>
> ::: toolkit/components/telemetry/Histograms.json:5116
> (Diff revision 2)
> > + "kind": "exponential",
> > + "n_buckets": 20,
> > + "high": 60000,
> > + "releaseChannelCollection": "opt-out",
> > + "keyed": true,
> > + "description": "Accumulated timer delay as an indicator for decreased responsiveness while auto-importing bookmarks from another browser, keyed by the name of the browser."
>
> Reading both this description and the related code, I still don't quite
> understand what this measures. You have a 1s timer and then measure the
> variance between when you got the timer and when you expected to get it?
Yes.
> Doesn't that scale by the number of seconds the whole process takes?
Yes.
> How is that valuable to you?
It exposes a problem regardless of how long the whole process takes. Most of the import is happening in a background thread and not expected to block the UI.
> In addition, this whole system is using JS date, which is non-monotonic and
> therefore very vulnerable to clock skew, which is especially a problem for
> short-duration measurements such as this.
'performance' doesn't seem to be available in JSMs and Components.utils.importGlobalProperties doesn't seem to help. The measurement is not that short-duration -- people have seen Firefox become unresponsive for more than a minute.
> We have much more fine-grained event loop telemetry available in general
> which measures runnable delays, and it's probably very easy to subset that
> for specific tasks like this.
Unfortunately I know nothing at all about that and we're on a tight schedule.
> That's not a data-review-, exactly, but I do think you should at least
> improve the description because it's very confusing.
I'm not sure where it's lacking since you seem to have figured out what it means, but I'll see what I can do.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 22•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
reviewboard
Attachment #8836743 -
Flags: feedback?(benjamin)
Assignee | ||
Updated•8 years ago
|
Blocks: 1249008
Summary: Add telemetry for responsiveness during auto migration of profile data from another browser → Add telemetry for responsiveness during migration of profile data from another browser
Assignee | ||
Comment 23•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
Redirecting to Rebecca / François as suggested by bsmedberg since he's on PTO.
Attachment #8836743 -
Flags: feedback?(rweiss)
Attachment #8836743 -
Flags: feedback?(francois)
Attachment #8836743 -
Flags: feedback?(benjamin)
Comment 24•8 years ago
|
||
mozreview-review |
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
https://reviewboard.mozilla.org/r/112090/#review115136
The new descriptions look much clearer to me. I would suggest two small tweaks.
Linking to the list of possible keys in the description of each probe:
keyed by the name of the browser (see gAvailableMigratorKeys in browser/components/migration/MigrationUtils.jsm)
and including units in the description:
Accumulated timer delay in ms
datareview+
Attachment #8836743 -
Flags: review+
Comment 25•8 years ago
|
||
A few quick questions:
1) Will this be added to opt-out telemetry on release? (Yes/No)
2) Will this be left on forever, or is it only bound to a specific version of Firefox? (yes/version(s))
3) How will you be collecting the resulting data and analyzing it? (t.m.o, s.t.m.o, a.t.m.o, don't know).
If the answer to (3) is don't know, we should probably discuss further. If the answer to (2) is "forever," who is responsible for monitoring this histogram?
Flags: needinfo?(dao+bmo)
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Rebecca Weiss from comment #25)
> A few quick questions:
>
> 1) Will this be added to opt-out telemetry on release? (Yes/No)
Yes
> 2) Will this be left on forever, or is it only bound to a specific version
> of Firefox? (yes/version(s))
57 is the version auto-migration might ship with. So I marked this as expiring in version 58, but ultimately I expect it will depend on the data we find. If at some point we consider the problem under control, I suppose we can remove this, although an argument could be made for keeping it longer. I don't currently plan on keeping this forever though.
> 3) How will you be collecting the resulting data and analyzing it? (t.m.o,
> s.t.m.o, a.t.m.o, don't know).
telemetry.mozilla.org is what I would start with, I expect it will give us good round-up insight.
Flags: needinfo?(dao+bmo)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 28•8 years ago
|
||
(In reply to François Marier [:francois] from comment #24)
> The new descriptions look much clearer to me. I would suggest two small
> tweaks.
Done. Thanks for the feedback.
I also changed the timer interval back to 100ms after the recent discussion in bug 1332225.
Comment 29•8 years ago
|
||
Pushed by dgottwald@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/23391448faad
Add telemetry for UI responsiveness during import of profile data from another browser. r=francois,Gijs
Assignee | ||
Updated•8 years ago
|
Summary: Add telemetry for responsiveness during migration of profile data from another browser → Add telemetry for UI responsiveness during migration of profile data from another browser
Comment 30•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Assignee | ||
Comment 31•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
Approval Request Comment
[Feature/Bug causing the regression]: bug 1332225 / bug 1332318
[User impact if declined]: /
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: /
[Is the change risky?]: no
[Why is the change risky/not risky?]: only adding telemetry, no behavior change
[String changes made/needed]: /
Attachment #8836743 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 32•8 years ago
|
||
Attachment #8839170 -
Flags: approval-mozilla-beta?
Comment 33•8 years ago
|
||
Is this critical enough that it really needs to get into 52 less than a week from RC?
Flags: needinfo?(dao+bmo)
Assignee | ||
Comment 34•8 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #33)
> Is this critical enough that it really needs to get into 52 less than a week
> from RC?
Yes, it's crucial for bug 1332318.
Flags: needinfo?(dao+bmo)
Updated•8 years ago
|
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Comment 35•8 years ago
|
||
Comment on attachment 8836743 [details]
Bug 1338522 - Add telemetry for UI responsiveness during import of profile data from another browser.
Collect telemetry data to investigate the hang issue. Aurora53+.
Attachment #8836743 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 36•8 years ago
|
||
Comment on attachment 8839170 [details] [diff] [review]
patch for beta
telemetry for migration code, beta52+
OK, let's get this in b9.
Attachment #8839170 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 37•8 years ago
|
||
Comment 38•8 years ago
|
||
bugherder uplift |
status-firefox-esr52:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•