Closed
Bug 1348097
Opened 8 years ago
Closed 8 years ago
Intermittent browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource://app/modules/experiments/Experiments.jsm:840 - Error: experiments are disabled
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: standard8)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
Updated•8 years ago
|
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 2•8 years ago
|
||
Although this is reporting a test failure against browser/components/search/test/browser_yahoo_behavior.js, I think this is actually an issue in the Experiments code:
20:34:12 INFO - TEST-UNEXPECTED-FAIL | browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource://app/modules/experiments/Experiments.jsm:840 - Error: experiments are disabled
20:34:12 INFO - Stack trace:
20:34:12 INFO - updateManifest@resource://app/modules/experiments/Experiments.jsm:840:29
20:34:12 INFO - notify@resource://app/components/ExperimentsService.js:66:7
20:34:12 INFO - TM_notify/<@resource://gre/components/nsUpdateTimerManager.js:218:11
20:34:12 INFO - TM_notify@resource://gre/components/nsUpdateTimerManager.js:263:7
20:34:12 INFO - GECKO(2802) | Completed ShutdownLeaks collections in process 2802
AFAICT we're not touching anything relating to enabling/disabling experiments in browser_yahoo_behavior.js. This implies another test has triggered the timer in ExperimentsService.js, but it hasn't been disabled properly.
There's several things that I think are wrong here:
1) `ExperimentsService:notify` calls `instance.updateManifest()`. However, the function is returning a promise. So this should possibly have a catch on it, or some other handling if required.
2) `gExperimentsEnabled` in ExperimentsService.js is defined as (some constants replaced for clarity):
```
const IS_UNIFIED_TELEMETRY = Preferences.get("toolkit.telemetry.unified", false);
return gPrefs.get("experiments.enabled", false) &&
IS_UNIFIED_TELEMETRY && gPrefs.get(PREF_TELEMETRY_ENABLED, false);
```
In Experiments.jsm it is:
```
gExperimentsEnabled = gPrefs.get("experiments.enabled", false) && TelemetryUtils.isTelemetryEnabled;
```
So `gExperimentsEnabled` is based on two different values.
3) Experiments.jsm watches the telemetry preference status, but ExperimentsService.jsm doesn't. This will mean that the Experiments.jsm version will get an update, whereas the ExperimentsService.jsm might not (it depends on exactly how the lazy getter works).
My theory then, is that the timer within ExperimentsService.js has been triggered, and when it activates it calls `instance.updateManifest()` in Experiments.jsm, but that one has been updated to think that the experiments service is not enabled, and hence the promise blows up.
Component: Search → Telemetry
Product: Firefox → Toolkit
Assignee | ||
Updated•8 years ago
|
Priority: P3 → --
Comment 3•8 years ago
|
||
in working on migrating the tests to run on Ubuntu 16.04 (instead of 12.04) we hit this root cause failure often:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4fb0bf102f76c3458b477cabc63a8820960e4f4&filter-searchStr=linux%20bc5%20e10s%20debug
is there a recommended fix for this or some general direction of what we should be doing to fix this?
Blocks: 1319863
Flags: needinfo?(standard8)
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #3)
> in working on migrating the tests to run on Ubuntu 16.04 (instead of 12.04)
> we hit this root cause failure often:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c4fb0bf102f76c3458b477cabc63a8820960e4f4&filter-
> searchStr=linux%20bc5%20e10s%20debug
That's interesting, especially as it fails with the same errors but in different tests.
> is there a recommended fix for this or some general direction of what we
> should be doing to fix this?
I've taken a bit more of a look. The more I get into this, the more I think that there is something to do with timing going on in Experiments/ExperimentsService.jsm. I'm not quite sure what yet though.
browser_oneOffContextMenu.js does stuff which triggers some Experiments activity, it then turns it off again later. On my opt setup, I'm not able to repeat this, but that's probably unsurprising, as this seems to be timing related.
There's a few timers about, and my suspicion is that one of them is somehow triggered at just the wrong time to upset the tests. My guess would be that the Ubuntu 16.04 builders are slightly slower and therefore triggering this more often.
With all the debug I've turned on and added, I've now pushed a try build. Hopefully that'll show up an explanation as to what is happening:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=db233cd760c9e2916901e35564acb358169d0034
I'll take another look in the morning.
Comment 5•8 years ago
|
||
I see a failure in bc3:
https://public-artifacts.taskcluster.net/SMemd-6KSXqza_Cl_7ZBvw/0/public/logs/live_backing.log
and I see:
[task 2017-04-18T22:57:06.022006Z] 22:57:06 INFO - GECKO(1531) | --DOMWINDOW == 13 (0xc9111400) [pid = 1531] [serial = 86] [outer = (nil)] [url = about:blank]
[task 2017-04-18T22:57:06.326319Z] 22:57:06 INFO - GECKO(1531) | 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest()
[task 2017-04-18T22:57:12.295604Z] 22:57:12 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1664
[task 2017-04-18T22:57:12.701353Z] 22:57:12 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1618
[task 2017-04-18T22:57:13.385353Z] 22:57:13 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1642
[task 2017-04-18T22:57:14.031587Z] 22:57:14 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1584
[task 2017-04-18T22:57:17.421945Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 12 (0xe842cc00) [pid = 1531] [serial = 76] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/opensearch.html]
[task 2017-04-18T22:57:17.424246Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 11 (0xc8d33400) [pid = 1531] [serial = 79] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/opensearch.html]
[task 2017-04-18T22:57:17.426257Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 10 (0xc928fc00) [pid = 1531] [serial = 82] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/tooManyEnginesOffered.html]
[task 2017-04-18T22:57:22.309864Z] 22:57:22 INFO - Console message: [JavaScript Error: "notify CALLLED123" {file: "jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js" line: 58}]
[task 2017-04-18T22:57:22.312343Z] 22:57:22 INFO - notify@jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js:58:5
[task 2017-04-18T22:57:22.313753Z] 22:57:22 INFO - TM_notify/<@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:218:11
[task 2017-04-18T22:57:22.315424Z] 22:57:22 INFO - TM_notify@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:263:7
[task 2017-04-18T22:57:22.315643Z] 22:57:22 INFO -
[task 2017-04-18T22:57:22.317015Z] 22:57:22 INFO - Console message: 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest()
[task 2017-04-18T22:57:22.374244Z] 22:57:22 INFO - TEST-INFO | started process screentopng
[task 2017-04-18T22:57:22.456099Z] 22:57:22 INFO - Failed to start /home/worker/workspace/build/tests/bin/screentopng for screenshot: No such file or directory
[task 2017-04-18T22:57:22.461320Z] 22:57:22 INFO - TEST-UNEXPECTED-FAIL | browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource:///modules/experiments/Experiments.jsm:843 - Error: experiments are disabled
[task 2017-04-18T22:57:22.461635Z] 22:57:22 INFO - Stack trace:
[task 2017-04-18T22:57:22.464118Z] 22:57:22 INFO - updateManifest@resource:///modules/experiments/Experiments.jsm:843:29
[task 2017-04-18T22:57:22.465671Z] 22:57:22 INFO - notify@jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js:67:7
[task 2017-04-18T22:57:22.467080Z] 22:57:22 INFO - TM_notify/<@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:218:11
[task 2017-04-18T22:57:22.468522Z] 22:57:22 INFO - TM_notify@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:263:7
[task 2017-04-18T22:57:31.839255Z] 22:57:31 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1531
[task 2017-04-18T22:57:31.839664Z] 22:57:31 INFO - TEST-START | Shutdown
what looks interesting is:
[task 2017-04-18T22:57:06.326319Z] 22:57:06 INFO - GECKO(1531) | 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest()
...
[task 2017-04-18T22:57:22.309864Z] 22:57:22 INFO - Console message: [JavaScript Error: "notify CALLLED123" {file: "jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js" line: 58}]
I assume that is related to this call:
https://dxr.mozilla.org/mozilla-central/source/browser/experiments/ExperimentsService.js?q=path%3AExperimentsService.js&redirect_type=single#58
I am not really sure if that helps at all, but thought I would take a look.
Assignee | ||
Comment 6•8 years ago
|
||
So, from the try builds, and today's investigations so far:
- ExperimentsService.notify() is being triggered because it is an update-timer defined in Experiments.manifest:
https://dxr.mozilla.org/mozilla-central/rev/722fdbff1efc308a22060e75b603311d23541bb5/browser/experiments/Experiments.manifest#3
The update-timers are triggered with one timer every 30 seconds, there are a few of them, so this could indeed be firing at any time (probably worse on slow builds, e.g. the debug ones).
Now comes the interesting bit:
- When the browser/components/search tests start, both ExperimentsService and Experiments see the various preferences as being true, i.e. they are enabled. In particular, "toolkit.telemetry.enabled" is true.
- browser_oneOffContextMenu.js changes "toolkit.telemetry.enabled" to false
=> Experiments picks this up, and changes its gExperimentsEnabled state to false. ExperimentsService stays as true, as it doesn't look for pref changes (see comment 2).
- browser_oneOffContextMenu.js then changes "toolkit.telemetry.enabled" to true
=> Experiments picks this up, but doesn't change its gExperimentsEnabled state - _telemetryStatusChanged calls _toggleExperimentsEnabled with the current value of gExperimentsEnabled (false), and then passes that enabled value to:
gExperimentsEnabled = enabled && TelemetryUtils.isTelemetryEnabled;
which is always going to return false...
- At some time later the update-timer kicks in, and triggers ExperimentsService.
=> ExperimentsService has its original recorded value of gExperimentsEnabled = true, so fires Experiments.updateManifest() which rejects a promise due to Experiments thinking that it is disabled, and bang: test fail.
I can reproduce this (5/5) on an opt build on my local mac by editing nsUpdateTimerManager.js and in TimerManager.observe forcing `this._timerMinimumDelay` and `firstInterval` to be `5000`.
These intervals are normally 30s, so I'd guess these tests are taking longer on the newer ubuntu machines.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → standard8
Flags: needinfo?(standard8)
Comment hidden (mozreview-request) |
Updated•8 years ago
|
Attachment #8859535 -
Flags: review?(alessio.placitelli) → review?(gfritzsche)
Assignee | ||
Comment 8•8 years ago
|
||
Comment on attachment 8859535 [details]
Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync.
I'm currently investigating the issues raised by try server.
Attachment #8859535 -
Flags: review?(gfritzsche)
Comment hidden (mozreview-request) |
Comment 10•8 years ago
|
||
mozreview-review |
Comment on attachment 8859535 [details]
Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync.
https://reviewboard.mozilla.org/r/131538/#review134444
Thanks for figuring this out!
::: browser/experiments/Experiments.jsm:124
(Diff revision 2)
> function configureLogging() {
> if (!gLogger) {
> gLogger = Log.repository.getLogger("Browser.Experiments");
> gLogger.addAppender(new Log.ConsoleAppender(new Log.BasicFormatter()));
> }
> - gLogger.level = gPrefs.get(PREF_LOGGING_LEVEL, Log.Level.Warn);
> + gLogger.level = gPrefs.get(PREF_LOGGING_LEVEL, Log.Level.Trace);
Left-over? Please undo this.
::: browser/experiments/ExperimentsService.js:53
(Diff revision 2)
>
> + get _experimentsEnabled() {
> + // We can enable experiments if either unified Telemetry or FHR is on, and the user
> + // has opted into Telemetry.
> + return gPrefs.get(PREF_EXPERIMENTS_ENABLED, false) &&
> + IS_UNIFIED_TELEMETRY && gPrefs.get(PREF_TELEMETRY_ENABLED, false);
Please use `TelemetryUtils.isTelemetryEnabled()`.
This handles an actual type issue we have seen before.
Attachment #8859535 -
Flags: review?(gfritzsche)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 12•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8859535 [details]
Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync.
https://reviewboard.mozilla.org/r/131538/#review134444
> Please use `TelemetryUtils.isTelemetryEnabled()`.
> This handles an actual type issue we have seen before.
Thanks, I've changed it. I also dropped the IS_UNIFIED_TELEMETRY check so that it was consistent with Experiments.jsm
Comment 13•8 years ago
|
||
mozreview-review |
Comment on attachment 8859535 [details]
Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync.
https://reviewboard.mozilla.org/r/131538/#review134950
Thanks for getting this fixed!
Attachment #8859535 -
Flags: review?(gfritzsche) → review+
Comment 14•8 years ago
|
||
Pushed by mbanner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e1c9c6e89f50
Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. r=gfritzsche
This caused failures like https://treeherder.mozilla.org/logviewer.html#?job_id=93077770&repo=autoland to start up on at least linux asan builds
Backed out in https://hg.mozilla.org/integration/autoland/rev/bc65b6aff361
Flags: needinfo?(standard8)
Assignee | ||
Comment 16•8 years ago
|
||
I knew I should have done the experiments.enabled setting for the whole of that test directory :-(
I'll get that fixed.
Flags: needinfo?(standard8)
Comment hidden (mozreview-request) |
Comment 18•8 years ago
|
||
Pushed by mbanner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b2c250009e4b
Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. r=gfritzsche
Comment 19•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•