Closed Bug 929301 Opened 11 years ago Closed 7 years ago

Intermittent browser_CTP_crashreporting.js | Should have a click-to-play notification, | uncaught exception, | Test timed out,

Categories

(Core Graveyard :: Plug-ins, defect, P3)

x86_64
macOS
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Made it quite a bit longer before the first failure than it did the first time it landed, anyway.

https://tbpl.mozilla.org/php/getParsedLog.php?id=29465069&tree=Mozilla-Inbound
Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound opt test mochitest-browser-chrome on 2013-10-21 17:35:44 PDT for push f5dd1dc2ae18
slave: talos-r4-snow-037

17:37:53     INFO -  TEST-START | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js
17:37:53  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Should have a click-to-play notification
17:37:53     INFO -  Stack trace:
17:37:53     INFO -      JS frame :: chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js :: afterBindingAttached :: line 58
17:37:53     INFO -      JS frame :: chrome://mochikit/content/browser-test.js :: <TOP_LEVEL> :: line 654
17:37:53     INFO -      native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
17:37:53     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Plugin should not be activated
17:37:53  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | uncaught exception - TypeError: popupNotification is null at chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js:65
17:37:53     INFO -  Stack trace:
17:37:53     INFO -      JS frame :: chrome://mochikit/content/tests/SimpleTest/SimpleTest.js :: simpletestOnerror :: line 1159
17:37:53     INFO -      native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
17:37:53     INFO -  JavaScript error: chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js, line 65: popupNotification is null
17:37:53     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Console message: [JavaScript Error: "TypeError: popupNotification is null" {file: "chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js" line: 65}]
17:38:23     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Longer timeout required, waiting longer...  Remaining timeouts: 1
17:38:52     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Console message: 1382402332220	Services.HealthReport.HealthReporter	WARN	Saved state file does not exist.
17:38:52     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Console message: 1382402332220	Services.HealthReport.HealthReporter	WARN	No prefs data found.
(screenshot)
17:39:27  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | Test timed out
17:39:27     INFO -  INFO TEST-END | chrome://mochitests/content/browser/browser/base/content/test/general/browser_CTP_crashreporting.js | finished in 90049ms
This is a different issue, similar to other click-to-play tests, with the CTP notification sometimes not being present (yet?).
I'll have to figure something out about these when i'm back.
Can somebody help me to investigate this issue, as this is currently blocking an optimization in the JS engine.

I can reproduce this issue on Try (only) with a tiny patch from Bug 1242462, which should not change much of the JS engine behaviour.

So far it seems that we switch the state to "submitting", but never switch it to "success", as the log[2] of some try pushes[1] are highlighting:

To compare I made another try push[4] without Bug 1242462's patch, and the log[3] is *almost* the same apart from a few parent messages which appear in different orders.

I really don't know much about this part of the code, and lack ideas on where and how to instrument the code to get a better grasp of what is not going well.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=11fc7185f703
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=16348054&repo=try
[3] http://archive.mozilla.org/pub/firefox/try-builds/npierron@mozilla.com-1edecc69abbb3ce3b8ed4cf2f1126f1240fe0017/try-linux64-debug/try_ubuntu64_vm-debug_test-mochitest-e10s-browser-chrome-5-bm116-tests1-linux64-build204.txt.gz
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1edecc69abbb
Flags: needinfo?(gfritzsche)
Attached patch CTP_crashreport.log.diff (deleted) — Splinter Review
This is the diff (--- success +++ failure) of the logs where the pointers are anonymized with the id/index of the DOMWINDOW/DOCSHELL/pid.

One thing that we can notice is that the Warning messages from the parent process have different locations, but I do not know if these are relevant.
Mike, as you have changed the code base since Georg worked on it, would you have ideas on how to instrument the code base better to figure out the plausible race which is causing this issue? (see comment 189 and comment 190)
Flags: needinfo?(mconley)
Looks like the XHR either isn't getting sent properly, received properly by crashreport.sjs, or the response isn't going out properly.

More logging: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0984e073ddc5
I made a push without Bug 1242462's patch.
Apparently on x86, the instrumentation alone now seems to be enough to trigger this issue. ( The good kind of Heisenbugs :) )

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9147969c258c
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #193)
> Looks like the XHR either isn't getting sent properly, received properly by
> crashreport.sjs, or the response isn't going out properly.

Looking at the differences between the logs, it seems that the time difference between (1) and (2)

(1) @@ NPAPIPluginCrashReportSubmitted: status = submitting
    […]
    @@ crashreport.sjs: Handling POST request

(2) @@ crashreport.sjs: Received: […]
    @@ submitForm XHR state listener saw state: 2 -- status: 200

varies between 3s to ~14s, while we wait 5s on Step 10 (50 x 100ms) to let the status flag switch from "submitting" to "success".

It sounds that some other task might be scheduled between (1) and (2), which takes time, and depends on the way the scheduler schedule the threads.

Thus, the next questions, are:
 - What are the tasks which are being scheduled?
 - How much time does each task takes?
Worked with nbp over IRC with this. Looks like crashreport.sjs / httpd.js aren't working as quickly as they should with his patch, so the test is bailing out before they can do their work.
Flags: needinfo?(mconley)
Thanks Mike, concerning Bug 1242462, the problem seems to be related to a combination of recent regression in the execution of for-of loops, with this patch, which emphasizes the regression impact. (Bug 1242462 comment 19).

Still, I have no idea how this would explains what I observed in comment 194, where the fact of adding more dump / console.log fucntions *without* Bug 1242462's patch causes the same timeout to appear on x86 slaves.
Flags: needinfo?(gfritzsche)
Priority: -- → P3
no failures in the last year
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: