Closed Bug 689256 Opened 13 years ago Closed 12 years ago

Intermittent TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js | process created before main is run, and other errors

Categories

(Firefox :: Session Restore, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 10

People

(Reporter: mbrubeck, Assigned: gsvelto)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 4 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=6554847&tree=Firefox Rev3 Fedora 12x64 mozilla-central opt test mochitest-other on 2011-09-26 11:13:18 PDT for push 44ef245b8706 builder: mozilla-central_fedora64_test-mochitest-other slave: talos-r3-fed64-036 starttime: 1317060798.19 results: warnings (1) buildid: 20110926073617 builduid: f8ea518cda884d3981afe49eb8c18f0b revision: 44ef245b8706 INFO TEST-END | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_514751.js | finished in 173ms TEST-INFO | checking window state TEST-INFO | unknown test url | before wait for focus -- loaded: complete active window: ([object ChromeWindow]) chrome://browser/content/browser.xul focused window: ([object Window]) about:blank desired window: ([object ChromeWindow]) chrome://browser/content/browser.xul child window: ([object Window]) about:blank docshell visible: true TEST-INFO | unknown test url | already focused TEST-INFO | unknown test url | maybe run tests <load:true, focus:true> -- loaded: complete active window: ([object ChromeWindow]) chrome://browser/content/browser.xul focused window: ([object Window]) about:blank desired window: ([object ChromeWindow]) chrome://browser/content/browser.xul child window: ([object Window]) about:blank docshell visible: true TEST-START | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js | process created before main is run ({process:(new Date(1317061148373)), createTopLevelWindow:(new Date(1317061149410)), firstPaint:(new Date(1317061152628)), sessionRestored:(new Date(1317061470046))}) Stack trace: JS frame :: chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js :: test :: line 4 JS frame :: chrome://mochikit/content/browser-test.js :: Tester_execTest :: line 277 JS frame :: chrome://mochikit/content/browser-test.js :: <TOP_LEVEL> :: line 227 native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js | main ran before first paint ({process:(new Date(1317061148373)), createTopLevelWindow:(new Date(1317061149410)), firstPaint:(new Date(1317061152628)), sessionRestored:(new Date(1317061470046))}) Stack trace: JS frame :: chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js :: test :: line 8 JS frame :: chrome://mochikit/content/browser-test.js :: Tester_execTest :: line 277 JS frame :: chrome://mochikit/content/browser-test.js :: <TOP_LEVEL> :: line 227 native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js | Session restored after main ({process:(new Date(1317061148373)), createTopLevelWindow:(new Date(1317061149410)), firstPaint:(new Date(1317061152628)), sessionRestored:(new Date(1317061470046))}) Stack trace: JS frame :: chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js :: test :: line 10 JS frame :: chrome://mochikit/content/browser-test.js :: Tester_execTest :: line 277 JS frame :: chrome://mochikit/content/browser-test.js :: <TOP_LEVEL> :: line 227 native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0 INFO TEST-END | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_522375.js | finished in 28ms Skipping 18902 lines... INFO TEST-END | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug618502.js | finished in 585ms TEST-INFO | checking window state TEST-INFO | unknown test url | before wait for focus -- loaded: complete active window: ([object ChromeWindow]) chrome://browser/content/browser.xul focused window: ([object Window]) about:blank desired window: ([object ChromeWindow]) chrome://browser/content/browser.xul child window: ([object Window]) about:blank docshell visible: true TEST-INFO | unknown test url | already focused TEST-INFO | unknown test url | maybe run tests <load:true, focus:true> -- loaded: complete active window: ([object ChromeWindow]) chrome://browser/content/browser.xul focused window: ([object Window]) about:blank desired window: ([object ChromeWindow]) chrome://browser/content/browser.xul child window: ([object Window]) about:blank docshell visible: true TEST-START | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js *** LOG addons.repository: Requesting http://127.0.0.1:8888/extensions-dummy/metadata?appOS=Linux&appVersion=9.0a1&tMain=%TIME_MAIN%&tFirstPaint=4255&tSessionRestored=354498 TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | appOS=Linux&appVersion=9.0a1&tMain=%TIME_MAIN%&tFirstPaint=4255&tSessionRestored=354498 TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | OS should be correct TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Version should be correct TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Should be a sensible tMain Stack trace: JS frame :: chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js :: observe :: line 45 JS frame :: resource://gre/modules/AddonRepository.jsm :: <TOP_LEVEL> :: line 1164 JS frame :: resource://gre/modules/AddonRepository.jsm :: <TOP_LEVEL> :: line 734 JS frame :: chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js :: test :: line 63 JS frame :: chrome://mochikit/content/browser-test.js :: Tester_execTest :: line 277 JS frame :: chrome://mochikit/content/browser-test.js :: <TOP_LEVEL> :: line 227 native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0 TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Should be a sensible tFirstPaint TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Should be a sensible tSessionRestored TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Console message: [JavaScript Error: "formatURL: Couldn't find value for key: TIME_MAIN" {file: "resource:///components/nsURLFormatter.js" line: 131}] TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Console message: LOG addons.repository: Requesting http://127.0.0.1:8888/extensions-dummy/metadata?appOS=Linux&appVersion=9.0a1&tMain=%TIME_MAIN%&tFirstPaint=4255&tSessionRestored=354498 TEST-PASS | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | Should have seen metadata request INFO TEST-END | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_bug623950.js | finished in 22ms
Taras, this seems pretty odd to have fail. It actually seems like TIME_MAIN is just never set (seeing as the AOM test fails too).
Blocks: 522375
(In reply to Paul O'Shannessy [:zpao] from comment #2) > Taras, this seems pretty odd to have fail. It actually seems like TIME_MAIN > is just never set (seeing as the AOM test fails too). What's AOM? This is bizarre, we are looking into it. Are these failures all 64bit linux? Not clear from later entries.
This could be a regression from stricter checking introduced by bug 670008 http://hg.mozilla.org/mozilla-central/rev/caf2dc068c23 This combined with clock-skew and/or some issue with using jiffies could cause main > process-creation
We can confirm above by adding a check for STARTUP_MEASUREMENT_ERRORS telemetry histogram to the test.
Attached patch relying on time sucks (deleted) — Splinter Review
Attachment #565385 - Flags: review?(mh+mozilla)
Instead of checking main value and discarding it if doesn't make sense, the patch leaves main in. This is OK because .main and .firstPaint assignment should always result in correct values(ie PR_Now doesn't fail on unix). This will not fix the orange, but will help us diagnose if the error is actually to due to the process timestamp or something funky is going on with the main timestamp.
Attachment #565385 - Flags: review?(mh+mozilla) → review+
Keywords: checkin-needed
Assignee: nobody → tglek
In my queue with a few other bits that are being sent to try first and then onto inbound :-) https://tbpl.mozilla.org/?tree=Try&rev=c1528769b893
Status: NEW → ASSIGNED
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
(In reply to Taras Glek from comment #7) > This will not fix the orange, but will help us diagnose Oops.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The recent surge in this was caused by the landing of bug 841651.
I'm having some trouble reproducing this on my machine. I'm using a 64-bit debug build (/w optimization turned on) on a Fedora 18 machine. I can get browser_addonrepository_performance.js to fail but it will bail out with a "Should be a sensible tFirstPaint" error message instead of "Should be a sensible tMain". Printing out tMain and tFirstPaint yields me a >0 value for tMain (which in the logs linked from this bug must have been set to something else for the test to fail) and the '%TIME_FIRST_PAINT%' string for tFirstPaint. On the other hand I was unable to reproduce the issue in browser_addonrepository_performance.js. Since these bugs seem to be timing-related is there any known way to trigger them in an easier way? I'll try loading my machine a lot in order to slow down threads with lower priority but I'm not sure if this is going to be enough to reproduce them consistently.
(In reply to Gabriele Svelto [:gsvelto] from comment #100) > I'm having some trouble reproducing this on my machine. I'm using a 64-bit > debug build (/w optimization turned on) on a Fedora 18 machine. Note that this is happening almost exclusively on 32bit builds. Might help in catching what's going on.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #102) > Note that this is happening almost exclusively on 32bit builds. Might help > in catching what's going on. OK, I'll try to make 32-bit build but I think I've figured out what's wrong here: the process creation and main timings are obtained by accessing the mozilla::StartupTimeline. The 'main' value is recorded by invoking: StartupTimeline::Record(StartupTimeline::MAIN); Within XREMain::XRE_mainInit(), this will use PR_Now() to record the time of the call. The 'process start' event on the other hand is computed in nsAppStartup.cpp by the ThreadedCalculateProcessCreationTimestamp() call. This call will look at the /proc/*/stat file of the current process and derive the time when the kernel created the process by looking at the |starttime| field. The source for this field is the number of clock ticks as measured by the kernel since startup, the value returned from PR_Now() on the other hand is obtained via gettimeofday(). Since the two time sources are different it doesn't surprise me that they might be off-sync and comparing their results directly is probably a bad idea. A way to mitigate this issue would be to modify nsAppStartup::GetStartupInfo() to clamp obviously wrong times for process creation. We already have a check for that so I could easily add another one to fix the issue. The second issue is also caused by this: tMain is being computed as startupInfo.main - startupInfo.process in toolkit/mozapps/extensions/AddonRepository.jsm thus yielding a negative value.
This patch compares the timestamps for the MAIN and PROCESS_CREATION events and if MAIN appears to have occurred earlier than PROCESS_CREATION sets both to the value of MAIN. This will fix the issue we're currently seeing but will obviously skew the measurements when the two timestamps are inconsistent. Alternatively we could set PROCESS_CREATION to -1 if it's value is inconsistent with MAIN and bump Telemetry::STARTUP_MEASUREMENT_ERRORS like we do already. In this case we'd need to patch both tests to check for STARTUP_MEASUREMENT_ERRORS and skip the tests if those errors are encountered. Both approaches work for me so if we think this patch is not the right way to go I'll prepare another one against the tests ASAP.
Attachment #720738 - Flags: review?(mh+mozilla)
Comment on attachment 720738 [details] [diff] [review] [PATCH] Ensure that the startup MAIN timestamp is never older than the PROCESS_CREATION one Review of attachment 720738 [details] [diff] [review]: ----------------------------------------------------------------- I think it would be better to mark the measures as errors.
Attachment #720738 - Flags: review?(mh+mozilla) → review-
Revised patch, this uses telemetry to record invalid PROCESS_CREATION timestamps and adjusts the tests so that the ones for which we don't have reliable data are skipped.
Attachment #720738 - Attachment is obsolete: true
Attachment #720793 - Flags: review?(mh+mozilla)
Comment on attachment 720793 [details] [diff] [review] [PATCH v2] Record invalid PROCESS_CREATION timestamps and adjust tests to cope with them Review of attachment 720793 [details] [diff] [review]: ----------------------------------------------------------------- I'm ok with the C++ change, but I'm not sure silently skipping the test is the right thing to do. Is it possible to turn them into known-failures?
Attachment #720793 - Flags: review?(mh+mozilla) → review+
This is a revised patch which uses todo() to record known failures instead of skipping the tests altogether. Unfortunately I'm not familiar enough with mochitest to figure out if this is the right approach so before obsoleting the previous patch I'll try to find a mochitest expert to have a look at my changes.
I've investigated a little further how to handle the mochitests and apparently the common idiom we use for this situation is to record a known failure via a |todo(false, "explain why this is a known failure")| call so I've revised my patch accordingly. Mike, should I ask an additional review from a unit-test peer for this patch? If it's not needed I'll just obsolete the second version and request for checking-in this one if the try run is green. The try run (slimmed down to touch only the affected tests) is here: https://tbpl.mozilla.org/?tree=Try&rev=cc05a863544e
Attachment #721148 - Attachment is obsolete: true
Attachment #720793 - Attachment is obsolete: true
The try-run was green, the patch C++ bits are unmodified compared to the one that was r+'d by Mike and the tests now issue known failures when the problem is detected like other mochitests do. Since no other changes were added if nobody has anything against it I think the patch should be good to checkin with r=mh+mozilla, feel free to clear the checkin-needed keyword if you feel otherwise. To avoid confusion the patch that needs to be checked-in is attachment 721297 [details] [diff] [review].
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/f856bcb5097f *fingers crossed* - Thanks for working on this, Gabriele!
Keywords: checkin-needed
(In reply to Ryan VanderMeulen [:RyanVM] from comment #152) > Backed out for xpcshell failures. > https://hg.mozilla.org/integration/mozilla-inbound/rev/bc5920039204 OK, I've traced back the errors to the session recorder, which requires the process startup time to be set to a valid value: https://mxr.mozilla.org/mozilla-central/source/services/datareporting/sessions.jsm#219 In addition I've found other consumers relying on the process creation timestamp field being present and set to a valid value: https://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#118 https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonRepository.jsm#744 https://mxr.mozilla.org/mozilla-central/source/browser/metro/base/content/browser-ui.js#185 https://mxr.mozilla.org/mozilla-central/source/mobile/xul/chrome/content/browser-ui.js#499 https://mxr.mozilla.org/mozilla-central/source/browser/components/nsBrowserGlue.js#445 So we've got two options here: - Going back to my first patch where we would fake the process creation timestamp by setting it to be equal to the main timestamp when we detect the values are inverted. To mitigate the hackiness of this we might still record a telemetry error to indicate that something went wrong when the measurement happened. - Changing all consumers to use the main timestamp instead of the process creation timestamp as the startup time. This is more intrusive but would have the advantage of actually having all consumers use timestamps that were taken via the same time source (i.e. PR_Now()) and so are necessarily comparable. BTW I've spotted another corner case when we would get erroneous values in these fields: since PR_Now() uses gettimeofday() it's susceptible to time-jumps caused when the system clock is manually set and these could be reflected in the recorded timestamps. Considering this is a very unlikely event I don't think we should bother too much with it but it serves to show that having more checks to sanitize these values can't hurt.
Maybe bug 793735 would be part of the solution, although last time i looked, it was tricky...
(In reply to Mike Hommey [:glandium] from comment #155) > Maybe bug 793735 would be part of the solution, although last time i looked, > it was tricky... Yes, it would definitely fix any possible time-jump issue caused by things like NTP kicking in and adjusting for DST or so but I would still need to find a way to accommodate for the process creation time we compute from /proc/*/stat. After giving it more thought I think the best approach is to set process creation to a valid value (i.e. main) in case the value is inconsistent but still record the issue using telemetry. This way all consumers should be happy but we will still be able to issue known failures in the mochitests.
Agreed.
I've got a revised patch, it's not ready yet because I've been hitting a snag when the timestamps are reset when the browser is restarted. I'll proceed to fixing bug 793735 as soon as I'm done here. I've got a WIP for that one too but I'd like to finish here first in order not to drag the problem along.
Assignee: taras.mozilla → gsvelto
This is hopefully the last iteration of this patch. The changes with the previous one are a bit subtle so I'm requesting the review again. The changes are essentially the following: - Whenever an inconsistent process creation timestamp is detected telemetry is used to record that it was erroneous. - If there is a valid main function timestamp it is used to replace the erroneous process creation timestamp to prevent consumers expecting it to be there from breaking. If no valid main function timestamp is present the erroneous process creation timestamp is set to -1 just like before the patch. - Tests have been modified to issue known failures depending on the presence of telemetry errors instead of the value of the process creation timestamp. I've pushed the patch to try with all mochitests & xpcshell tests and it turned out green: https://tbpl.mozilla.org/?tree=Try&rev=65390f74a834
Attachment #721297 - Attachment is obsolete: true
Attachment #722780 - Flags: review?(mh+mozilla)
Attachment #722780 - Flags: review?(mh+mozilla) → review+
Thanks for the review! Hopefully this should be fixed for good this time. Again to avoid any confusion with the older patch the one to be checked in is attachment 722780 [details] [diff] [review].
Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: