osx Mojave (10.14) - gtests seem to fail on expirationtracker in debug mode
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: jmaher, Assigned: KrisWright)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr68+
|
Details |
in starting to look at upgrading our osx machines from 10.10 to 10.14, some tests are failing.
In this case, I see gtest failing:
11:06:13 INFO - TEST-START | ExpirationTracker.main
11:06:13 INFO - [842, Main Thread] WARNING: Expired objects were not removed or marked used: file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsExpirationTracker.h, line 257
11:06:14 INFO - [842, Main Thread] WARNING: Expired objects were not removed or marked used: file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsExpirationTracker.h, line 257
11:06:14 INFO - [842, Main Thread] WARNING: Expired objects were not removed or marked used: file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsExpirationTracker.h, line 257
11:06:14 WARNING - TEST-UNEXPECTED-FAIL | ExpirationTracker.main | Value of: timeDiffMS < periodMS && aObj->mExpired
11:06:14 INFO - Actual: false
11:06:14 INFO - Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestExpirationTracker.cpp:134
11:06:14 WARNING - TEST-UNEXPECTED-FAIL | ExpirationTracker.main | Value of: timeDiffMS < periodMS && aObj->mExpired
11:06:14 INFO - Actual: false
11:06:14 INFO - Expected: true @ /builds/worker/workspace/build/src/xpcom/tests/gtest/TestExpirationTracker.cpp:134
11:06:14 WARNING - TEST-UNEXPECTED-FAIL | ExpirationTracker.main | Value of: timeDiffMS < periodMS && aObj->mExpired
11:06:14 INFO - Actual: false
you can see the test runs and logs here:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher%40mozilla.com&fromchange=96a96f509d98f90ce3f8e35990df75153755b877&tochange=b9ef9517732230f87398e4b26d30ed5c2e34adce&searchStr=gtest&selectedJob=240125939
Reporter | ||
Comment 1•6 years ago
|
||
:njn, would you know much about this test (I saw you wrote patches for this test in the past) and have ideas for fixing this on osx mojave?
Comment 2•6 years ago
|
||
Apologies: I have no recollection or knowledge of this test and the Mercurial history says I have only ever made one trivial change to it.
I tried running it on my Mac laptop, which has 10.14, but it succeeded.
Reporter | ||
Comment 3•6 years ago
|
||
thanks :njn, hg history can be difficult to really figure out who knows about the test. Thanks for giving this a try locally.
Comment 4•6 years ago
|
||
This reproduces with:
MOZ_CHAOSMODE=4 ./mach gtest ExpirationTracker.*
MOZ_CHAOSMODE=4
makes our timers less predictable. I think we're just making bad assumptions that our timers will always fire precisely at the the expected time.
Comment 5•5 years ago
|
||
:erahm - this bug now appears on both shippable
and debug
variants of gtest: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=875b73dc18b721a2cb92286a61bb7684b068b09a
Would you be able to take a look at this test? It seems to be specific to macosx1014. There is a timeline for migration to 1014; would it be acceptable to have the test disabled for macosx1014?
Comment 6•5 years ago
|
||
Kris, can you take a look at this test failure? We're seeing increased tests failures on OSX 10.14 in the ExpirationTracker
test. Comment 4 has details on reproducing locally. We might just need to tweak the upper bounds used in the test due to the inevitable flakiness of timers. At this point we don't really have an owner for this code, but if you run into issues ping froydnj or myself.
Assignee | ||
Comment 7•5 years ago
|
||
The timer is falling above/below bounds during this check https://searchfox.org/mozilla-central/rev/8ed8474757695cdae047150a0eaf94a5f1c96dbe/xpcom/tests/gtest/TestExpirationTracker.cpp#133
Some of the values fall in predictably and suggest maybe increasing the slack to give them a bit more room to fail...but sometimes the test still fails randomly on a very large/small value (e.g. some 200+ ms above or ~70ms below the boundary). It doesn't seem to be a particular pattern of incriminating objects causing this, which leads me to believe that the timer is behaving strangely. The fact that outside of chaos mode here and on other platforms the tests pass (while still printing the |WARNING: Expired objects were not removed or marked used| notice) too indicates this may be a timer issue.
Filed bug 1560411 about the timers.
Edit: Bah, I entered the wrong bug number.
Comment 8•5 years ago
|
||
:kriswright - while bug 1560411 is being worked on, would it be acceptable to either disable the test for macosx, or insert a larger value that permits the test to pass on macosx1014? Timeline wise, I am hoping to migrate this suite to macosx1014 within the next couple of weeks.
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Edwin Gao (:egao) from comment #8)
:kriswright - while bug 1560411 is being worked on, would it be acceptable to either disable the test for macosx, or insert a larger value that permits the test to pass on macosx1014? Timeline wise, I am hoping to migrate this suite to macosx1014 within the next couple of weeks.
I think the best course of action here would be to adjust the bounds for the test, probably by inserting a new slack value (found here https://searchfox.org/mozilla-central/rev/f91bd38732d4a330eba4e780812274b98eb81274/xpcom/tests/gtest/TestExpirationTracker.cpp#46). It looks like the slack boundary may need to be significantly increased for the time being.
It appears that this is happening across several tests, so filed bug 1561959 about addressing the timer fails.
Comment 10•5 years ago
|
||
I tried changing the bounds for the slackMs value and tried several much larger values, but none of them seem to permit the test to pass on macosx1014 by itself. Several values from 200ms to 14000ms with no luck.
However, once I also modified the value of periodMS
from 100ms to 40000000ms. This seems to have done the trick.
I'm pushing several try runs without the ridiculous period
and slack
values; we will see which ones work.
2000/1000: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=ad95459a235675c6d1d5d41a5b9024a252533f0d
1000/500: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=97c04e97f58059316b7fff965feddb376aed53fe
600/260: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=0cb49b483817b8d0e2787b7de0dcb93a4ab28a7c
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to Edwin Gao (:egao) from comment #10)
I tried changing the bounds for the slackMs value and tried several much larger values, but none of them seem to permit the test to pass on macosx1014 by itself. Several values from 200ms to 14000ms with no luck.
However, once I also modified the value of
periodMS
from 100ms to 40000000ms. This seems to have done the trick.I'm pushing several try runs without the ridiculous
period
andslack
values; we will see which ones work.2000/1000: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=ad95459a235675c6d1d5d41a5b9024a252533f0d
1000/500: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=97c04e97f58059316b7fff965feddb376aed53fe
600/260: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=macosx%2Cgtest&revision=0cb49b483817b8d0e2787b7de0dcb93a4ab28a7c
Oh! LowerBoundMS is probably going negative. This will trip up the objects if the negative value is large enough. I apologize for the oversight. I had clamped the uint to 100 or greater.
I enabled logging to see how the timers behave now. Let's see if this works. If it clears I'll have a patch up shortly with a better slack range.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b25a141d469f67656142fdab2b0d93da4d98ef0
Assignee | ||
Comment 12•5 years ago
|
||
Increased & clamped a slack boundary at 400ms. Lower bound vals should not drop below 100 ms (timer period).
Assignee | ||
Comment 13•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b25a141d469f67656142fdab2b0d93da4d98ef0
Looks like the test is greenlighting now. I've put up a patch increasing slack boundaries to 400ms (the test up here is set to 1400 so I'm running a second push now to verify 400 as the right number). For the short term this is a good workaround to get the tests greenlighting again. If the newest push goes out of bounds again I'll adjust the patch accordingly.
Reporter | ||
Comment 14•5 years ago
|
||
keep in mind that try run is on 10.10, not 10.14, to run on 10.14, you need to uncomment out gtest from the test-sets.yml file:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#318
Assignee | ||
Comment 15•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #14)
keep in mind that try run is on 10.10, not 10.14, to run on 10.14, you need to uncomment out gtest from the test-sets.yml file:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#318
Oh! I apologize. Thank you for the info. This is a bit of a learning experience for me. Reruning now with the right settings - if it fails for any reason I will examine the logs in the morning.
Assignee | ||
Comment 16•5 years ago
|
||
Edit: I thought ExpirationTracker.main was passing but more repetitions started to fail. Will have a patch up soon with a sufficient upper bound range.
Comment 17•5 years ago
|
||
:kriswright - do note that there are other failure on macosx1014 for gtest that appears once ExpirationTracker
tests pass. Notable failure are ThreadPool
and WebRtcIceConnectTest
failures; these are logged in bug 1561410 and bug bug 1558887.
Comment 18•5 years ago
|
||
Comment 19•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Reporter | ||
Comment 20•5 years ago
|
||
Comment on attachment 9074938 [details]
Bug 1544452 - Increase slack boundaries for ExpirationTracker gtest
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: testonly change to green up test on osx 10.14.
- User impact if declined:
- Fix Landed on Version: 69
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Comment 21•5 years ago
|
||
Comment on attachment 9074938 [details]
Bug 1544452 - Increase slack boundaries for ExpirationTracker gtest
Doesn't really need approval since it's a test-only change, but sure. Approved for 68.1esr.
Comment 22•5 years ago
|
||
bugherder uplift |
Assignee | ||
Updated•4 years ago
|
Description
•