Closed
Bug 652459
Opened 14 years ago
Closed 13 years ago
TM Windows nightlies (twice) and Linux64 opt builds (permanently) fail multiple mochitest-a11y tests with "getTextBeforeOffset" errors
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla6
People
(Reporter: philor, Unassigned)
References
Details
Attachments
(2 files)
(deleted),
patch
|
davidb
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
surkov
:
review+
|
Details | Diff | Splinter Review |
On both the Windows nightly built 2011/04/23 and the one built 2011/04/24, the mochitest-a11y (part of mochitest-oth, so the M(oth) on tbpl) tests have failed with multiple failures like "getTextBeforeOffset (word start): wrong text, offset: 0, id: ' 'input' '; - "" should equal """ (http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1303653153.1303654716.27583.gz) spread across multiple tests. There are a couple of interesting things about that:
* The regular depend build from the same cset doesn't fail, even though in the case of the 2011/04/23 one, the depend build says that it did a periodic clobber, so it was also a full build. The failure is repeatable on the same build: I retriggered both the failed and the passing tests on the 2011/04/23 builds (which is supposed to then download the same build, nightly or depend, and seems to have done so), and the passes stayed passing while the failures stayed failing. I retriggered the failed runs on Win7 and WinXP on the 2011/04/24 nightly seven times each, and all seven failed.
* On 2011/04/21, dvander landed http://hg.mozilla.org/tracemonkey/rev/d851d44ad77a which caused Linux64 opt mochitest-a11y to fail with the same sort of getTextBeforeOffset thing (http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1303488964.1303490147.21467.gz), every run rather than only nightlies or only clobbers, until it was backed out on 2011/04/22.
Despite how interesting it would be to see what would happen after a merge to mozilla-central, I sort of doubt we want to merge a baffling a11y-breaker just to find out.
Reporter | ||
Comment 1•14 years ago
|
||
Because I have odd ideas about what would be fun, for fun I clobbered the Windows slaves (which is supposed to be the moral equivalent of a nightly), and triggered another build on Windows on the same tip rev that was fine as a depend and failed as a nightly, and it's also fine as a clobbered depend.
Reporter | ||
Comment 2•14 years ago
|
||
And then because the world likes to be random, the next nightly, which built on the same cset as the last, does not fail.
Comment 3•14 years ago
|
||
This has gone perma-orange since the gcc-4.5 upgrade on linux64-opt. philor fired off a rebuild on the last good version before the upgrade: http://tree.mozilla.org/?tree=TraceMonkey&rev=268942ceb06f. So that should tell us whether the upgrade is responsible.
Either way, we probably have to deal with a closed tree until this is resolved.
Reporter | ||
Comment 4•14 years ago
|
||
Yeah, closing it now. That rebuild came up orange, so gcc-4.5 has brought us the same thing dvander's patch did, only possibly with nothing we can use as a scapegoat. I retriggered the build on the last merge from m-c, if that comes up green then I can start the hunt for a scapegoat in earnest, retriggering builds on everything that's landed since then until I find the first one that fails, and that lucky person will have to figure it out (unless someone else lands something which also breaks it, before they do figure it out).
Summary: TM Windows nightlies fail multiple mochitest-a11y tests with "getTextBeforeOffset" errors → TM Windows nightlies (twice) and Linux64 opt builds (permanently) fail multiple mochitest-a11y tests with "getTextBeforeOffset" errors
Reporter | ||
Comment 5•14 years ago
|
||
Unfortunately, the rebuild on Tuesday's m-c merge came up orange, so either something landed on m-c between Tuesday morning and Thursday morning which protected them from this, which we need to merge, or something inconceivable is happening.
Reporter | ||
Comment 6•14 years ago
|
||
Also seems pretty strongly associated with a "dom/src/threads/test/test_closeOnGC.html | Test timed out" also on Linux64, which mozilla-central also isn't seeing.
Comment 7•14 years ago
|
||
(In reply to comment #5)
> Unfortunately, the rebuild on Tuesday's m-c merge came up orange
Are you sure? Looking at tbpl, it looks like green on the rebuild of 28bc239d3d9d...
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1304111798.1304113030.7492.gz
Reporter | ||
Comment 8•14 years ago
|
||
That's the one where I retriggered the mochitest-other directly, which just reran it on the same original build, as a control to verify that we hadn't made any infra changes that caused the problem.
Then I retriggered the build, letting the build trigger a new round of tests. Thanks to tbpl foolishly thinking that builds and tests happen within a given time after a push (because Tinderbox only has one API, "things that happened in a given timespan"), you cannot see that on tbpl.m.o, but since mine's turned up to looking 96 hours after a push instead of just 12, http://dev.philringnalda.com/tbpl/?tree=TraceMonkey&rev=28bc239d3d9d will show you the retriggered builds (I did a second when the first looked like it wouldn't ever finish, and the second retriggering caused two more builds instead of just one), and the three orange mochitest-other runs from running the tests on old code compiled with a new mozconfig.
Reporter | ||
Comment 9•14 years ago
|
||
Something did land on mozilla-central last week which either fixes this, or just happens to protect them from this: http://dev.philringnalda.com/tbpl/?tree=Try&rev=9a9aaf90ba8d is a try push with a mozilla-central parent from April 25th which hit this.
Given that we have every reason to believe that a merge will make it go away (though no particular reason to believe that it will be fixing it), should we stay closed, or just open and pile pushes on orange, trusting that we won't affect the outcome of the merge with any of them?
Reporter | ||
Comment 10•14 years ago
|
||
Until next time.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
Comment 11•14 years ago
|
||
I took the latest tinderbox build as of now and the corresponding tests.
If I run test_singleline.html repeatedly, the number of fails varies greatly, within the three values 2, 24 and 26.
If I remove all tests but the getTextBeforeOffset ones from that test, which are supposedly the ones failing, I get no failure.
Comment 12•14 years ago
|
||
... and, it's not perma-orange on tm where it is on m-c...
Comment 13•14 years ago
|
||
and it's perma-green when run under valgrind... but valgrind complains a bunch about uninitialized values on a stack allocation from nsHyperTextAccessible::GetTextHelper at nsHyperTextAccessible.cpp:892...
Comment 14•14 years ago
|
||
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Comment 15•14 years ago
|
||
(In reply to comment #13)
> and it's perma-green when run under valgrind... but valgrind complains a bunch
> about uninitialized values on a stack allocation from
> nsHyperTextAccessible::GetTextHelper at nsHyperTextAccessible.cpp:892...
Interesting. Can you please file a bug (core/disability) and paste the valgrind output there?
Comment 16•14 years ago
|
||
(In reply to comment #11)
> If I remove all tests but the getTextBeforeOffset ones from that test, which
> are supposedly the ones failing, I get no failure.
Please go ahead and disable those tests for now and open the tree.
Fernando told me we are calling GetRenderedText when we shouldn't be, and that this may all be solved when we use cached text for this.
Comment 17•14 years ago
|
||
(In reply to comment #16)
> (In reply to comment #11)
> > If I remove all tests but the getTextBeforeOffset ones from that test, which
> > are supposedly the ones failing, I get no failure.
>
> Please go ahead and disable those tests for now and open the tree.
>
> Fernando told me we are calling GetRenderedText when we shouldn't be, and that
> this may all be solved when we use cached text for this.
Oof I misread Mike's comment. I thought he was just removing the getTextBeforeOffset ones.
Comment 18•14 years ago
|
||
I have disabled these tests:
accessible/tests/mochitest/text/test_singleline.html
accessible/tests/mochitest/text/test_whitespaces.html
in http://hg.mozilla.org/mozilla-central/rev/2f2d4de42515
Comment 19•14 years ago
|
||
(In reply to comment #15)
> (In reply to comment #13)
> > and it's perma-green when run under valgrind... but valgrind complains a bunch
> > about uninitialized values on a stack allocation from
> > nsHyperTextAccessible::GetTextHelper at nsHyperTextAccessible.cpp:892...
>
> Interesting. Can you please file a bug (core/disability) and paste the valgrind
> output there?
Filed bug 654999. As it's not clear if it's related, I didn't put any dependency on this bug.
Updated•14 years ago
|
Comment 20•13 years ago
|
||
It sounds like we've got disabled tests entirely while some part of tests were failing. Other point is we don't have good tests for text interface anymore so we are in risk of serious regressions. While we don't have any good guess why they were failed then we are faced these tests are disabled for a long time.
Comment 21•13 years ago
|
||
Note that now PGO is back on again, we should be able to backout 2f2d4de42515.
Comment 22•13 years ago
|
||
Note that I disabled the tests on davidb's request. I agree that we don't want them to be disabled. Furthermore, we need to figure out _why_ they're failing without PGO on Linux...
Comment 23•13 years ago
|
||
So, it seems like we can seperate this into 2 problems
* reenable the tests on m-c we think this should just work with the current optamization level. Is tryserver currently using the same flags for optamized builds as m-c? if not can it be made to use the same flags? Either way I'll try to land this at night / in the evening to minimize the pain should it stay orange.
* make this all work with any optimization level, in other words find out what certain optimizations cause to break. I would gues that this part depends on fixing the valgrind errors mentioned above, or atleast it would make sense to see if this persists after they are fixed. To test this can we easily change what optamization is done on try builds?
Attachment #531434 -
Flags: review?
Comment 24•13 years ago
|
||
Comment on attachment 531434 [details] [diff] [review]
backout disabling the tests
r=me
Attachment #531434 -
Flags: review? → review+
Comment 25•13 years ago
|
||
(In reply to comment #23)
> To test this can
> we easily change what optamization is done on try builds?
You can change optimization level in configure.in (e.g backing out 3abe2f8c592e in your try push), and you can disable PGO with https://wiki.mozilla.org/ReleaseEngineering/TryServer#How_to_get_non-PGO_coverage
If I recall correctly, you need to do both to reproduce the perma-orange.
Comment 26•13 years ago
|
||
(In reply to comment #25)
> (In reply to comment #23)
> > To test this can
> > we easily change what optamization is done on try builds?
>
> You can change optimization level in configure.in (e.g backing out
> 3abe2f8c592e in your try push), and you can disable PGO with
> https://wiki.mozilla.org/ReleaseEngineering/TryServer#How_to_get_non-
> PGO_coverage
>
> If I recall correctly, you need to do both to reproduce the perma-orange.
Though I seem to recall that it also was perma-orange on tm when it had PGO and 3abe2f8c592e wasn't merged on it, so backing out 3abe2f8c592e alone should trigger it (but takes longer to build)
Comment 27•13 years ago
|
||
REENABLED THE TESTS IN http://hg.mozilla.org/mozilla-central/rev/745a8761a2d8
Reporter | ||
Comment 28•13 years ago
|
||
And in the current episode:
The http://hg.mozilla.org/tracemonkey/rev/0de1cf797699 merge from mozilla-central at 14:45 May 9th merged disabling the tests to TM.
Between then and 15:09 on May 13th one of the 28 pushes to TM caused this to happen on Windows opt, but we didn't know it because the tests were disabled.
The http://hg.mozilla.org/tracemonkey/rev/599d1c6cba63 merge from mozilla-central at 15:09 May 13th merged reenabling the tests to TM. Once Windows built and ran m-a11y, both Win7 and WinXP failed, but by then TM had already been merged back to mozilla-central, bringing that orange with it.
http://hg.mozilla.org/mozilla-central/rev/8404426ef391 disabled the tests again on mozilla-central.
I closed TM (again, yeah), until someone either figures out why random jseng patches cause this in random optimized versions on random OSes, and according to rumor, can then uncause it by moving around blocks of code, or until someone just backs out everything that landed on TM between Tuesday and Friday until it goes green.
Status: REOPENED → NEW
Comment 29•13 years ago
|
||
(In reply to comment #28)
> I closed TM (again, yeah), until someone either figures out why random jseng
> patches cause this in random optimized versions on random OSes, and
> according to rumor, can then uncause it by moving around blocks of code
I think the reason is as "simple" as bug 654999 comment 7.
Comment 30•13 years ago
|
||
(In reply to comment #29)
> I think the reason is as "simple" as bug 654999 comment 7.
Who is the current owner for nsHyperTextAccessible::GetTextHelper?
This is clearly not a JS issue. I'm happy to provide the appropriate person with the changeset where the WinOpt went permaorange, but given that Julian's attempt at an obvious fix didn't pan out, we need to pass this off to people with domain knowledge who can hack on it via try. I'm going to disable the test on the tracemonkey side and reopen the TM tree if there are no serious objections.
Comment 31•13 years ago
|
||
> Who is the current owner for nsHyperTextAccessible::GetTextHelper?
Some combination of davidb and surkov (cced on this bug).
Comment 32•13 years ago
|
||
Okay, http://hg.mozilla.org/tracemonkey/rev/599d1c6cba63 is the TM/MC merge changeset that gives the perma-orange. davidb/surkov, you guys have the con!
Comment 33•13 years ago
|
||
Could somebody take assignment on this? I just want to make sure it gets done -- we shouldn't just drop failing tests from our suite when they're reproducibly orange on our build machines and we have a known proximal cause.
Comment 34•13 years ago
|
||
Assigning to Fernando since he is working on the a11y side of this. The orange tests are currently disabled. If we go ahead with a TM merge we have until the 24th to fix on the a11y end and failing that would face backing out the TM merge. Obviously this will be a priority.
Assignee: general → fherrera
Comment 35•13 years ago
|
||
(In reply to comment #33)
> Could somebody take assignment on this? I just want to make sure it gets
> done -- we shouldn't just drop failing tests from our suite when they're
yeah, that was already being worked on in bug 654999 some, sorry that wasn't clerer
> reproducibly orange on our build machines and we have a known proximal cause.
Comment 36•13 years ago
|
||
I pushed this change:
if (aType == eGetBefore) {
- endOffset = aOffset;
+ finalEndOffset = aOffset;
}
To the TM tryserver:
http://tbpl.mozilla.org/?tree=Try&pusher=fherrera@mozilla.com&rev=b06c31f9b75d
Assignee: fherrera → general
Comment 37•13 years ago
|
||
(In reply to comment #36)
> I pushed this change:
>
> if (aType == eGetBefore) {
> - endOffset = aOffset;
> + finalEndOffset = aOffset;
> }
>
> To the TM tryserver:
>
> http://tbpl.mozilla.org/?tree=Try&pusher=fherrera@mozilla.
> com&rev=b06c31f9b75d
https://bugzilla.mozilla.org/show_bug.cgi?id=654999#c11
Comment 38•13 years ago
|
||
As the patch also includes fixes for some test results, I'm attaching it complete
Comment 39•13 years ago
|
||
Comment on attachment 533908 [details] [diff] [review]
Patch fixing the issue and test marked as TODO
Review of attachment 533908 [details] [diff] [review]:
-----------------------------------------------------------------
r=me, obviously that was a misspelling
Attachment #533908 -
Flags: review+
Comment 40•13 years ago
|
||
(In reply to comment #38)
> Created attachment 533908 [details] [diff] [review] [review]
> Patch fixing the issue and test marked as TODO
>
> As the patch also includes fixes for some test results, I'm attaching it
> complete
It should be landed as for bug 654999 since you're not sure it's a right fix for this one, right?
Comment 41•13 years ago
|
||
(In reply to comment #40)
> (In reply to comment #38)
> > Created attachment 533908 [details] [diff] [review] [review] [review]
> > Patch fixing the issue and test marked as TODO
> >
> > As the patch also includes fixes for some test results, I'm attaching it
> > complete
>
> It should be landed as for bug 654999 since you're not sure it's a right fix
> for this one, right?
It's easy to verify: backout 3abe2f8c592e, add a mozconfig-extra file containing mk_add_options MOZ_PGO=, and push to try.
Comment 42•13 years ago
|
||
I landed the patch - http://hg.mozilla.org/mozilla-central/rev/5ebbe0f78c74. Fernando, can you make sure this patch fixes this bug too?
Comment 43•13 years ago
|
||
Pushed to try. We'll see how it goes.
http://tbpl.mozilla.org/?tree=Try&rev=e4ea52ae7293
Comment 44•13 years ago
|
||
And this is all green.
Status: NEW → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6
You need to log in
before you can comment on or make changes to this bug.
Description
•