Closed
Bug 468023
Opened 16 years ago
Closed 16 years ago
Synchronize RefTest results to tinderbox waterfall from (new) log
Categories
(Testing :: Reftest, defect)
Testing
Reftest
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.9.1b3
People
(Reporter: sgautherie, Assigned: sgautherie)
References
()
Details
(Keywords: fixed1.9.1, Whiteboard: [fixed1.9.1b3: Bv2a])
Attachments
(3 files, 8 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
coop
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
coop
:
review+
|
Details | Diff | Splinter Review |
Bug 466372 comment 1:
{{
From Serge Gautherie 2008-11-23 08:19:53 PST
Fwiw, the figures are a little different from the (Windows SeaMonkey)
waterfall:
{
TinderboxPrint: reftest<br/>2326/0/93
}
I would guess
2326 = Pass + Fail (which don't on the tindrboxes) + LoadOnly
93 = +/- "Known Fail" (without the "Random only") + Skip
}}
***
Example of current outputs:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228438741.1228442157.13690.gz&fulltext=1
Linux mozilla-central moz2-linux-slave07 dep unit test on 2008/12/04 16:59:01
REFTEST INFO | Pass: 2351
REFTEST INFO | KnownFail: 98
REFTEST INFO | Random: 30
REFTEST INFO | LoadOnly: 4
REFTEST INFO | Skip: 15
TinderboxPrint: reftest<br/>2362/0/113
REFTEST INFO | LoadOnly: 795
REFTEST INFO | Skip: 2
TinderboxPrint: crashtest<br/>799/0/2
***
I don't know where/how the (3) TinderboxPrint numbers are calculated...
But I'd like to suggest to calculate them as the sum of the new items:
{
8 + // The errors...
9 + Exception: 0,
10 + FailedLoad: 0,
11 + UnexpectedFail: 0,
12 + UnexpectedPass: 0,
13 + // The successes...
14 + Pass: 0,
15 + KnownFail : 0,
16 + Random : 0,
17 + LoadOnly: 0,
18 + // The unknowns.
19 + Skip: 0,
}
Comment 1•16 years ago
|
||
We should not do it this way.
The first number (passed tests) should be the sum of the Pass and the LoadOnly. The second number (unexpected results that turn the tree orange) should be the sum of the Exception (maybe, I'm not even sure what this is), FailedLoad, UnexpectedFail, and UnexpectedPass. The third number (known failures) should be the some of the KnownFail, Skip, and Random.
However, I don't think there's any need to rewrite the code in question, nor do I see how such a rewrite would change the results (except possibly for the Exception category).
Assignee | ||
Comment 2•16 years ago
|
||
(In reply to comment #1)
> We should not do it this way.
Fine with me.
> However, I don't think there's any need to rewrite the code in question, nor do
> I see how such a rewrite would change the results
At least, I'm puzzled by the 4 extra crashtests reported on the waterfall :-/
Comment 3•16 years ago
|
||
Actually, it looks like the current code is broken, so maybe we should do this. In particular, the current code does this:
REFTEST TEST-FAIL | | EXCEPTION: Error in manifest file file:///builds/slave/trunk_linux-7/build/layout/reftests/text/reftest.list line 28
REFTEST FINISHED: Slowest test took 0ms (undefined)
REFTEST INFO | Result summary:
REFTEST INFO | Exception: 1
REFTEST INFO | Total canvas count = 0
program finished with exit code 0
TinderboxPrint: reftest<br/>5/0/0
Note that it's counting the info lines as passing tests. We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.
Assignee | ||
Comment 4•16 years ago
|
||
(In reply to comment #3)
> We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.
Bug 468476 fixed that.
URL: 468476
Assignee | ||
Updated•16 years ago
|
Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #3)
> Actually, it looks like the current code is broken
What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
(compared to buildbotcustom/*/steps.py)
Assignee | ||
Comment 6•16 years ago
|
||
Per comment 1 and comment 3.
I did some manual unit-testing of the new code,
but I can't test the file as a whole.
Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Assignee: nobody → sgautherie.bz
Status: NEW → ASSIGNED
Attachment #352027 -
Flags: review?(ted.mielczarek)
Attachment #352027 -
Flags: review?(dbaron)
Comment 7•16 years ago
|
||
Why not do the summing in reftest.js? (Maybe call them successful, unexpected, and known problems?) That would also give people the ability to compare their results to the tinderbox results.
Comment 8•16 years ago
|
||
reftest.js could perhaps print a summary like this (instead of what it's printing now):
REFTEST INFO | 1500 successful (1400 pass, 100 load only)
REFTEST INFO | 0 unexpected (0 unexpected fail, 0 unexpected pass,
REFTEST INFO | 0 failed load, 0 exceptions)
REFTEST INFO | 160 known problems (100 known fail, 50 random, 10 skipped)
(maybe merging the 2nd and 3rd lines)
Assignee | ||
Comment 9•16 years ago
|
||
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko/20081209 Minefield/3.2a1pre] (home, optim default) (W2Ksp4)
(http://hg.mozilla.org/mozilla-central/rev/85507cfcdda8)
Per comment 8:
{
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 2410 (2406 pass, 4 load only)
REFTEST INFO | Unexpected: 1 (0 unexpected fail, 1 unexpected pass, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 111 (92 known fail, 13 random, 6 skipped)
}
Attachment #352103 -
Flags: review?(dbaron)
Assignee | ||
Comment 10•16 years ago
|
||
Per comment 7.
I did some manual unit-testing of the new code,
but I can't test the file as a whole.
Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Attachment #352027 -
Attachment is obsolete: true
Attachment #352106 -
Flags: review?(ted.mielczarek)
Attachment #352106 -
Flags: review?(dbaron)
Attachment #352027 -
Flags: review?(ted.mielczarek)
Attachment #352027 -
Flags: review?(dbaron)
Comment 11•16 years ago
|
||
Comment on attachment 352103 [details] [diff] [review]
(Bv1) New summarized log format
I don't think we want the count != 0 checks around the groups. We might want them around the items, but I'd say just take them out completely.
Attachment #352103 -
Flags: review?(dbaron) → review-
Comment 12•16 years ago
|
||
Comment on attachment 352106 [details] [diff] [review]
(Av2) Use the new summarized 'REFTEST INFO' counts
Might it be faster if the regular expression checked that REFTEST was at the beginning of the line?
Also, given the other changes, it seems like you should require all 3 INFO lines to be present, and give an error if they aren't.
Other than that, this looks fine to me, although I'm really not an official reviewer for this code.
Attachment #352106 -
Flags: review?(dbaron) → review-
Updated•16 years ago
|
Component: Testing → Reftest
Product: Core → Testing
Version: Trunk → unspecified
Assignee | ||
Comment 13•16 years ago
|
||
Bv1, with comment 11 suggestion(s).
Attachment #352103 -
Attachment is obsolete: true
Attachment #352465 -
Flags: review?(dbaron)
Updated•16 years ago
|
Attachment #352465 -
Flags: review?(dbaron) → review+
Comment 14•16 years ago
|
||
Comment on attachment 352465 [details] [diff] [review]
(Bv2) New summarized log format
r=dbaron
Could you also fix or remove the comments in the initialization of gTestResults, since they currently categorize them incorrectly?
Assignee | ||
Comment 15•16 years ago
|
||
Bv2, with comment 14 suggestion(s).
Attachment #352465 -
Attachment is obsolete: true
Assignee | ||
Comment 16•16 years ago
|
||
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]
http://hg.mozilla.org/mozilla-central/rev/dbe0fab46324
Attachment #352630 -
Attachment description: (Bv2a) New summarized log format → (Bv2a) New summarized log format
[Checkin: Comment 16]
Assignee | ||
Updated•16 years ago
|
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1]
Assignee | ||
Comment 17•16 years ago
|
||
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f7cb13b74007
Attachment #352630 -
Attachment description: (Bv2a) New summarized log format
[Checkin: Comment 16] → (Bv2a) New summarized log format
[Checkin: Comment 16 & 17]
Assignee | ||
Updated•16 years ago
|
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1] → [Bv2a is fixed1.9.1]
Assignee | ||
Comment 18•16 years ago
|
||
Av2, with comment 12 suggestion(s).
(In reply to comment #12)
> Might it be faster if the regular expression checked that REFTEST was at the
> beginning of the line?
This is what .match() does, != .search() ;-)
> Also, given the other changes, it seems like you should require all 3 INFO
> lines to be present, and give an error if they aren't.
I ported the code used elsewhere in this file.
(I'm planning a followup bug to sync all these...)
Asking r? from David for the "layout" p-o-v and Ben for the "BuildBot" p-o-v.
Attachment #352106 -
Attachment is obsolete: true
Attachment #352755 -
Flags: review?(dbaron)
Attachment #352755 -
Flags: review?(bhearsum)
Attachment #352106 -
Flags: review?(ted.mielczarek)
Comment 19•16 years ago
|
||
(In reply to comment #18)
> (In reply to comment #12)
> > Might it be faster if the regular expression checked that REFTEST was at the
> > beginning of the line?
>
> This is what .match() does, != .search() ;-)
How does that expression match, then, given that it doesn't have a .* at the end to match the end of the line?
Comment 20•16 years ago
|
||
Comment on attachment 352755 [details] [diff] [review]
(Av3) Use the new summarized 'REFTEST INFO' counts
Er, never mind my previous question... although I don't see why that makes sense as a primitive operation called "match".
This looks fine to me, although bhearsum should be the main reviewer (or designate someone else to be).
Attachment #352755 -
Flags: review?(dbaron) → review+
Comment 21•16 years ago
|
||
One other thought is that in addition to printing "FAIL" it might be good to make the test actually fail.
Assignee | ||
Comment 22•16 years ago
|
||
Av3, with comment 21 suggestion(s).
*Adds |emphasizeFailureText()| and uses it everywhere.
*Rewrites |summaryText()|:
*Removes superfluous |knownFail|.
*Separates |emphasizeFailureText()| calls for counts and leak.
*Fixes nits in Av3 patch.
***
(In reply to comment #21)
> One other thought is that in addition to printing "FAIL" it might be good to
> make the test actually fail.
Did I understood you correctly or were you thinking about something else/more ?
Attachment #352755 -
Attachment is obsolete: true
Attachment #352819 -
Flags: review?(bhearsum)
Attachment #352755 -
Flags: review?(bhearsum)
Comment 23•16 years ago
|
||
I was saying that if the reftest output can't be parsed, tinderbox ought to turn orange, not just print "FAIL" in the build's table cell.
Assignee | ||
Comment 24•16 years ago
|
||
(In reply to comment #23)
> I was saying that if the reftest output can't be parsed, tinderbox ought to
> turn orange, not just print "FAIL" in the build's table cell.
That's what I thought (and I concur), but I'm not sure how to achieve this:
I would guess |evaluateCommand()| needs to be modified ? (How ?)
Assignee | ||
Comment 25•16 years ago
|
||
Av3a, with nits fixed.
Attachment #352819 -
Attachment is obsolete: true
Attachment #352915 -
Flags: review?(bhearsum)
Attachment #352819 -
Flags: review?(bhearsum)
Updated•16 years ago
|
QA Contact: testing → reftest
Assignee | ||
Updated•16 years ago
|
Attachment #352915 -
Flags: review?(ccooper)
Updated•16 years ago
|
Attachment #352915 -
Flags: review?(bhearsum) → review+
Comment 27•16 years ago
|
||
Comment on attachment 352915 [details] [diff] [review]
(Av3b) Use the new summarized 'REFTEST INFO' counts, ++
Looks fine to me.
Assignee | ||
Updated•16 years ago
|
Attachment #352915 -
Flags: review?(ccooper)
Assignee | ||
Updated•16 years ago
|
Keywords: checkin-needed,
fixed1.9.1
Target Milestone: --- → mozilla1.9.1b3
Version: unspecified → Trunk
Assignee | ||
Updated•16 years ago
|
Whiteboard: [Bv2a is fixed1.9.1] → [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1]
Assignee | ||
Comment 28•16 years ago
|
||
Av3b, with a nit fix.
Ben, could you check this in ?
Attachment #352915 -
Attachment is obsolete: true
Assignee | ||
Updated•16 years ago
|
Whiteboard: [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1] → [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1]
Comment 29•16 years ago
|
||
Unfortunately due to a bug with our l10n builds I can't do enable this right away. I'll make sure this gets in the next downtime though.
Comment 30•16 years ago
|
||
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]
Carried forward review and fixed some bitrot.
Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v <-- steps.py
new revision: 1.20; previous revision: 1.19
done
Attachment #355929 -
Flags: review+
Comment 31•16 years ago
|
||
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]
These changes will be pushed to the master tomorrow during the downtime.
Assignee | ||
Updated•16 years ago
|
Attachment #355929 -
Attachment description: (Av3c) Use the new summarized 'REFTEST INFO' counts, ++ → (Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]
Assignee | ||
Updated•16 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1] → [Bv2a is fixed1.9.1]
Assignee | ||
Comment 32•16 years ago
|
||
(In reply to comment #5)
> What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> (compared to buildbotcustom/*/steps.py)
Ben, would you know the answer to this question ?
(In reply to comment #18)
> (I'm planning a followup bug to sync all these...)
I filed bug 473259.
(In reply to comment #24)
> (In reply to comment #23)
> > I was saying that if the reftest output can't be parsed, tinderbox ought to
> > turn orange, not just print "FAIL" in the build's table cell.
>
> That's what I thought (and I concur), but I'm not sure how to achieve this:
> I would guess |evaluateCommand()| needs to be modified ? (How ?)
David, I still don't know more than what I wrote (comment and patch), then feel
free to file a bug about this...
Assignee | ||
Comment 33•16 years ago
|
||
(syntax) Not yet tested, but asking for (early) review.
David, is that what you meant in comment 21 ?
Attachment #356650 -
Flags: review?(dbaron)
Attachment #356650 -
Flags: review?(ccooper)
Updated•16 years ago
|
Attachment #356650 -
Flags: review?(dbaron)
Comment 34•16 years ago
|
||
Comment on attachment 356650 [details] [diff] [review]
(Cv1) Update evaluateCommand() too
This seems like a good change to make, although it's not what I was talking about in comment 21, and I'm not an appropriate reviewer for this code.
(I was referring to the < 0 checks, I think.)
Comment 35•16 years ago
|
||
(In reply to comment #32)
> (In reply to comment #5)
> > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > (compared to buildbotcustom/*/steps.py)
>
> Ben, would you know the answer to this question ?
>
Are you asking about CVS or Mercurial?
I think you're asking about CVS, in which case...only mozilla/tools/buildbot-configs/testing/unittest and mozilla/tools/buildbot-configs/testing/unittest-stage are relevant. leaktest/ and moz2unit/ are not used anymore.
Assignee | ||
Comment 36•16 years ago
|
||
Cv1, fixed and tested.
match() doesn't care for re.MULTILINE mode,
per http://docs.python.org/library/re.html#matching-searching
Attachment #356650 -
Attachment is obsolete: true
Attachment #356710 -
Flags: review?(ccooper)
Attachment #356650 -
Flags: review?(ccooper)
Assignee | ||
Comment 37•16 years ago
|
||
(In reply to comment #34)
> (I was referring to the < 0 checks, I think.)
Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't it ?
(In reply to comment #35)
> (In reply to comment #32)
> > (In reply to comment #5)
> > > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > > (compared to buildbotcustom/*/steps.py)
> >
> > Ben, would you know the answer to this question ?
>
> Are you asking about CVS or Mercurial?
Cvs, yes; are these files in Hg ?
http://mxr.mozilla.org/mozilla/find?string=%2Funittest.*%2Fmozbuild.py$
These files seem to be "near" copies of steps.py,
so I wonder what they are used for and if they should be kept in sync ?
Updated•16 years ago
|
Attachment #356710 -
Flags: review?(ccooper) → review+
Comment 38•16 years ago
|
||
Comment on attachment 356710 [details] [diff] [review]
(Cv1a) Update evaluateCommand() too
[Checkin: Comment 38]
Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v <-- steps.py
new revision: 1.21; previous revision: 1.20
done
Assignee | ||
Updated•16 years ago
|
Attachment #356710 -
Attachment description: (Cv1a) Update evaluateCommand() too → (Cv1a) Update evaluateCommand() too
[Checkin: Comment 38]
Comment 39•16 years ago
|
||
(In reply to comment #37)
> (In reply to comment #34)
> > (I was referring to the < 0 checks, I think.)
>
> Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't
> it ?
No, I meant making the tinderbox turn orange.
Assignee | ||
Comment 40•16 years ago
|
||
(In reply to comment #39)
> No, I meant making the tinderbox turn orange.
David, *what* (would) makes the tinderbox turn orange ?
(if not comment 36 evaluateCommand() either)
Comment 41•16 years ago
|
||
The case where it prints "FAIL" because it can't parse the log.
Assignee | ||
Comment 42•16 years ago
|
||
David, I'm sorry but your answers don't seem to be enough for me to understand what should be done (exactly):
could you rather point me to some code (examples) of what/where ?
Comment 43•16 years ago
|
||
In that patch you had the code:
+ if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount < 0:
+ summary += "FAIL\n"
I'm saying that this should also have code to make the tinderbox turn orange.
Assignee | ||
Comment 44•16 years ago
|
||
(In reply to comment #43)
Oh, let's see:
> In that patch you had the code:
your comment 21 (correctly) referred to Av3 patch.
> + if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount <
> 0:
> + summary += "FAIL\n"
Then, in comment 22 Av3a patch, I enhanced it to
+ summary += '%s\n' % emphasizeFailureText('FAIL')
and that's what (in Av3c patch) got checked in.
> I'm saying that this should also have code to make the tinderbox turn orange.
Then, since your comment 23 answer, I'm under the impression that you think this didn't solve your request.
Did we simply misunderstood each other ?
(Or do you think there actually is something more to do yet ?)
Comment 45•16 years ago
|
||
(In reply to comment #44)
> Then, in comment 22 Av3a patch, I enhanced it to
> + summary += '%s\n' % emphasizeFailureText('FAIL')
> and that's what (in Av3c patch) got checked in.
emphasizeFailureText puts <em class="testfail">...</em> around the text.
> > I'm saying that this should also have code to make the tinderbox turn orange.
>
> Then, since your comment 23 answer, I'm under the impression that you think
> this didn't solve your request.
> Did we simply misunderstood each other ?
> (Or do you think there actually is something more to do yet ?)
Are you saying that there's code somewhere else in buildbot that magically turns the build orange if there's text with '<em class="testfail">' in it?
If not, there's more to do.
Assignee | ||
Updated•15 years ago
|
Whiteboard: [Bv2a is fixed1.9.1] → [fixed1.9.1b3: Bv2a]
You need to log in
before you can comment on or make changes to this bug.
Description
•