Closed Bug 1261199 Opened 8 years ago Closed 8 years ago

Use StructuredOutputParser for parsing reftest test results

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(firefox51 fixed)

RESOLVED FIXED
Tracking Status
firefox51 --- fixed

People

(Reporter: ahal, Assigned: pyang)

References

Details

Attachments

(1 file, 4 obsolete files)

Reftest emits structured logs, but mozharness currently doesn't consume them. It is still parsing the old formatted logs using a set of regexes. We should make sure to use the StructuredOutputParser class instead of DesktopUnittestParser:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#502

We'll need to be careful that we don't miss failures that are hidden in unstructured output. This might mean running the same set of regexes when we come across unstructured output.
Attached patch reftest-structuredlog (obsolete) (deleted) — Splinter Review
Adding reftest for using structured log
Attachment #8755752 - Flags: feedback?(ahalberstadt)
Attached patch reftest-structuredlog (obsolete) (deleted) — Splinter Review
Attachment #8755752 - Attachment is obsolete: true
Attachment #8755752 - Flags: feedback?(ahalberstadt)
Andrew, we don't see issues for enabling strcuturedlog for reftest.  any comments?
Flags: needinfo?(ahalberstadt)
Awesome! So at the very minimum, I'd like to see:

1) Force a test-failure, make sure job turns orange. You can change an == to a != in the reftest manifest to do this.
2) Force a leak, make sure job turns orange. You can remove an asserts directive in the manifest to do this.
3) Force a crash, make sure job turns red. You can add a proc.kill(sig=singal.SIGABRT) somewhere in the harness to do this (after sleeping a bit to give the harnes a chance to start running tests).

For all of the above we also need to make sure the appropriate failure messages appear in the log. I'd recommend doing a push both with and without your patch here, and compare the logs of the two. We really need better tests for all of this, but unfortunately we do not.
Flags: needinfo?(ahalberstadt)
(In reply to Andrew Halberstadt [:ahal] from comment #5)
> For all of the above we also need to make sure the appropriate failure
> messages appear in the log. I'd recommend doing a push both with and without
> your patch here, and compare the logs of the two. We really need better
> tests for all of this, but unfortunately we do not.

does it mean to push 3 x 2 (with & without) pushes? Or we need combinations of those 3 conditions.
No, you don't need to do combinations.. you could even do all 3 in the same push if you wanted (i.e in different chunks)
Hi ahal, here are force error changeset and result

Force crash changeset
https://hg.mozilla.org/try/rev/ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57

treeherder result
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff2dc3f806a6ffd1b699e86ac1c6366b07c39a57

for the red suites, main process was killed so no 'SUITE-END' log.
Flags: needinfo?(ahalberstadt)
Haha nice! I like your time bomb. I see what you were saying earlier.. it would be good to get a little module we could use to test certain conditions like that somewhere so we can re-use it in other places. So it looks like there are a couple things to investigate here:

1) It looks like anytime something goes wrong, no suite_end message is being emitted. We should always be emitting this even if there are failures or crashes

2) When there is a crash, we don't generate a stack trace. Usually there should be a long spew of addresses and a PROCESS-CRASH log line. I'm not sure why this is not happening :/

3) The output format is slightly different from before. I think this is because we aren't using the customized reftest formatter here:
https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output.py#13

We'll need to make sure that is the formatter being used to log to stdout. Let me know if you have anymore questions about these problems, or need a hand looking into them.
Flags: needinfo?(ahalberstadt)
(In reply to Andrew Halberstadt [:ahal] from comment #10)
> Haha nice! I like your time bomb. I see what you were saying earlier.. it
> would be good to get a little module we could use to test certain conditions
> like that somewhere so we can re-use it in other places. So it looks like
> there are a couple things to investigate here:
> 
> 1) It looks like anytime something goes wrong, no suite_end message is being
> emitted. We should always be emitting this even if there are failures or
> crashes
supposedly suite_end should be emitted by runner process. When it was forced killed, does it make sense to send suite_end by manage process?  or we need to add signal handler for those situations.

> 
> 2) When there is a crash, we don't generate a stack trace. Usually there
> should be a long spew of addresses and a PROCESS-CRASH log line. I'm not
> sure why this is not happening :/
> 

ya agree.  Should add more log/stack trace while process lost.

> 3) The output format is slightly different from before. I think this is
> because we aren't using the customized reftest formatter here:
> https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/output.
> py#13
> 
> We'll need to make sure that is the formatter being used to log to stdout.
Not really familiar with its format.  Will check if we need to add another patch for that.


> Let me know if you have anymore questions about these problems, or need a
> hand looking into them.
Thanks!  Those are really helpful :D
Depends on: 1286513
Attached patch enable-reftest (obsolete) (deleted) — Splinter Review
leave only config patch here.
Attachment #8755827 - Attachment is obsolete: true
Attached patch enable-reftest (obsolete) (deleted) — Splinter Review
Adding structured log enabling part
Attachment #8772705 - Attachment is obsolete: true
Comment on attachment 8779692 [details] [diff] [review]
enable-reftest

I re-triggered failed tests and looks green.
Try wasn't stable in these couple of days, maybe we can run them again.
Attachment #8779692 - Flags: review?(ahalberstadt)
Comment on attachment 8779692 [details] [diff] [review]
enable-reftest

Review of attachment 8779692 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks good! I still want to do a bit of testing before we land this though, I'll do a new try run. For future reference, you can use '-u reftest,crashtest,jsreftest' for your try push, as we only care about those suites with this patch.
Attachment #8779692 - Flags: review?(ahalberstadt) → review+
Nice that try run includes some known intermittents and a crash, both seem to get logged correctly! Paul could you upload a new patch with a proper commit message?
Flags: needinfo?(pyang)
I did find one minor "problem" that we need some sheriff input on.

When tests fail to start, or there is a crash before the tests finish, DesktopUnittestOutputParser prints "No tests run or summary not found":
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/unittest.py#191

However when this happens with the StructuredOutputParser, it prints either "No checks run" and/or "No suite_end message was emitted by this harness":
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/structuredlog.py#15

The StructuredOutputParser behaviour is strictly better than the old message, as it can differentiate between "tests didn't start" and "tests didn't finish". There are two problems with changing this message though:

1) Treeherder regexes need to be updated to flag these two new messages
2) Existing intermittent bugs filed with the old message will no longer be starrable

Problem 1) should be fixed either way, it's just a matter of do we need to block on it. Problem 2) will cause some pain for the sheriffs. A potential compromise would be to temporarily emit all three messages from the StructuredOutputParser, which will give sheriffs some time to migrate over to the new message from the old.

Philor, Wes, what would you prefer we do here?
Flags: needinfo?(wkocher)
Flags: needinfo?(philringnalda)
Depends on: 1294703
If you look at the existing bugs on the current message, you'll see that nobody is working on any of them, and that there's no understanding whatsoever of what the message means or why it is output or when it should or should not be starred or which bug it should be starred as, so it won't hurt anything if these classes of failure have no starrable message for a while.
Flags: needinfo?(philringnalda)
Thanks good to know.

Also I'm an idiot, treeherder already flags the two new message because they are logged with ERROR. I didn't pay close enough attention to the failure summary. This means problem 1) is already solved. And given that starring the old message is already terrible, I guess there's nothing extra to be done here.
Flags: needinfo?(wkocher)
Attached patch enable-reftest (deleted) — Splinter Review
modifying only commit message. carry on review.
Thanks!
Attachment #8779692 - Attachment is obsolete: true
Flags: needinfo?(pyang)
Attachment #8780967 - Flags: review+
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/95aef368faed
Use StructuredOutputParser for parsing reftest test results; r=ahal
Thought I should mention that I tested this by writing a patch that runs both the DesktopUnittestOutputParser and the StructuredOutputParser on the same log. It then compares the TBPL status of both, and passes the job if they're the same, and fails the job if they're different:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3abf6cb08b972ba70af6f0941f0d6e2d2bfda7d

All green means that both output parsers appear to be flagging the same sets of errors. It's not 100% fool proof, but much better than manually comparing logs.
It should have found a difference on 330 seconds without output hangs, since we have been failing to actually parse anything for them other than the utterly useless "crash" that we cause after them, since somewhere around February, a thing which this now fixed.
Yeah, I wasn't able to pipe logs that were printed directly to stdout (i.e not through the logger) to the StructuredOutputParser in my test patch. I presume the 330s timeout message was one of those.

But when the StructuredOutputParser encounters one of these in production, it passes it on to the base class. Which means unstructured output is now going through the base OutputParser instead of DesktopUnittestOutputParser. This is probably what inadvertently fixed it.
https://hg.mozilla.org/mozilla-central/rev/95aef368faed
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 1313265
Depends on: 1321127
Depends on: 1325148
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: