Closed Bug 385248 Opened 17 years ago Closed 15 years ago

[client] tinderbox should (sometimes?) run tools/rb/fix-linux-stack.pl / fix-macosx-stack.pl over log

Categories

(Firefox Build System :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.2b1

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

Attachments

(1 file, 1 obsolete file)

Tinderbox logs sometimes have crash stacks or assertion stacks in them.  However, on Linux, these stacks aren't very useful by default, unless tools/rb/fix-linux-stack.pl is run over the log.  We should do this so that the logs are more useful.  It needs to be run on the machine with the build, before the build is deleted, since it's getting debugging information out of the libraries.

(If we do this, we should also follow up by installing some *-debuginfo RPMs on the tinderboxes.)
Summary: [client] tinderbox should (sometimes?) run tools/rb/fix-linux-stack.pl over log → [client] tinderbox should (sometimes?) run tools/rb/fix-linux-stack.pl / fix-macosx-stack.pl over log
See also bug 432385 for doing this over the leak stats differences.
Severity: normal → enhancement
What can I do to help this happen?  I'm trying to sort out orange on tbox due to an assert that would take all of 2 minutes to pin down if I knew what the immediate caller of the function asserting is.  As it is, I've now spent the better part of an hour on it, and counting.
This sounds like we just need to set up some cron jobs?  What needs to happen?  I can work with IT to speed this up.
I think comment 0 sort of sums it up.  In case of a crash or fatal assert, we need to run this script on the log.  If it's easier to run in all cases, we should just do that.
Oh, and I don't see how a cron job would do the right thing.
Yeah, re-read comment 0 -- I was hoping we could run the script on logs in a batch but I guess that's not possible.  Some questions I had in an email to dbaron:

* who worked on this in the past?  who can work on it now?
* where is the code we'd be changing?
* is there a test env for us to test this in?
Regarding who works on it, I'm really just figuring out who could be a contact for info about how to setup/install/etc.
IT would have nothing to do with build machines.
Assignee: morgamic → nobody
Component: Tinderbox → Release Engineering
Product: Webtools → mozilla.org
QA Contact: tinderbox → release
Version: Trunk → other
Thanks, really helpful.
Setup/install what?  The script?  It's just at tools/rb/fix-linux-stack.pl (or fix-macosx-stack.pl) in the m-c tree.  If you have a source tree, you have it.  I guess on the test machines we don't have the source tree, so we might need to copy it over or something.....  It needs no setup; just needs to be run.
Back when this bug was filed (or at least pretty shortly before that point), there was no component for tinderbox client bugs other than Webtools/Tinderbox.  That's why it was in that component, marked "[client]", meaning that it was a bug in the code run on the build machines.

All this needs is a change to the buildbot scripts to run the scripts over the logs that come from debug builds.
Ok - thanks guys.  Will check with releng and see who can help.
So what type of builds would this need to run on in modern builds ? leak test (debug) and unit (for the stacks on crash) ? 

For the former, can we teach leaktest.py to pipe the crash stack through fix-*-stack.pl ?  Could you be more specific which "logs" need to be processed.
Just anything that has a stack.  If you can pipe just the stack through it, that's perfect.
This script should be run, per unittest job. Doing this as a cron job wont always work right, depending on how quickly the slave cleans up to start another job.

afaict, the best approach is to have these scripts called as part of the existing Makefile targets that we run per build. Probably just near the end, after the logs are produced, but before the Makefile returns - but I dont know details of how the scripts work.

I see from comment#10 that the scripts are already checked in. So, it seems like remaining steps are to change the Makefile targets?

Assigning over to BuildConfig - Ted, does that seem like a fair summary to you?
Component: Release Engineering → Build Config
Product: mozilla.org → Firefox
QA Contact: release → build.config
Version: other → unspecified
Basically "anything that has a stack" means "any stdout output from a debug build", since debug builds will print stacks for both assertions and crashes.

I think this means it has to happen in the buildbot scripts.
Component: Build Config → Release Engineering
Product: Firefox → mozilla.org
QA Contact: build.config → release
Version: unspecified → other
Er, I suppose if everything's in the tree you're right that it goes in Firefox::Build Config, although I'm not sure if that's the case for the alive test, which is often the first (and only, if its failure makes the rest of the tests not run) one to fail when something goes wrong.

So moving back to build config, and if we fix it there and alive test is still an issue, we'll move it back.
Component: Release Engineering → Build Config
Product: mozilla.org → Firefox
QA Contact: release → build.config
Version: other → Trunk
And from nthomas, the commands the leak test builders currently run are:

python leaktest.py -- -register
python leaktest.py -- -CreateProfile default
python leaktest.py -l bloat.log
python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks=sdleak.log
obj-firefox/dist/bin/leakstats ../malloc.log
obj-firefox/dist/bin/leakstats ../malloc.log.old
bash -c perl build/tools/trace-malloc/diffbloatdump.pl --depth=15 --use-address
  /dev/null sdleak.log > sdleak.tree
/bin/bash -c perl build/tools/rb/fix-linux-stack.pl sdleak.tree.raw > sdleak.tree
perl build/tools/trace-malloc/diffbloatdump.pl --depth=15 sdleak.tree.old
  sdleak.tree
(In reply to comment #18)
> And from nthomas, the commands the leak test builders currently run are:
> 
> python leaktest.py -- -register
> python leaktest.py -- -CreateProfile default
> python leaktest.py -l bloat.log
> python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks=sdleak.log
> obj-firefox/dist/bin/leakstats ../malloc.log
> obj-firefox/dist/bin/leakstats ../malloc.log.old
> bash -c perl build/tools/trace-malloc/diffbloatdump.pl --depth=15 --use-address
>   /dev/null sdleak.log > sdleak.tree
> /bin/bash -c perl build/tools/rb/fix-linux-stack.pl sdleak.tree.raw >
> sdleak.tree
> perl build/tools/trace-malloc/diffbloatdump.pl --depth=15 sdleak.tree.old
>   sdleak.tree

Ted: how about moving all that (and dbaron's additional script request) into a new Makefile target?

("make leaktest" was being discussed in irc, but I don't care much what its called.)
That's a somewhat abbreviated list, leaving out previous log downloads and new log uploads. For the full list I recommend searching for "BuildStep started" in a leak test log, or reading 
http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l444
http://hg.mozilla.org/build/buildbotcustom/file/tip/steps/test.py#l45
(In reply to comment #19)
> Ted: how about moving all that (and dbaron's additional script request) into a
> new Makefile target?

If we want to do that, can we do it in a different bug?

This bug has accumulated far too many unrelated comments already today.
(In reply to comment #21)
> (In reply to comment #19)
> > Ted: how about moving all that (and dbaron's additional script request) into a
> > new Makefile target?
> 
> If we want to do that, can we do it in a different bug?

Ted, let me know if you want a separate bug filed - I'd be happy to oblige! :-)


> This bug has accumulated far too many unrelated comments already today.
I have a work-in-progress patch that works for me at
http://hg.mozilla.org/users/dbaron_mozilla.com/patches/file/tip/run-output-through-stack-fixer
though I still need to do a lot more testing, and I suspect (for portability) that I also need to call fix-linux-stack/fix-macosx-stack by calling perl rather than by calling the script directly.
Assignee: nobody → dbaron
OS: Linux → All
Priority: -- → P3
Product: Firefox → Core
QA Contact: build.config → build-config
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2
Attached patch patch (obsolete) (deleted) — Splinter Review
I think this is at least the bulk of what's needed here.

I've tested that it fixes up stacks when running leaktest (both regular and trace-malloc commands), reftest, and mochitest on my Linux debug build.

I also sent it to try server with the "if IS_DEBUG_BUILD" changed to "if True" so that it would actually get run under the build automation; mostly green, though I got what I have to presume is a random-orange timeout in the middle of mochitest-plain on Mac.  (Unit tests passed fully on Windows and Linux, and everything other than mochitest-plain passed on Mac.)


The core of the patch is pretty simple; in the case where we're piping the output (i.e., no debugger attached), stick an extra pipe in the chain if appropriate.  A few things of note:

 * I don't have a whole lot of experience writing python, so you should review my python somewhat closely

 * I added two new global variables in automation.py (PERL and IS_LINUX), but I didn't put them in the __all__ list, which I presume is the list of stuff the module exports, because I tend to think other callers should avoid using them if possible.  (Especially so for IS_LINUX, since we want people to use UNIXISH in general, but in this case fix-linux-stack is actually only known to work on Linux; I'm not sure if any other Unixish platforms ship addr2line (part of GNU binutils).)

 * I install fix-linux-stack or fix-macosx-stack to dist/bin/ from the Makefile in build/ because I didn't want to deal with adding one for tools/rb and figuring out where it should fit in the build process.  They're currently installed ifdef ENABLE_TESTS (that ifdef starts just above the context of the diff), although we only actually use them in debug builds because I think debug builds are the only ones that print stacks to stdout in their default configuration.
Attachment #392979 - Flags: review?(ted.mielczarek)
Attachment #392979 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 392979 [details] [diff] [review]
patch

This looks good, with the caveat that it's not going to work right if we start running unittests on packaged debug builds. Since we aren't currently doing that, I'm ok with you checking this in, as long as you file a follow-up bug on fixing that. FWIW, what would need fixing is:
1) Include the fix-stack scripts in the test package, probably just by putting them in the list here: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/Makefile.in#92
2) Provide some way to override PERL from the command line.

Comments on the patch itself:
+
+ifeq ($(OS_ARCH),Darwin)
+libs:: $(topsrcdir)/tools/rb/fix-macosx-stack.pl
+	$(INSTALL) $< $(DIST)/bin
 endif
+
+ifeq ($(OS_ARCH),Linux)
+libs:: $(topsrcdir)/tools/rb/fix-linux-stack.pl
+	$(INSTALL) $< $(DIST)/bin
+endif
+endif

Do you have an extra endif there? (Did you mean to have an ifdef ENABLE_TESTS on top?)

+        stackFixerProcess = Process([PERL, os.path.join(utilityPath, stackFixerCommand)], stdin=logsource, stdout=subprocess.PIPE)

I wonder if you ought to be calling .wait() on this process later, to ensure that we don't wind up with zombies.
(In reply to comment #25)
> Do you have an extra endif there? (Did you mean to have an ifdef ENABLE_TESTS
> on top?)

It's in an existing ENABLE_TESTS; the "extra" is just because diff decided to put the existing one in the middle rather than at the end (note the one without a + that looks like it ought to have one).
Attached patch patch [Checkin: Comment 29] (deleted) — Splinter Review
Here's a revised patch that I think addresses all your comments other than the passing perl as an argument issue.  (Not quite sure how to approach that given how many ways automation.py is used; maybe an environment variable would be better?)

So the changes here are:
 * waiting for the stack fixer process, and reporting a failure if it returns nonzero status

 * adding fix-linux-stack.pl / fix-macosx-stack.pl to the test package
Attachment #392979 - Attachment is obsolete: true
Attachment #393560 - Flags: review?(ted.mielczarek)
Attachment #393560 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 393560 [details] [diff] [review]
patch
[Checkin: Comment 29]

Looks good. Yeah, an environment variable for PERL would probably work. For now, this will work fine if we get debug unittests on packaged builds, since all our build slaves will have perl in the same place.
http://hg.mozilla.org/mozilla-central/rev/219bd7354882
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.9.2 → mozilla1.9.2b1
And filed bug 509620 as a followup on the path-to-perl issue.
I should probably also file separate bugs for xpcshell-tests and maybe make check.
|make check| doesn't seem worth the effort, since it's a bunch of standalone C++ programs scattered about various makefiles.

xpcshell-tests should be easy, the patch will probably look very similar to this, in runxpcshelltests.py.
I filed bug 510550 for runxpcshelltests.py.
Attachment #393560 - Attachment description: patch → patch [Checkin: Comment 29]
Depends on: 531479
blocking-b2g: --- → koi?
blocking-b2g: koi? → ---
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: