Closed
Bug 679759
Opened 13 years ago
Closed 13 years ago
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
Categories
(Testing :: Mochitest, defect)
Testing
Mochitest
Tracking
(firefox-esr1012+ fixed)
RESOLVED
FIXED
mozilla11
People
(Reporter: ted, Assigned: wlach)
References
Details
(Whiteboard: [buildfaster:p1][qa-])
Attachments
(1 file, 5 obsolete files)
(deleted),
patch
|
lsblakk
:
approval-mozilla-esr10+
|
Details | Diff | Splinter Review |
In bug 563745 and bug 563678 I implemented a CGI script that would run minidump_stackwalk, with the goal of making it so unit test machines never had to download symbol files to process crashes. We tried running this in production, and the web server running the CGI got swamped and we had to turn it off.
catlee tossed out an alternate idea, which is to simply make the test harnesses download the symbols as needed, only if they need them to process a crash.
This sounds reasonably easy to do, and should get us most of the benefits of the other approach with fewer moving parts.
Comment 1•13 years ago
|
||
Do crashes that occur but are expected still require processing? That could negate the benefit of this.
Reporter | ||
Comment 2•13 years ago
|
||
They shouldn't, since we fixed bug 642175.
Updated•13 years ago
|
Whiteboard: [buildfaster:p1]
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → wlachance
Assignee | ||
Comment 3•13 years ago
|
||
For some reason I volunteered to take this bug on. :)
Comment 4•13 years ago
|
||
For the future, see https://bugzilla.mozilla.org/show_bug.cgi?id=675688
Comment 5•13 years ago
|
||
I'm not using automation.py, but I took the checkForCrashes method for my own harness (which uses mozbase) and stripped out the CGI script bits:
https://github.com/ahal/peptest/blob/4ed9090aaee9cb80dda0cc09d2d5fed061c80de7/runpeptests.py#L276
Assignee | ||
Comment 6•13 years ago
|
||
This patch does what it says, or tries to. :)
Attachment #568145 -
Flags: review?(ted.mielczarek)
Reporter | ||
Comment 7•13 years ago
|
||
Comment on attachment 568145 [details] [diff] [review]
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
Review of attachment 568145 [details] [diff] [review]:
-----------------------------------------------------------------
This mostly looks good, but we don't require Python 2.6 yet, so I have to r- it.
::: build/automationutils.py
@@ +109,5 @@
> except:
> testName = "unknown"
>
> foundCrash = False
> + eraseSymbolsPath = False
I would probably call this "removeSymbolsPath", but that's pretty nitpicky.
@@ +124,5 @@
> + symbolsFile = tempfile.TemporaryFile()
> + symbolsFile.write(data.read())
> + # extract symbols to a temporary directory (which we'll delete after
> + # processing all crashes)
> + # can't use extractall because of python 2.4 compat requirements for talos
Uh, you say this, but then you go ahead and use it below! extractall is a 2.6-ism, unfortunately. I'd be fine with just shelling out to "unzip" here for simplicity.
Attachment #568145 -
Flags: review?(ted.mielczarek) → review-
Assignee | ||
Comment 8•13 years ago
|
||
This should address both elements of the review (erase->remove, remove py2.6ism).
Sorry about the python 2.6ism, I thought we did depend on py2.6 for mozilla-central at the moment (I decided I'd simplify things a bit, without removing a comment that was specific to Talos).
Attachment #568145 -
Attachment is obsolete: true
Attachment #569508 -
Flags: review?(ted.mielczarek)
Reporter | ||
Comment 9•13 years ago
|
||
This looks like the exact same patch. Did you attach the wrong patch?
Assignee | ||
Comment 10•13 years ago
|
||
(this is the actual patch I intended to attach last time)
Attachment #569508 -
Attachment is obsolete: true
Attachment #569508 -
Flags: review?(ted.mielczarek)
Attachment #569654 -
Flags: review?(ted.mielczarek)
Reporter | ||
Comment 11•13 years ago
|
||
Comment on attachment 569654 [details] [diff] [review]
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
Review of attachment 569654 [details] [diff] [review]:
-----------------------------------------------------------------
I don't think this is quite going to work, but it's pretty close. Sorry to make you go through multiple revisions here!
::: build/automationutils.py
@@ +124,5 @@
> + symbolsFile = tempfile.TemporaryFile()
> + symbolsFile.write(data.read())
> + # extract symbols to a temporary directory (which we'll delete after
> + # processing all crashes)
> + # can't use extractall because of python 2.4 compat requirements
Might want to say 2.5 here, since extractall is 2.6. (That way when we do require 2.6 everywhere someone might notice and clean this up.)
@@ +129,5 @@
> + symbolsPath = tempfile.mkdtemp()
> + zfile = zipfile.ZipFile(symbolsFile)
> + for name in zfile.namelist():
> + (dirname, filename) = map(lambda p: os.path.join(symbolsPath, p),
> + os.path.split(name))
I always use list comprehensions instead of map, they feel more Pythonic.
Also, I'm not sure this is really doing what you want it to do. If one of the zip entries is "foo/bar.txt", and symbolsPath is "/tmp/xyz", then you're going to wind up with:
dirname = "/tmp/xyz/foo", filename = "/tmp/xyz/bar.txt"
You probably want to write this whole thing more like:
for name in zfile.namelist():
filename = os.path.join(symbolsPath, name)
dirname = os.path.dirname(filename)
...
@@ +132,5 @@
> + (dirname, filename) = map(lambda p: os.path.join(symbolsPath, p),
> + os.path.split(name))
> + if dirname and not os.path.exists(dirname):
> + os.makedirs(dirname)
> + f = open(filename, "w")
with open(filename, "w") as f: (although if you're going to copy-paste this code for Talos with Python 2.4 I guess you can leave it alone for now).
@@ +169,5 @@
> os.remove(extra)
> foundCrash = True
>
> + if removeSymbolsPath:
> + shutil.rmtree(symbolsPath)
I wonder if you shouldn't wrap this entire block in try, and put this in a finally block?
Attachment #569654 -
Flags: review?(ted.mielczarek) → review-
Assignee | ||
Comment 12•13 years ago
|
||
Ok, I hope this patch should address your issues. :)
1. I added the try..finally block.
2. Not in reaction to your review, I moved the logic of downloading the symbols from a URL outside the crash processing loop. Seems to make a lot more sense that way.
3. For the rest of your comments, I just realized that we already subclassed ZipFileReader for this exact reason inside automation.py.in. I thus moved this class over and started using it. No point in having two slightly-different copies of this kind of (complicated) code.
Attachment #569654 -
Attachment is obsolete: true
Attachment #571378 -
Flags: review?(ted.mielczarek)
Reporter | ||
Comment 13•13 years ago
|
||
Comment on attachment 571378 [details] [diff] [review]
MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
Review of attachment 571378 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, thanks! I should have remembered that ZipFile stuff in automation.py, since I reviewed it not that long ago...
Attachment #571378 -
Flags: review?(ted.mielczarek) → review+
Comment 15•13 years ago
|
||
Keywords: checkin-needed
Comment 16•13 years ago
|
||
This is causing orange on every single platform, with failures like the following:
{
python reftest/runreftest.py --appname=firefox/firefox-bin --utility-path=bin --extra-profile-file=bin/plugins --symbols-path=symbols reftest/tests/testing/crashtest/crashtests.list
in dir /home/cltbld/talos-slave/test/build (timeout 1200 secs) (maxTime 7200 secs)
watching logfiles {}
argv: ['python', 'reftest/runreftest.py', '--appname=firefox/firefox-bin', '--utility-path=bin', '--extra-profile-file=bin/plugins', '--symbols-path=symbols', 'reftest/tests/testing/crashtest/crashtests.list']
environment:
[...]
using PTY: False
/home/cltbld/talos-slave/test/build/reftest/automationutils.py:109: Warning: 'with' will become a reserved keyword in Python 2.6
Traceback (most recent call last):
File "reftest/runreftest.py", line 48, in <module>
from automation import Automation
File "/home/cltbld/talos-slave/test/build/reftest/automation.py", line 20, in <module>
import automationutils
File "/home/cltbld/talos-slave/test/build/reftest/automationutils.py", line 109
with open(filename, "wb") as dest:
^
SyntaxError: invalid syntax
program finished with exit code 1
elapsedTime=0.111890
TinderboxPrint: crashtest<br/><em class="testfail">T-FAIL</em>
Unknown Error: command finished with exit code: 1
}
Backing out. Please fix and re-land only after this has been TryServer-tested.
Comment 17•13 years ago
|
||
actually mbrubeck beat me to it. Backout was:
https://hg.mozilla.org/integration/mozilla-inbound/rev/72592670532d
Version: unspecified → Trunk
Comment 18•13 years ago
|
||
Try run for 97eb160935b0 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=97eb160935b0
Results (out of 169 total builds):
success: 149
warnings: 19
failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-97eb160935b0
Assignee | ||
Comment 19•13 years ago
|
||
Ok, breaking down this try server run:
Failures:
1. A timeout in jetpack. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-97eb160935b0/try-linux64/try_fedora64_test-jetpack-build112.txt.gz
Warnings:
Most of these are problems in Lion, which we don't support yet. Of the remainder, I looked at them to see what was going on:
1. Another warning in a build on Linux. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-97eb160935b0/try-linux/try-linux-build3283.txt.gz
2. Warnings during a mochitest run on mac snow leopard. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-97eb160935b0/try-macosx64-debug/try_snowleopard-debug_test-mochitest-other-build48.txt.gz
3. Warning during a reftest run on android. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/wlachance@mozilla.com-97eb160935b0/try-android/try_tegra_android_test-reftest-1-build3040.txt.gz
4. Warning during a reftest run on android. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/wlachance@mozilla.com-97eb160935b0/try-android/try_tegra_android_test-reftest-2-build3094.txt.gz
5. Warning during a mochitest run on Linux. Unrelated to this change. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-97eb160935b0/try-linux64-debug/try_fedora64-debug_test-mochitest-other-build204.txt.gz
Assignee | ||
Comment 20•13 years ago
|
||
Ok, here's a hopefully final take. It fixes the bug with the use of "with" that we noticed after pushing the first time (oops). I also noticed at the last minute that we're still shipping poster.zip as part of the tree, even though it's no longer needed. So I deleted that. Doing one last try server run, can we push assuming that no new oranges occur there?
Attachment #571378 -
Attachment is obsolete: true
Attachment #573004 -
Flags: review?(ted.mielczarek)
Comment 21•13 years ago
|
||
Try run for 1dbb678f75fc is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=1dbb678f75fc
Results (out of 18 total builds):
failure: 18
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-1dbb678f75fc
Comment 22•13 years ago
|
||
Try run for fc5ce15dbe5a is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=fc5ce15dbe5a
Results (out of 18 total builds):
exception: 18
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-fc5ce15dbe5a
Comment 23•13 years ago
|
||
Try run for a56ddb64ab16 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=a56ddb64ab16
Results (out of 214 total builds):
success: 194
warnings: 19
failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-a56ddb64ab16
Assignee | ||
Comment 24•13 years ago
|
||
(In reply to Mozilla RelEng Bot from comment #23)
> Try run for a56ddb64ab16 is complete.
> Detailed breakdown of the results available here:
> https://tbpl.mozilla.org/?tree=Try&rev=a56ddb64ab16
> Results (out of 214 total builds):
> success: 194
> warnings: 19
> failure: 1
> Builds available at
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.
> com-a56ddb64ab16
Ok, there don't seem to be any oranges in this build that could be caused by my change AFAICT.
Reporter | ||
Comment 25•13 years ago
|
||
Comment on attachment 573004 [details] [diff] [review]
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
Review of attachment 573004 [details] [diff] [review]:
-----------------------------------------------------------------
::: build/automationutils.py
@@ +105,5 @@
> + else:
> + path = os.path.split(filename)[0]
> + if not os.path.isdir(path):
> + os.makedirs(path)
> + dest = open(filename, "wb")
Don't go removing with blocks. You just need to import them from the future up top, like automation.py does:
http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#40
Attachment #573004 -
Flags: review?(ted.mielczarek) → review+
Comment 26•13 years ago
|
||
Try run for 6da8df028cc7 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=6da8df028cc7
Results (out of 209 total builds):
success: 184
warnings: 25
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-6da8df028cc7
Assignee | ||
Comment 27•13 years ago
|
||
This imports with from future, as desired.
Attachment #573004 -
Attachment is obsolete: true
Assignee | ||
Comment 28•13 years ago
|
||
The latest patch passes try and gets a positive review. Please commit when ready.
Keywords: checkin-needed
Comment 29•13 years ago
|
||
Comment 30•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 31•13 years ago
|
||
Comment on attachment 573874 [details] [diff] [review]
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
This is decidedly inconvenient.
This patch landed just before 10 merged from m-c, but was backed out, and then landed after, for 11. But then bug 561754 maybe sort of forgot that, and treated esr-10 as though it was able to download symbols on demand. That means that now buildbot doesn't download symbols while downloading a build for tests, and doesn't pass a local path, but automation doesn't know what to do with a remote symbols path, and so we get useless stacks from crashes like https://tbpl.mozilla.org/php/getParsedLog.php?id=10831561&tree=Mozilla-Esr10 which might be some known crash, or might be a new crash introduced by that push, and there's no way of telling.
We have two choices: land this patch on esr-10 (it applies cleanly, I have no idea whether it actually works or how to tell other than by landing it), or, give esr-10 the same "don't download on demand" buildbot treatment that bug 561754 gave 1.9.2 (which we have absolutely no reason to believe works, we haven't actually seen a crash on 1.9.2 since March 1st, and the other thing that wasn't downloading symbols on demand, Android, now is because when it wasn't, it wasn't getting stacks for crashes).
Attachment #573874 -
Flags: approval-mozilla-esr10?
Updated•13 years ago
|
status-firefox-esr10:
--- → affected
tracking-firefox-esr10:
--- → ?
Reporter | ||
Comment 32•13 years ago
|
||
I think that since we have this working on trunk, the simplest solution is to land this on esr. This is a test-automation-only change, and won't affect the bits we're actually shipping (except that we'll actually be able to get stacks for test failures).
Comment 33•13 years ago
|
||
Comment on attachment 573874 [details] [diff] [review]
Drop MINIDUMP_STACKWALK_CGI support, let harness download symbols as needed
[Triage Comment]
approving for landing, please go ahead as per https://wiki.mozilla.org/Release_Management/ESR_Landing_Process
Attachment #573874 -
Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Updated•13 years ago
|
Comment 34•13 years ago
|
||
Comment 35•12 years ago
|
||
I don't think this is anything QA needs to explicitly verify, flagging qa-. Please correct me if I am wrong.
Whiteboard: [buildfaster:p1] → [buildfaster:p1][qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•