Closed
Bug 799295
Opened 12 years ago
Closed 12 years ago
v8 crashing most runs on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox17 | --- | unaffected |
firefox18 | + | fixed |
firefox19 | --- | fixed |
firefox-esr17 | --- | unaffected |
People
(Reporter: philor, Assigned: glandium)
References
Details
(Keywords: crash, reproducible, Whiteboard: [qa-])
Crash Data
Attachments
(1 file)
(deleted),
patch
|
dvander
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Since today's merge to aurora, every Linux32 PGO dromaeo_css has crashed like https://tbpl.mozilla.org/php/getParsedLog.php?id=15934446&tree=Mozilla-Aurora Thread 0 (crashed) 0 libxul.so!js::gc::MarkKind [Heap.h:118a3b748323 : 1017 + 0x0] eip = 0x02936b85 esp = 0xbfff2f50 ebp = 0xbfff3080 ebx = 0x02e56f34 esi = 0xb3d3a100 edi = 0x82000000 eax = 0x00000000 ecx = 0x82000000 edx = 0xbfff2fbc efl = 0x00010286 Found by: given as instruction pointer in context 1 libxul.so!js::gc::MarkValueInternal [Marking.cpp:118a3b748323 : 387 + 0x10] eip = 0x02936edd esp = 0xbfff2f90 ebp = 0xbfff3080 ebx = 0x02e56f34 esi = 0x00000000 edi = 0xbfff3048 Found by: call frame info 2 libxul.so!js::ion::MarkIonActivations [IonFrames.cpp:118a3b748323 : 503 + 0xc] eip = 0x029b8a44 esp = 0xbfff2fd0 ebp = 0xbfff3080 ebx = 0x02e56f34 esi = 0xb3d3a100 edi = 0xbfff3048 Found by: call frame info 3 libxul.so!js::MarkRuntime [jsgc.cpp:118a3b748323 : 2610 + 0xb] eip = 0x02853f6f esp = 0xbfff30b0 ebp = 0xab4c6e50 ebx = 0x02e56f34 esi = 0xb3d3a100 edi = 0xb3d3a000 Found by: call frame info 4 libxul.so!IncrementalCollectSlice [jsgc.cpp:118a3b748323 : 3434 + 0x4] eip = 0x0285d9e3 esp = 0xbfff3150 ebp = 0x00000015 ebx = 0x02e56f34 esi = 0xbfff3274 edi = 0xbfff326c Found by: call frame info 5 libxul.so!GCCycle [jsgc.cpp:118a3b748323 : 4533 + 0xf] eip = 0x02860c64 esp = 0xbfff32b0 ebp = 0x00000000 ebx = 0x02e56f34 esi = 0xbfff32dc edi = 0x00000000 Found by: call frame info 6 libxul.so!Collect [jsgc.cpp:118a3b748323 : 4647 + 0x23] eip = 0x02858500 esp = 0xbfff3300 ebp = 0x00000001 ebx = 0x02e56f34 esi = 0xb3d3a1bc edi = 0xb3d3a000 Found by: call frame info 7 libxul.so!js_InvokeOperationCallback [jscntxt.cpp:118a3b748323 : 1132 + 0x29] eip = 0x0283c964 esp = 0xbfff3360 ebp = 0x00000031 ebx = 0x9b5d8dc0 esi = 0xac0bf790 edi = 0x00000067 Found by: call frame info 8 libxul.so!js_HandleExecutionInterrupt [jscntxt.cpp:118a3b748323 : 1156 + 0x7] eip = 0x0283c98c esp = 0xbfff3390 ebp = 0x00000031 ebx = 0x9b5d8dc0 esi = 0x00000000 edi = 0x00000067 Found by: call frame info 9 libxul.so!js::ion::InterruptCheck [VMFunctions.cpp:118a3b748323 : 387 + 0xb] eip = 0x01f39b4f esp = 0xbfff33b0 ebp = 0x00000031 ebx = 0x9b5d8dc0 That same code (since this is the merge) did not crash on mozilla-central, PGO or non-PGO. The one obvious difference is that on aurora we don't do --enable-profiling. In theory https://tbpl.mozilla.org/?tree=Profiling is supposed to let us get away with that, but it doesn't run Talos (or PGO). Questions that a try push can answer: if we push m-c to try with the --enable-profiling removed from the mozconfig, will that be enough to crash, or do we also need to add mk_add_options MOZ_PGO=1 to the mozconfig to get the crash? Aurora's also closed because of bug 799277, but if that gets fixed, this is still a tree-closer.
Reporter | ||
Comment 1•12 years ago
|
||
Forgot to cc the poor unfortunate who will have to decide whether we're willing to open aurora without Linux dromaeo, assuming bug 799277 gets fixed first.
This push disables profiling from trunk, but PGO is off. https://tbpl.mozilla.org/?tree=Try&rev=ad1fed437a48
This seems reasonably important, so here's the push with PGO enabled: https://tbpl.mozilla.org/?tree=Try&rev=3ebe093955e1
With profiling disabled and PGO off, there's no crash.
Reporter | ||
Comment 5•12 years ago
|
||
With profiling disabled and PGO on, there's a crash. Sorry, this is going to suck for you.
Comment 6•12 years ago
|
||
I guess this is an argument against not running both Linux64 and Linux32...
Updated•12 years ago
|
tracking-firefox17:
--- → +
Updated•12 years ago
|
tracking-firefox17:
+ → ---
tracking-firefox18:
--- → +
Updated•12 years ago
|
Status: NEW → ASSIGNED
QA Contact: dvander
After thinking about this more and talking with members of the JS team, I think we should stop testing Linux PGO builds. All evidence suggests that PGO is endlessly buggy. We've wasted plenty of developer man-hours tracking down PGO-only crashes, only to find that a specific function has been optimized wrong and must be explicitly deoptimized. There's really nothing good about this situation, littering the codebase with untestable compiler bug workaround hints that may or may not randomly become obsolete. (I'm also sceptical about how beneficial PGO is, but I won't argue that here.) PGO deeply and zealously modifies calling conventions, function bodies, and large callgraphs. We've seen it cause test failures and top crashes that are very difficult to debug. That's even on *Windows*, where Microsoft has thrown a huge amount of resources into their C++ compiler. I'm not convinced GCC's implementation will be better, and from what I've heard, PGO has been historically buggy in compilers in general. It is possible there is a real bug here. If so, we'll see it one way or another, likely in a context that is more easily reproducible and easier to debug. So, I say we stop testing Linux PGO, free up releng resources and developer time, and get rid of these spurious bugs :)
Assignee: general → dvander
QA Contact: dvander
Comment 8•12 years ago
|
||
(In reply to David Anderson [:dvander] from comment #7) > So, I say we stop testing Linux PGO, free up releng resources and developer > time, and get rid of these spurious bugs :) +1 from RelEng I think :) Can you propose this in the newsgroups?
(In reply to Chris AtLee [:catlee] from comment #8) > (In reply to David Anderson [:dvander] from comment #7) > > So, I say we stop testing Linux PGO, free up releng resources and developer > > time, and get rid of these spurious bugs :) > > +1 from RelEng I think :) Can you propose this in the newsgroups? Sure, which newsgroups?
Comment 10•12 years ago
|
||
(In reply to David Anderson [:dvander] from comment #9) > Sure, which newsgroups? mozilla.dev.platform
Comment 11•12 years ago
|
||
I've hidden this test on aurora now. There's no point in starring this on every run.
Reporter | ||
Comment 12•12 years ago
|
||
News from various fronts: It's not actually every run on aurora, only most: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&noignore=1&rev=564500b98bac&jobname=Rev3%20Fedora%2012%20mozilla-aurora%20pgo%20talos%20dromaeojs caught a green that made me wonder if something accidentally "fixed" it, but it didn't. Something in https://hg.mozilla.org/projects/profiling/pushloghtml?startID=1382&endID=1383 (bug 804636? bug 800568? bug 800063? bug 778948? something less obvious?) on the trunk caused the dromaeo_css crash to become less frequent, but added a similar crash Thread 0 (crashed) 0 libxul.so!ReadAllocation [Registers.h:6ed9ca8c36bd : 112 + 0x4] eip = 0x01ebdcf4 esp = 0xbf94abd0 ebp = 0xbf94ace0 ebx = 0x02df5100 esi = 0xbf94aca8 edi = 0x00000008 eax = 0x00000000 ecx = 0x0000001f edx = 0x00000000 efl = 0x00010286 Found by: given as instruction pointer in context 1 libxul.so!js::ion::MarkIonActivations(JSRuntime*, JSTracer*) [IonFrames.cpp:6ed9ca8c36bd : 506 + 0xa] eip = 0x0295243b esp = 0xbf94ac30 ebp = 0xbf94ace0 ebx = 0x02df5100 esi = 0xb3e3a100 edi = 0xbf94aca8 Found by: call frame info 2 libxul.so!js::MarkRuntime [jsgc.cpp:6ed9ca8c36bd : 2515 + 0xb] eip = 0x027eebaf esp = 0xbf94ad10 ebp = 0xa5eef274 ebx = 0x02df5100 esi = 0xb3e3a100 edi = 0xb3e3a000 Found by: call frame info 3 libxul.so!IncrementalCollectSlice [jsgc.cpp:6ed9ca8c36bd : 3351 + 0x4] eip = 0x027f79f3 esp = 0xbf94adb0 ebp = 0x00000015 ebx = 0x02df5100 esi = 0xbf94aec0 edi = 0xbf94aeb8 Found by: call frame info 4 libxul.so!GCCycle [jsgc.cpp:6ed9ca8c36bd : 4447 + 0xf] eip = 0x027facd4 esp = 0xbf94af00 ebp = 0x00000000 ebx = 0x02df5100 esi = 0xbf94af28 edi = 0x00000000 Found by: call frame info 5 libxul.so!Collect [jsgc.cpp:6ed9ca8c36bd : 4561 + 0x23] eip = 0x027f2829 esp = 0xbf94af50 ebp = 0x00000001 ebx = 0x02df5100 esi = 0xb3e3a1c0 edi = 0xb3e3a000 Found by: call frame info 6 libxul.so!js_InvokeOperationCallback(JSContext*) [jscntxt.cpp:6ed9ca8c36bd : 1205 + 0x29] eip = 0x027d6994 esp = 0xbf94afb0 ebp = 0xffffff81 ebx = 0x00000003 esi = 0xa4409990 edi = 0x9ef64010 Found by: call frame info 7 libxul.so!js_HandleExecutionInterrupt(JSContext*) [jscntxt.cpp:6ed9ca8c36bd : 1229 + 0x7] eip = 0x027d69bc esp = 0xbf94afe0 ebp = 0xffffff81 ebx = 0x00000003 esi = 0xffffff87 edi = 0x9ef64010 Found by: call frame info 8 libxul.so!js::ion::InterruptCheck(JSContext*) [VMFunctions.cpp:6ed9ca8c36bd : 387 + 0xb] eip = 0x01ec40cb esp = 0xbf94b000 ebp = 0xffffff81 ebx = 0x00000003 esi = 0xffffff87 edi = 0x9ef64010 Found by: call frame info 9 0x305e773 eip = 0x0305e774 esp = 0xbf94b020 ebp = 0xffffff81 ebx = 0x00000003 in test_MochiKit-DOM-Safari.html. So far, https://tbpl.mozilla.org/?tree=Profiling&rev=6ed9ca8c36bd makes that look permaorange, though it could also only be 99 of 100 like the aurora dromaeo_css crash. That sets a more firm "do something" deadline of November 19th, since we'll be a little less sanguine about just hiding mochitest-1 on aurora-19.
Recommend untracking for Firefox 18 - this bug is self-inflicted by using experimental compiler features, and we don't seem to have any data on whether it affects Linux users.
Assignee: dvander → general
Status: ASSIGNED → NEW
Reporter | ||
Comment 14•12 years ago
|
||
Taras: you drove PGO on Linux in bug 559964, because it was "a huge(30%) startup improvement on Linux" - you okay with just dropping it rather than fixing this bustage?
We might consider using a newer GCC version, to see if this bug goes away. It looks like we're two behind, 4.5 versus 4.7.
Comment 16•12 years ago
|
||
Depending on tracking bug for 19-nov-2012 migration work. Please unblock if this is incorrect.
Blocks: 786550
Reporter | ||
Comment 17•12 years ago
|
||
I guess it's merge-day in the sense that when 18 migrates to mozilla-beta, this will break that test suite there, and we will just hide it there, too?
Summary: dromaeo_css crashing every run on Linux32 PGO, in js::gc::MarkKind, on mozilla-aurora → dromaeo_css crashing every run on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up
Comment 18•12 years ago
|
||
(In reply to David Anderson [:dvander] from comment #13) > Recommend untracking for Firefox 18 - this bug is self-inflicted by using > experimental compiler features, and we don't seem to have any data on > whether it affects Linux users. Not tracking for FF18 given this comment .
Updated•12 years ago
|
Reporter | ||
Comment 19•12 years ago
|
||
We hid the dromaeojs job on Aurora, back when we thought this was just something that was going to be fixed in the first couple of days after the merge; turns out that was probably tree mismanagement on our part. It's now unhidden.
Reporter | ||
Comment 20•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16913744&tree=Mozilla-Aurora
Reporter | ||
Comment 21•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16911770&tree=Mozilla-Aurora
Reporter | ||
Comment 22•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16908119&tree=Mozilla-Aurora
Reporter | ||
Comment 23•12 years ago
|
||
What does it actually mean to untrack this bug? Tracking+ for an unfiled bug to stop doing PGO on Linux on all trunk trees, and for another unfiled bug to stop doing PGO on Linux on Aurora, and for another unfiled bug to stop doing PGO on Linux on Beta at the next merge, and for another unfiled bug to stop doing PGO on Linux on Release at the merge after that, and for another unfiled bug to switch release automation to not do PGO on Linux at that same merge? Or tracking+ for an unfiled bug to just drop the dromaeojs talos suite on trunk and on each branch as 18 merges through it, assuming that we only crash in that talos suite, and not anywhere else?
Reporter | ||
Comment 24•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16918557&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16919019&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16898917&tree=Profiling
Summary: dromaeo_css crashing every run on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up → dromaeo_css crashing most runs on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up
Reporter | ||
Comment 25•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16920248&tree=Mozilla-Aurora
Reporter | ||
Comment 26•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16920295&tree=Profiling https://tbpl.mozilla.org/php/getParsedLog.php?id=16920292&tree=Profiling
Reporter | ||
Comment 27•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16920479&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16922912&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16923700&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16924157&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16924563&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16924876&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16925099&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16925098&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16925100&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16927802&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16928560&tree=Mozilla-Aurora
Reporter | ||
Comment 28•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16933457&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16933754&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16933752&tree=Mozilla-Aurora
Reporter | ||
Comment 29•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16930982&tree=Profiling
Reporter | ||
Comment 30•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16946463&tree=Profiling https://tbpl.mozilla.org/php/getParsedLog.php?id=16945826&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16946221&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16946217&tree=Mozilla-Aurora
Updated•12 years ago
|
status-firefox17:
--- → unaffected
status-firefox18:
--- → affected
status-firefox19:
--- → affected
status-firefox-esr17:
--- → unaffected
Reporter | ||
Comment 31•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16952722&tree=Profiling
Reporter | ||
Comment 32•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16957332&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=16957404&tree=Mozilla-Aurora
Comment 33•12 years ago
|
||
Actually, I don't see anything here that is actionable by releng. I'm removing the dependency on our migration work tracking bug.
No longer blocks: 786550
Reporter | ||
Comment 34•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16964136&tree=Mozilla-Aurora
Reporter | ||
Comment 35•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16967149&tree=Mozilla-Aurora
Reporter | ||
Comment 36•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16968592&tree=Mozilla-Aurora
Comment 37•12 years ago
|
||
m-c PGO no-profiling MOZ_PGO_OPTIMIZE_FLAGS=$MOZ_OPTIMIZE_FLAGS https://tbpl.mozilla.org/?tree=Try&rev=d1314a03a613
Keywords: regressionwindow-wanted
Reporter | ||
Comment 38•12 years ago
|
||
If the regressionwindow isn't http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d042ad078f43&tochange=fdfaef738a00 I'll be astonished.
Reporter | ||
Comment 39•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16990325&tree=Mozilla-Aurora
Reporter | ||
Comment 40•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16994323&tree=Mozilla-Aurora
Reporter | ||
Comment 41•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17002970&tree=Profiling
Comment 42•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #23) > What does it actually mean to untrack this bug? We're trying to get to a resolution in the original thread https://groups.google.com/d/msg/mozilla.dev.platform/hud9aK0qZWI/lVvYys-U_nYJ.
Reporter | ||
Comment 43•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17013049&tree=Mozilla-Aurora
Reporter | ||
Comment 44•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17014965&tree=Mozilla-Aurora
Comment 45•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17018659&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=17028358&tree=Mozilla-Aurora
Reporter | ||
Comment 46•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17036981&tree=Mozilla-Aurora
Reporter | ||
Comment 47•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17039819&tree=Mozilla-Aurora
Comment 48•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #23) > Or tracking+ for an unfiled bug to just drop the dromaeojs talos suite on > trunk and on each branch as 18 merges through it, assuming that we only > crash in that talos suite, and not anywhere else? The discussion on dev-platform has played out, and we can just disable the failing tests on all branches for Linux PGO (no reason to wait for FF18 to merge to each branch).
Comment 49•12 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #37) > m-c PGO no-profiling MOZ_PGO_OPTIMIZE_FLAGS=$MOZ_OPTIMIZE_FLAGS > https://tbpl.mozilla.org/?tree=Try&rev=d1314a03a613 That seems to leave only PGO as the difference between pass and fail. (In reply to Phil Ringnalda (:philor) from comment #38) > If the regressionwindow isn't > http://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=d042ad078f43&tochange=fdfaef738a00 I'll be astonished. fdfaef738a00 and back to dd9554c236dc at least demonstrate the bug. The older revisions that I picked either didn't compile, or hit unknown run_tests.py options or PerfConfigurator.py errors.
Keywords: regressionwindow-wanted
Assignee | ||
Comment 50•12 years ago
|
||
So, this is not a crash in dromaeo_css, but a crash in v8. And if I download e.g. http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1352918270/firefox-18.0a2.en-US.linux-i686.tar.bz2 and go to http://v8.googlecode.com/svn/data/benchmarks/v7/run.html, I can reproduce.
Summary: dromaeo_css crashing most runs on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up → v8 crashing most runs on Linux32 PGO, in js::gc::MarkKind, on Gecko 18 and up
Assignee | ||
Comment 51•12 years ago
|
||
Even better with a nightly: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-11-14-04-20-11-mozilla-aurora/firefox-18.0a2.en-US.linux-i686.tar.bz2 Which got me a crash report: https://crash-stats.mozilla.com/report/index/bp-8667a1b0-3b38-43ba-9379-f4d0d2121121
Comment 52•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #51) > Even better with a nightly Well, that's an aurora build, actually. ;-) Thanks for reproducing, though, that's great! Can you track this down to a regression range in 18.0a1 nightly by bisecting nightly builds?
Keywords: regressionwindow-wanted,
reproducible
Assignee | ||
Comment 53•12 years ago
|
||
So, from what I can see in a debugger, this looks very much like a stack corruption bug. The interesting part is that it started happening the day aurora became 18, and wasn't happening on 18 central. The even more interesting part is that it stopped happening since we merged 19 to aurora, and, obviously, moved to beta. It never happened on central because on central, we build with --enable-profiling, which enables frame pointers, which changes the stack layout, since frame pointers are kept on stack. This may well have contributed to hide the bug. I'm inclined to think there's an ionmonkey bug hiding in there, and it was fixed on m-c during the 19 cycle.
Reporter | ||
Comment 54•12 years ago
|
||
Frequency dropped a whole lot, but it is still happening - https://tbpl.mozilla.org/?tree=Profiling&noignore=1&jobname=Rev3%20Fedora%2012%20profiling%20pgo%20talos%20dromaeojs
Assignee | ||
Comment 55•12 years ago
|
||
This is interesting: while i can't manage to get a crash out of a non pgo build, i *do* get a busy script popup during a v8 run, which i'd normally never get.
Assignee | ||
Comment 56•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #55) > This is interesting: while i can't manage to get a crash out of a non pgo > build, i *do* get a busy script popup during a v8 run, which i'd normally > never get. A non pgo build from the profiling branch, so without frame pointers.
Assignee | ||
Comment 57•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #56) > (In reply to Mike Hommey [:glandium] from comment #55) > > This is interesting: while i can't manage to get a crash out of a non pgo > > build, i *do* get a busy script popup during a v8 run, which i'd normally > > never get. > > A non pgo build from the profiling branch, so without frame pointers. In fact, that also happens on nightly, but it takes more tries. Each new run of v8 runs the regexp test slower and slower, until the busy script popup shows up. It takes about 5 attempts here to get from a score of about 1000 to about 250. The crashes on aurora seem to happen during the EarleyBoyer test, though, which is before regexp.
Comment 58•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #53) > It never happened on central because on central, we build with > --enable-profiling, which enables frame pointers, which changes the stack > layout, since frame pointers are kept on stack. This may well have > contributed to hide the bug. Hrm, is there some other way we can track it down to a checkin range (one that isn't full 6 weeks)? (In reply to Mike Hommey [:glandium] from comment #57) > In fact, that also happens on nightly, but it takes more tries. Each new run > of v8 runs the regexp test slower and slower, until the busy script popup > shows up. It takes about 5 attempts here to get from a score of about 1000 > to about 250. Even if that's a different bug, this sounds like something we *really* should have on file and should investigate.
Assignee | ||
Comment 59•12 years ago
|
||
When running under valgrind, the first warning i get when running v8 is: ==12930== Use of uninitialised value of size 4 ==12930== at 0x64F5661: ??? (in libxul.so) Sadly, I don't know how to tell valgrind to get the symbols from the files i downloaded from the symbol server, but resolving manually tells me this is http://hg.mozilla.org/releases/mozilla-aurora/file/149ad5c9f94d/js/src/ion/LIR.h#l94 This doesn't look very good.
Assignee | ||
Comment 60•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58) > Hrm, is there some other way we can track it down to a checkin range (one > that isn't full 6 weeks)? Apart from using try to do pgo builds without frame pointer on older checkins, I have no idea. Let's start by testing the ionmonkey merge changeset...
Reporter | ||
Comment 61•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17222933&tree=Profiling
Comment 64•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #60) > Let's start by testing the ionmonkey merge changeset... See comment 49. I expect its possible to patch the issues with older revisions there, but I'd guess a better first step would be to have someone who knows the code look at the valgrind warnings. Perhaps a try build with stripping disabled might make that easier. (In reply to Ehsan Akhgari [:ehsan] from comment #62) > Does this only happen on PGO builds? AFA we K. See comment 49.
Reporter | ||
Comment 65•12 years ago
|
||
Heck, you can see comment 5, for that matter. It's PGO-only, and that's the reason I added PGO builds to the Profiling branch, which we didn't have at the time of the IM merge.
Comment 66•12 years ago
|
||
So it seems like the best bet is for an IonMonkey developer to bisect the range...
Assignee | ||
Comment 67•12 years ago
|
||
Additional data point: https://hg.mozilla.org/try/rev/fdfaef738a00 , which is the fixup changeset after the ionmonkey merge *does* have the problem.
(In reply to Mike Hommey [:glandium] from comment #57) > In fact, that also happens on nightly, but it takes more tries. Each new run > of v8 runs the regexp test slower and slower, until the busy script popup > shows up. It takes about 5 attempts here to get from a score of about 1000 > to about 250. We're tracking regexp performance in bug 806646, which is on track to be fixed soon. Right now we're pretty sad on this test, but I don't think it's related here. > The crashes on aurora seem to happen during the EarleyBoyer test, though, > which is before regexp. Another thing you could try is seeing if this reproduces using gcc-4.7.
Assignee | ||
Comment 69•12 years ago
|
||
Some of the valgrind errors from a rebuild of current beta with pgo, --enable-valgrind, --disable-install-strip and --disable-elf-hack: ==22533== Use of uninitialised value of size 4 ==22533== at 0x62F60A1: CanEncodeInfoInHeader(js::ion::LAllocation const&, unsigned int*) (LIR.h:94) ==22533== by 0x2BB: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F6073: CanEncodeInfoInHeader(js::ion::LAllocation const&, unsigned int*) (Safepoints.cpp:192) ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844B8A: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:392) ==22533== by 0x7B44557: ??? ==22533== by 0x37F: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844B99: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:395) ==22533== by 0x7B44557: ??? ==22533== by 0x37F: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844BA3: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:398) ==22533== by 0x7B44557: ??? ==22533== by 0x37F: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844B8A: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:392) ==22533== by 0x7B44557: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844B99: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:395) ==22533== by 0x7B44557: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Conditional jump or move depends on uninitialised value(s) ==22533== at 0x5844BA3: PartFromStream(js::ion::CompactBufferReader&, NunboxPartKind, unsigned int) (Safepoints.cpp:398) ==22533== by 0x7B44557: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209) ==22533== ==22533== Use of uninitialised value of size 4 ==22533== at 0x583FEE0: ReadAllocation(js::ion::IonFrameIterator const&, js::ion::LAllocation const*) (IonFrames.cpp:434) ==22533== by 0x1CD: ??? ==22533== Uninitialised value was created by a stack allocation ==22533== at 0x62F61F5: js::ion::SafepointWriter::writeNunboxParts(js::ion::LSafepoint*) (Safepoints.cpp:209)
Reporter | ||
Comment 70•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17266502&tree=Profiling
Assignee | ||
Comment 71•12 years ago
|
||
Thanks to the valgrind output, it ended being easy to spot what is wrong: 01a0c070 <_ZL21CanEncodeInfoInHeaderRKN2js3ion11LAllocationEPj>: 1a0c070: 56 push %esi 1a0c071: 89 c6 mov %eax,%esi 1a0c073: 83 ec 08 sub $0x8,%esp 1a0c076: 8b 08 mov (%eax),%ecx 1a0c078: f6 c1 01 test $0x1,%cl 1a0c07b: 75 1b jne 1a0c098 <_ZL21CanEncodeInfoInHeaderRKN2js3ion11LAllocationEPj+0x28> (snip) 1a0c098: 8b 0e mov (%esi),%ecx 1a0c09a: 8b 54 24 04 mov 0x4(%esp),%edx 1a0c09e: c1 f9 04 sar $0x4,%ecx 1a0c0a1: 89 0a mov %ecx,(%edx) (snip) 0x1a0c0a1 above corresponds to 0x62F60A1 in valgrind's output. Essentially, what is happening is that a value is taken from the stack at 0x1a0c09a that was never stored... This appears to be a miscompilation from gcc 4.5 with PGO enabled, but I was able to get a similarly broken code out of it without PGO with the following patch: diff --git a/js/src/ion/Safepoints.cpp b/js/src/ion/Safepoints.cpp --- a/js/src/ion/Safepoints.cpp +++ b/js/src/ion/Safepoints.cpp @@ -8,6 +8,7 @@ #include "Safepoints.h" #include "IonSpewer.h" #include "LIR.h" +#include "mozilla/Likely.h" using namespace js; using namespace ion; @@ -190,7 +191,7 @@ AllocationToPartKind(const LAllocation & static inline bool CanEncodeInfoInHeader(const LAllocation &a, uint32 *out) { - if (a.isGeneralReg()) { + if (MOZ_LIKELY(a.isGeneralReg())) { *out = a.toGeneralReg()->reg().code(); return true; } If I manually patch a broken libxul.so with the function machine code from a working build, I get a non-crashy build, and valgrind never complains. It looks like, from succinct local testing, that gcc 4.7 would generate proper code. Although a notable difference between gcc 4.7 and gcc 4.5 is that gcc 4.7 inlines the function in most cases where gcc 4.5 doesn't. I'm testing a try build replacing inline with MOZ_ALWAYS_INLINE on that function, and see if the miscompilation is worked around this way.
Assignee | ||
Comment 72•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=c0b53bdc7f7a
Attachment #684474 -
Flags: review?(dvander)
Assignee | ||
Updated•12 years ago
|
Assignee: general → mh+mozilla
Comment on attachment 684474 [details] [diff] [review] Work around gcc 4.5 miscompilation of CanEncodeInfoInHeader by always inlining it Review of attachment 684474 [details] [diff] [review]: ----------------------------------------------------------------- Nice find! Small nit, comments in this file should use //
Attachment #684474 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 74•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d8e4f06198dc
Assignee | ||
Comment 75•12 years ago
|
||
Comment on attachment 684474 [details] [diff] [review] Work around gcc 4.5 miscompilation of CanEncodeInfoInHeader by always inlining it [Approval Request Comment] Bug caused by (feature/regressing bug #): gcc 4.5 miscompiling ionmonkey User impact if declined: Pretty much random crashes when js is involved Testing completed (on m-c, etc.): heavily tested on try, although only tested on linux x86. Risk to taking this patch (and alternatives if risky): the patch in itself doesn't change much. It changes an "inline" to an "MOZ_ALWAYS_INLINE", which only forces compilers to inline the function. There is a tiny possibility that other compilers start miscompiling as a result, but it's quite unlikely: clang is already inlining the function, so forcing an inline won't change that, and i suspect msvc does too. String or UUID changes made by this patch: none.
Attachment #684474 -
Flags: approval-mozilla-beta?
Attachment #684474 -
Flags: approval-mozilla-aurora?
Reporter | ||
Comment 76•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17296761&tree=Mozilla-Aurora
Updated•12 years ago
|
Updated•12 years ago
|
Attachment #684474 -
Flags: approval-mozilla-beta?
Attachment #684474 -
Flags: approval-mozilla-beta+
Attachment #684474 -
Flags: approval-mozilla-aurora?
Attachment #684474 -
Flags: approval-mozilla-aurora+
Assignee | ||
Updated•12 years ago
|
Updated•12 years ago
|
Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•