Closed Bug 543034 Opened 15 years ago Closed 15 years ago

Windows builder failing, with nsannotationservice.cpp(457) : "fatal error C1001: An internal error has occurred in the compiler" or "fatal error C1002: compiler is out of heap space in pass 2"

Categories

(Release Engineering :: General, defect, P1)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: nthomas)

References

()

Details

Attachments

(3 files)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264769305.1264783731.25012.gz WINNT 5.2 mozilla-central nightly on 2010/01/29 04:48:25 s: win32-slave12 > PGOMGR : warning PG0188: No .PGC files matching 'xul!*.pgc' were found. > Creating library xul.lib and object xul.exp > Generating code > 3700 of 103836 ( 3.56%) profiled functions will be compiled for speed > NEXT ERROR e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. > (compiler file 'F:\SP\vctools\compiler\utc\src\P2\main.c[0x10CBB356:0x339D0000]', line 182) > To work around this problem, try simplifying or changing the program near the locations listed above. > Please choose the Technical Support command on the Visual C++ > Help menu, or open the Technical Support help file for more information > > LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage > > Version 8.00.50727.762 > > ExceptionCode = C0000005 > ExceptionFlags = 00000000 > ExceptionAddress = 10CBB356 (10B00000) "d:\msvs8\VC\BIN\c2.dll" > NumberParameters = 00000002 > ExceptionInformation[ 0] = 00000000 > ExceptionInformation[ 1] = 339D0000 > > CONTEXT: > Eax = 474F0028 Esp = 0012ED40 > Ebx = 000A0470 Ebp = 00000000 > Ecx = 339D0000 Esi = 339D0000 > Edx = 642E9998 Edi = 642E9944 > Eip = 10CBB356 EFlags = 00010206 > SegCs = 0000001B SegDs = 00000023 > SegSs = 00000023 SegEs = 00000023 > SegFs = 0000003B SegGs = 00000000 > Dr0 = 00000000 Dr3 = 00000000 > Dr1 = 00000000 Dr6 = 00000000 > Dr2 = 00000000 Dr7 = 00000000 > make[5]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox/toolkit/library' > make[4]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox' > make[5]: *** [xul.dll] Error 232 > make[5]: *** Deleting file `xul.dll' > make[4]: *** [libs_tier_toolkit] Error 2 > make[3]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox' > make[3]: *** [tier_toolkit] Error 2 > make[2]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build/obj-firefox' > make[2]: *** [default] Error 2 > make[1]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32-nightly/build' > make[1]: *** [build] Error 2 > make: *** [profiledbuild] Error 2
OS: Linux → Windows XP
Alice kindly started a replacement nightly build, but it failed with the same problem: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264802234.1264811517.24068.gz WINNT 5.2 mozilla-central nightly on 2010/01/29 13:57:14 s: win32-slave38 So, looks non-random... Maybe a checkin from yesterday broke something?
The file nsannotationservice.cpp hasn't changed in 13 days, and Mak doesn't think any checkins from yesterday look suspicious... Comparing one of the broken buildlogs vs a non-broken one, the contextual lines look identical... Perhaps we got an update to MSVC yesterday, and that broke something?
Summary: sporadic issue during |make profiledbuild|: nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler → Windows nightly builder failing, with nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler
For code changes, mak pointed out that bug 500328's changeset looks like the only one remotely related to this: http://hg.mozilla.org/mozilla-central/rev/dc7a04be6904 It makes some changes to the nsIVariant intervace & implmentation, and the line that the compiler flags with an internal error (nsannotationservice.cpp:457) has just done some work with an nsIVariant I may try backing out that changeset later tonight and clobbering the nightly again...
(In reply to comment #2) > Perhaps we got an update to MSVC yesterday, and that broke something? There were no updates/changes to the MSVC compilers installed on the build machines.
dholbert ping'd me in irc; investigating.
Assignee: nobody → joduinn
(In reply to comment #3) > I may try backing out that changeset later tonight and clobbering the nightly > again... Backed out that changeset: http://hg.mozilla.org/mozilla-central/rev/6d50455cabaa http://hg.mozilla.org/mozilla-central/rev/b0b9d8dca9d6 Joduinn is respinning the nightly... we'll see if the backout fixes anything.
(In reply to comment #6) > (In reply to comment #3) > > I may try backing out that changeset later tonight and clobbering the nightly > > again... > > Backed out that changeset: > http://hg.mozilla.org/mozilla-central/rev/6d50455cabaa > http://hg.mozilla.org/mozilla-central/rev/b0b9d8dca9d6 > > Joduinn is respinning the nightly... we'll see if the backout fixes anything. nightly started on win32-slave15.
aaaand we got a green cycle: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264821121.1264834641.15890.gz So, looks like that changeset was indeed the 'guilty' one...
Blocks: 500328
No longer blocks: 438871
Whiteboard: [orange]
fwiw, these bugs should be filed against Microsoft since 90% of the times is a real compiler bug, btw it's most likely that just changing some minor thing will skip the problem... The best thing is that someone from build engineering works directly on the machine trying to get information for a MS bug and at the same time seeing what's different with the usual opt build boxes. it's even possible that pushing again after some other change will directly go green, the optimization step optimizes all the code at once. Still i'd be scared of seeing this again in future, so either update MSVC or file the bug upstream.
All the windows builds (on any mercurial based branch) allocate the work against a pool of identical machines, so there is no difference between an opt and a nightly build in terms of compiler. The key difference is that "nightly" builds are always clobbers (so is try server), and "build" will be a mixture of depend and clobbers.
Marco is right in that this appears to be a Microsoft compiler bug. That being said, even if we report it I wouldn't hold my breath for a fix anytime soon, since VC 2010 is coming out soon, and this is VC 2008, which will then be 2 releases behind.
er, I meant VC 2005.
I think we have an intermittent problem and rev dc7a04be6904 was not at fault, or not wholly at fault. The data we have is collected at http://spreadsheets.google.com/pub?key=tVZQKFDccCvXx63C2OmVBNg&output=html based on the hypothesis that it's clobber builds which fail. Some comments: * "WINNT 5.2 mozilla-central build" will mostly use an existing objdir, and sometimes clobber (either forced, 7 days since last clobber, effective clobber because the disk space was needed by another build) * "WINNT 5.2 mozilla-central nightly" is always a clobber build * the failures have been on three separate slaves (win32-slave[12,19,38]), so it's not a slave that's gone mad * the original identification of dc7a04be6904 and subsequent backout at b0b9d8dca9d6 gave us three green nightlies * it doesn't explain why the clobber at 5ad17deecfe0 succeeded, nor why the most recent build on 3048d03980e7 failed
The cycle after comment 13 also failed, with a slightly different message: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264973998.1264982902.3272.gz > e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2 > LINK : fatal error LNK1257: code generation failed Note that that's still the same line in nsAnnotationService.cpp -- line 457. I wonder if "compiler is out of heap space" has been the problem here all along -- but depending on *when* it runs out space, the compiler just dies with a cryptic "internal error" message? (as it has done up until this particular log)
So we've had two failures in non-clobbering builds now and this looks like a more general problem doing PGO. Philor also mentioned seeing the same error on the 26th. A quick google search didn't turn up a way to control the size of the compiler's heap space. The memory usage data from the VM management isn't great but I don't think we're using more memory all of a sudden, nor taking significantly longer to complete a build. Has the build complexity increased recently ? Or become more recursive ?
And the reason I knew it was the 26th was because I taunted cjones into filing bug 542429 about it.
Nothing significant has changed in the build recently, AFAIK.
I just posted on bug 500328, clearing it of guilt for causing this bug, per comment 14 & beyond.
Summary: Windows nightly builder failing, with nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler → Windows nightly builder failing, with nsannotationservice.cpp(457) : "fatal error C1001: An internal error has occurred in the compiler" or "fatal error C1002: compiler is out of heap space in pass 2"
No longer blocks: 500328
Summary: Windows nightly builder failing, with nsannotationservice.cpp(457) : "fatal error C1001: An internal error has occurred in the compiler" or "fatal error C1002: compiler is out of heap space in pass 2" → Windows builder failing, with nsannotationservice.cpp(457) : "fatal error C1001: An internal error has occurred in the compiler" or "fatal error C1002: compiler is out of heap space in pass 2"
Reassigning to buildduty. I grabbed it only because dholbert ping'd me in irc Friday evening and I was able to trigger nightlies for him.
Assignee: joduinn → aki
(In reply to comment #8) > aaaand we got a green cycle: > http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1264821121.1264834641.15890.gz > > So, looks like that changeset was indeed the 'guilty' one... jruderman reports that this is still happening, intermittently, so while this changeset might have contributed to tickling a compiler bug, it seems to not be the only tickler. We already have vs9 installed on the same pool-o-slaves, and could get newer if asked. However if we want to upgrade to this from vc2008, we'd have to be careful about binary compat issues with the firefox releases still supported on those same machines.
There isn't anything newer than VC 2008 yet (2010 is still in beta). We would need SP1 installed, for --enable-jemalloc (bug 529169). I don't think switching would have any negative effects, lots of developers build with VC 2008. Personally, I was holding out for VC 2010, since I've verified that that version does in fact fix a bug that impacts us (bug 520651).
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265079021.1265088295.14024.gz WINNT 5.2 mozilla-central build on 2010/02/01 18:50:21 s: win32-slave24 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265131346.1265141207.4084.gz WINNT 5.2 mozilla-central build on 2010/02/02 09:22:26 s: win32-slave16 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265133454.1265144280.6383.gz WINNT 5.2 mozilla-central build on 2010/02/02 09:57:34 s: win32-slave20 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
This has made the Win32 opt builder pretty much perma-red today. Looking into workarounds. Bug 281158 is about a similar-looking internal compiler error elsewhere. It was suggested in that bug to wrap the affected code with #pragma optimize( "", off )" We didn't end up actually doing that in that bug, but we *did* in a different bug: bug 501082 (changeset linked in initial comment there). For lack of any better ideas, I propose we do the same thing around the affected chunk of nsAnnotationService.cpp -- specifically, the implementation of nsAnnotationService::SetItemAnnotation. Of course we'd prefer not to mark a chunk of code as "don't optimize me", but if it gets us non-burning builds, it's worth it as an interim stopgap at least.
Attached patch workaround? [no, doesn't help] (deleted) — Splinter Review
This patch does what I suggest in previous comment. Requesting r=gavin, since he wrote the similar patch on bug 501082.
Attachment #424859 - Flags: review?
Attachment #424859 - Flags: review? → review?(gavin.sharp)
Comment on attachment 424859 [details] [diff] [review] workaround? [no, doesn't help] worth a shot!
Attachment #424859 - Flags: review?(gavin.sharp) → review+
FWIW, the cycle just *before* my workaround-push ended up being green, after a string of 4-5 consecutive cycles that had this failure. :) So, even if the cycle built from the workaround-patch ends up being green, that doesn't necessarily mean it worked... we'll have to see whether the greenness sticks.
do we know if relanding bug 500328 has increase the failure ratio or not? that could still give some useful information about what code we could try to change.
It's possible... however, note that on Friday, when we backed bug 500328 out, we'd really only seen this failure twice (on nightly builds) and we'd had tons of passing tinderbox opt builds since it had first landed. So, I don't think there's any strong correlation at this point.
(In reply to comment #35) > It's possible... however, note that on Friday, when we backed bug 500328 out, > we'd really only seen this failure twice (on nightly builds) and we'd had tons > of passing tinderbox opt builds since it had first landed. i'm not thinking to strong correlations with patches just trying to find code correlations. But also this thing makes me think: why this has started being so frequent suddenly? Is it just due to some specific code change i nthe last week? Could be are working parallel to some VS2005 limit and we should plan to upgrade asap to VS2008, otherwise this will return in another form?
It's certainly possible we're bumping up against some internal compiler limit. I kind of wanted to wait till VC 2010 was released to upgrade, though (as I said in comment 25).
Comment on attachment 424859 [details] [diff] [review] workaround? [no, doesn't help] The cycle built from comment 32's 'workaround' was green, but the next two cycles were both red with this same issue: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265156801.1265167524.16160.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265157211.1265167401.14815.gz Backing workaround out, since it apparently didn't help.
Attachment #424859 - Attachment description: workaround? → workaround? [no, doesn't help]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265161691.1265175922.11593.gz WINNT 5.2 mozilla-central build on 2010/02/02 17:48:11 s: win32-slave12 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\xre\nsapprunner.cpp(3596) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265166315.1265178266.4715.gz WINNT 5.2 mozilla-central build on 2010/02/02 19:05:15 s: win32-slave08 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\xre\nsapprunner.cpp(3596) : fatal error C1001: An internal error has occurred in the compiler. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265168403.1265179952.22913.gz WINNT 5.2 mozilla-central build on 2010/02/02 19:40:03 s: win32-slave42 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265287341.1265299160.6835.gz WINNT 5.2 mozilla-central build on 2010/02/04 04:42:21 s: win32-slave11 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265292985.1265303911.31448.gz WINNT 5.2 mozilla-central build on 2010/02/04 06:16:25 s: win32-slave33 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265312255.1265321624.6881.gz WINNT 5.2 mozilla-central build on 2010/02/04 11:37:35 s: win32-slave34 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265313560.1265323849.31706.gz WINNT 5.2 mozilla-central build on 2010/02/04 11:59:20 s: win32-slave05 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265316791.1265327765.10740.gz WINNT 5.2 mozilla-central build on 2010/02/04 12:53:11 s: win32-slave13 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265330901.1265341661.3602.gz WINNT 5.2 mozilla-central build on 2010/02/04 16:48:21 s: win32-slave38 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Has anybody tried cloberring all the mozilla-central checkouts from the clobberer page? Does this happen on the try server with a no-op build?
This has appeared on lots of nightly builds, which are clobbers, so that seems unlikely to fix it. Last I knew, try server win32 builds were not PGO. Has that changed yet? I highly suspect this is a PGO-only issue.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265390986.1265403415.14210.gz WINNT 5.2 mozilla-central build on 2010/02/05 09:29:46 s: win32-slave09 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Try builds are not currently PGO.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265402027.1265411107.3247.gz WINNT 5.2 mozilla-central build on 2010/02/05 12:33:47 s: win32-slave16 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
(In reply to comment #49) > Last I knew, try server win32 builds were not PGO. Has that changed yet? I > highly suspect this is a PGO-only issue. Agreed that this is PGO-only. As shown in comment 0, the line right before the failure is always something like this: > 3700 of 103836 ( 3.56%) profiled functions will be compiled for speed
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265398835.1265410835.32524.gz WINNT 5.2 mozilla-central build on 2010/02/05 11:40:35 s: win32-slave10
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265409314.1265423098.14815.gz WINNT 5.2 mozilla-central build on 2010/02/05 14:35:14 s: win32-slave11 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265456057.1265466168.15613.gz WINNT 5.2 mozilla-central build on 2010/02/06 03:34:17 s: win32-slave40 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265466029.1265473450.4352.gz s: win32-slave35e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265453038.1265462915.7475.gz WINNT 5.2 mozilla-central build on 2010/02/06 02:43:58 s: win32-slave35 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265469057.1265481477.19077.gz WINNT 5.2 mozilla-central build on 2010/02/06 07:10:57 s: win32-slave26 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265480759.1265491908.6385.gz WINNT 5.2 mozilla-central build on 2010/02/06 10:25:59 s: win32-slave11 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265492246.1265498536.15152.gz WINNT 5.2 mozilla-central build on 2010/02/06 13:37:26 s: win32-slave35 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265495977.1265507330.14132.gz WINNT 5.2 mozilla-central build on 2010/02/06 14:39:37 s: win32-slave18 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265504572.1265512923.9886.gz WINNT 5.2 mozilla-central build on 2010/02/06 17:02:52 s: win32-slave09 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265544806.1265556316.6236.gz WINNT 5.2 mozilla-central build on 2010/02/07 04:13:26 s: win32-slave08 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265542041.1265554118.7301.gz WINNT 5.2 mozilla-central build on 2010/02/07 03:27:21 s: win32-slave17 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265558179.1265567390.11977.gz WINNT 5.2 mozilla-central build on 2010/02/07 07:56:19 s: win32-slave32 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265540523.1265550443.27831.gz WINNT 5.2 mozilla-central nightly on 2010/02/07 03:02:03 s: win32-slave35 e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Related issue: http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1265542320.1265547707.27169.gz WINNT 5.2 tracemonkey nightly on 2010/02/07 03:32:00 s: win32-slave50 link -NOLOGO -DLL -OUT:mozjs.dll .... -LTCG:PGUPDATE ... PGOMGR : warning PG0188: No .PGC files matching 'mozjs!*.pgc' were found. Creating library mozjs.lib and object mozjs.exp Generating code 1759 of 5262 ( 33.43%) profiled functions will be compiled for speed NEXT ERROR e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jshashtable.h(297) : fatal error C1001: An internal error has occurred in the compiler. (compiler file 'F:\SP\vctools\compiler\utc\src\P2\main.c', line 216) To work around this problem, try simplifying or changing the program near the locations listed above. Please choose the Technical Support command on the Visual C++ Help menu, or open the Technical Support help file for more information LINK : fatal error LNK1257: code generation failed LNK1257 is "failed to perform code generation" when using /GL.
mozilla-central nightly number two for the day: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265575600.1265584996.14792.gz WINNT 5.2 mozilla-central nightly on 2010/02/07 12:46:40 s: win32-slave26 e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265596494.1265603851.32325.gz WINNT 5.2 mozilla-central build on 2010/02/07 18:34:54 s: win32-slave25 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. and nightly number three: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265587011.1265595645.4782.gz WINNT 5.2 mozilla-central nightly on 2010/02/07 15:56:51 s: win32-slave34 e:\builds\moz2_slave\mozilla-central-win32-nightly\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
(In reply to comment #68) > Related issue: Ouch. mozjs.dll is much smaller than xul.dll, if this was a code size issue I wouldn't expect to see it there.
Assignee: aki → nobody
Per irc with bhearsum, this caused the FF3.7a1 win32 builds to fail.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265648812.1265661852.1294.gz WINNT 5.2 mozilla-central build on 2010/02/08 09:06:52 s: win32-slave13 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265663168.1265671384.15320.gz WINNT 5.2 mozilla-central build on 2010/02/08 13:06:08 s: win32-slave43 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265664574.1265671219.13465.gz WINNT 5.2 mozilla-central build on 2010/02/08 13:29:34 s: win32-slave16 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
This (or something similar) is also happening on tracemonkey in the js library. I tried out vs2008, and got a similar crash: make export make[1]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src' make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config' make[3]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/mkdepend' make[3]: Nothing to be done for `export'. make[3]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/mkdepend' e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/nsinstall.exe nsinstall.exe ../../../dist/bin make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config' make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/shell' d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \ js ../../../dist/bin make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/shell' make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsapi-tests' d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \ jsapi-tests ../../../dist/bin make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsapi-tests' make[2]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/tests' make[2]: Nothing to be done for `export'. make[2]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/tests' d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/tracemonkey-win32/build/js/src/build/win32/pgomerge.py \ mozjs ./../../dist/bin e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/config/nsinstall.exe -m 644 js-config.h jsautocfg.h /e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src/jsautokw.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/js.msg /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsarray.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsarena.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsatom.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbool.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsclist.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscntxt.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscompat.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdate.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdbgapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdhash.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsdtoa.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsemit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsfun.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsgc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jshash.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsinterp.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsinttypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsiter.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jslock.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jslong.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsmath.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsnum.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsobj.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsobjinlines.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/json.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsopcode.tbl /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsopcode.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsotypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsparse.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsprf.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsproto.tbl /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsprvtd.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jspubtd.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsregexp.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscan.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscope.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscript.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsscriptinlines.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsstaticcheck.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsstr.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstask.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstracer.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstypedarray.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstypes.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsutil.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsvector.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jstl.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jshashtable.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsversion.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsxdrapi.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsxml.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jsbuiltins.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Assembler.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Allocator.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/CodeAlloc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Containers.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/LIR.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/avmplus.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Fragmento.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Native.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/Nativei386.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/RegAlloc.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/nanojit.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/nanojit/VMPI.h /e/builds/moz2_slave/tracemonkey-win32/build/js/src/jscpucfg.h ./../../dist/include mkdir -p nanojit make[1]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src' make libs make[1]: Entering directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src' link -NOLOGO -DLL -OUT:mozjs.dll -PDB:mozjs.pdb -SUBSYSTEM:WINDOWS jsapi.obj jsarena.obj jsarray.obj jsatom.obj jsbool.obj jscntxt.obj jsdate.obj jsdbgapi.obj jsdhash.obj jsdtoa.obj jsemit.obj jsexn.obj jsfun.obj jsgc.obj jshash.obj jsinterp.obj jsinvoke.obj jsiter.obj jslock.obj jslog2.obj jsmath.obj jsnum.obj jsobj.obj json.obj jsopcode.obj jsparse.obj jsprf.obj jsregexp.obj jsscan.obj jsscope.obj jsscript.obj jsstr.obj jstask.obj jstypedarray.obj jsutil.obj jsxdrapi.obj jsxml.obj prmjtime.obj jstracer.obj Assembler.obj Allocator.obj CodeAlloc.obj Containers.obj Fragmento.obj LIR.obj RegAlloc.obj avmplus.obj Nativei386.obj jsbuiltins.obj VMPI.obj -MANIFESTUAC:NO -NXCOMPAT -DYNAMICBASE -SAFESEH -DEBUG -OPT:REF -LTCG:PGUPDATE e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/nspr4.lib e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/plc4.lib e:/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/dist/lib/plds4.lib kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib PGOMGR : warning PG0188: No .PGC files matching 'mozjs!*.pgc' were found. Creating library mozjs.lib and object mozjs.exp Generating code 1766 of 5262 ( 33.56%) profiled functions will be compiled for speed e:\builds\moz2_slave\tracemonkey-win32\build\js\src\jshashtable.h(297) : fatal error C1001: An internal error has occurred in the compiler. (compiler file 'f:\dd\vctools\compiler\utc\src\p2\main.c[0xE8575000:0xE8575000]', line 182) To work around this problem, try simplifying or changing the program near the locations listed above. Please choose the Technical Support command on the Visual C++ Help menu, or open the Technical Support help file for more information LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage Version 9.00.21022.08 ExceptionCode = C0000005 ExceptionFlags = 00000000 ExceptionAddress = E8575000 NumberParameters = 00000002 ExceptionInformation[ 0] = 00000000 ExceptionInformation[ 1] = E8575000 CONTEXT: Eax = FFFFFFFC Esp = 0012ECC8 Ebx = 00000008 Ebp = 0012ECD8 Ecx = 51037C01 Esi = 0012ED29 Edx = 0012ED2A Edi = 0852043C Eip = E8575000 EFlags = 00010297 SegCs = 0000001B SegDs = 00000023 SegSs = 00000023 SegEs = 00000023 SegFs = 0000003B SegGs = 00000000 Dr0 = 00000000 Dr3 = 00000000 Dr1 = 00000000 Dr6 = 00000000 Dr2 = 00000000 Dr7 = 00000000 make[1]: Leaving directory `/e/builds/moz2_slave/tracemonkey-win32/build/obj-firefox-cl15/js/src'
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265675212.1265684889.10919.gz WINNT 5.2 mozilla-central build on 2010/02/08 16:26:52 s: win32-slave05 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265708141.1265713826.22130.gz WINNT 5.2 mozilla-central build on 2010/02/09 01:35:41 s: win32-slave05 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
(on the JS build)
Ted and I are going to try reproducing on revision http://hg.mozilla.org/mozilla-central/rev/43e818c28059, which we know failed to build the first time for 3.7a1. I'm going to try re-linking xul.dll continuously to see if it fails. Ted is going to try re-building the whole tree continuously.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265734335.1265746795.32657.gz WINNT 5.2 mozilla-central build on 2010/02/09 08:52:15 s: win32-slave19 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265763290.1265773452.10156.gz WINNT 5.2 mozilla-central build on 2010/02/09 16:54:50 s: win32-slave28 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265768094.1265777945.28286.gz WINNT 5.2 mozilla-central build on 2010/02/09 18:14:54 s: win32-slave27 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1002: compiler is out of heap space in pass 2
Unable to reproduce by continually re-linking xul.dll. Going to try full rebuilds now.
(In reply to comment #84) > Unable to reproduce by continually re-linking xul.dll. Going to try full > rebuilds now. xul.dll was re-linked 12 times by doing: rm xul.dll make MOZ_PROFILE_USE=1 in objdir/toolkit/library
I've had almost non-stop PGO builds running on my local machine on the changeset from comment 80 with no failures yet. My local environment differs from tinderbox in that it's: 1) Windows 7 x64 with 4GB ram 1) Visual C++ 2008 I think we'll see how catlee's testing goes, if he can reproduce it there, then maybe I'll try installing VC2005 and narrow down the differences.
FWIW, I haven't run into this error yet with any of my builds on my system. I'm also Win7 x64 w/ 4GB of RAM. I'm using VC2005SP1 still, though.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265825366.1265835278.9909.gz WINNT 5.2 mozilla-central build on 2010/02/10 10:09:26 s: win32-slave31 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265838075.1265850047.15954.gz WINNT 5.2 mozilla-central build on 2010/02/10 13:41:15 s: win32-slave02 e:\builds\moz2_slave\mozilla-central-win32\build\toolkit\components\places\src\nsannotationservice.cpp(457) : fatal error C1001: An internal error has occurred in the compiler.
Full rebuilds (by deleting the object directory) have failed to reproduce this crash on the VM after 7 attempts. I'll try putting disk / memory pressure on the VM to see if that causes it to fail.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265986899.1265994905.14820.gz This log is interesting because it fails in view\src\nsview.cpp, not nsannotationservice.cpp.
So, putting heavy memory pressure on the VM caused the machine to exhaust virtual memory and almost hang....but no crash! After killing everything, I restarted full builds and got a crash after the 2nd attempt. Re-linking xul.dll with MOZ_PROFILE_USE=1 also crashed a second time. I'm currently re-linking with LINK_REPRO set so we can try and get a reproducible test case out of it. I've also tarred up the entire directory (1.6 GB) if anybody wants to examine it.
The page file on this machine has grown to 1529 MB. The maximum is 1536 MB.
One of the new hardware windows machines also failed with this error. This machine has 4 GB of RAM, and a 2 GB page file, which makes me think we're hitting the 2 GB address space limit.
Severity: normal → blocker
Does Microsoft ship 64-bit versions of the compiler? ;)
If you're bumping into a 2GB process limit, might the /3GB switch help? http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx
The mozilla-central tree is now closed because we haven't had a successful Windows opt build in nearly 20 hours.
The linker can use >2GB of address space if it's available. I don't know that Microsoft has a 64-bit version of the 32-bit compiler. It's possible the 3GB thing might help, since that'd give the process 3GB of VM. Switching to a 64-bit OS would give it 4GB of VM.
Response from MSFT We have managed to reproduce the problem with the link repro you have provided. Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time constraints, but we will make sure it is fixed in subsequent releases. In the meantime, we suggest two workarounds for you. The code exposing the bug is around Line 286 in file jstracer.cpp, relating to the use of _BitScanReverse. The first workaround is to turn off optimization for the function altogether, using "#pragma optimize ("", off)". More details can be found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx . The 1st workaround may have unpleasant performance impact. Another workaround is to rewrite some of the code. The source operand of this particular _BitScanReverse call is actually a constant (0x3ff). You should be able to avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get rid of the call.
(In reply to comment #102) > Response from MSFT > > We have managed to reproduce the problem with the link repro you have provided. > Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time > constraints, but we will make sure it is fixed in subsequent releases. > > In the meantime, we suggest two workarounds for you. The code exposing the bug > is around Line 286 in file jstracer.cpp, relating to the use of > _BitScanReverse. The first workaround is to turn off optimization for the > function altogether, using "#pragma optimize ("", off)". More details can be > found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx . Looks like gavin already tried that without success. > The 1st workaround may have unpleasant performance impact. Another workaround > is to rewrite some of the code. The source operand of this particular > _BitScanReverse call is actually a constant (0x3ff). You should be able to > avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get > rid of the call. Can someone who understands _BitScanReverse please give this a try?
(In reply to comment #103) > (In reply to comment #102) > > Response from MSFT > > > > We have managed to reproduce the problem with the link repro you have provided. > > Thanks! Unfortunately, we cannot fix the bug in Visual Studio 2010 due to time > > constraints, but we will make sure it is fixed in subsequent releases. > > > > In the meantime, we suggest two workarounds for you. The code exposing the bug > > is around Line 286 in file jstracer.cpp, relating to the use of > > _BitScanReverse. The first workaround is to turn off optimization for the > > function altogether, using "#pragma optimize ("", off)". More details can be > > found at http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx . > > Looks like gavin already tried that without success. Wait. dholbert/gavin tried this in nsAnnotationService.cpp, not jstracer.cpp. dholbert/gavin, can you also try this in jstracer.cpp? (Thanks to "tn" in irc for catching that!) > > The 1st workaround may have unpleasant performance impact. Another workaround > > is to rewrite some of the code. The source operand of this particular > > _BitScanReverse call is actually a constant (0x3ff). You should be able to > > avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get > > rid of the call. > > Can someone who understands _BitScanReverse please give this a try?
I already tried the #pragma trick for jstracer.cpp, but perhaps with inlining and optimization the #pragma got lost. This patch http://hg.mozilla.org/tracemonkey/rev/feac51b74044 #pragmas the call sites. If it works, perhaps the same would work for the nsAnnotationService.cpp bust as well.
(In reply to comment #103) > > The 1st workaround may have unpleasant performance impact. Another workaround > > is to rewrite some of the code. The source operand of this particular > > _BitScanReverse call is actually a constant (0x3ff). You should be able to > > avoid the bug by coding the result of _BitScanReverse(0x3ff) directly and get > > rid of the call. > > Can someone who understands _BitScanReverse please give this a try? 0x3ff = 0b1111111111; 0 is the least significant bit, so the first operand will be set to 9, and the function will return 1. _BitScanReverse itself is trivial - its result appears to be undefined for a mask of 0 (I've gotten it to return different values), but otherwise it can be defined like this: (idea taken, slightly modified, from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup ) #define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n static unsigned long const LogTable256[256] = {-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)}; #undef LT inline unsigned char _BitScanReverse2(unsigned long * i, unsigned long v) { static unsigned long t, tt; *i = (tt = v >> 16) ? (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt] : (t = v >> 8) ? 8 + LogTable256[t] : 0 + LogTable256[v]; return (v ? 1 : 0); } Of course, you need _BitScanReverse64 for 64-bit values, but that's already the case anyway. The above matches the output from Microsoft's version for all 32-bit values.
Ah, of course the above should be _BitScanReverse, not _BitScanReverse2 - that was left over from testing.
Oh, and the table can be unsigned char (was playing around with the first value before I realized it was undefined), and maybe the temporaries shouldn't be static for a multi-threaded environment. Sorry about the comment spam.
The #pragmas in comment 105 didn't work, so, since _BitScanReverse is the culprit, I just replaced the JS_CEILING_LOG2 with a dumb loop and that did fix the permanent red. So, thanks Ted, John and others!
Is that function at all performance critical? The version of _BitScanReverse I gave above is about as fast in terms of operation count as such a function gets, and you could probably use the code directly in JS_CEILING_LOG2 with slight modifications. Regardless, I'm glad this is fixed.
Is then good to go on the production side so the Tree can reopen ?
(In reply to comment #110) > Is that function at all performance critical? Nope (http://hg.mozilla.org/tracemonkey/rev/3c7e7c13c311), but thanks for the suggestions. (In reply to comment #111) > Is then good to go on the production side so the Tree can reopen ? I'm sorry I wasn't more specific, but this is for the ICE in jshashtable.h on TraceMonkey. The m-c issues seem to be, AFAICS, unrelated to the use of intrinsics.
> (In reply to comment #111) > > Is then good to go on the production side so the Tree can reopen ? > > I'm sorry I wasn't more specific, but this is for the ICE in jshashtable.h on > TraceMonkey. The m-c issues seem to be, AFAICS, unrelated to the use of > intrinsics. Oh, I was under the impression you were using TraceMonkey as a test guinea pig before commiting to the production side. So, in-other-words - we are still stuck for a fix on the compiler problem.
The tree is still closed blocking on this bug for which appears there is no fix yet. Jesse closed the tree specifically until we got one green cycle on Windows Opt, which we did. The tree closing rationale has morphed (per Tinderbox) into waiting until this is fixed but it doesn't look like a fix is anywhere near in hand for this. Should the tree still be blocking on this?
Has this been reproduced on a machine other than one of the build machines? If so, what are the steps to reproduce this? What OS version/compiler version/mozconfig is needed? I tried to reproduce this with a VM based on Win XP SP2, VS2008 and a mozconfig similar to the nightly build but couldn't reproduce the compiler bug. One thing that often helps in these cases is to track down the specific file where the problem occurs, then generate a preprocessed file which includes all header info. Then using pragma's, figure out the minimum range within the code that causes the problem.
This seems to be very difficult to reproduce on a non-build VM. catlee ran about 10 builds in a row on a build VM and failed to reproduce it. He finally reproduced it by running a script to exhaust virtual memory and then linking. John: it's harder than that, because it's the linker crashing while doing the final PGO link, so the current testcase is "all of xul.dll". We think we might be exhausting the 32-bit address space here, I think RelEng wanted to try taking a build VM, bumping the ram to 3GB, and booting it with the Windows 3GB address space flag. Also, I ran multiple PGO builds on my Win7 x64/ VC2008 machine (probably at least 7 in a row) without reproducing.
(In reply to comment #114) > Should the tree still be blocking on this? I think so, at least without an *active* sheriff to gate checkins. It looks like Windows builds were starting to fail most of the time, so given the long build-to-test times we want to avoid people rushing in on green. Plus, backouts get long and painful when builds don't reliably finish.
Also, with only one green build a day, noticing a perf regression could take a week.
[Probably shouldn't have an unowned tree blocker... Assigning to Ted since I heard him talking about it last week, but feel free to reassign to whomever is on the hook for this.]
Assignee: nobody → ted.mielczarek
Regarding the memory exhaustion hypothesis, I re-ran today's nightly on win32-slave39 at a time when the VM infrastructure was at minimal loading (a few try builds where running, plus the usual background from old branches that build continuously). It failed with LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage The VM had rebooted at 04:51 at the end of the previous run, then been idle until I started the nightly at 13:23. I was monitoring the memory usage with the Task Manager during both link phases. Towards the end of the 2nd link (performing the PGO), link.exe has a * "Mem Usage" of about 1170MB, which wikipedia informs me is the working set, the actual RAM in use * "VM Size" of 1111MB, which is a measure of the virtual address space of the process In the "Physical Memory" box of the Performance tab of the task manager there was still more than 450MB "Available", more than 600MB in "System Cache" (which I take to be file cache in memory). Now I had looked away at the point that link blew up, but the graph for the Page File Usage (which is really plotting the Total Commit Charge, which is really the total virtual memory in use) did not increase dramatically. It went up a bit but not enough to exhaust the memory. At catlee's suggestion I'm going to take the known-to-fail build dir from comment #93, put it on slave39 and see if I can reproduce the link failure. If I can then I'll try using /3G in boot.ini. Would it help any to see if Dr Watson catches the link crash and we can submit it to MS ?
(In reply to comment #120) > At catlee's suggestion I'm going to take the known-to-fail build dir from > comment #93, put it on slave39 and see if I can reproduce the link failure. If > I can then I'll try using /3G in boot.ini. Blew up at the first attempt. The sequence seems to be * link grabs a bunch of memory (>1GB), presumably by loading a bunch of files * thinks for 45 minutes, slight increase in memory usage * mspdbsvr uses cpu for a few seconds * link gets the cpu back * build craps out Trying /3GB now.
Attached image process info (deleted) —
Got three successful links when booted with /3GB. The attachment shows link using up almost all of the 2GB of virtual memory space for applications (courtesy of VMMap from SysInternals), and lots of headroom left over when /3GB is set. This makes it clear that it's reserving a much larger hunk of the address space than task manager reports as actually in use. We'll have to check if there are any downsides to setting /3GB, eg see http://blogs.technet.com/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx for some warnings, but relinking for PGO is likely to be the most stressful thing these machines are doing. The question then is why do some PGOs fail and others succeed? When we generate the optimization profile (by launching Firefox and running sunspider) do we sometimes exercise more or less code paths? In particular in the history code, given the frequency of nsAnnotationService.cpp in the error messages. The app is open for a little over 2 minutes. There are some differences in what PGO says it's going to do right after starting, eg in a log where it fails 2535 of 104349 ( 2.43%) profiled functions will be compiled for speed vs succeeding log (both without /3G) 3674 of 103933 ( 3.53%) profiled functions will be compiled for speed I only looked at these two logs so far, so this might not be typical. When it did succeed there's also 103933 of 106728 functions (97.4%) were optimized using profile data 1770914794 of 1844595302 instructions (96.0%) were optimized using profile data Finished generating code afterward, not clear how that relates to the original 3.53%. It's late now so I'll have to pick this up again tomorrow.
Just wanted to confirm that this crash is really caused by PGO. I went through every build log that was posted here so far, which gave the following statistics on where and why the compiler crashed: 5x 0x10C9CFA1 => c2.dll!PogoReadSimpleProbes()+0xF1 which dereferenced a pointer by pgodb80.dll!PogoDbReadSimpleProbeEx() 25x 0x10CBB2DA => c2.dll!PogoValReadValueCountsEx()+0x7A, which dereferenced a pointer by pgodb80.dll!PogoDbReadValueProbeEx() 11x 0x10CBB356 => c2.dll!PogoValReadValueCountsEx()+0xF6 which dereferenced a pointer by pgodb80.dll!PogoDbReadValueProbeData() 13x Out of heap space Said pointer often points to memory that is obviously not committed at all and sometimes even impossible to allocate such as NULL, -1 or in kernel space. I can't tell whether this is due to unchecked allocation failures / memory exhaustion or plain flawed program logic. However, other times the pointer seems to be plausible, so considering that the process's VM is very populated, I'm almost sure that the compiler often successfully reads some totally unrelated value from memory because the it happended to be allocated. This would ultimately question the meaningfulness of the decisions PGO makes. I hope raising the usermode VM to 3GB proves to be a reliable workaround, but if it doesn't, we probably should do non-PGO builds until MSFT can fix this. Debugging data can be collected by setting the LINK_REPRO to an empty folder: http://support.microsoft.com/kb/134650 or getting a recent version of the Windows Debugging Tools: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx and then forcing link.exe to be run under the NTSD debugger to catch the crash: [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ Image File Execution Options\link.exe] "Debugger" = "ntsd.exe -g -G" This essentially prepends every command line intending to execute link.exe with the given NTSD command line, with the "-g" and "-G" switches specifying that the program's startup and termination breakpoints should be ignored. Everything should be able to build without intervention until something crashes which then breaks into NTSD. A dump of all writeable pages can be done there: .dump /mFhutpwd link_crash.dmp and then quit the debugger with q But either way, since the crashes caused by compiling Tracemonkey occured at the same offsets, these are probably equal or related and MSFT responded that they can't fix them even in the upcoming VS2010 release: https://connect.microsoft.com/VisualStudio/feedback/details/532306 At least, the 3GB VM should be able to fix the out-of-heap errors, so I'm curious how this all will turn out. Hopefully with a cured, healthy tree.
Priority: -- → P1
(In reply to comment #123) > I'm almost sure that the compiler often > successfully reads some totally unrelated value from memory because the it > happended to be allocated. If that's true, it's a bit... terrifying. I wonder if any of the intermittent test failures could be due to this causing bad PGO builds (and, if so, maybe we should investigate a way to monitor memory usage during the build, so we can hard-fail the build if it starts using enough to do bad things). > But either way, since the crashes caused by compiling Tracemonkey occured at > the same offsets, these are probably equal or related and MSFT responded that > they can't fix them even in the upcoming VS2010 release: > https://connect.microsoft.com/VisualStudio/feedback/details/532306 Shaver has MSFT contacts; if the /3GB flag doesn't fix the problem, escalating through him is an option.
Sounds like Nick has a plan. This isn't anything I can fix, for sure.
Assignee: ted.mielczarek → nrthomas
(In reply to comment #124) > If that's true, it's a bit... terrifying. I wonder if any of the intermittent > test failures could be due to this causing bad PGO builds (and, if so, maybe we > should investigate a way to monitor memory usage during the build, so we can > hard-fail the build if it starts using enough to do bad things). Alternatively, can we sandbox the build process to keep track of exactly what memory it allocates, and return 0 (or some other predefined value) if it tries to access anything it never allocated? I'm not saying that would fix things, or be a realistic long-term solution (especially considering what it would no doubt do to compilation speed ... at least PGO should always be relative to everything else), but it would at least give us some consistency and an idea of where and when things are failing.
Confirmed today that /3GB in boot.ini also fixes PGO on an ix slave (using the broken objdir catlee created earlier). This opsi package updates boot.ini with /3GB. I've tested it on win32-slave21 and mw32-ix-slave01 - it deploys, can be backed out, and deployed again as expected. They had the exactly the same boot.ini so that was helpful. In order to get some full-build testing I've set up * staging-master02 with win32-slave03, 21, and mw32-ix-slave01 (/3GB in use) * triggered a bunch of builds * WINNT 5.2 mozilla-central (build|nightly|leak test) * WINNT 5.2 mozilla-1.9.2 (build|unit test|leak test) * the first builds started at Tue Feb 16 00:56 PST. Please check the results there, and if they are OK go ahead and deploy this to the production VMs. (win32-slave04 is also set to pickup the opsi package next time it reboots, it's on sm01).
Attachment #427083 - Flags: review?(catlee)
You may want to be careful with /3gb; in my experience (with mostly server workloads) it can cause hard-to-diagnose problems with the base OS. With /3gb enabled, if you have more than 4gb physical memory installed you can run out of PTEs, and you can run out of pool memory regardless of total memory. The article Nick Thomas (comment 122) mentioned is good, http://blogs.technet.com/markrussinovich/archive/2009/03/26/3211216.aspx also has a good description of pool limits and how you can monitor them.
Thanks for the heads up. These machines only have 2GB installed, AFAIK (although the new hardware slaves have 4GB), and their workload consists entirely of running the build, basically. Running the PGO link phase is the most demanding task any of them will do. I wouldn't rule out hitting weird problems due to this (I've learned that there's no problem so weird that we can't hit it), but I'm optimistic. I think the better long-term fix would be to switch to a 64-bit OS on the build machines, so the linker gets a full 4GB of address space.
Also note that we're rebooting after every build, which likely reduces the chance of hitting any weirdness.
(In reply to comment #127) > Created an attachment (id=427083) [details] > [opsi-package-sources] Add /3GB to boot.ini > > Confirmed today that /3GB in boot.ini also fixes PGO on an ix slave (using the > broken objdir catlee created earlier). > > This opsi package updates boot.ini with /3GB. I've tested it on win32-slave21 > and mw32-ix-slave01 - it deploys, can be backed out, and deployed again as > expected. They had the exactly the same boot.ini so that was helpful. > > In order to get some full-build testing I've set up > * staging-master02 with win32-slave03, 21, and mw32-ix-slave01 (/3GB in use) > * triggered a bunch of builds > * WINNT 5.2 mozilla-central (build|nightly|leak test) > * WINNT 5.2 mozilla-1.9.2 (build|unit test|leak test) > * the first builds started at Tue Feb 16 00:56 PST. > Please check the results there, and if they are OK go ahead and deploy this to > the production VMs. I just had a look and there's no issues that I can see. I'll land this and get it rolling out.
Comment on attachment 427083 [details] [diff] [review] [opsi-package-sources] Add /3GB to boot.ini changeset: 40:402576d3617e I've set this package to roll out across all of the build farm. They'll pick it up on the next reboot -- expect to see at least a few more failures of this type, though. Not all of the slaves will be rebooting right away.
Attachment #427083 - Flags: review?(catlee)
Attachment #427083 - Flags: review+
Attachment #427083 - Flags: checked-in+
(In reply to comment #132) > (From update of attachment 427083 [details] [diff] [review]) > changeset: 40:402576d3617e For posterity, I accidentally landed this as Lukas, rather than Nick. Sorry about that, both of you!
win32-slave01 thru 59 are all have updated boot.ini. We'll have to update the try slaves too if we enable PGO there.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Nick, Ben, thanks for tackling this and getting it fixed!
(In reply to comment #134) > win32-slave01 thru 59 are all have updated boot.ini. We'll have to update the > try slaves too if we enable PGO there. Thanks for updating this bug, I forgot to. I actually went ahead and rebooted all the try slaves while deploying this, so we're set to go there, too.
In bug 565402 I was looking to see if we need this change for 64-bit build machines. From my reading of: http://technet.microsoft.com/en-us/library/cc786709%28WS.10%29.aspx It seems that we don't need to do the 4GT tuning (aka /3GB switch) as done for the Win2k3 32-bit machines [1]. The only thing that I noticed that might be needed is to set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag [2]. [1]: > 4-gigabyte tuning (4GT), also known as application memory tuning, or the /3GB > switch, is a technology (only applicable to 32 bit systems) that alters the > amount of virtual address space available to user mode applications. Enabling > this technology reduces the overall size of the system virtual address space > and therefore system resource maximums. [2]: > 2 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE cleared (default) > 4 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE set
IMAGE_FILE_LARGE_ADDRESS_AWARE is a flag that gets set on executable files. The compiler and linker already have this set.
Blocks: 709193
Blocks: 750661
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: