Closed
Bug 669384
Opened 13 years ago
Closed 10 years ago
Windows 64-bit leak builds fail to buildsymbols
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
References
Details
(Whiteboard: [debug win64] It seems to be a Microsoft bug. Reach them with a smaller test case)
Attachments
(2 files)
(deleted),
application/x-ms-dos-executable
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
After compilation the buildsymbols step is run but it times out.
Make buildsymbols would stop doing anything when it reached xul.pdb [1]
If I run manually $srcdir/toolkit/crashreporter/tools/win32/dump_syms_vc1500.exe $objdir/toolkit/library/xul.pdb I get a lot of output and at some point it would just stop and that's it [2].
dump_syms_vc1500.exe will go up to ~294,000 K memory usage and just stop (without finishing) and will go from 3-4% CPU to a 25% constant CPU usage.
I have no idea of what is going on but for now I will have to disable the step for Windows 64-bit leak builds.
[1]
Processing file: .\toolkit\library\xul.pdb
[2]
FUNC 2248b00 15 0 `getReallocHooker'::`2'::`dynamic atexit destructor for 'gReal
locHooker''
FUNC 2248b20 15 0 `getNewHooker'::`2'::`dynamic atexit destructor for 'gNewHooke
r''
FUNC 2248b40 15 0 `getDeleteHooker'::`2'::`dynamic atexit destructor for 'gDelet
eHooker''
FUNC 2248b60 15 0 `getVecNewHooker'::`2'::`dynamic atexit destructor for 'gVecNe
wHooker''
FUNC 2248b80 15 0 `getVecDeleteHooker'::`2'::`dynamic atexit destructor for 'gVe
cDeleteHooker''
FUNC 2248ba0 15 0 `dynamic atexit destructor for 'tracked_objects::ThreadData::l
ist_lock_''
FUNC 2248bc0 15 0 `tracked_objects::Comparator::ParseKeyphrase'::`2'::`dynamic a
texit destructor for 'key_map''
FUNC 2248be0 15 0 `anonymous namespace'::`dynamic atexit destructor for 'gProces
sLog''
FUNC 2248c00 15 0 mozilla::gl::`dynamic atexit destructor for 'gGlobalContext''
FUNC 2248c20 15 0 mozilla::gl::`dynamic atexit destructor for 'gGlobalContext''
FUNC 2248c40 15 0 `dynamic atexit destructor for 'js::JSScriptedProxyHandler::si
ngleton''
FUNC 2248c60 15 0 `dynamic atexit destructor for 'JSWrapper::singleton''
FUNC 2248c80 15 0 `dynamic atexit destructor for 'JSCrossCompartmentWrapper::sin
gleton''
FUNC 2248ca0 15 0 js::`dynamic atexit destructor for 'LogController''
Reporter | ||
Comment 1•13 years ago
|
||
Here is the zipped up xul.pdb file.
http://people.mozilla.com/~armenzg/win64/xul.zip
ted was able to reproduce this locally:
> ted: FWIW, I'm testing with dump_syms.exe built with VC2010 and I see the
> same thing, but it might be a property of the .PDB file being built by VC2008
Comment 2•13 years ago
|
||
I ran this in a debugger, the hang appears to be entirely within the MS DIA DLLs, so this might just be a bug in Microsoft's tools. :-/
Comment 3•13 years ago
|
||
msdia90!CDiaEnumSymbolsByAddr::Next doesn't return... Also although I use win64 version of dump_sym.exe, it still hangs...
Comment 4•13 years ago
|
||
- If we use SHARED_JS=1 (build mozjs.dll), this doesn't occurs. Current Win64 build has no mozjs.dll (integrated into xul.dll).
- This works with --enable-options even if --enable-debug
Comment 5•13 years ago
|
||
It seems like this is either a bug in msdia, or a bug in the PDB files that the compiler produces. Armen said he was going to try building a debug build with vc2010 to see if the issue happens there as well.
Reporter | ||
Comment 6•13 years ago
|
||
I have a mobile release which is not going cool and will not have time today to work on this.
Reporter | ||
Comment 7•13 years ago
|
||
Very interesting. I just noticed that the optimized build symbols step takes 45 mins rather than the 5 mins that it takes on WINNT 5.2.
Here is a log in case it has some interesting information:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1310032964.1310048195.21709.gz&fulltext=1
Reporter | ||
Updated•13 years ago
|
Blocks: support-win64
Reporter | ||
Comment 8•13 years ago
|
||
I am building now a leak build with VS2010.
Reporter | ||
Comment 9•13 years ago
|
||
I have the same problem but with dump_syms_vc1600.exe; do we have a 64-bit version of that file?
It freezes at ~320,000 K of memory usage and 25% CPU usage.
If I kill the process through the task manager and then I kill makecab the buildsymbols output continues.
85929934 firefox-8.0a1.en-US.win64-x86_64.crashreporter-symbols-full.zip
17928151 firefox-8.0a1.en-US.win64-x86_64.crashreporter-symbols.zip
Any ideas on what I could try next?
Comment 10•13 years ago
|
||
(In reply to comment #9)
> I have the same problem but with dump_syms_vc1600.exe; do we have a 64-bit
> version of that file?
It still occurs on 64-bit version.
Reporter | ||
Comment 11•13 years ago
|
||
I created this patch to skip creating symbols but I found that make package fails (bug 670915) as well.
Nevertheless, I wanted to attach the patch just in case it happens that we decide taking it if "make package" gets fixed before this bug.
Comment 12•13 years ago
|
||
FWIW, I don't know if there's anything we can fix, this seems to be entirely a bug in Microsoft's library triggered by the PDB files that get produced in the debug build. The only thing I can suggest trying is a debug build with Visual C++ 2010 to see if the PDB files that version generates still trigger the problem.
Comment 13•13 years ago
|
||
Oh, sorry, I missed comment 9. I don't really have any other suggestions. :-/
Reporter | ||
Comment 14•13 years ago
|
||
Anyone that I can approach to reach Microsoft?
Reporter | ||
Comment 15•13 years ago
|
||
Would you suggest we disable buildsymbols on leak builds for Windows 64-bit so at least we get coverage until we figure this issue? (pending bug 670915)
Depends on: 670915
Comment 16•13 years ago
|
||
That sounds like a reasonable approach for now. I don't know who we'd contact at Microsoft, we might have to create a smaller testcase to file a bug with them.
Reporter | ||
Updated•13 years ago
|
Whiteboard: It seems to be a Microsoft bug. Reach them with a smaller test case
Reporter | ||
Updated•13 years ago
|
Whiteboard: It seems to be a Microsoft bug. Reach them with a smaller test case → [debug win64] It seems to be a Microsoft bug. Reach them with a smaller test case
Reporter | ||
Comment 17•13 years ago
|
||
I have filed bug 685887 to disable symbols for debug builds for now.
Meanwhile we will need to create a smaller testcase to bug Microsoft with.
Anyone on your side that could help creating the testcase?
Comment 18•13 years ago
|
||
filed to connect.microsoft.com.
https://connect.microsoft.com/VisualStudio/feedback/details/722366/idiaenumsymbolsbyaddr-next-doesnt-return-huge-pdb
Also, When I test VS2010 with PGO on try server yesterday, this problem occurs even if it is optimized build.
Comment 19•12 years ago
|
||
At connect.microsoft.com bug is closed as fixed. I guess it means it no longer presents in VS 2012.
Comment 20•11 years ago
|
||
A few months ago someone on the Breakpad mailing list posted some code he had written using various open source bits to dump PDB files:
https://groups.google.com/forum/#!topic/google-breakpad-discuss/F0jMWxmWk0M
This might be worth looking into if this isn't fixed in a toolchain we can use for our Win64 builds. I just replied to his list post, the only sticking point for us is probably the licensing of the code he used to build it.
Comment 22•11 years ago
|
||
Per Makoto in bug 893139 comment 15, this is fixed in VS2012.
Comment 23•11 years ago
|
||
Should we be adding VS2012 to our Windows build slaves?
Flags: needinfo?(catlee)
Comment 24•11 years ago
|
||
I spoke with catlee today and he said we should look into getting VS2012 on our build machines. markco of RelOps says it should be easy to do, so I've requested a staging install in bug 946859.
Flags: needinfo?(catlee)
Comment 25•11 years ago
|
||
ted: do you see any issue with going straight to VS2013? See bug 914523 for background.
Flags: needinfo?(ted)
Comment 26•11 years ago
|
||
No, I think that's the right move. If the issue was fixed in 2012 then 2013 should be fine. Ideally we'll be moving our x86 builds to 2013 in the near future as well, so it makes sense to try to get to the same version for both platforms.
Flags: needinfo?(ted)
Comment 27•11 years ago
|
||
We are going to solve this with VS2013 instead. Some context from #releng:
[2:34pm] jhopkins: markco: remember the work we did to get VS2012 installed via GPO? we are instead going to skip to VS2013 (see bug 914523). could you look into installing VS2013 alongside VS2010 via GPO?
[2:36pm] markco: jhopkins: yes, what is the time frame on it?
[2:36pm] jhopkins: dmajor: ^
[2:36pm] ted: "now"
[2:39pm] dmajor: markco: the compiler update that we want is still in preview status, so it's not a "we need it yesterday" thing, but the sooner we can start testing, the better shape we'll be in when it's released
[2:40pm] jhopkins: dmajor: what is the expected release date?
[2:40pm] dmajor: jhopkins: AFAIK no official word but there are hints that it may be a month or two
[2:41pm] jhopkins: dmajor: ok. i assume that will be an acceptable delay for win64 debug symbols in bug 669384. cc: vlad
[2:42pm] markco: dmajor: rgr could you open up a relops bug for it please?
[2:42pm] dmajor: markco: bug 914523 or something separate?
[2:43pm] markco: dmajor, that'll work. I will open up a blocking bug on that.
[2:43pm] dmajor: thanks!
[2:43pm] ted: jhopkins: yeah, the long-term benefits make sense here
[2:43pm] ted: standardizing on the same toolchain version
[2:43pm] jhopkins: ok, great, sounds like a plan. thanks all!
Depends on: 914523
Comment 29•11 years ago
|
||
So, upstream breakpad landed a patch that might work around this problem:
https://code.google.com/p/google-breakpad/source/detail?r=1316
We could update our in-tree copy of dump_syms and see if it helps.
Comment 30•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #29)
> So, upstream breakpad landed a patch that might work around this problem:
> https://code.google.com/p/google-breakpad/source/detail?r=1316
>
> We could update our in-tree copy of dump_syms and see if it helps.
Am I understanding correctly that the STR are to build dump_syms with VS2010 and run it on a 64-bit xul.pdb?
On a stale copy of the breakpad tree (maybe a few months old) I gave up after 10 minutes. With the revision above, it finished in 9 seconds. It did change the output somewhat; I can't say whether that would cause problems.
Comment 31•11 years ago
|
||
(In reply to David Major [:dmajor] (UTC+12) from comment #30)
> Am I understanding correctly that the STR are to build dump_syms with VS2010
> and run it on a 64-bit xul.pdb?
Yes.
> On a stale copy of the breakpad tree (maybe a few months old) I gave up
> after 10 minutes. With the revision above, it finished in 9 seconds. It did
> change the output somewhat; I can't say whether that would cause problems.
Sounds like the build speed issue is fixed then. I think what we need to do is compare the output on a 32-bit PGO build, since I know Chrome doesn't build PGO, so Google wouldn't have tested that thoroughly.
Comment 32•11 years ago
|
||
I tried on the 29.0 release, which I hope has PGO. The diff has some of the expected things: some new PUBLIC entries, some FUNC entries picked a different name for comdat-folded functions.
Interestingly, a large region of duplicates has been removed. There used to be a bunch of FUNC entries that appeared twice in the old dump, with identical contents. The new dump only has them once each.
Comment 33•11 years ago
|
||
Cool, sounds like it's just fiddly stuff then. I filed bug 1003085 to update the in-tree dump_syms.
Comment 34•10 years ago
|
||
Ehsan: can you merge m-c to date to pick up bug 1003085? That should fix this issue.
Flags: needinfo?(ehsan)
Comment 35•10 years ago
|
||
I can, but I don't own the date branch any more. :-) Can you or someone else who owns it take over please (not sure who that person would be)?
Flags: needinfo?(ehsan)
Comment 36•10 years ago
|
||
Sorry, I was going by:
https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE
Is there anyone driving Win64 build testing anymore?
Comment 37•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #36)
> Sorry, I was going by:
> https://wiki.mozilla.org/ReleaseEngineering/
> DisposableProjectBranches#BOOKING_SCHEDULE
Uh-oh, that page says Vlad now. ;-)
> Is there anyone driving Win64 build testing anymore?
Not that I know of. Vlad?
Flags: needinfo?(vladimir)
johnath & the firefox team/org owns it now and is actively driving it (or should be)
Flags: needinfo?(vladimir)
Comment 39•10 years ago
|
||
Rob Strong says we don't have anyone actively working on Win64 support, FWIW. That's okay, I was just under the impression that we did (we did at one point, certainly) and wanted to poke the right person here.
Comment 40•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #39)
> Rob Strong says we don't have anyone actively working on Win64 support, FWIW.
Well, I hear that the Firefox team/org (as vlad mentioned) is actively investigating picking up Win64 again. That said, right now we are creating Nightly builds so people can test them but do not actively develop Win64 builds, from all I know.
Comment 41•10 years ago
|
||
Fixed by bug 1003085.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•