Closed
Bug 549561
Opened 15 years ago
Closed 14 years ago
tracemalloc is very slow on win32
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: dbaron)
References
Details
Attachments
(6 files, 1 obsolete file)
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/plain; charset=UTF-8
|
Details | |
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
The time to run the AliveTest for TraceMalloc, aka
python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks=sdleak.log
in dir obj-firefox/_leaktest
is about 90 seconds for linux and mac, but is over 70 minutes on the new hardware slaves for win32. I've also seen looked at the Task Manager while the AliveTest is running and the machine is mostly idle, with Firefox occasionally using < 5% of CPU. Could it be blocking on something ? Could a developer take a debugger to one of the builds in
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/
and try to reproduce this ?
Complete guess at the right place to file this, please fix as required.
Assignee | ||
Comment 1•15 years ago
|
||
I think this is largely a function of the Windows stack walking code. I think it's possible that:
* the locking may be too fine-grained
* we may be doing some work in NS_StackWalk that we could defer until NS_DescribeCodeAddress
* other things
Profiling is really the way to find out.
Reporter | ||
Comment 2•15 years ago
|
||
Well, let me approach this from a different angle. Are the MH, A, and Lk values valuable enough to spend 70-90 minutes generating them ? Do you know if the leak regressions in
http://graphs.mozilla.org/graph.html#tests=[{%22test%22:%2228%22,%22branch%22:%221%22,%22machine%22:%22107%22}]&sel=1262036479,1268200095
have been noticed and are filed ?
Assignee | ||
Comment 3•15 years ago
|
||
I think they probably haven't been filed because catlee's scripts aren't pointing them out, and people have stopped looking at the graphs and are relying on the scripts.
Assignee | ||
Comment 4•15 years ago
|
||
Also, that graphs.m.o URL you give doesn't actually load (I've been waiting since before I wrote the previous comment), so I can't see what the regressions are.
Assignee | ||
Comment 5•15 years ago
|
||
To fix this, we should either:
* find somebody who knows how to use a Windows profiling tool, or
* add an option to make trace-malloc not generate stacks, and add that option to the windows build configs
The second is pretty easy, and should make the performance problem pretty much go away.
Assignee | ||
Comment 6•15 years ago
|
||
Filed bug 551530 on the largest of the regressions in the graph you pointed to, and bug 551531 on the lack of regression emails.
Assignee | ||
Comment 7•14 years ago
|
||
This isn't a huge win, but it's a small perf win I noticed. There's only one caller of calltree(), and that caller aquires the lock right after calling calltree, and calltree also acquires the lock at the beginning of the function and releases it at the end, so we may as well change calltree to require its caller to be holding the lock.
Attachment #466372 -
Flags: review?(benjamin)
Assignee | ||
Comment 8•14 years ago
|
||
This is the main performance win here; we'll need to change the relevant mozconfigs to benefit.
This adds a configure option to disable the stack-walking part of trace-malloc. I stubbed things out so that everything that would have a stack has a bogus single-frame stack, to avoid having to hack tmreader to deal with empty stacks.
Attachment #466374 -
Flags: review?(benjamin)
Updated•14 years ago
|
Attachment #466372 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 9•14 years ago
|
||
Attachment #466374 -
Attachment is obsolete: true
Attachment #466683 -
Flags: review?(benjamin)
Attachment #466374 -
Flags: review?(benjamin)
Assignee | ||
Comment 10•14 years ago
|
||
Updated•14 years ago
|
Attachment #466683 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 11•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/ea54a11bd987
http://hg.mozilla.org/mozilla-central/rev/60944ddadc17
Now I should investigated whether the scripts in which we should set
NS_TRACE_MALLOC_DISABLE_STACKS=1
are in the tree (I think they are...).
Reporter | ||
Comment 12•14 years ago
|
||
Great to see progress on this bug. Any news on comment #11 ?
Assignee | ||
Comment 13•14 years ago
|
||
OK, build/leaktest.py.in uses automation.py, so I think I just want to patch automation.py. That will affect unit tests too, which is good.
Assignee | ||
Comment 14•14 years ago
|
||
Attachment #474059 -
Flags: review?(ted.mielczarek)
Updated•14 years ago
|
Attachment #474059 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 15•14 years ago
|
||
So it turns out that while this works fine on all platforms other than windows, it causes a crash on shutdown on the Windows debug leak test machines. (And since we don't have Windows debug unit tests on try, I can't tell if it does anything to them; if we did, they might provide more useful information that would let me figure out what the problem is.)
So this is on hold for now.
Reporter | ||
Comment 16•14 years ago
|
||
Filed bug 595730 for the test coverage regression on try.
Depends on: 595730
Assignee | ||
Comment 17•14 years ago
|
||
I got a unit test run with all the unit tests orange as well; it requires --shutdown-leaks=<file> to cause the orange. None of the orange runs had a minidump (they seem to be broken on Windows debug these days; there's a bug somewhere), but perhaps that'll be enough information to figure out what's up.
Assignee | ||
Comment 18•14 years ago
|
||
I can reproduce the crash in a Windows VM, but I haven't found anything enlightening.
Assignee | ||
Comment 19•14 years ago
|
||
This fixes a single missing line from patch 2 that caused an uninitialized variable and led to the orange when I set the env var (presumably because we'd randomly miss both mallocs and frees, and thus have dangling pointers).
With this, attachment 474059 [details] [diff] [review] is ready to land, and should make Windows debug builds quite a bit faster.
Attachment #477246 -
Flags: review?(benjamin)
Updated•14 years ago
|
Attachment #477246 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 20•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/0778c7931695
http://hg.mozilla.org/mozilla-central/rev/d20b0bf1cf3a
Assignee: nobody → dbaron
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 21•14 years ago
|
||
I reverted the change for Mac only:
http://hg.mozilla.org/mozilla-central/rev/6a3efef10e69
since it caused the Mac OS X 10.5 32-bit leak test trace-malloc leak test to hang; the 10.6 64-bit run of the same test was fine.
(My guess is it might be related to bug 478195.)
You need to log in
before you can comment on or make changes to this bug.
Description
•