Closed
Bug 1329888
Opened 8 years ago
Closed 8 years ago
Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
WONTFIX
Performance Impact | low |
People
(Reporter: ehsan.akhgari, Assigned: mstange)
References
Details
Attachments
(1 file)
(deleted),
image/png
|
Details |
When profiling bug 1269695, I noticed that we spend around 318ms in arena_dalloc coming from LifoAlloc::freeAll(), and it seems most of this time is going to memsetting the memory to 0 (presumably through jemalloc's opt_zero configuration).
Is it possible to avoid this for LifoAlloc?
Flags: needinfo?(jdemooij)
Flags: needinfo?(emanuel.hoogeveen)
Comment 1•8 years ago
|
||
jemalloc doesn't fill memory with 0 on free. It fills with 0xe5.
Comment 2•8 years ago
|
||
Right - as far as I can tell opt_free is never set, but we always set opt_poison. So if we're zeroing it isn't coming from jemalloc. If it's the poisoning that's taking this long.. well, it's a compile time flag, so it can only be enabled or disabled globally. I've never really looked at the LifoAlloc implementation so I don't know the constraints, but could we make freeAll() asynchronous?
Flags: needinfo?(emanuel.hoogeveen)
Reporter | ||
Comment 3•8 years ago
|
||
Hmm, the profile here <https://clptr.io/2jGzv4R> suggests that arena_dalloc is calling _platform_bzero$VARIANT$Haswell and as far as I know that's what memsetting memory to 0 will translate to on OSX with Haswell CPUs, so there is _something_ setting the memory to 0...
Comment 4•8 years ago
|
||
There is no apparent code path from arena_dalloc to a memset(ptr, 0, size), only 0xe5, at jemalloc.c:4638 and jemalloc.c:4731. Even if opt_zero was set, memset(ptr, 0, size) would happen on *malloc*, not free. I can see how madvise would be called, but I don't expect that to call _platform_bzero. So really, I have no idea where this would come from.
Comment 5•8 years ago
|
||
<ehsan> glandium: I'm really out of ideas :(
<glandium> ehsan: dtrace?
<ehsan> glandium: yeah maybe
<glandium> ehsan: or simply a debugger with a breakpoint in _platform_bzero
<ehsan> glandium: can you ni? me on the bug for that please?
Flags: needinfo?(ehsan)
Reporter | ||
Comment 6•8 years ago
|
||
Examining this under the debugger, it is *arena_malloc* not arena_dalloc that is calling _platform_bzero$VARIANT$Haswell! So this is either us incorrectly walking the stack or incorrectly symbolicating.
Markus, do you mind taking a look please?
Component: JavaScript Engine → Gecko Profiler
Flags: needinfo?(ehsan)
Summary: Consider opting out of jemalloc zeroing in arena_dalloc → Gecko profiler mistakenly shows _platform_bzero$VARIANT$Haswell being called from arena_dalloc (instead of arena_malloc)
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(jdemooij) → needinfo?(mstange)
Comment 7•8 years ago
|
||
Why would that be attributed to lifoAlloc::freeAll then?
Comment 8•8 years ago
|
||
(other than that, memset(..., 0, ...) from arena_malloc means calloc is being called, and that it's the expected behavior)
Updated•8 years ago
|
Whiteboard: [qf:p5]
Assignee | ||
Comment 9•8 years ago
|
||
I think this might be caused by https://github.com/devtools-html/Gecko-Profiler-Addon/issues/29 , i.e. we're using symbols from the wrong architecture (x86_64 instead of x86_64h).
In bug 1329111 I'm going to add an arch field to the shared library information value that we expose to the profiler, so that it can pick the correct architecture.
Updated•8 years ago
|
Comment 10•8 years ago
|
||
Markus are you still looking at this? It's still giving me grief in the "make jemalloc faster for stylo" bugs I'm looking into.
Flags: needinfo?(mstange)
Assignee | ||
Comment 11•8 years ago
|
||
This should be fixed. If it's still happening I'd like to look into it again. Do you have STR?
Flags: needinfo?(mstange)
Comment 12•8 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #11)
> This should be fixed. If it's still happening I'd like to look into it
> again. Do you have STR?
The STR are the same, record a profile, bzero is blamed on dalloc. For something more concrete:
#1 ./mach run --diable-e10s
#2 Start profiling
#3 Load https://en.wikipedia.org/wiki/Barack_Obama
#4 Stop pofiling
#5 Invert the stack, note bzero shows up, note the step up is dalloc, note that's not possible
Assignee | ||
Comment 13•8 years ago
|
||
I can reproduce this. However, Instruments shows the same thing.
Assignee | ||
Comment 14•8 years ago
|
||
memset and bzero share a lot of code. memset has a different beginning but then jumps into the code that it shares with bzero. Frame addresses in the shared part are symbolicated as bzero because bzero is the closer symbol.
Assignee | ||
Comment 15•8 years ago
|
||
I don't think there's much we can or should do about this.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Assignee | ||
Comment 16•8 years ago
|
||
Oh, and memset is being called in order to fill the freed memory with 0xe5.
Comment 17•8 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #14)
> Created attachment 8864269 [details]
> screenshot of __platform_memset$VARIANT$Haswell disassembly
>
> memset and bzero share a lot of code. memset has a different beginning but
> then jumps into the code that it shares with bzero. Frame addresses in the
> shared part are symbolicated as bzero because bzero is the closer symbol.
Thanks Markus, that makes a lot more sense now.
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [qf:p5]
You need to log in
before you can comment on or make changes to this bug.
Description
•