Closed
Bug 392009
Opened 17 years ago
Closed 17 years ago
reduce code duplication between Linux and Windows trace-malloc
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
FIXED
People
(Reporter: dbaron, Assigned: brendan)
References
Details
Attachments
(5 files, 6 obsolete files)
(deleted),
patch
|
dbaron
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
brendan
:
review+
brendan
:
approval1.9+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dbaron
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
brendan
:
review+
brendan
:
approval1.9+
|
Details | Diff | Splinter Review |
We should reduce code duplication between Linux and Windows trace-malloc by using the MallocCallback, etc., functions that are currently used only on Windows for all platforms. Note that there are a few things in the Linux ones that are not in the Windows ones, but should be (calls to NS_TraceStack), and we should probably also fix bug 392008 at the same time.
Assignee | ||
Updated•17 years ago
|
Blocks: aboutmemory
Assignee | ||
Updated•17 years ago
|
Assignee | ||
Updated•17 years ago
|
No longer blocks: aboutmemory
Assignee | ||
Comment 3•17 years ago
|
||
I reordered a big #if defined(XP_UNIX) && !defined(XP_MACOSX)...#endif #ifdef XP_MACOSX...#endif section to follow the standard-elsewhere XP_MACOSX first, then XP_UNIX, then XP_WIN32 test order. The first such #if/#elif chain ends with #else # error ... #endif. This makes the diff less readable, but only slightly. /be
Assignee | ||
Comment 4•17 years ago
|
||
Attachment #277828 -
Attachment is obsolete: true
Attachment #277921 -
Flags: review?(dbaron)
Attachment #277828 -
Flags: review?(dbaron)
Assignee | ||
Comment 5•17 years ago
|
||
I am trying to set up a Linux build, but if anyone with Linux could test this patch, I'd be grateful. /be
Attachment #277921 -
Attachment is obsolete: true
Attachment #277996 -
Flags: review?(dbaron)
Attachment #277921 -
Flags: review?(dbaron)
Assignee | ||
Comment 6•17 years ago
|
||
Now two threads may race through NS_TraceMallocDisable and/or NS_TraceMallocEnable and only one will unhook or hook as required. The other may race ahead and call malloc, etc., with tracing of that call enabled or disabled until the unhook or hook happens on the other thread. This seems good enough for MT scenarios, better than fighting to store 0 or 1 in tracing_enabled. /be
Attachment #277996 -
Attachment is obsolete: true
Attachment #277998 -
Flags: review?(dbaron)
Attachment #277996 -
Flags: review?(dbaron)
Assignee | ||
Comment 8•17 years ago
|
||
dbaron, did you test that Linux patch? I'll apply an updated patch if that works for you. /be
Status: NEW → ASSIGNED
Reporter | ||
Comment 9•17 years ago
|
||
Seems fine on Linux (except we might want to ignore one additional stack frame; not sure if that's the case on other platforms yet). I got a crash on shutdown on Windows, though...
Reporter | ||
Comment 10•17 years ago
|
||
Never mind the crash on Windows on shutdown. It was a typo in my review-comment-fix for bug 391141 that I've had in my tree for ages without testing it. But on Windows we don't want to ignore an additional stack frame, so it's probably ok as-is, at least for now.
Reporter | ||
Comment 11•17 years ago
|
||
Seems fine on Mac as well, although there I'm getting two frames of stack more than needed.
Reporter | ||
Comment 12•17 years ago
|
||
Comment on attachment 277998 [details] [diff] [review] weak MT consistency for tracing_enabled r=dbaron with the build fixes in the patch I provided. Sorry for the long delay here.
Attachment #277998 -
Flags: review?(dbaron) → review+
Assignee | ||
Comment 13•17 years ago
|
||
Fixed, thanks dbaron: tools/trace-malloc/lib/nsTraceMalloc.c 1.76 tools/trace-malloc/lib/nsTraceMallocCallbacks.h 1.5 If the skip parameters need tweaking, please feel free to do so and check in with rs=me. /be
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 14•17 years ago
|
||
Anyone have an idea why the Lk jumped up after this checkin ? http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_leaks&units=bytes&tbox=fxdbug-linux-tbox.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:09:24:18:32:51,7272033 Before Lk: 4.5 After Lk: 6.94 and higher at times Real leak - ??
Reporter | ||
Comment 15•17 years ago
|
||
So it looks like this drastically changed the amount of leaks reported by leakstats (which processes the log generated by --trace-malloc) but didn't change what shows up in the log generated by --shutdown-leaks.
Reporter | ||
Comment 16•17 years ago
|
||
And I confirmed that the two numbers matched without the patch, and don't match with the patch. (In my own build, which has the patches above, not what was checked in.)
Reporter | ||
Comment 17•17 years ago
|
||
Seems like the problem is that realloc(NULL, n) calls both the malloc callback and the realloc callback (and we're not suppressing one of them; not sure if we could suppress one).
Reporter | ||
Comment 18•17 years ago
|
||
This fixes the regression -- or at least almost all of it. I still see a tiny difference in the numbers (I may have seen that before, too, though).
Attachment #282291 -
Flags: review?(brendan)
Attachment #282291 -
Flags: approval1.9?
Reporter | ||
Comment 19•17 years ago
|
||
I didn't see the tiny difference (a few thousand bytes) before. Not sure what's up with that, yet, but I think the above patch is good.
Assignee | ||
Comment 20•17 years ago
|
||
Comment on attachment 282291 [details] [diff] [review] patch to fix leak stats regression > start = PR_IntervalNow(); >+ /* >+ * __libc_realloc(NULL, size) recurs into my_malloc_hook, so it's >+ * important that we've incremented t->suppress_tracing here >+ */ Blank line before major comment would be nice. r+a=me with that -- thanks for fixing this. /be
Attachment #282291 -
Flags: review?(brendan)
Attachment #282291 -
Flags: review+
Attachment #282291 -
Flags: approval1.9?
Attachment #282291 -
Flags: approval1.9+
Reporter | ||
Comment 24•17 years ago
|
||
I expect the remainder of the discrepancy is related to races saving and restoring the hooks on different threads. We need to acquire a lock (probably tmlock should be fine) before saving/restoring hooks -- and avoiding trying to re-enter the lock might be a little tricky.
Assignee | ||
Comment 25•17 years ago
|
||
dbaron, I didn't realize multiple threads were disabling and enabling (saving and restoring the hooks) -- how does that happen? Followup bug? /be
Reporter | ||
Comment 26•17 years ago
|
||
The hooks themselves save and restore the hooks when calling the original function. This can race with malloc on other threads. (And things can also be a bit confusing if allocation functions call each other, which they do sometimes.) I'd also note that the use of __libc_* isn't needed anymore since we're not overriding them, so __libc_malloc is now always the same as malloc.
Reporter | ||
Comment 27•17 years ago
|
||
I thought this would fix the remaining inconsistency, but it didn't. I suspect it's needed anyway, though.
Reporter | ||
Comment 28•17 years ago
|
||
I can't figure out what the remaining problem is. The leak numbers are still oscillating a good bit more randomly than they used to. We could try the above patch, although something is still wrong. I tend to think it's probably time to back out; I need to work on other things.
Reporter | ||
Comment 29•17 years ago
|
||
(What I do know is that the inconsistency comes from missing frees for pointers that are later reused. leakstats counts that as a leak, but it doesn't show up in the sdleak.log since it overwrites the entry for that address.)
Reporter | ||
Comment 30•17 years ago
|
||
Now that I think about this, there's no reason I should expect that patch to work, since while it's holding the lock other threads won't hit the hook to wait to acquire it.
Assignee | ||
Comment 31•17 years ago
|
||
This should restore the consistent numbers. I hope it compiles -- I don't have a Linux box handy. Thanks for help compiling and testing. /be
Attachment #282364 -
Attachment is obsolete: true
Attachment #282480 -
Flags: review?(dbaron)
Reporter | ||
Comment 32•17 years ago
|
||
Comment on attachment 282480 [details] [diff] [review] linux uses weak-symbol data override and always traces malloc etc. Didn't actually try it, but this isn't going to work. It'll overflow the stack, since the hook API requires that a malloc hook cannot call malloc (which is the same as __libc_malloc) without unsetting itself as malloc-hook. The problem has nothing to do with StartupHooker and ShutdownHooker; it's the race of the hook assignments in my_malloc_hook, etc., themselves.
Attachment #282480 -
Flags: review?(dbaron) → review-
Reporter | ||
Comment 33•17 years ago
|
||
We need to go back to overriding malloc rather than using the hook APIs. This gets back the correct numbers, and the overhead is pretty small (although not zero) when disabled, since it should hit the early tracing_enabled check (as it did before).
Attachment #282582 -
Flags: review?(brendan)
Assignee | ||
Comment 34•17 years ago
|
||
Comment on attachment 282582 [details] [diff] [review] fix hook/unhook race by going back to overriding malloc, etc. Too bad we have to call PR_Initialized() -- I remember asking about it a while ago but I forget the details about why it is necessary. /be
Attachment #282582 -
Flags: review?(brendan)
Attachment #282582 -
Flags: review+
Attachment #282582 -
Flags: approval1.9+
Reporter | ||
Comment 35•17 years ago
|
||
One tiny tweak relative to the previous patch: tracing_enabled should default to 0 so that we don't do all the logging work before NS_TraceMallocStartup. That used to be a good thing (we'd catch more mallocs), but if we're going to ship this by default we don't want to be doing that unless requested. (This makes the behavior more similar to other platforms as well.)
Attachment #282744 -
Flags: review?(brendan)
Reporter | ||
Comment 36•17 years ago
|
||
...er, and also fix the corresponding PR_ASSERT.
Attachment #282582 -
Attachment is obsolete: true
Attachment #282744 -
Attachment is obsolete: true
Attachment #282750 -
Flags: review?(brendan)
Attachment #282744 -
Flags: review?(brendan)
Reporter | ||
Updated•17 years ago
|
Attachment #282750 -
Attachment is patch: true
Attachment #282750 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 37•17 years ago
|
||
Comment on attachment 282750 [details] [diff] [review] fix hook/unhook race by going back to overriding malloc, etc. Would like to assert that NSPR is initialized; maybe a later rev (or perhaps because I'm not using Linux these days, I don't care enough to follow up!). /be
Attachment #282750 -
Flags: review?(brendan)
Attachment #282750 -
Flags: review+
Attachment #282750 -
Flags: approval1.9+
You need to log in
before you can comment on or make changes to this bug.
Description
•