Closed
Bug 99442
Opened 23 years ago
Closed 23 years ago
Huge "leak" occurs while loading a 'random' page during page-loader test
Categories
(SeaMonkey :: General, defect, P1)
SeaMonkey
General
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla0.9.5
People
(Reporter: jrgmorrison, Assigned: brendan)
References
()
Details
(4 keywords)
Attachments
(3 files, 2 obsolete files)
(deleted),
image/jpeg
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jband_mozilla
:
review+
|
Details | Diff | Splinter Review |
A _huge_ leak (or equivalent) is happening while running the
page-loader test. It appears to have entered the trunk between 8am
9/11/2001 and 8am 9/12/2001.
Define huge. Okay. Process size peaked at 313 Megabytes on Linux.
I've seen this happen on both Linux and win98.
It's a very strange "leak" though. The test will be running along
fine for some number of page loads, and then on one page, the browser
will freeze for about 160 seconds while the process size shoots up
from normal size to well over 200MB. It all happens during the load
of one page.
However, it is not the same page each time (observed with four
different pages), and it is not in the same point in the sequence each
time (sometimes it is on the first (uncached) cycle, and sometimes it
is on the 2nd or 3rd cycle). [It's even possible that it may not
happen every time I run the test through 5 cycles; e.g., the Mac
didn't show this leak, but maybe it was lucky and didn't hit the
"magic" set of ~random conditions].
I wish I could say more, or had a more precise way of reproducing
this, but I don't. If anyone has any suggestions as to how to narrow
this down, I'm all ears.
I'm cc:ing everyone on the hook for that period. (Sorry. Some are
likely innocuous checkins. Remove yourself from the cc: if you know
that it's not you).
Comment 1•23 years ago
|
||
Comment 2•23 years ago
|
||
I saw something VERY like this in the 9/11 daily build (win32) under win2k with
avsforum.com when hitting "Back". Repeated multiple times, but had to read
threads there for 5-10 minutes. Always was on Back, but I only did it 2-3
times, so might have been random. I'll attach a screendump of the task manager
graph of mem usage. The drop-off is when I killed mozilla. It used 100% of
available VM and hung.
Comment 3•23 years ago
|
||
Comment 4•23 years ago
|
||
The build I had the problem with was 2001091203, not 9/11
Comment 5•23 years ago
|
||
Checked the bonsai diffs below for every file in my checkin line by line; I
can't see anything that could be involved. I'll stay on the CC list, but I
think I'm clear.
Comment 6•23 years ago
|
||
I sent a mail to cpd-all and attached a stack and staps to rep/debug with msdev
I paste them here again:
How to run the debug session with msdev:
- open a command-shell:
- set the mozilla environment variables
- run:
>set XPCOM_DEBUG_BREAK=warn
- run:
>subst S: DRIVE_LETTER:\...
(the directory where your trunk resides)
- - run:
>msdev
- create a new makefile project (you can go with defaults)
- open Project/Settings
- click the Debug tab
- set the following values:
Executable for debug session =
S:\mozilla\dist\WIN32_D.OBJ\bin\viewer.exe
Working dir =
S:\mozilla\dist\WIN32_D.OBJ\bin\
Program arguments =
-o s:\mozilla\layout\html\tests\table\bugs\ -f
s:\mozilla\layout\html\tests\table\bugs\file_list1.txt
[$]
- when you did this, make sure that ALL the *.RGD files
in the directory
s:\mozilla\layout\html\tests\table\bugs\
are DELETED (seems to be very important)
- press F5 (run debug session in msdev)
- if execution is interrupted by assertion warnings
OK/ignore them and continue (F5) the debug session
- if you do not get the mem leaks, repeat from [$]
(i recommend that ou keep the task manager around
at all the time)
CALL STACK:
memset() line 108
0012ff2c()
_nh_malloc_dbg(unsigned int 402653184, int 0, int 1, const char * 0x00000000,
int 0) line 248 + 21 bytes
malloc(unsigned int 402653184) line 130 + 21 bytes
JS_DHashAllocTable(JSDHashTable * 0x012758b8, unsigned long 402653184) line 58 +
10 bytes
ChangeTable(JSDHashTable * 0x012758b8, int 1) line 379 + 15 bytes
JS_DHashTableOperate(JSDHashTable * 0x012758b8, const void * 0x0012e024, int 1)
line 456 + 13 bytes
_js_LookupProperty(JSContext * 0x0124a650, JSObject * 0x0117a200, long 37258152,
JSObject * * 0x0012e0ec, JSProperty * * 0x0012e0e0, const char * 0x0052c7a4,
unsigned int 2339) line 2114 + 15 bytes
js_GetProperty(JSContext * 0x0124a650, JSObject * 0x0117a200, long 37258152,
long * 0x0012e7b0) line 2339 + 35 bytes
js_Interpret(JSContext * 0x0124a650, long * 0x0012e93c) line 2553 + 1751 bytes
js_Invoke(JSContext * 0x0124a650, unsigned int 1, unsigned int 2) line 824 + 13
bytes
js_InternalInvoke(JSContext * 0x0124a650, JSObject * 0x0117a200, long 18328048,
unsigned int 0, unsigned int 1, long * 0x0012eb24, long * 0x0012ea6c) line 899 +
20 bytes
JS_CallFunctionValue(JSContext * 0x0124a650, JSObject * 0x0117a200, long
18328048, unsigned int 1, long * 0x0012eb24, long * 0x0012ea6c) line 3360 + 31 bytes
nsJSContext::CallEventHandler(nsJSContext * const 0x01248488, void * 0x0117a200,
void * 0x0117a9f0, unsigned int 1, void * 0x0012eb24, int * 0x0012eb20, int 0)
line 957 + 33 bytes
nsJSEventListener::HandleEvent(nsJSEventListener * const 0x023b8048, nsIDOMEvent
* 0x0245784c) line 139 + 74 bytes
nsXBLPrototypeHandler::ExecuteHandler(nsXBLPrototypeHandler * const 0x02366a40,
nsIDOMEventReceiver * 0x023e7378, nsIDOMEvent * 0x0245784c) line 433
nsXBLPrototypeHandler::BindingAttached(nsXBLPrototypeHandler * const 0x02366a40,
nsIDOMEventReceiver * 0x023e7378) line 481
nsXBLPrototypeBinding::BindingAttached(nsXBLPrototypeBinding * const 0x02369ac8,
nsIDOMEventReceiver * 0x023e7378) line 564 + 30 bytes
nsXBLBinding::ExecuteAttachedHandler(nsXBLBinding * const 0x02392fc0) line 1134
nsBindingManager::ProcessAttachedQueue(nsBindingManager * const 0x023c3288) line
1283
nsCSSFrameConstructor::ContentInserted(nsCSSFrameConstructor * const 0x023d6ed0,
nsIPresContext * 0x02342828, nsIContent * 0x00000000, nsIContent * 0x024383b8,
int 0, nsILayoutHistoryState * 0x00000000) line 8328
StyleSetImpl::ContentInserted(StyleSetImpl * const 0x02348060, nsIPresContext *
0x02342828, nsIContent * 0x00000000, nsIContent * 0x024383b8, int 0) line 1187
PresShell::InitialReflow(PresShell * const 0x023b5e98, int 9180, int 4260) line 2625
HTMLContentSink::StartLayout() line 3861
HTMLContentSink::OpenBody(HTMLContentSink * const 0x023f6028, const
nsIParserNode & {...}) line 3147
CNavDTD::OpenBody(const nsCParserNode * 0x02338818) line 3102 + 31 bytes
CNavDTD::OpenContainer(const nsCParserNode * 0x02338818, nsHTMLTag
eHTMLTag_body, int 1, nsEntryStack * 0x00000000) line 3359 + 12 bytes
CNavDTD::HandleDefaultStartToken(CToken * 0x023c1cf8, nsHTMLTag eHTMLTag_body,
nsCParserNode * 0x02338818) line 1288 + 20 bytes
CNavDTD::HandleStartToken(CToken * 0x023c1cf8) line 1698 + 22 bytes
CNavDTD::HandleToken(CNavDTD * const 0x023c09d8, CToken * 0x023c1cf8, nsIParser
* 0x023e2670) line 867 + 12 bytes
CNavDTD::BuildModel(CNavDTD * const 0x023c09d8, nsIParser * 0x023e2670,
nsITokenizer * 0x02336ce8, nsITokenObserver * 0x00000000, nsIContentSink *
0x023f6028) line 503 + 20 bytes
nsParser::BuildModel() line 1989 + 34 bytes
nsParser::ResumeParse(int 1, int 0) line 1855 + 11 bytes
nsParser::OnDataAvailable(nsParser * const 0x023e2678, nsIRequest * 0x02423110,
nsISupports * 0x00000000, nsIInputStream * 0x023d6028, unsigned int 0, unsigned
int 379) line 2464 + 19 bytes
nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x024150c0,
nsIRequest * 0x02423110, nsISupports * 0x00000000, nsIInputStream * 0x023d6028,
unsigned int 0, unsigned int 379) line 244 + 46 bytes
nsFileChannel::OnDataAvailable(nsFileChannel * const 0x02423118, nsIRequest *
0x0242cf3c, nsISupports * 0x00000000, nsIInputStream * 0x023d6028, unsigned int
0, unsigned int 379) line 492 + 49 bytes
nsOnDataAvailableEvent::HandleEvent() line 178 + 70 bytes
nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x02283d94) line 65
PL_HandleEvent(PLEvent * 0x02283d94) line 590 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x011ac930) line 520 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00180592, unsigned int 49373, unsigned int 0,
long 18532656) line 1071 + 9 bytes
USER32! 77e148dc()
USER32! 77e14aa7()
USER32! 77e266fd()
main(int 5, char * * 0x00628c60) line 157 + 11 bytes
mainCRTStartup() line 338 + 17 bytes
Comment 7•23 years ago
|
||
Looks like Brendan's hashtable changes (which have no bug # or r/sr in the
checkin comment... though I think they're in the bug).
Assignee: jrgm → brendan
Assignee | ||
Comment 8•23 years ago
|
||
I editing detailed comments based on cvs diff -wu output, passed the file
containing those comments to cvs ci -F, but forgot to cite the bug# and
reviewers as I usually do. Sorry about that.
I'm removing the nsbranch keyword, as I did not check into the branch.
Looking at this now.
/be
Status: NEW → ASSIGNED
Keywords: nsbranch → mozilla0.9.5
Priority: -- → P1
Target Milestone: --- → mozilla0.9.5
Assignee | ||
Comment 9•23 years ago
|
||
Cc'ing my reviewers from bug 81847, for re-review of forthcoming patches.
I missed something obvious (in hindsight) when I improved js_LookupProperty
performance in that bug's patch. That patch improved lookup performance by
conserving the cx->resolving hash table for reuse by all future lookups till the
context (cx) is destroyed, instead of destroying it each time the outermost
(non-recursive) lookup activation on cx unwinds (and then re-creating it on next
lookup where a DOM property id resolver, or similar -- XPConnect uses resolve
too, IIRC -- was invoked; rjesup and others have profiled too many cycles under
js_LookupProperty, due to this hash table destroy/recreate thrashing). But the
existing code counted on the hash table being destoryed soon, because it used
JS_DHashTableRawRemove to optimize each unwind from a recursive or the outer lookup.
JS_DHashTableRawRemove does not compress the table, it either frees the entry
forthwith if no other entry collided with it, or else replaces the entry with a
removed sentinel and lets the removed entry count climb toward infinity. A
subsequent JS_DHASH_ADD operation should compress the table if it finds too many
reomved-entry sentinels, but this clearly did not happen, as the size passed to
malloc in alexandru's backtrace is 0x18000000, which indicates a table size of
0x8000000 or 128M entry slots.
I need to reproduce this bloat bug (I don't think it's a leak; memory for the
old table is freed, but an ever-larger piece from the malloc heap is requested,
no doubt requiring the OS to grow the process's data segment) to find out why
the compression code isn't helping. In the mean time, here's a patch that
should make things better. Please let me know what effect it has.
/be
Assignee | ||
Comment 10•23 years ago
|
||
Comment 11•23 years ago
|
||
Sounds like Alex has a good test case....
Alex can you run with the patch?
Does this need to go on the 0.9.4 branch?
Alexandru Savulov wrote:
Since yesterday I can observe a serious memory leak
consuming the whole amount of physical and virtual
memory while running regression testing for tables.
This occurs randomly and is bringing my machine to
break down (very low responsiviness).
(I use one of the latest trunk pulls.)
I ran the first of the table/bugs regression tests
in the debugger (msdev) and break the execution after
the mem-leak occured. The callstack is available at
the end of this mail.
This is a very serious issue and has to be checked
ASAP. I suggest that we close the tree until the
leak cause has been found.
Alex Savulov
Comment 12•23 years ago
|
||
Adding nsbranch keyword and nominating for nsbranch+. Brendan, pls + this one,
so it stays on the PDT radar.
Keywords: nsbranch
Whiteboard: PDT
Assignee | ||
Comment 13•23 years ago
|
||
Please re-read my comments dated 2001-09-13 12:55. I did not check the patch
for bug 81847 into the branch, nor will I without a fix for this bug. Why then
do we need the nsbranch keyword?
/be
Reporter | ||
Comment 14•23 years ago
|
||
Removing nsbranch/PDT. As noted, this is not on the branch. [I apologize
for too casually adding the keyword before, although I was using
better-safe-then-sorry policy at that time since I didn't know the cause
at that time].
Keywords: nsbranch
Whiteboard: PDT
Comment 15•23 years ago
|
||
I applied the fix [attachment 49246 [details] [diff] [review]] and still see the leak.
Here's how the source of my build looks like:
...
cleanup:
if (table->generation == generation &&
table->removedCount < JS_BIT(table->sizeLog2) >> 2) {
JS_DHashTableRawRemove(table, entry);
} else {
JS_DHashTableOperate(table, &key, JS_DHASH_REMOVE);
}
if (!ok || *propp)
return ok;
...
I use the trunk, and I use the debug test case I described above (running the
first table/bugs regression test). I would recommend that everybody involved in
this tries to reproduce. The stack that I get might be missleading. Is just the
result of pausing the debuging session after the memory consumption goes
endless. Now be cause of that memset() call, it makes me think that the stack is
a good one, but I'm not 100% sure. So go ahead and run that test case. Get a
good machine to do that. (mine has 512MB RAM and it has a hard time doing this)
good hunt!
Keywords: mozilla0.9.5 → nsbranch
Comment 16•23 years ago
|
||
sorry I think we had a collision ... backing up
Keywords: nsbranch → mozilla0.9.5
Comment 18•23 years ago
|
||
Way to go, Fokker. Are you a pothead, Fokker?
Comment 19•23 years ago
|
||
What the status on this bug?. It's holding the tree closed.
Reporter | ||
Comment 20•23 years ago
|
||
brendan's debugging it and he's going to give me a patch to try in a couple of
minutes
Assignee | ||
Comment 21•23 years ago
|
||
Assignee | ||
Comment 22•23 years ago
|
||
Comment on attachment 49298 [details] [diff] [review]
proposed fix -- I am Gaylord Fokker today
Nice move, Fokker!
Attachment #49298 -
Attachment is obsolete: true
Assignee | ||
Comment 23•23 years ago
|
||
Comment 24•23 years ago
|
||
You're fixing the problems I'm going to point out before I point them out!
Looking at the latest patch.
Comment 25•23 years ago
|
||
Comment on attachment 49299 [details] [diff] [review]
I really hope this is it
[s]r=waterson
Attachment #49299 -
Flags: review+
Comment 26•23 years ago
|
||
I know this part is not new... InitFunctionAndObjectClass can't end up in a any
resolve hook? You're doing a raw remove and that would be bad if they could.
Otherwise I think you got it this time.
r/sr=jband
Comment 27•23 years ago
|
||
Comment on attachment 49299 [details] [diff] [review]
I really hope this is it
sr=jband
Attachment #49299 -
Flags: superreview+
Assignee | ||
Updated•23 years ago
|
Attachment #49299 -
Attachment is obsolete: true
Assignee | ||
Comment 28•23 years ago
|
||
Assignee | ||
Comment 29•23 years ago
|
||
jband: you caught the problem addressed by the final patch. I'll check in after
jrgm verifies nine different ways. dbaron has had a look, too.
/be
Comment 30•23 years ago
|
||
Comment on attachment 49305 [details] [diff] [review]
sigh -- raw-remove is a sharp tool, and I've bloodied myself thoroughly on it
I'm laughing now.
You're not going to bother to do the generation/removedCount stuff
for the InitFunctionAndObjectClasses case?
r/sr=jband
Attachment #49305 -
Flags: review+
Assignee | ||
Comment 31•23 years ago
|
||
My apologies to the hook, and to everyone who waited on the tree today -- I
should have reduced this bug from blocker severity once it was isolated (mainly
by alex's work). Other than people running into it, there wouldn't have been
any need to keep the tree closed -- the problem was isolated. Anyway, sorry it
took so long to nail.
/be
Assignee | ||
Comment 32•23 years ago
|
||
Fix checked in (jband, I see no point in over-optimizing that part of
InitFunctionAndObjectClasses, plus, I don't trust myself today :-/).
/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•