Closed Bug 99442 Opened 23 years ago Closed 23 years ago

Huge "leak" occurs while loading a 'random' page during page-loader test

Categories

(SeaMonkey :: General, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: jrgmorrison, Assigned: brendan)

References

()

Details

(4 keywords)

Attachments

(3 files, 2 obsolete files)

A _huge_ leak (or equivalent) is happening while running the page-loader test. It appears to have entered the trunk between 8am 9/11/2001 and 8am 9/12/2001. Define huge. Okay. Process size peaked at 313 Megabytes on Linux. I've seen this happen on both Linux and win98. It's a very strange "leak" though. The test will be running along fine for some number of page loads, and then on one page, the browser will freeze for about 160 seconds while the process size shoots up from normal size to well over 200MB. It all happens during the load of one page. However, it is not the same page each time (observed with four different pages), and it is not in the same point in the sequence each time (sometimes it is on the first (uncached) cycle, and sometimes it is on the 2nd or 3rd cycle). [It's even possible that it may not happen every time I run the test through 5 cycles; e.g., the Mac didn't show this leak, but maybe it was lucky and didn't hit the "magic" set of ~random conditions]. I wish I could say more, or had a more precise way of reproducing this, but I don't. If anyone has any suggestions as to how to narrow this down, I'm all ears. I'm cc:ing everyone on the hook for that period. (Sorry. Some are likely innocuous checkins. Remove yourself from the cc: if you know that it's not you).
I saw something VERY like this in the 9/11 daily build (win32) under win2k with avsforum.com when hitting "Back". Repeated multiple times, but had to read threads there for 5-10 minutes. Always was on Back, but I only did it 2-3 times, so might have been random. I'll attach a screendump of the task manager graph of mem usage. The drop-off is when I killed mozilla. It used 100% of available VM and hung.
The build I had the problem with was 2001091203, not 9/11
Checked the bonsai diffs below for every file in my checkin line by line; I can't see anything that could be involved. I'll stay on the CC list, but I think I'm clear.
I sent a mail to cpd-all and attached a stack and staps to rep/debug with msdev I paste them here again: How to run the debug session with msdev: - open a command-shell: - set the mozilla environment variables - run: >set XPCOM_DEBUG_BREAK=warn - run: >subst S: DRIVE_LETTER:\... (the directory where your trunk resides) - - run: >msdev - create a new makefile project (you can go with defaults) - open Project/Settings - click the Debug tab - set the following values: Executable for debug session = S:\mozilla\dist\WIN32_D.OBJ\bin\viewer.exe Working dir = S:\mozilla\dist\WIN32_D.OBJ\bin\ Program arguments = -o s:\mozilla\layout\html\tests\table\bugs\ -f s:\mozilla\layout\html\tests\table\bugs\file_list1.txt [$] - when you did this, make sure that ALL the *.RGD files in the directory s:\mozilla\layout\html\tests\table\bugs\ are DELETED (seems to be very important) - press F5 (run debug session in msdev) - if execution is interrupted by assertion warnings OK/ignore them and continue (F5) the debug session - if you do not get the mem leaks, repeat from [$] (i recommend that ou keep the task manager around at all the time) CALL STACK: memset() line 108 0012ff2c() _nh_malloc_dbg(unsigned int 402653184, int 0, int 1, const char * 0x00000000, int 0) line 248 + 21 bytes malloc(unsigned int 402653184) line 130 + 21 bytes JS_DHashAllocTable(JSDHashTable * 0x012758b8, unsigned long 402653184) line 58 + 10 bytes ChangeTable(JSDHashTable * 0x012758b8, int 1) line 379 + 15 bytes JS_DHashTableOperate(JSDHashTable * 0x012758b8, const void * 0x0012e024, int 1) line 456 + 13 bytes _js_LookupProperty(JSContext * 0x0124a650, JSObject * 0x0117a200, long 37258152, JSObject * * 0x0012e0ec, JSProperty * * 0x0012e0e0, const char * 0x0052c7a4, unsigned int 2339) line 2114 + 15 bytes js_GetProperty(JSContext * 0x0124a650, JSObject * 0x0117a200, long 37258152, long * 0x0012e7b0) line 2339 + 35 bytes js_Interpret(JSContext * 0x0124a650, long * 0x0012e93c) line 2553 + 1751 bytes js_Invoke(JSContext * 0x0124a650, unsigned int 1, unsigned int 2) line 824 + 13 bytes js_InternalInvoke(JSContext * 0x0124a650, JSObject * 0x0117a200, long 18328048, unsigned int 0, unsigned int 1, long * 0x0012eb24, long * 0x0012ea6c) line 899 + 20 bytes JS_CallFunctionValue(JSContext * 0x0124a650, JSObject * 0x0117a200, long 18328048, unsigned int 1, long * 0x0012eb24, long * 0x0012ea6c) line 3360 + 31 bytes nsJSContext::CallEventHandler(nsJSContext * const 0x01248488, void * 0x0117a200, void * 0x0117a9f0, unsigned int 1, void * 0x0012eb24, int * 0x0012eb20, int 0) line 957 + 33 bytes nsJSEventListener::HandleEvent(nsJSEventListener * const 0x023b8048, nsIDOMEvent * 0x0245784c) line 139 + 74 bytes nsXBLPrototypeHandler::ExecuteHandler(nsXBLPrototypeHandler * const 0x02366a40, nsIDOMEventReceiver * 0x023e7378, nsIDOMEvent * 0x0245784c) line 433 nsXBLPrototypeHandler::BindingAttached(nsXBLPrototypeHandler * const 0x02366a40, nsIDOMEventReceiver * 0x023e7378) line 481 nsXBLPrototypeBinding::BindingAttached(nsXBLPrototypeBinding * const 0x02369ac8, nsIDOMEventReceiver * 0x023e7378) line 564 + 30 bytes nsXBLBinding::ExecuteAttachedHandler(nsXBLBinding * const 0x02392fc0) line 1134 nsBindingManager::ProcessAttachedQueue(nsBindingManager * const 0x023c3288) line 1283 nsCSSFrameConstructor::ContentInserted(nsCSSFrameConstructor * const 0x023d6ed0, nsIPresContext * 0x02342828, nsIContent * 0x00000000, nsIContent * 0x024383b8, int 0, nsILayoutHistoryState * 0x00000000) line 8328 StyleSetImpl::ContentInserted(StyleSetImpl * const 0x02348060, nsIPresContext * 0x02342828, nsIContent * 0x00000000, nsIContent * 0x024383b8, int 0) line 1187 PresShell::InitialReflow(PresShell * const 0x023b5e98, int 9180, int 4260) line 2625 HTMLContentSink::StartLayout() line 3861 HTMLContentSink::OpenBody(HTMLContentSink * const 0x023f6028, const nsIParserNode & {...}) line 3147 CNavDTD::OpenBody(const nsCParserNode * 0x02338818) line 3102 + 31 bytes CNavDTD::OpenContainer(const nsCParserNode * 0x02338818, nsHTMLTag eHTMLTag_body, int 1, nsEntryStack * 0x00000000) line 3359 + 12 bytes CNavDTD::HandleDefaultStartToken(CToken * 0x023c1cf8, nsHTMLTag eHTMLTag_body, nsCParserNode * 0x02338818) line 1288 + 20 bytes CNavDTD::HandleStartToken(CToken * 0x023c1cf8) line 1698 + 22 bytes CNavDTD::HandleToken(CNavDTD * const 0x023c09d8, CToken * 0x023c1cf8, nsIParser * 0x023e2670) line 867 + 12 bytes CNavDTD::BuildModel(CNavDTD * const 0x023c09d8, nsIParser * 0x023e2670, nsITokenizer * 0x02336ce8, nsITokenObserver * 0x00000000, nsIContentSink * 0x023f6028) line 503 + 20 bytes nsParser::BuildModel() line 1989 + 34 bytes nsParser::ResumeParse(int 1, int 0) line 1855 + 11 bytes nsParser::OnDataAvailable(nsParser * const 0x023e2678, nsIRequest * 0x02423110, nsISupports * 0x00000000, nsIInputStream * 0x023d6028, unsigned int 0, unsigned int 379) line 2464 + 19 bytes nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x024150c0, nsIRequest * 0x02423110, nsISupports * 0x00000000, nsIInputStream * 0x023d6028, unsigned int 0, unsigned int 379) line 244 + 46 bytes nsFileChannel::OnDataAvailable(nsFileChannel * const 0x02423118, nsIRequest * 0x0242cf3c, nsISupports * 0x00000000, nsIInputStream * 0x023d6028, unsigned int 0, unsigned int 379) line 492 + 49 bytes nsOnDataAvailableEvent::HandleEvent() line 178 + 70 bytes nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x02283d94) line 65 PL_HandleEvent(PLEvent * 0x02283d94) line 590 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x011ac930) line 520 + 9 bytes _md_EventReceiverProc(HWND__ * 0x00180592, unsigned int 49373, unsigned int 0, long 18532656) line 1071 + 9 bytes USER32! 77e148dc() USER32! 77e14aa7() USER32! 77e266fd() main(int 5, char * * 0x00628c60) line 157 + 11 bytes mainCRTStartup() line 338 + 17 bytes
Looks like Brendan's hashtable changes (which have no bug # or r/sr in the checkin comment... though I think they're in the bug).
Assignee: jrgm → brendan
I editing detailed comments based on cvs diff -wu output, passed the file containing those comments to cvs ci -F, but forgot to cite the bug# and reviewers as I usually do. Sorry about that. I'm removing the nsbranch keyword, as I did not check into the branch. Looking at this now. /be
Status: NEW → ASSIGNED
Keywords: nsbranchmozilla0.9.5
Priority: -- → P1
Target Milestone: --- → mozilla0.9.5
Cc'ing my reviewers from bug 81847, for re-review of forthcoming patches. I missed something obvious (in hindsight) when I improved js_LookupProperty performance in that bug's patch. That patch improved lookup performance by conserving the cx->resolving hash table for reuse by all future lookups till the context (cx) is destroyed, instead of destroying it each time the outermost (non-recursive) lookup activation on cx unwinds (and then re-creating it on next lookup where a DOM property id resolver, or similar -- XPConnect uses resolve too, IIRC -- was invoked; rjesup and others have profiled too many cycles under js_LookupProperty, due to this hash table destroy/recreate thrashing). But the existing code counted on the hash table being destoryed soon, because it used JS_DHashTableRawRemove to optimize each unwind from a recursive or the outer lookup. JS_DHashTableRawRemove does not compress the table, it either frees the entry forthwith if no other entry collided with it, or else replaces the entry with a removed sentinel and lets the removed entry count climb toward infinity. A subsequent JS_DHASH_ADD operation should compress the table if it finds too many reomved-entry sentinels, but this clearly did not happen, as the size passed to malloc in alexandru's backtrace is 0x18000000, which indicates a table size of 0x8000000 or 128M entry slots. I need to reproduce this bloat bug (I don't think it's a leak; memory for the old table is freed, but an ever-larger piece from the malloc heap is requested, no doubt requiring the OS to grow the process's data segment) to find out why the compression code isn't helping. In the mean time, here's a patch that should make things better. Please let me know what effect it has. /be
Sounds like Alex has a good test case.... Alex can you run with the patch? Does this need to go on the 0.9.4 branch? Alexandru Savulov wrote: Since yesterday I can observe a serious memory leak consuming the whole amount of physical and virtual memory while running regression testing for tables. This occurs randomly and is bringing my machine to break down (very low responsiviness). (I use one of the latest trunk pulls.) I ran the first of the table/bugs regression tests in the debugger (msdev) and break the execution after the mem-leak occured. The callstack is available at the end of this mail. This is a very serious issue and has to be checked ASAP. I suggest that we close the tree until the leak cause has been found. Alex Savulov
Adding nsbranch keyword and nominating for nsbranch+. Brendan, pls + this one, so it stays on the PDT radar.
Keywords: nsbranch
Whiteboard: PDT
Please re-read my comments dated 2001-09-13 12:55. I did not check the patch for bug 81847 into the branch, nor will I without a fix for this bug. Why then do we need the nsbranch keyword? /be
Removing nsbranch/PDT. As noted, this is not on the branch. [I apologize for too casually adding the keyword before, although I was using better-safe-then-sorry policy at that time since I didn't know the cause at that time].
Keywords: nsbranch
Whiteboard: PDT
I applied the fix [attachment 49246 [details] [diff] [review]] and still see the leak. Here's how the source of my build looks like: ... cleanup: if (table->generation == generation && table->removedCount < JS_BIT(table->sizeLog2) >> 2) { JS_DHashTableRawRemove(table, entry); } else { JS_DHashTableOperate(table, &key, JS_DHASH_REMOVE); } if (!ok || *propp) return ok; ... I use the trunk, and I use the debug test case I described above (running the first table/bugs regression test). I would recommend that everybody involved in this tries to reproduce. The stack that I get might be missleading. Is just the result of pausing the debuging session after the memory consumption goes endless. Now be cause of that memset() call, it makes me think that the stack is a good one, but I'm not 100% sure. So go ahead and run that test case. Get a good machine to do that. (mine has 512MB RAM and it has a hard time doing this) good hunt!
Keywords: mozilla0.9.5nsbranch
sorry I think we had a collision ... backing up
Keywords: nsbranchmozilla0.9.5
Making this one a blocker.
Severity: critical → blocker
Blocks: 81847
Way to go, Fokker. Are you a pothead, Fokker?
What the status on this bug?. It's holding the tree closed.
brendan's debugging it and he's going to give me a patch to try in a couple of minutes
Attached patch proposed fix -- I am Gaylord Fokker today (obsolete) (deleted) — Splinter Review
Comment on attachment 49298 [details] [diff] [review] proposed fix -- I am Gaylord Fokker today Nice move, Fokker!
Attachment #49298 - Attachment is obsolete: true
Attached patch I really hope this is it (obsolete) (deleted) — Splinter Review
You're fixing the problems I'm going to point out before I point them out! Looking at the latest patch.
Comment on attachment 49299 [details] [diff] [review] I really hope this is it [s]r=waterson
Attachment #49299 - Flags: review+
I know this part is not new... InitFunctionAndObjectClass can't end up in a any resolve hook? You're doing a raw remove and that would be bad if they could. Otherwise I think you got it this time. r/sr=jband
Comment on attachment 49299 [details] [diff] [review] I really hope this is it sr=jband
Attachment #49299 - Flags: superreview+
Attachment #49299 - Attachment is obsolete: true
jband: you caught the problem addressed by the final patch. I'll check in after jrgm verifies nine different ways. dbaron has had a look, too. /be
Comment on attachment 49305 [details] [diff] [review] sigh -- raw-remove is a sharp tool, and I've bloodied myself thoroughly on it I'm laughing now. You're not going to bother to do the generation/removedCount stuff for the InitFunctionAndObjectClasses case? r/sr=jband
Attachment #49305 - Flags: review+
My apologies to the hook, and to everyone who waited on the tree today -- I should have reduced this bug from blocker severity once it was isolated (mainly by alex's work). Other than people running into it, there wouldn't have been any need to keep the tree closed -- the problem was isolated. Anyway, sorry it took so long to nail. /be
Fix checked in (jband, I see no point in over-optimizing that part of InitFunctionAndObjectClasses, plus, I don't trust myself today :-/). /be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified
Status: RESOLVED → VERIFIED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: