Closed Bug 911 Opened 26 years ago Closed 22 years ago

Fix(?) for crash when quickly launching multiple windows

Categories

(MozillaClassic Graveyard :: NetLib, defect, P1)

1998-09-04
x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: bryce, Assigned: nisheeth_mozilla)

Details

Not sure if I have the right component for this one... Pardon if not so. It might belong to the RDF or the Cookies folks. I found a bug that occurs when one opens a number of windows, which may be related to a low memory condition, and somehow to cookies. I've located the bug exactly and have a patch that seems to stop the bug. However, whatever condition happens when this bug crops up is now causing a bug in a different area of the program (I think) so this isn't a total fix. I'm running WinNT4.0 SR3, on a 166MHz Dell Pentium with 64MB RAM and 200MB swap. In addition to WinNT processes and mozilla, I am running cdplayer.exe, MSACCESS, TASKMGR, MSDEV, CRT, and a notepad-like editor. The crash occurs when 4-8 windows are opened in rapid succession (within 30 sec or so). The faster one opens windows, the sooner the error will occur. I was looking at nine web pages on CGI scripts, looking for the slashdot source code. I had run a search on "slashdot.cgi" and opened up a bunch of windows to view the results of the search. Seven of the pages loaded, the final two hadn't loaded up fully at the time of crash. Error was reported "Access violation". Here is the stack trace: ProcessCookiesAndTrustLabels(_ActiveEntry * 0x025ea8a0) line 3739 + 21 bytes net_ProcessFile(_ActiveEntry * 0x025ea8a0) line 1300 + 9 bytes NET_ProcessNet(PRFileDesc * 0x00000000, int 1) line 3334 + 13 bytes net_process_slow_net_timer_callback(void * 0x00000000) line 216 + 9 bytes wfe_ProcessTimeouts(unsigned long 13087538) line 303 + 12 bytes FireTimeout(HWND__ * 0x001201b2, unsigned int 275, unsigned int 777, unsigned long 13087538) line 60 + 9 bytes USER32! 77e7128c() CNetscapeApp::Run() line 1675 + 8 bytes AfxWinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00142595, int 1) line 52 + 11 bytes WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00142595, int 1) line 33 + 21 bytes WinMainCRTStartup() line 330 + 57 bytes KERNEL32! 77f1b304() Here's the code snippit where MSVC says the crash occurred: void ProcessCookiesAndTrustLabels( ActiveEntry *ce ) { #define TEN_MINUTES (time_t)(10*60) /* 10 minutes in seconds */ unsigned int i; TrustLabel *ALabel; XP_List *TempTrustList; if ( IsTrustLabelsEnabled() && ce && ce->URL_s) { /* * if the trust label parsing is enabled then look at each cookie * and try to match it to a trust label on the trust list to see * if one matches the cookie */ for(i=0 ;i < ce->URL_s->all_headers.empty_index; i++) { /* look for a cookie field - allow Set-cookie: or Set-Cookie2: - CASE INSENSITIVE COMPARE */ /* >> */ if(!PL_strncasecmp(ce->URL_s->all_headers.key[i],"Set-Cookie", 10)) { NET_SetCookieStringFromHttp(CE_FORMAT_OUT, ce->URL_s,CE_WINDOW_ID, ce->URL_s->address, ce->URL_s->all_headers.value[i]); } } /* Snip */ It died on the if statement line. Here's what the variables were: i = 0 ce->URL_s->all_headers.empty_index = 39256272 ce->URL_s->all_headers.key = 0x001e71e3 .key should be an array, or at least, that's how it's being used in the above code snippit. But MSVC couldn't evaluate the pointer. MSVC crashed. Reloaded MSVC, then launched Mozilla in debug mode, did a search for slashdot.cgi again and started launching off the windows. Got five open before it crashed. Error was: "Unhandled exception in mozilla.exe: 0xC0000005: Access Violation." Crashed on the same line of the same function. This time variables are: i = 0 ce->URL_s->all_headers.empty_index = 3722304989 ce->URL_s->all_headers.key = 0xdddddddd Here's a few lines of relevant assembly code: 007dcd24 mov eax,dword ptr [eax+ecx*4] 007dcd27 push eax 007dcd28 call _PL_strncasecmp (00831644) 007dcd2d add esp,0000000c 007dcd30 test eax,eax 007dcd32 jne ProcessCookiesAndTrustLabels+000000af (007dcd6f) 3740: NET_SetCookieStringFromHttp(CE_FORMAT_OUT, ce->URL_s, CE_WINDOW_ID, ce->URL_s->address, ce->URL_s->all_headers.value[i]); The ce structure pretty much looks like its blank. There's a ton of fields, like "window_chrome", "referer", "username", "password", etc. etc. but nearly all of the fields are set to either 0xdddddddd "", -572662307, 3722304989, or 221. The few that are set to particular values: ce->status = 1 ce->bytes_received = 16534 ce->socket = 0x00000000 ce->con_sock = 0x00000000 ce->local_file = 1 ce->memory_file = 0 ce->protocol = 12 ce->proto_impl = 0x0097d470 ce->con_data = 0x00d45f30 ce->exit_routine = 0x00792cc0 il_netgeturldone(URL_Struct_ *, int, MWContext_ *) ce->window_id = 0x00c5a120 ce->format_out = 2 ce->save_stream = 0x00000000 ce->busy = 1 ce->proxy_conf = 0x00000000 ce->proxy_addr = 0x00000000 ce->socks_host = 0 ce->socks_port = 0 Some Debug output: Created rdf:ht4 www.hax0r.org error=0 h_name=1 task=12 www.kalifornia.com error=0 h_name=1 task=13 sunsite.unc.edu error=0 h_name=1 task=14 hax0r.org error=0 h_name=1 task=15 First-chance exception in mozilla.exe: 0xC0000005: Access Violation. My first impression of what's going on is: - Mozilla, while idle, polls the sockets. - Since net_calling_all_the_time_count != 0 (it's set to 5, in this case) - NET_ProcessNet is called, which allows multiple connections to be processed simultaneously. - ready_fd = NULL, so an attempt to find a socket is made - a bunch of code is run to set up the socket (I think...) - sockets ready for reading are processed one by one - tmpEntry->busy is false, so processing proceeds - ready_fd = 0, and since both tmpEntry->socket and tmpEntry->con_sock are NULL, the else if statement is executed. - The line rv = (*tmpEntry->proto_impl->process)(tmpEntry); evaluates to rv = net_ProcessFile(tmpEntry); and so net_ProcessFile is called. - net_ProcessFile is called for the file "M1AIR9QS.GIF", which is a picture of the USSR flag. - con_data->next_state = 15, which is NET_FILE_DONE - con_data->stream is non-zero, so the macro COMPLETE_STREAM is run. - con_data->next_state is set to NET_FILE_FREE - ProcessCookiesAndTrustLabels is called: - Checks are made: trust labels is enabled, ce is non-null, and ce->URL_s is non-null. All pass. - A loop is made through all the headers (I think?) - Loop runs from 0 to ce->URL_s->all_headers.empty_index, which equals 3722304989. Hmm. Here's the problem. Okay, now to figure out a solution. Obviously this cookie code shouldn't be called in some cases. I don't think it's the pages themselves that's causing the crash, but rather the strain of loading several heavy duty pages all at once. Here are the exact pages I loaded: http://harbor.ecn.purdue.edu/~jacoby/Slashdot_Mailer/ http://www.krazi.org/ http://www.stars.com/vlib/providers/cgi.html http://www.hax0r.org/ http://www.icemall.com/free/free_perl_scripts.html I'm going to load each one in turn and see if one page in particular is causing the problem... Nope. I loaded each page up in a single browser, with no crash. Then I went nuts loading links into new browsers. I loaded up half a dozen links out of the list on http://www.icemall.com/free/free_perl_scripts.html and got the same error that I've been having. Tried again, this time only got four windows opened. I was a tad slower in launching the windows this time. Here's a possible fix. Change the code to look like this: void ProcessCookiesAndTrustLabels( ActiveEntry *ce ) { #define TEN_MINUTES (time_t)(10*60) /* 10 minutes in seconds */ unsigned int i; TrustLabel *ALabel; XP_List *TempTrustList; if ( IsTrustLabelsEnabled() && ce && ce->URL_s && ce->URL_s->all_headers.empty_index != 0xdddddddd ) { The code under the if statement shouldn't really be executed when empty_index is set to such a large number. The program still crashes, but in a different location, and it seems to allow more web pages to load. I think this new way of crashing is not related to the fix I just made, but I'm not certain. I'll submit it as a separate bug report once I have more info on it.
Assignee: gagan → morse
That wouldn't work for an optimized build since 0xddddddd is inserted only by the debugging code to find uninitialized or freed memory or something like that.
Assignee: morse → nisheeth
Looks like this might be related to bugs 324513 and 324098. Both of those bugs are caused by the url struct being freed too early and it looks like that is what is happening here as well. Assigning this one to Nisheeth since he already has the other two.
I've been working on mkgeturl-related bugs for a while. Whomever accepts this bug, please contact me, as I have further information and details on it.
Status: NEW → ASSIGNED
Accepting bug. Bryce, thanks a lot for the detailed analysis you did for this bug. Please continue to discuss this on this bug report. I am going to try to fix this bug this week. Gagan Saksena (netlib), Steve Morse (privacy), Pam Nunn (imagelib) and me sat around and decided that the root cause of this problem was that the url struct was going away too early. We are going to add a ref count to the url struct and use it selectively for that cases that we know are causing crashes. We welcome any comments you might have.
Because I've had Mozilla crash in several different modes, all seemingly caused by bad URL_s structures, I believe there are at least two, and perhaps three different places where the URL_s structure is being deallocated early. One of these places is the code called when a user closes a browser window when many windows are open. My hypothesis is that when the user has a lot of connections open and is keeping the browser very busy, and then the user closes a window, the URL_s belonging to that window is deallocated, but the notice of this lost window does not reach all of the data structures using the pointer to that URL_s. Finding where this occurs, and under what conditions, was too time consuming for me. Here is the process I would use if I had the time: Locate every area in the code where the URL_s or any component of it is deallocated, deleted, freed, NULLed, etc. and put a TRACE message there. Put TRACE messages into the places in the code that remove the URL_s pointers from various data structures, as well. Then recreate the crash by rapidly loading a number of windows to different web sites into separate windows; once a dozen or so windows are opened, begin closing windows one by one until the crash occurs. Hopefully, the debug messages will help narrow down which part of the code introduced the bad URL_s. The reference counter approach would be a more comprehensive solution though. Please let me know how this goes.
I just had the same crash immediately on clicking "Related" folder, however, there's some interesting notes... I include the full stack backtrace : ProcessCookiesAndTrustLabels(_ActiveEntry * 0x00b7c320) line 3836 + 21 bytes net_ProcessFile(_ActiveEntry * 0x00b7c320) line 1366 + 9 bytes NET_ProcessNet(PRFileDesc * 0x00000000, int 0x00000001) line 3365 + 13 bytes net_process_slow_net_timer_callback(void * 0x00000000) line 240 + 9 bytes wfe_ProcessTimeouts(unsigned long 0x036cddc8) line 303 + 12 bytes FireTimeout(HWND__ * 0x0ec10368, unsigned int 0x00000113, unsigned int 0x00000309, unsigned long 0x036cddc8) line 60 + 9 bytes USER32! 77e71373() USER32! 77e9161f() USER32! 77e923dc() USER32! 77e9290a() USER32! 77e91bd7() USER32! 77e92679() USER32! 77e914ec() __crtMessageBoxA(char * 0x0012b14c, char * 0x1024d83c, unsigned int 0x00012012) line 65 CrtMessageWindow(int 0x00000002, char * 0x008fa454, char * 0x0012c280, char * 0x00000000, char * 0x0012e2a4) line 520 + 22 bytes _CrtDbgReport(int 0x00000002, char * 0x008fa454, int 0x00000052, char * 0x00000000, char * 0x00000000) line 419 + 76 bytes AfxAssertFailedLine(char * 0x008fa454, int 0x00000052) line 39 + 20 bytes XP_AssertAtLine(char * 0x008fa454, int 0x00000052) line 2692 + 13 bytes makeNewAssertion(RDF_TranslatorStruct * 0x00ad9600, RDF_ResourceStruct * 0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short 0x0003, int 0x00000001) line 82 + 72 bytes remoteStoreAdd(RDF_TranslatorStruct * 0x00ad9600, RDF_ResourceStruct * 0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short 0x0003, int 0x00000001) line 208 + 30 bytes remoteAssert3(RDF_FileStruct * 0x00b34e90, RDF_TranslatorStruct * 0x00ad9600, RDF_ResourceStruct * 0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short 0x0003, int 0x00000001) line 112 + 30 bytes addSlotValue(RDF_FileStruct * 0x00b34e90, RDF_ResourceStruct * 0x00b7df40, RDF_ResourceStruct * 0x009de850, void * 0x00b7d160, unsigned short 0x0003, char * 0x008fac5c) line 604 + 44 bytes addElementProps(char * * 0x0012f448, char * 0x036c2399, RDF_FileStruct * 0x00b34e90, RDF_ResourceStruct * 0x00b7df40) line 230 + 49 bytes parseNextRDFToken(RDF_FileStruct * 0x00b34e90, char * 0x036c2398) line 379 + 27 bytes parseNextRDFXMLBlobInt(RDF_FileStruct * 0x00b34e90, char * 0x03695ee8, long 0x000004b0) line 128 + 22 bytes parseNextRDFXMLBlob(_NET_StreamClass * 0x00b7c5d0, char * 0x03695ee8, long 0x000004b0) line 146 + 17 bytes net_CacheWrite(_NET_StreamClass * 0x00b7d2f0, char * 0x03695ee8, long 0x000004b0) line 1459 + 24 bytes net_pull_http_data(_ActiveEntry * 0x00b4c7b0) line 3096 + 30 bytes net_ProcessHTTP(_ActiveEntry * 0x00b4c7b0) line 3488 + 9 bytes NET_ProcessNet(PRFileDesc * 0x00b4ca80, int 0x00000002) line 3365 + 13 bytes NET_PollSockets() line 203 + 18 bytes CNetscapeApp::OnIdle(long 0x0000007b) line 1831 + 5 bytes CNetscapeApp::Run() line 1663 + 30 bytes AfxWinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00141e37, int 0x0000000a) line 52 + 11 bytes WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00000000, char * 0x00141e37, int 0x0000000a) line 34 WinMainCRTStartup() line 330 + 54 bytes See the AfxAssert there..? Guess what, it seems to have forked on "Netscape plug-ins ÿ Downloads" or something the like, which I reported on another bug. This assert _also_ happens around the same place where I reported r->url ending up undefined (r freed) in the same bug. I hope the implications are pretty clear, but as usual, I need to get to sleep... ;)
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → LATER
Latering this bug for now because the old Mozilla codebase is dead. The Aurora pane and the cookies code will have to be re-implemented around NGLayout. We'll check that this bug doesn't exist when these features have been re-implemented.
Status: RESOLVED → VERIFIED
verified later
LATER is deprecated per bug 35839.
Status: VERIFIED → REOPENED
Resolution: LATER → ---
.
Status: REOPENED → RESOLVED
Closed: 26 years ago22 years ago
Resolution: --- → WONTFIX
VERIFIED, MozillaClassic is dead.
Status: RESOLVED → VERIFIED
for reference, comment 6 is a duplicate of bug 54792
You need to log in before you can comment on or make changes to this bug.