Closed Bug 99515 Opened 23 years ago Closed 23 years ago

Autoconfig use during migration leads to hang

Categories

(Core :: Preferences: Backend, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: lrg, Assigned: mitesh)

Details

(Whiteboard: pdt+)

Attachments

(3 files)

This is potentially related to http://bugzilla.mozilla.org/show_bug.cgi?id=99221 If Autoconfig is being used while migration is taking place, a hang occurs (though the migration process itself seems to be successful). This forces the user to manually kill the netscape process. This bug occurs 100 percent of the time in the case that autoconfig is being used. Once the application is killed, it can be restarted, and the preferences that were migrated will be utilized, as well as those set by autoconfig (although i have observed some problems, as are noted in the previous bug).
QA Contact: sairuh → lrg
The event loop in AutoConfig is hanging after the pref-migration is done. Looks like we have to push the current Event Queue into the current list of eventQs so that the events posted by necko will be executed in the AutoConfig event loop. Patch is coming
Status: NEW → ASSIGNED
Looks like with the pref-migrator, the necko events are not posted to the queue we are getting from the GetThreadEventQueue so replacing it with PushThreadEventQueue. It creates a new queue and makes it a current queue. other code is to pop the event queue back in the error coditions. Also, moved the timer code after the eventloop code so it doesn't have to deal with event queues.
Adding Dan for event queue code review.
This patch's success implies the old queue (the one coming from GetThreadEventQueue) had already been suspended. This can happen fairly easily, so I guess on general principles it's probably always a good idea to push a new queue before falling into a captive event loop like that. r=danm PS you might want to consider adding a little helper class to take care of popping the event queue; you wouldn't have to be so careful to pop the queue at each possible exit point. You can crib the class at http://lxr.mozilla.org/ seamonkey/source/embedding/components/windowwatcher/src/nsWindowWatcher.cpp#282. For its use (just two lines) look for "queueGuard" elsewhere in that same file.
Comment on attachment 49361 [details] [diff] [review] Replacing GetThreadEventQueue with PushThreadEventQueue Putting an r= on the patch. PDT wants it r= there. see bug 97228
Attachment #49361 - Flags: review+
sr=alecf
Attachment #49361 - Flags: superreview+
Whiteboard: requesting PDT approval
Fix checked into the trunk. Waiting for PDT approval for branch checkin Checking in nsAutoConfig.cpp; /cvsroot/mozilla/modules/libpref/src/nsAutoConfig.cpp,v <-- nsAutoConfig.cpp new revision: 3.15; previous revision: 3.14 done
lets get it in there...marking pdt+
Whiteboard: requesting PDT approval → pdt+
Fix checked into the branch. Checking in nsAutoConfig.cpp; /cvsroot/mozilla/modules/libpref/src/nsAutoConfig.cpp,v <-- nsAutoConfig.cpp new revision: 3.12.2.2; previous revision: 3.12.2.1 done
Tinderbox seems fine
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Branch build on Linux seems to be hanging at the main event loop with these changes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The main event loop is hanging because even if it has stopped accepting events, GetThreadEventQueue return the disabled event queue when requested to return the youngest event queue. Now, the events posted to it get transfered to the elder event queue and the loop hangs waiting for the events to this queue. I think, GetThreadEventQueue should return the youngest active event queue. As discussed with Danm, It looks like the event queue is leaking and it should be deleted when it has stopped accepting events. At this point, changing anything in the event loop can cause regression. So I am changing AutoConfig event loop to use ProcessPendingEvents() instead waitforevent and handleevent. ProcessPendingEvents() processes the events in the elder event queues, too. Now, another problem is that there two event queue loops, one for the AutoConfig and one for pref_get_ldap_attributes(). We need to create atleast one event queue for pref_get_ldap_attributes() because recursive calls to ProcessPendingEvents() on the same event queue doesn't work. There are two patches, one is with reference to branch version of nsAutoConfig.cpp and the another one is with reference to trunk version.
Mitesh, you mentioned in your comment above that you needed to push a new event queue in pref_get_ldap_attributes, but I don't see that in the patches. Besides that concern, this is the patch we've been discussing. Cool. I'm happy to actually understand what's happening here. Long live ProcessPendingEvents. r=danm again. But this time it'll all be good.
The second event queue and the event loop is only in the patch with reference to the branch sources. The code is not checked into the trunk. The reason you don't see the PushThreadEventQueue and PopThreadEventQueue in the patch is that it's already in the source base as part of bug 75955 check in. pref_get_ldap_atribute() may be called as part of one of the events processed during the first event loop -> ProcessPendingEvents(). Now, this also needs to be finished before we can proceed further so I need another event loop to make it look like Sync connection. As we can see from the ProcessPendingEvents() code it can not be called recursivley on the same event queue so I needed to create a new event queue. I know it's not the best solution as we had problems with the main event loop hanging on the Linux earlier but with ProcessPendignEvents() it seems working fine so I have kept it this way. If you have any other suggestion, please let me know. Thanks for your help and the code review
That's just what event queue pushing is for, so it's possible to ProcessPending while in the middle of ProcessPending on some previous event. I imagine that pushed queue isn't leaking permanently or temporarily and causing problems like in the case you just fixed.
Yes, that's right the new event queue is not leaking. It's actually getting deleted when pref_get_ldap_attributes() finishes. Looks like LDAP connection is not adding extra references like the other necko connection (mainly, HTTP) or may be ProcessPendingEvents() does some magic and helps to remove any references to the event queue.
Comment on attachment 50643 [details] [diff] [review] latest patch with reference to branch sr=alecf
Attachment #50643 - Flags: superreview+
Attachment #50643 - Flags: review+
Comment on attachment 50644 [details] [diff] [review] latest patch with reference to trunk sr=alecf
Attachment #50644 - Flags: superreview+
Attachment #50644 - Flags: review+
Checked into the branch and the trunk
Marking it fixed
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I have verified this fix in the branch now, in macos8 macosX linux and windows, marking as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: