Closed Bug 84420 Opened 23 years ago Closed 23 years ago

BeOS and OS/2 should use Unix Async DNS Implementation

Categories

(Core :: Networking, defect, P3)

Other
Other
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: gordon, Assigned: mkaply)

Details

(Whiteboard: critical for 0.9.2)

Attachments

(3 files)

The DNS cache doesn't evict lookups on BEOS and OS/2. We need to call AddToEvictionQ() after processing the request on new/pending lookups for platforms that don't use the dns thread for async lookups.
Mike, do you test on both BEOS and OS/2, or just OS/2? Could you test a patch once I get it done?
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2
afaik mkaply is just a concerned citizen and doesn't have a BeOS box. I know cls builds on BeOS.
Summary: DNS cache doesn't evict on BEOS and OS/2 → DNS cache doesn't evict on BeOS and OS/2
Someone else is going to need to pick up testing this.
Keywords: qawanted
If mkaply and cls can review the patch, I'll get an sr= and an a= and check it in. Another possibility is that OS/2 and BeOS may be able to simply use the UNIX implementation. That would clean up the code a bit as well. I'll submit an optimistic patch so someone can try that as well.
How would I actually test the patch to see if it was working/failing?
The second patch is using PR_GetIPNodeByName() for OS/2. Wan-Teh, is that available on OS/2?
Yes, PR_GetIPNodeByName() is available on OS/2.
Setting target milestone to 0.9.3, but I'll check it in as soon as I get test feedback and reviews.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
OK, without any patches applied, DNS works fine. First patch wouldn't compile - PRCLIST_IS_EMPTY is not defined. Second patch compiled, but I got this assert coming up: ###!!! ASSERTION: bad DNS shutdown state: 'mState == DNS_SHUTTING_DOWN', file D: /builds/current/mozilla/netwerk/dns/src/nsDnsService.cpp, line 1219 And DNS did not work at all.
I don't understand the problem with the optimal patch. I looks like the dns thread is failing to wait on the condition variable in DequeueLookup() or getting wrongly notified when there are no lookups in the queue. It's not clear to me how this could be happening on OS/2, but not Unix. Wan-Teh, are there any known differences with PR_WaitCondVar() on OS/2?
You MUST ALWAYS call PR_WaitCondVar in a while loop. The following code is wrong. nsDNSLookup * nsDNSService::DequeuePendingQ() { #if defined(XP_UNIX) // Wait for notification of a queued request, unless we're shutting down. if (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) { PRStatus status = PR_WaitCondVar(mDNSCondVar, PR_INTERVAL_NO_TIMEOUT); NS_ASSERTION(status == PR_SUCCESS, "PR_WaitCondVar failed."); } #endif You should say while (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
Ah, great. I'll fix it on the trunk as soon as I can push through the review process. I wonder why we haven't seen this as a problem on Unix?
in the original "async dns" XP_UNIX code that i wrote for bug 10733, we had something similar to: DNS thread: mon.enter(); if (!pending_lookup && !shutting_down) mon.wait(); if (pending_lookup) do_lookup(); else do_shutdown(); mon.exit(); client thread: mon.enter(); pending_lookup = true; mon.notify(); mon.exit(); in other words, it seemed ok for there not to be a pending_lookup when the monitor was notified... it meant to shutdown (note: the shutting_down flag was just used to handle the shutdown case in which the DNS thread had not yet run). the new code seems to be basically identical. on OS/2 it seems to be the case that a pending_lookup is added before the monitor (convar in the new world) is notified, but after the DNS thread resumes pending_lookup is for some reason not set. wtc: why is the while loop needed? i always thought that a while loop was only need in the case of multiple threads waiting on a condition upon which only one thread may act. is this not the case? thanks!
The OS/2 problem may be not caused by not using a while loop here. I am merely pointing out that as a rule, you must call PR_WaitCondVar or PR_Wait in a while loop. That's required by the API. Two reasons the while loop is required. 1. PR_WaitCondVar or PR_Wait may return due to a spurious wakeup. Note that an implementation may not actually generate spurious wakeups, but they are allowed by the API specification. 2. If multiple threads may call PR_WaitCondVar, another thread may be able to enter the monitor sooner than the awakened thread can (a thread awakened from a PR_WaitCondVar call needs to re-enter the monitor) and make the boolean expression false again. This condition is called a stolen wakeup. Again, I am not saying that the OS/2 problem is caused by a spurious wakeup or a stolen wakeup. In fact, you may be able to analyze your code and the PR_WaitCondVar implementation on OS/2 and determine that these don't happen. I am just saying that it is a syntactical requirement (one that can't be enforced by a compiler) that all PR_WaitCondVar calls must be inside a while loop.
fair enough... thanks for the thorough explanation :-) in this case, it is not a stolen wakeup, because we only have one thread that may call PR_WaitCondVar. perhaps it is the spurious wakeup problem, since we are only seeing this on OS/2 and not any of the unix platforms. sounds to me like we need to do a scan of all usages of PR_Wait or PR_WaitCondVar to ensure that they are not subject to similar problems. I suspect there may be problems with the thread pool implementation.
The simple patch has been checked in. So I'm morphing this bug to state the desire to have BeOS and OS/2 use the Unix implementation of async dns lookups (unless those platforms have a native async gethostbyname API that someone wants to use).
Summary: DNS cache doesn't evict on BeOS and OS/2 → BeOS and OS/2 should use Unix Async DNS Implementation
Target Milestone: mozilla0.9.3 → mozilla1.0
OK, with the current level of nsDnsService.cpp, Os/2 can follow the UNIX path no problem (as per the optimistic patch) Do we want to assume BEOS can and check in the changes? Or just check in the pieces with XP_OS2?
Gordon, OS/2 can use the Unix implementation. Do you want to assume that BeOS can? Or should I create a patch that just fixes this for OS/2?
You know, I'm tempted to change it for both; if it works - great! If it breaks BeOS and somebody cares, then we should hear from them, which will let us know who we can work with to resolve the issues. I suppose we should make an announcement on netscape.public.mozilla.netlib first.
you could also post to netscape.public.mozilla.beos, ... I just updated my tree so i might try a beos build this week.
Sorry to be late, I tested the 'optimistic' patch on BeOS build and it is working for me so far. I'm posting this from patched Bezilla:-)
Why is this targeted for 1.0? I'd like to see this in 0.9.2. Don't we leak without this?
mkaply: gordon is out of town for the next couple of weeks... so, this probably won't go in for 0.9.2 unless some kind soul takes it... how 'bout you? ;-)
I'm taking this bug to push for checkin to 0.9.2. Gordon wrote the code, and I looked at it and said it was OK, so r=mkaply Makoto Hamanaka verified it on BeOS, so I am going to take r=VYA04230@nifty.com for the BEOS part. darin: Can I get an sr=darin so I can send this to drivers?
Assignee: gordon → mkaply
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → mozilla0.9.2
sr=darin
Fix checked in to trunk and branch (branch approval from Asa by mail)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: critical for 0.9.2
Keywords: verifyme
Verify fixed
Status: RESOLVED → VERIFIED
Keywords: verifyme
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: