Closed
Bug 84420
Opened 23 years ago
Closed 23 years ago
BeOS and OS/2 should use Unix Async DNS Implementation
Categories
(Core :: Networking, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla0.9.2
People
(Reporter: gordon, Assigned: mkaply)
Details
(Whiteboard: critical for 0.9.2)
Attachments
(3 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
The DNS cache doesn't evict lookups on BEOS and OS/2. We need to call
AddToEvictionQ() after processing the request on new/pending lookups for
platforms that don't use the dns thread for async lookups.
Mike, do you test on both BEOS and OS/2, or just OS/2? Could you test a patch
once I get it done?
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2
afaik mkaply is just a concerned citizen and doesn't have a BeOS box. I
know cls builds on BeOS.
Summary: DNS cache doesn't evict on BEOS and OS/2 → DNS cache doesn't evict on BeOS and OS/2
Someone else is going to need to pick up testing this.
Keywords: qawanted
If mkaply and cls can review the patch, I'll get an sr= and an a= and check it
in.
Another possibility is that OS/2 and BeOS may be able to simply use the UNIX
implementation. That would clean up the code a bit as well. I'll submit an
optimistic patch so someone can try that as well.
Assignee | ||
Comment 6•23 years ago
|
||
How would I actually test the patch to see if it was working/failing?
The second patch is using PR_GetIPNodeByName() for OS/2. Wan-Teh, is that
available on OS/2?
Comment 9•23 years ago
|
||
Yes, PR_GetIPNodeByName() is available on OS/2.
Reporter | ||
Comment 10•23 years ago
|
||
Setting target milestone to 0.9.3, but I'll check it in as soon as I get test
feedback and reviews.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Assignee | ||
Comment 11•23 years ago
|
||
OK, without any patches applied, DNS works fine.
First patch wouldn't compile - PRCLIST_IS_EMPTY is not defined.
Second patch compiled, but I got this assert coming up:
###!!! ASSERTION: bad DNS shutdown state: 'mState ==
DNS_SHUTTING_DOWN', file D:
/builds/current/mozilla/netwerk/dns/src/nsDnsService.cpp, line 1219
And DNS did not work at all.
Reporter | ||
Comment 12•23 years ago
|
||
Reporter | ||
Comment 13•23 years ago
|
||
I don't understand the problem with the optimal patch. I looks like the dns
thread is failing to wait on the condition variable in DequeueLookup() or getting
wrongly notified when there are no lookups in the queue. It's not clear to me
how this could be happening on OS/2, but not Unix.
Wan-Teh, are there any known differences with PR_WaitCondVar() on OS/2?
Comment 14•23 years ago
|
||
You MUST ALWAYS call PR_WaitCondVar in a while loop.
The following code is wrong.
nsDNSLookup *
nsDNSService::DequeuePendingQ()
{
#if defined(XP_UNIX)
// Wait for notification of a queued request, unless we're shutting down.
if (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
PRStatus status = PR_WaitCondVar(mDNSCondVar, PR_INTERVAL_NO_TIMEOUT);
NS_ASSERTION(status == PR_SUCCESS, "PR_WaitCondVar failed.");
}
#endif
You should say
while (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
Reporter | ||
Comment 15•23 years ago
|
||
Ah, great. I'll fix it on the trunk as soon as I can push through the review
process. I wonder why we haven't seen this as a problem on Unix?
Comment 16•23 years ago
|
||
in the original "async dns" XP_UNIX code that i wrote for bug 10733, we had
something similar to:
DNS thread:
mon.enter();
if (!pending_lookup && !shutting_down)
mon.wait();
if (pending_lookup)
do_lookup();
else
do_shutdown();
mon.exit();
client thread:
mon.enter();
pending_lookup = true;
mon.notify();
mon.exit();
in other words, it seemed ok for there not to be a pending_lookup when the
monitor was notified... it meant to shutdown (note: the shutting_down flag was
just used to handle the shutdown case in which the DNS thread had not yet run).
the new code seems to be basically identical.
on OS/2 it seems to be the case that a pending_lookup is added before the
monitor (convar in the new world) is notified, but after the DNS thread resumes
pending_lookup is for some reason not set.
wtc: why is the while loop needed? i always thought that a while loop was only
need in the case of multiple threads waiting on a condition upon which only one
thread may act. is this not the case? thanks!
Comment 17•23 years ago
|
||
The OS/2 problem may be not caused by not using a while loop
here. I am merely pointing out that as a rule, you must call
PR_WaitCondVar or PR_Wait in a while loop. That's required by
the API.
Two reasons the while loop is required.
1. PR_WaitCondVar or PR_Wait may return due to a spurious wakeup.
Note that an implementation may not actually generate spurious
wakeups, but they are allowed by the API specification.
2. If multiple threads may call PR_WaitCondVar, another thread
may be able to enter the monitor sooner than the awakened thread
can (a thread awakened from a PR_WaitCondVar call needs to re-enter
the monitor) and make the boolean expression false again. This
condition is called a stolen wakeup.
Again, I am not saying that the OS/2 problem is caused by a
spurious wakeup or a stolen wakeup. In fact, you may be able
to analyze your code and the PR_WaitCondVar implementation on
OS/2 and determine that these don't happen. I am just saying
that it is a syntactical requirement (one that can't be enforced
by a compiler) that all PR_WaitCondVar calls must be inside a
while loop.
Comment 18•23 years ago
|
||
fair enough... thanks for the thorough explanation :-)
in this case, it is not a stolen wakeup, because we only have one thread
that may call PR_WaitCondVar. perhaps it is the spurious wakeup problem,
since we are only seeing this on OS/2 and not any of the unix platforms.
sounds to me like we need to do a scan of all usages of PR_Wait or
PR_WaitCondVar to ensure that they are not subject to similar problems.
I suspect there may be problems with the thread pool implementation.
Reporter | ||
Comment 19•23 years ago
|
||
The simple patch has been checked in. So I'm morphing this bug to state the
desire to have BeOS and OS/2 use the Unix implementation of async dns lookups
(unless those platforms have a native async gethostbyname API that someone wants
to use).
Summary: DNS cache doesn't evict on BeOS and OS/2 → BeOS and OS/2 should use Unix Async DNS Implementation
Target Milestone: mozilla0.9.3 → mozilla1.0
Assignee | ||
Comment 20•23 years ago
|
||
OK, with the current level of nsDnsService.cpp, Os/2 can follow the
UNIX path no problem (as per the optimistic patch)
Do we want to assume BEOS can and check in the changes? Or just check
in the pieces with XP_OS2?
Assignee | ||
Comment 21•23 years ago
|
||
Gordon,
OS/2 can use the Unix implementation. Do you want to assume that BeOS can? Or
should I create a patch that just fixes this for OS/2?
Reporter | ||
Comment 22•23 years ago
|
||
You know, I'm tempted to change it for both; if it works - great! If it breaks
BeOS and somebody cares, then we should hear from them, which will let us know
who we can work with to resolve the issues. I suppose we should make an
announcement on netscape.public.mozilla.netlib first.
Comment 23•23 years ago
|
||
you could also post to netscape.public.mozilla.beos, ... I just updated my tree
so i might try a beos build this week.
Comment 24•23 years ago
|
||
Sorry to be late, I tested the 'optimistic' patch on BeOS build and it is
working for me so far.
I'm posting this from patched Bezilla:-)
Assignee | ||
Comment 25•23 years ago
|
||
Why is this targeted for 1.0?
I'd like to see this in 0.9.2. Don't we leak without this?
Comment 26•23 years ago
|
||
mkaply: gordon is out of town for the next couple of weeks... so, this probably
won't go in for 0.9.2 unless some kind soul takes it... how 'bout you? ;-)
Assignee | ||
Comment 27•23 years ago
|
||
I'm taking this bug to push for checkin to 0.9.2.
Gordon wrote the code, and I looked at it and said it was OK, so
r=mkaply
Makoto Hamanaka verified it on BeOS, so I am going to take r=VYA04230@nifty.com
for the BEOS part.
darin: Can I get an sr=darin so I can send this to drivers?
Assignee: gordon → mkaply
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → mozilla0.9.2
Comment 28•23 years ago
|
||
sr=darin
Assignee | ||
Comment 29•23 years ago
|
||
Fix checked in to trunk and branch (branch approval from Asa by mail)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•23 years ago
|
Whiteboard: critical for 0.9.2
You need to log in
before you can comment on or make changes to this bug.
Description
•