Closed
Bug 102227
Opened 23 years ago
Closed 23 years ago
N620 Trunk Segfault in OnFound in nsLDAPConnection [@ nsLDAPConnection::OnFound]
Categories
(Directory :: LDAP XPCOM SDK, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: leif, Assigned: leif)
References
Details
(Keywords: crash, topcrash, Whiteboard: [PDT+])
Crash Data
Attachments
(3 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
dmosedale
:
review+
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
We have a few Talkback reports indicating that we are crashing on line 852 in
nsLDAPConnection.cpp. The stack is
nsLDAPConnection::OnFound
[d:\builds\seamonkey\mozilla\directory\xpcom\base\src\nsLDAPConnection.cpp, line
852]
XPTC_InvokeByIndex
[d:\builds\seamonkey\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp,
line 139]
EventHandler [d:\builds\seamonkey\mozilla\xpcom\proxy\src\nsProxyEvent.cpp, line
515]
PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 591]
The relevant code is:
NS_IMETHODIMP
nsLDAPConnection::OnFound(nsISupports *aContext,
const char* aHostName,
nsHostEnt *aHostEnt)
{
PRUint32 index = 0;
PRNetAddr netAddress;
char addrbuf[64];
// Do we have a proper host entry? If not, set the internal DNS
// status to indicate that host lookup failed.
//
if (!aHostEnt->hostEnt.h_addr_list || !aHostEnt->hostEnt.h_addr_list[0]) {
mDNSStatus = NS_ERROR_UNKNOWN_HOST;
return NS_ERROR_UNKNOWN_HOST;
}
// Make sure our address structure is initialized properly
//
memset(&netAddress, 0, sizeof(netAddress));
PR_SetNetAddr(PR_IpAddrAny, PR_AF_INET6, 0, &netAddress);
I can't think of any reason why we'd sometimes crash on this call to |memset()|,
and I've not been able to reproduce it either. I'm kind of stumped how to debug
this problem, I don't understand how |netAddress| could not be correcly
allocated on the stack?
-- Leif
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•23 years ago
|
||
From a talkback report:
x86 Registers:
EAX: 00060003 EBX: 60e32b60 ECX: 02a9afcc EDX: 606864b4
ESI: 02b0a954 EDI: 00000000 ESP: 0012fc28 EBP: 0012fc90
EIP: 6068332e cf PF af zf sf of IF df nt RF vm IOPL: 0
CS: 001b DS: 0023 SS: 0023 ES: 0023 FS: 0038 GS: 0000
cmp [eax],edi
60683330 0f84d9000000 je 6068340f
60683336 6a20 push 0x20
60683338 8d45e0 lea eax,[ebp-0x20]
6068333b 57 push edi
6068333c 50 push eax
6068333d e89a200000 call 606853dc
60683342 8d45e0 lea eax,[ebp-0x20]
60683345 50 push eax
60683346 57 push edi
60683347 6a17 push 0x17
60683349 6a01 push 0x1
6068334b ff15dc29dccc call dword ptr [ccdc29dc]
Comment 2•23 years ago
|
||
*** Bug 102567 has been marked as a duplicate of this bug. ***
Comment 3•23 years ago
|
||
I just ran into this on my linux box running a branch build. Talkback ID is
36186399.
x86 Registers:
EAX: 09fec8cc EBX: 41337130 ECX: 0000266e EDX: 41336998
ESI: 00000003 EDI: 09fece90 ESP: bffff1bc EBP: bffff298
EIP: 4132fd02 cf pf af zf sf of IF df nt RF vm IOPL: 0
CS: 0023 DS: 002b SS: 002b ES: 002b FS: 0000 GS: 0007
Code Around the PC:
4132fd02 833900 cmp dword ptr [ecx],0x0
4132fd05 7519 jnz 4132fd20
4132fd07 8b4508 mov eax,[ebp+0x8]
4132fd0a c7404c1e004b80 mov dword ptr [eax+0x4c],0x804b001e
4132fd11 b81e004b80 mov eax,0x804b001e
4132fd16 e945010000 jmp 4132fe60
4132fd1b 90 nop
4132fd1c 8d742600 lea esi,[esi]
4132fd20 6a6c push 0x6c
Assignee | ||
Comment 4•23 years ago
|
||
Assignee | ||
Comment 5•23 years ago
|
||
After looking at this some more, both Mose and I are not convinced that the
Talkback report is pointing at the correct line. In fact, we suspect the crasher
might be at around line 845:
if (!aHostEnt->hostEnt.h_addr_list || !aHostEnt->hostEnt.h_addr_list[0]) {
We've been able to reproduce a crasher on this exact line, where
|aHostEntr->hostEnt.h_addr_list| is non-null but points into never-never land
(or Uranus as mose would say), and we crash on the second half of the |if()|
statement. This causes a segfault.
It's still unclear how this structure is getting corrupted, or why. Does anyone
have suggestions if a) I'm not testing the |aHostEnt| structure properly for
"correctness" or b) what could cause the DNS service (or possible the proxy
code) to corrupt the host data or c) is this a corruption on the stack itself,
making our |aHostEnt| point into the void somehow?
Thanks!
-- Leif
You might try adding assertions to nsDNSRequest::FireStop() to ascertain whether
or not the hostent is corrupt at that point.
I presume that aHostEnt is !nil, but I don't see a test for that.
Comment 7•23 years ago
|
||
Comment 8•23 years ago
|
||
OK, so I noticed that in my builds, the crash happens more of the time when
there is an error dialog, after I select the error item. Additionally, just for
grins, I tried recompiling nsLDAPConnection.cpp using PROXY_SYNC rather than
PROXY_ASYNC. Interestingly, once when I saw the core dump with this PROXY_SYNC
code, I saw an assertion from nsDNSRequest::Cancel:
NS_ASSERTION(!PR_CLIST_IS_EMPTY(this), "request is not queue on lookup");
This is making me wonder if ::Cancel is sometimes getting called after the
lookup has already finished. Is this allowable semantics?
Comment 9•23 years ago
|
||
gordon: correct, aHostEnt is not nil. I tried adding the assertions you
suggested, and the hostent is NOT corrupt when just before the call to OnFound.
So this may be proxy or xptcall or other event queue lossage of some sort.
Comment 10•23 years ago
|
||
OK, so I see what's going on here. The DNS service is calling OnFound back with
a pointer to some private data. Then, it assumes that once OnFound returns,
there's no need for the private data any more, and sets the nsCOMPtr holding it
to nsnull.
However, in the case of an asynchronous proxy, the data may not have actually
been used yet.
So I think we can work around this in the short term by using a synchronous
proxy (maybe I was mistaken when I thought it still dumped core before with the
sync proxy, because it's not now).
Long term, I'd propose the nsIDNSListener should hand back refcounted data
directly, rather than just a pointer into a privately refcounted objet.
I'm still seeing the assertion I mentioned before with PROXY_SYNC, anyone know
what's up with this?
Comment 11•23 years ago
|
||
The assertion is happening when the nsLDAPConnection destructor calls
mDNSRequest->Cancel. It's not clear to me why this is happening, however: I
added some logging, and nsLDAPConnection::OnStopLookup is getting called, and
that function zeroes out mDNSRequest.
Updated•23 years ago
|
Assignee | ||
Comment 12•23 years ago
|
||
Assignee | ||
Comment 13•23 years ago
|
||
Comment on attachment 52290 [details] [diff] [review]
Possible fix, v1
This patch is missing one part, posting a new one soon.
Attachment #52290 -
Attachment is obsolete: true
Assignee | ||
Comment 14•23 years ago
|
||
Assignee | ||
Comment 15•23 years ago
|
||
Requesting SR= and R= on the v2 patch. It's tested on all three platforms.
-- Leif
Comment 16•23 years ago
|
||
Attachment #52295 -
Flags: review+
Comment 17•23 years ago
|
||
Comment on attachment 52295 [details] [diff] [review]
Potential fix, v2
sr=bienvenu
Attachment #52295 -
Flags: superreview+
Assignee | ||
Comment 18•23 years ago
|
||
Checked in on trunk. Richi P.: can you maybe try a "trunk" build on Monday or
so, and see if this fixes your problem?
Thanks,
-- Leif
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 19•23 years ago
|
||
I'm using build 2001100503 on win32 right now. Unfortunately, a lot has happened
since I sent that bug report. One of the major changes is that I delete my User
profile and started from scratch (some changes a few weeks back caused Mozilla
installers to **** on me).
With this build, Mozilla doesn't seem to crash anymore when doing an LDAP
lookup. I'll bang on it some more and see what happens. I'll also download a
build on Monday and see if that makes any difference as well.
Comment 20•23 years ago
|
||
Sorry ... spoke too soon. It's still happening on 2001100503 win32 (I just
noticed on the Platform heading for this bug report, it says Linux only).
The behavior is erratic. Near as I can tell, one of three things happen:
1) I start Mozilla, compose a message, type in a few chars. and it SIGSEGVs (the
win32 equivalent, at least)
2) I start Mozilla, do some stuff, compose a message, type in a few chars. and
some entries in the personal dictionary will show up and in the bottom and error
entry saying problems with the LDAP server. I try a different sequence of
letters and next thing I know, LDAP is working.
3) LDAP works fine.
Once LDAP lookup starts to work, though, I can't seem to make it break again
without restarting Mozilla.
Will check again on Monday.
Assignee | ||
Comment 21•23 years ago
|
||
What was the timestamp on the file you downloaded? The fix wasn't checked in
until around 7pm, so I suspect you won't see the fix in any builds until
earliest Saturday morning.
-- Leif
Comment 22•23 years ago
|
||
Finally!
On win32 mozilla 2001100610 (timestamp 06-Oct-2001 14:06), doing LDAP lookups
isn't crashing like before. Of course, there's very little traffic on the LAN so
the environment is unlike that when I experienced it before, but it looks good
so far.
Assignee | ||
Comment 23•23 years ago
|
||
Requesting PDT for checkin on 0.9.4 branch.
-- Leif
Whiteboard: PDT
Comment 24•23 years ago
|
||
Verified with 20011008 trunk build on Window 2000.
LDAP auto complete works fine against the following servers:
Hostname: 208.12.37.50
Base DN: dc=mcom,dc=com
Hostname: 208.12.36.22
Base DN: o=Airius.com
Hostname: 208.12.37.103
Base DN: o=mcom.com
QA Contact: olgac → yulian
Updated•23 years ago
|
Whiteboard: PDT → [PDT+]
Comment 25•23 years ago
|
||
pls check this into the branch - PDT+
Assignee | ||
Comment 26•23 years ago
|
||
Checked in on 0.9.4 branch
-- Leif
Comment 27•23 years ago
|
||
*** Bug 103868 has been marked as a duplicate of this bug. ***
Comment 28•23 years ago
|
||
Re-open to get into the 0.9.5 branch.
Assignee | ||
Comment 29•23 years ago
|
||
Checked in on 0.9.5 branch
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 30•23 years ago
|
||
We still show four incidents on the Trunk as recently as 10-04. Can we check it
in?
Adding info for talkback tracking. This was a topcrasher on the branch.
Changing platform to reflect that this was/is happening on Windows and Linux.
Keywords: topcrash
OS: Linux → All
Hardware: All → PC
Summary: Segfault in OnFound in nsLDAPConnection → N620 Trunk Segfault in OnFound in nsLDAPConnection [@ nsLDAPConnection::OnFound]
Comment 31•23 years ago
|
||
Tom, do you see this on the topcrash report for the 094 branch and 095 branch
after 10-9? Thanks.
Comment 32•23 years ago
|
||
greer: re-read the comments in the bug, and you'll see that the fix wasn't
checked in until late on 10/5, so it's not surprising that there are crashes on
10/4.
Comment 33•23 years ago
|
||
Talkback data shows no incidents with this signature after 10/9.
Marking VERIFIED fixed.
Status: RESOLVED → VERIFIED
Updated•13 years ago
|
Crash Signature: [@ nsLDAPConnection::OnFound]
You need to log in
before you can comment on or make changes to this bug.
Description
•