Closed
Bug 501446
Opened 15 years ago
Closed 10 years ago
Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]
Categories
(Firefox :: Extension Compatibility, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: justdave, Assigned: ma1)
References
()
Details
(Keywords: crash)
Crash Data
I've had this same crash 7 times so far in the last 6 or 8 hours, seems to happen at random when the machine is otherwise idle. I have had three or four pages up with an auto-refresh on them.
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090620 Firefox/3.5pre Glubble/2.0.4.6 ID:20090630031219
ChatZilla 0.9.85
DownloadHelper 4.5
DownThemAll! 1.1.4
Firebug 1.4.0b3
Firefox PDF Plugin for Mac OS X 1.1
Glubble 2.0.4.6
Greasemonkey 0.8.20090123.1
Japanese-English Dictionary for rikaichan 1.10
Live HTTP headers 0.15
Mass Password Reset 1.04
Modify Headers 0.6.6
Nagios Checker 0.14.4
Names Dictionary for rikaichan 1.10
Nightly Tester Tools 2.0.2
NoScript 1.9.5
Personas for Firefox 1.2.1
Rikaichan 1.06
Update Channel Selector 1.5
User Agent Switcher 0.7.1
Web Developer 1.1.7
Reporter | ||
Comment 1•15 years ago
|
||
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1pre) Gecko/20090630 Shiretoko/3.5.1pre Glubble/2.0.4.6
^^ My correct UserAgent... I had User-Agent Switcher active because of a Facebook sniffing bug, and NTT believed it when I told it to paste. :)
Comment 2•15 years ago
|
||
This is bug 337418 reincarnated.
Reporter | ||
Comment 3•15 years ago
|
||
OK, given the information in bug 337418 and the additional debugging info requested there, I'm suspecting it's a site with automatic-update/refresh usage since it happens at random without me actually touching the browser, and I have several tabs open with auto-refresh stuff (keeping an eye on release activity, etc).
Tab 1: http://www.facebook.com/home.php - does ajax requests to poll for new status updates and notifications
www.facebook.com has address 69.63.184.30
Tab 2: https://nagios.mozilla.org/sentry/ - has a Refresh: 150 on it.
nagios.mozilla.org is an alias for dm-nagios01.mozilla.org.
dm-nagios01.mozilla.org has address 63.245.208.170
Tab 3: https://nagios.mozilla.org/ftplag/ - has a Refresh: 60 on it.
nagios.mozilla.org is an alias for dm-nagios01.mozilla.org.
dm-nagios01.mozilla.org has address 63.245.208.170
Tab 4: http://people.mozilla.org/~nthomas/misc/uptake.html - has <meta http-equiv="refresh" content="180">
people.mozilla.org is an alias for people.mozilla.com.
people.mozilla.com has address 63.245.208.169
Tab 5: http://downloadstats.mozilla.com/ - I assume it's using ajax of some sort to poll for the stats data.
downloadstats.mozilla.com is an alias for star-moz-com-php5.nslb.sj.mozilla.com.
star-moz-com-php5.nslb.sj.mozilla.com has address 63.245.209.77
Of course, the ones doing ajaxy stuff could be using other host names for where to pull data from.
Comment 4•15 years ago
|
||
What product/component owns class nsDNSRecord ?
IMO, the problem probably lies there, not in NSS.
Perhaps we should add members of the cc list for bug 337418 to this bug's list.
PR_EnumerateAddrInfo nsprpub/pr/src/misc/prnetdb.c:2095
nsDNSRecord::GetNextAddr netwerk/dns/src/nsDNSService2.cpp:134
nsDNSRecord::GetNextAddrAsString netwerk/dns/src/nsDNSService2.cpp:166
NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:2454
XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1590
js_Invoke js/src/jsinterp.cpp:1386
js_Interpret js/src/jsinterp.cpp:5179
justdave: are you using PAC?
Assignee: wtc → nobody
Severity: major → critical
Component: NSPR → Networking
Product: NSPR → Core
QA Contact: nspr → networking
Summary: Crash [@ PR_EnumerateAddrInfo ] at random times → Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times
Version: other → Trunk
Reporter | ||
Comment 6•15 years ago
|
||
(In reply to comment #5)
> justdave: are you using PAC?
What's PAC?
Reporter | ||
Comment 8•15 years ago
|
||
Nope, not using that.
Comment 9•15 years ago
|
||
Looks like NoScript is to blame. While running NoScript (no changes to base install), I loaded about 12 tabs, most to support.mozilla.com, along with three other tabs: Facebook, Nagios (open source monitoring tool, nagios.org) and OTRS (open source trouble ticket system, otrs.org).
The common thread between OTRS and Nagios is meta refreshes every so often (90 seconds for Nagios, 2 minutes for OTRS.
Additionally, Facebook does some triggered JS, which checks back to FB servers to see if there are any new status updates. I think it also interacts with their chat servers.
I did not have this crash before 6/30/09, and I was either up to date or a day or two behind Shiretoko Nightlies. I'm fairly certain that whatever updates were made to Shiretoko and thus FF 3.5 between 6/27/09 and release is causing this issue.
CrashID for the crash with only NoScript running:
41724f2c-19f4-47da-ab54-785202090703 7/3/09 5:01 PM
Comment 10•15 years ago
|
||
NoScript 1.9.5 was released on or around June 29, 2009. Since NoScript was the only AddOn installed, and the same tabs hadn't crashed Firefox with NO AddOns installed, I'm wondering if the combination is suspect.
I likely updated NoScript on June 29 or 30. Maybe I'll install a previous version and see if that still causes the problems. I also wonder if it is NoScript to blame, since between the 26th and 30th a bunch of non-3.5 browsers had the same issue.
Other related crashes of mine that show up in PR_EnumerateAddrInfo crash search:
3073a03a-093a-4a83-a7ba-2ba8e2090703
e18a0d58-1c80-469e-8bda-3f0082090630 Shiretoko
eea60f39-2c9a-4953-86eb-c610a2090630 Shiretoko
fa650c74-553b-425a-a1a0-a66742090702
42fc54ef-72f1-4bea-8276-e72032090701
Assignee | ||
Comment 11•15 years ago
|
||
Latest NoScript does perform async DNS resolution from the UI thread as part of ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe ) and implements a 2nd level cache on top of the native nsIDNSService's one in order to minimize XPCOM and inter-thread dispatching and gain more control over unpinning attacks.
During development I found that if I cached nsIDNSRecord objects directly I caused random crashed like the one described here (I suppose for some pointer ownership issue), so I started storing the address strings returned by the query into an ad-hoc JavaScript DNSRecord object for caching purposes, and had no crash at all since then.
However this report seems to suggest there's still something broken in what I do. Should I deep-clone the strings (e.g. new String(record.getNextAddrAsString()) ) rather than pushing them directly into a JS array?
If you've got NoScript 1.9.5 (or even better 1.9.5.4 from http://noscript.net/getit#devel) installed, the relevant code is all in chrome://noscript/content/DNS.js
Assignee | ||
Comment 12•15 years ago
|
||
Sorry, I actually read the stack now and it doesn't seem an ownership issue but rather a concurrence one. Nothing I can fix on my side, apparently.
Also, I wonder why none of my testers (nor I) had a single crash of this kind in one month of beta testing (the ones I got when I tried to cache nsIDNSRecord objects directly were in private preliminary builds) and I still can't see any report on the NoScript forum.
Assignee | ||
Comment 13•15 years ago
|
||
Stupid question related to comment 12 (unexpectedly few reports): might it be Mac OS X specific? I can see Darwin-specific code in prnetdb.c...
Reporter | ||
Comment 14•15 years ago
|
||
If you look at the crash-stats report linked from the URL field, you can see that almost all of the crashes with this signature are indeed on Mac. There's a couple Windows NT 6.0 in there, but not very many.
Comment 15•15 years ago
|
||
It does seem that the prnetdb.c does have OSX and Darwin specific ifdefs, the only one starts at line 196, with a possible elif at line 333, but the else is at line 403, and the endif at 416. There's one more DARWIN ifdef on line 2186 which defines function pr_StringToNetAddrFB(), which is called in PR_StringToNetAddr, which I don't see called, but then again, this code is pretty foreign to me, so it is possible that the Darwin specific code related to bug 404399 is to blame.
Darwin 8.8.4 was OSX 10.4.4, and while OSX10.5 was released in October 2007 (after the bug was opened), it is possible that it is causing problems in 10.5. I'm guessing that since no popular code used this code recently, it had not reared its head before, as it does now. And given that the related crashes are almost all OSX, it's possible this code needs a new review.
Comment 17•15 years ago
|
||
It worked for me without a crash in Windows7, but crashed in OS X
Comment 18•15 years ago
|
||
TEMPORARY FIX FOR THIS ISSUE
If you want to avoid this bug temporarily, and this bug is due to your use of the NoScript Addon, open NoScript Preferences and Disable ABE.
Tools > AddOns > NoScript Preferences > Advanced Tab > ABE Sub-tab > Uncheck Enable ABE
According to NoScript developer Giorgio Maone confirmed in Bugzilla that ABE calls the code that seems to be broken in OSX version of Firefox, and that by disabling ABE, you still get the functionality of NoScript, without the crashes. However once this bug is fixed, you should re-enable ABE to have the utmost protection that NoScript can provide.
Comment 19•15 years ago
|
||
Is this going to get fixed? We've narrowed it down to a piece of OSX-specific code, probably a bad pointer. What do we do here to get this one fixed to we can use ABE again?
Comment 20•15 years ago
|
||
Out of curiosity, what is ABE?
Assignee | ||
Comment 21•15 years ago
|
||
Comment 22•15 years ago
|
||
From comment #11:
Latest NoScript does perform async DNS resolution from the UI thread as part of
ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe
) and implements a 2nd level cache on top of the native nsIDNSService's one in
order to minimize XPCOM and inter-thread dispatching and gain more control over
unpinning attacks.
Comment 23•15 years ago
|
||
I wonder if the changes to class nsDNSService for bug 453403 are in some way
causal of these crashes.
Comment 24•15 years ago
|
||
It'd be great to get this issue resolved.
Updated•15 years ago
|
Summary: Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr]
Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]
Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]
Comment 25•15 years ago
|
||
This has been driving my wife nuts (OSX Firefox user) for weeks now. I finally did some digging and found out about about:crashes, which finally pointed us to the NoScript ABE issue. Would really like to see this addressed, since it seems to be affecting a lot of people (as NoScript is one of the most popular Firefox addons).
She has just disabled the ABE feature and we're crossing our fingers that it stops the crashes.
Comment 26•15 years ago
|
||
ok, i claim this is noscript's fault:
timeless-mbp-2:ns timeless$ grep em:ver '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/install.rdf'
<em:version>1.9.9.35</em:version>
timeless-mbp-2:ns timeless$ unzip '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/chrome/noscript.jar'
timeless-mbp-2:ns timeless$ grep idn-ser content/noscript/DNS.js host = CC["@mozilla.org/network/idn-service;1"].createInstance(CI.nsIIDNService).convertUTF8toACE(host);
mao: the idn-service is a service. please use .getService().
Assignee: nobody → g.maone
Component: Networking → Extension Compatibility
Product: Core → Firefox
QA Contact: networking → extension.compatibility
Comment 27•15 years ago
|
||
Timeless, Please spell your suggestion out a little more explicitly.
You might even spell it out in a form that an adventurous user could try
out him/her self by editing his/her own noscript.jar file. If that fixes
the problem, then users get immediate relief, you get more fame & glory. :)
Comment 28•15 years ago
|
||
I think fixing NoScript to not cause the bug is not as good as fixing the underlying bug. One day, another extension or even some Firefox code might come along that triggers the same bug.
Assignee | ||
Comment 29•15 years ago
|
||
@timeless:
You're correct about nsIIDNService being, as the interface name says, a service.
However I happened to copy that code (which, BTW, gets called quite rarely) verbatim from MDC. Therefore I'm gonna fix my call, but someone should fix the docs:
https://developer.mozilla.org/en/nsIIDNService
That said, how is this supposed to fix this very bug?
Comment 30•15 years ago
|
||
i'm not sure i want to think about or explain it. i can guess. the code uses locks to protect objects, but if there are two locks each of which should be the *only* lock for something, then when things think they have "locked" the object, they might have just locked a lock which wasn't the same as the other guy's lock for the same object.
as for the wiki, wow, that sucks. in general, anyone who creates an account can edit any page on the wiki (including that one). I don't touch wikis. You're welcome to fix it, or i'll see about getting someone to fix it.
nelson: i'd just as soon giorgio fix it and send it to everyone.
jonathan louie: comment 29 has identified the underlying bug, it's bad documentation.
our arrangement with extension authors is this:
You have full access to the entire application, you can do anything, you can crash the browser. we expect you to follow certain rules and try not to crash the browser.
among the rules are this: a service should be used as a service.
we might someday enforce that codewise, however officially the design of xpcom is such that there might be a class for which it's legal to have both a service and multiple instances. Unfortunately, since xpcom is portable, it's possible some downstream vendor has taken advantage of that with their own objects, which means if we change how the service manager works, we're breaking our contract with them.
it's unfortunate that sheppy made this mistake in his documentation (or in copying it from elsewhere), but he was basically the only person creating any MDC content and he created a lot, getting everything right is a lot to ask for.
Assignee | ||
Comment 31•15 years ago
|
||
While I agree that fixing the nsIIDNService instantiation is due (and indeed I did it), I doubt it's the actual culprit here, especially since it gets called exclusively for hosts containing characters which are illegal in ASCII domains (i.e. quite rarely so far).
After re-digging Gecko's sources I found some more suspicious code:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/nsDNSService2.cpp#184
By "doing the right thing" with the nsIDNSRecord interface (i.e. calling hasMore() before getNextAddrAsString()), NoScript is apparently calling getNextAddr() *twice*, and looking at the stack traces it crashes on the second call.
Furthermore, the thread safety of hasMore() seems dubious, so it plausible that it's preparing our crash since NoScript calls the DNS service asynchronously from an "unusual" thread (the UI one).
Therefore I eliminated the hasMore() call completely, and released both "fixes" (nsIIDNService and hasMore() elision) in NoScript 1.9.9.36, now available on http://noscript.net/getit#direct
Let's cross our fingers and see if crashes go down in a week or so...
Comment 32•13 years ago
|
||
There are many startup crashes with this crash signature in 10.0b4.
It doesn't seem related to NoScript.
More reports at:
https://crash-stats.mozilla.com/report/list?signature=PR_EnumerateAddrInfo
Crash Signature: [@ PR_EnumerateAddrInfo]
[@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr]
Keywords: crash
Comment 33•13 years ago
|
||
For crashes in 10.0b4, I filed bug 718389.
Comment 34•10 years ago
|
||
(In reply to Scoobidiver (away) from comment #33)
> For crashes in 10.0b4, I filed bug 718389.
I've closed the above bug report.
Does anyone believe this crash still exists?
I'm not seeing any on crash-stats
Whiteboard: [closeme 2015-06-01]
Comment 35•10 years ago
|
||
Resolved per whiteboard
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2015-06-01]
You need to log in
before you can comment on or make changes to this bug.
Description
•