Closed Bug 501446 Opened 15 years ago Closed 10 years ago

Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]

Categories

(Firefox :: Extension Compatibility, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: justdave, Assigned: ma1)

References

()

Details

(Keywords: crash)

Crash Data

I've had this same crash 7 times so far in the last 6 or 8 hours, seems to happen at random when the machine is otherwise idle. I have had three or four pages up with an auto-refresh on them. Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090620 Firefox/3.5pre Glubble/2.0.4.6 ID:20090630031219 ChatZilla 0.9.85 DownloadHelper 4.5 DownThemAll! 1.1.4 Firebug 1.4.0b3 Firefox PDF Plugin for Mac OS X 1.1 Glubble 2.0.4.6 Greasemonkey 0.8.20090123.1 Japanese-English Dictionary for rikaichan 1.10 Live HTTP headers 0.15 Mass Password Reset 1.04 Modify Headers 0.6.6 Nagios Checker 0.14.4 Names Dictionary for rikaichan 1.10 Nightly Tester Tools 2.0.2 NoScript 1.9.5 Personas for Firefox 1.2.1 Rikaichan 1.06 Update Channel Selector 1.5 User Agent Switcher 0.7.1 Web Developer 1.1.7
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1pre) Gecko/20090630 Shiretoko/3.5.1pre Glubble/2.0.4.6 ^^ My correct UserAgent... I had User-Agent Switcher active because of a Facebook sniffing bug, and NTT believed it when I told it to paste. :)
This is bug 337418 reincarnated.
OK, given the information in bug 337418 and the additional debugging info requested there, I'm suspecting it's a site with automatic-update/refresh usage since it happens at random without me actually touching the browser, and I have several tabs open with auto-refresh stuff (keeping an eye on release activity, etc). Tab 1: http://www.facebook.com/home.php - does ajax requests to poll for new status updates and notifications www.facebook.com has address 69.63.184.30 Tab 2: https://nagios.mozilla.org/sentry/ - has a Refresh: 150 on it. nagios.mozilla.org is an alias for dm-nagios01.mozilla.org. dm-nagios01.mozilla.org has address 63.245.208.170 Tab 3: https://nagios.mozilla.org/ftplag/ - has a Refresh: 60 on it. nagios.mozilla.org is an alias for dm-nagios01.mozilla.org. dm-nagios01.mozilla.org has address 63.245.208.170 Tab 4: http://people.mozilla.org/~nthomas/misc/uptake.html - has <meta http-equiv="refresh" content="180"> people.mozilla.org is an alias for people.mozilla.com. people.mozilla.com has address 63.245.208.169 Tab 5: http://downloadstats.mozilla.com/ - I assume it's using ajax of some sort to poll for the stats data. downloadstats.mozilla.com is an alias for star-moz-com-php5.nslb.sj.mozilla.com. star-moz-com-php5.nslb.sj.mozilla.com has address 63.245.209.77 Of course, the ones doing ajaxy stuff could be using other host names for where to pull data from.
What product/component owns class nsDNSRecord ? IMO, the problem probably lies there, not in NSS. Perhaps we should add members of the cc list for bug 337418 to this bug's list.
PR_EnumerateAddrInfo nsprpub/pr/src/misc/prnetdb.c:2095 nsDNSRecord::GetNextAddr netwerk/dns/src/nsDNSService2.cpp:134 nsDNSRecord::GetNextAddrAsString netwerk/dns/src/nsDNSService2.cpp:166 NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179 XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:2454 XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1590 js_Invoke js/src/jsinterp.cpp:1386 js_Interpret js/src/jsinterp.cpp:5179 justdave: are you using PAC?
Assignee: wtc → nobody
Severity: major → critical
Component: NSPR → Networking
Product: NSPR → Core
QA Contact: nspr → networking
Summary: Crash [@ PR_EnumerateAddrInfo ] at random times → Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times
Version: other → Trunk
(In reply to comment #5) > justdave: are you using PAC? What's PAC?
Nope, not using that.
Looks like NoScript is to blame. While running NoScript (no changes to base install), I loaded about 12 tabs, most to support.mozilla.com, along with three other tabs: Facebook, Nagios (open source monitoring tool, nagios.org) and OTRS (open source trouble ticket system, otrs.org). The common thread between OTRS and Nagios is meta refreshes every so often (90 seconds for Nagios, 2 minutes for OTRS. Additionally, Facebook does some triggered JS, which checks back to FB servers to see if there are any new status updates. I think it also interacts with their chat servers. I did not have this crash before 6/30/09, and I was either up to date or a day or two behind Shiretoko Nightlies. I'm fairly certain that whatever updates were made to Shiretoko and thus FF 3.5 between 6/27/09 and release is causing this issue. CrashID for the crash with only NoScript running: 41724f2c-19f4-47da-ab54-785202090703 7/3/09 5:01 PM
NoScript 1.9.5 was released on or around June 29, 2009. Since NoScript was the only AddOn installed, and the same tabs hadn't crashed Firefox with NO AddOns installed, I'm wondering if the combination is suspect. I likely updated NoScript on June 29 or 30. Maybe I'll install a previous version and see if that still causes the problems. I also wonder if it is NoScript to blame, since between the 26th and 30th a bunch of non-3.5 browsers had the same issue. Other related crashes of mine that show up in PR_EnumerateAddrInfo crash search: 3073a03a-093a-4a83-a7ba-2ba8e2090703 e18a0d58-1c80-469e-8bda-3f0082090630 Shiretoko eea60f39-2c9a-4953-86eb-c610a2090630 Shiretoko fa650c74-553b-425a-a1a0-a66742090702 42fc54ef-72f1-4bea-8276-e72032090701
Latest NoScript does perform async DNS resolution from the UI thread as part of ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe ) and implements a 2nd level cache on top of the native nsIDNSService's one in order to minimize XPCOM and inter-thread dispatching and gain more control over unpinning attacks. During development I found that if I cached nsIDNSRecord objects directly I caused random crashed like the one described here (I suppose for some pointer ownership issue), so I started storing the address strings returned by the query into an ad-hoc JavaScript DNSRecord object for caching purposes, and had no crash at all since then. However this report seems to suggest there's still something broken in what I do. Should I deep-clone the strings (e.g. new String(record.getNextAddrAsString()) ) rather than pushing them directly into a JS array? If you've got NoScript 1.9.5 (or even better 1.9.5.4 from http://noscript.net/getit#devel) installed, the relevant code is all in chrome://noscript/content/DNS.js
Sorry, I actually read the stack now and it doesn't seem an ownership issue but rather a concurrence one. Nothing I can fix on my side, apparently. Also, I wonder why none of my testers (nor I) had a single crash of this kind in one month of beta testing (the ones I got when I tried to cache nsIDNSRecord objects directly were in private preliminary builds) and I still can't see any report on the NoScript forum.
Stupid question related to comment 12 (unexpectedly few reports): might it be Mac OS X specific? I can see Darwin-specific code in prnetdb.c...
If you look at the crash-stats report linked from the URL field, you can see that almost all of the crashes with this signature are indeed on Mac. There's a couple Windows NT 6.0 in there, but not very many.
It does seem that the prnetdb.c does have OSX and Darwin specific ifdefs, the only one starts at line 196, with a possible elif at line 333, but the else is at line 403, and the endif at 416. There's one more DARWIN ifdef on line 2186 which defines function pr_StringToNetAddrFB(), which is called in PR_StringToNetAddr, which I don't see called, but then again, this code is pretty foreign to me, so it is possible that the Darwin specific code related to bug 404399 is to blame. Darwin 8.8.4 was OSX 10.4.4, and while OSX10.5 was released in October 2007 (after the bug was opened), it is possible that it is causing problems in 10.5. I'm guessing that since no popular code used this code recently, it had not reared its head before, as it does now. And given that the related crashes are almost all OSX, it's possible this code needs a new review.
It worked for me without a crash in Windows7, but crashed in OS X
TEMPORARY FIX FOR THIS ISSUE If you want to avoid this bug temporarily, and this bug is due to your use of the NoScript Addon, open NoScript Preferences and Disable ABE. Tools > AddOns > NoScript Preferences > Advanced Tab > ABE Sub-tab > Uncheck Enable ABE According to NoScript developer Giorgio Maone confirmed in Bugzilla that ABE calls the code that seems to be broken in OSX version of Firefox, and that by disabling ABE, you still get the functionality of NoScript, without the crashes. However once this bug is fixed, you should re-enable ABE to have the utmost protection that NoScript can provide.
Is this going to get fixed? We've narrowed it down to a piece of OSX-specific code, probably a bad pointer. What do we do here to get this one fixed to we can use ABE again?
Out of curiosity, what is ABE?
(In reply to comment #20) > Out of curiosity, what is ABE? http://noscript.net/abe
From comment #11: Latest NoScript does perform async DNS resolution from the UI thread as part of ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe ) and implements a 2nd level cache on top of the native nsIDNSService's one in order to minimize XPCOM and inter-thread dispatching and gain more control over unpinning attacks.
I wonder if the changes to class nsDNSService for bug 453403 are in some way causal of these crashes.
It'd be great to get this issue resolved.
Summary: Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr]
Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]
Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]
This has been driving my wife nuts (OSX Firefox user) for weeks now. I finally did some digging and found out about about:crashes, which finally pointed us to the NoScript ABE issue. Would really like to see this addressed, since it seems to be affecting a lot of people (as NoScript is one of the most popular Firefox addons). She has just disabled the ABE feature and we're crossing our fingers that it stops the crashes.
ok, i claim this is noscript's fault: timeless-mbp-2:ns timeless$ grep em:ver '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/install.rdf' <em:version>1.9.9.35</em:version> timeless-mbp-2:ns timeless$ unzip '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/chrome/noscript.jar' timeless-mbp-2:ns timeless$ grep idn-ser content/noscript/DNS.js host = CC["@mozilla.org/network/idn-service;1"].createInstance(CI.nsIIDNService).convertUTF8toACE(host); mao: the idn-service is a service. please use .getService().
Assignee: nobody → g.maone
Component: Networking → Extension Compatibility
Product: Core → Firefox
QA Contact: networking → extension.compatibility
Timeless, Please spell your suggestion out a little more explicitly. You might even spell it out in a form that an adventurous user could try out him/her self by editing his/her own noscript.jar file. If that fixes the problem, then users get immediate relief, you get more fame & glory. :)
I think fixing NoScript to not cause the bug is not as good as fixing the underlying bug. One day, another extension or even some Firefox code might come along that triggers the same bug.
@timeless: You're correct about nsIIDNService being, as the interface name says, a service. However I happened to copy that code (which, BTW, gets called quite rarely) verbatim from MDC. Therefore I'm gonna fix my call, but someone should fix the docs: https://developer.mozilla.org/en/nsIIDNService That said, how is this supposed to fix this very bug?
i'm not sure i want to think about or explain it. i can guess. the code uses locks to protect objects, but if there are two locks each of which should be the *only* lock for something, then when things think they have "locked" the object, they might have just locked a lock which wasn't the same as the other guy's lock for the same object. as for the wiki, wow, that sucks. in general, anyone who creates an account can edit any page on the wiki (including that one). I don't touch wikis. You're welcome to fix it, or i'll see about getting someone to fix it. nelson: i'd just as soon giorgio fix it and send it to everyone. jonathan louie: comment 29 has identified the underlying bug, it's bad documentation. our arrangement with extension authors is this: You have full access to the entire application, you can do anything, you can crash the browser. we expect you to follow certain rules and try not to crash the browser. among the rules are this: a service should be used as a service. we might someday enforce that codewise, however officially the design of xpcom is such that there might be a class for which it's legal to have both a service and multiple instances. Unfortunately, since xpcom is portable, it's possible some downstream vendor has taken advantage of that with their own objects, which means if we change how the service manager works, we're breaking our contract with them. it's unfortunate that sheppy made this mistake in his documentation (or in copying it from elsewhere), but he was basically the only person creating any MDC content and he created a lot, getting everything right is a lot to ask for.
While I agree that fixing the nsIIDNService instantiation is due (and indeed I did it), I doubt it's the actual culprit here, especially since it gets called exclusively for hosts containing characters which are illegal in ASCII domains (i.e. quite rarely so far). After re-digging Gecko's sources I found some more suspicious code: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/nsDNSService2.cpp#184 By "doing the right thing" with the nsIDNSRecord interface (i.e. calling hasMore() before getNextAddrAsString()), NoScript is apparently calling getNextAddr() *twice*, and looking at the stack traces it crashes on the second call. Furthermore, the thread safety of hasMore() seems dubious, so it plausible that it's preparing our crash since NoScript calls the DNS service asynchronously from an "unusual" thread (the UI one). Therefore I eliminated the hasMore() call completely, and released both "fixes" (nsIIDNService and hasMore() elision) in NoScript 1.9.9.36, now available on http://noscript.net/getit#direct Let's cross our fingers and see if crashes go down in a week or so...
Depends on: 716345
There are many startup crashes with this crash signature in 10.0b4. It doesn't seem related to NoScript. More reports at: https://crash-stats.mozilla.com/report/list?signature=PR_EnumerateAddrInfo
Crash Signature: [@ PR_EnumerateAddrInfo] [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr]
Keywords: crash
For crashes in 10.0b4, I filed bug 718389.
(In reply to Scoobidiver (away) from comment #33) > For crashes in 10.0b4, I filed bug 718389. I've closed the above bug report. Does anyone believe this crash still exists? I'm not seeing any on crash-stats
Whiteboard: [closeme 2015-06-01]
Resolved per whiteboard
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2015-06-01]
You need to log in before you can comment on or make changes to this bug.