501446 - Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]

Reporter

Description

•

15 years ago

I've had this same crash 7 times so far in the last 6 or 8 hours, seems to happen at random when the machine is otherwise idle. I have had three or four pages up with an auto-refresh on them. Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090620 Firefox/3.5pre Glubble/2.0.4.6 ID:20090630031219 ChatZilla 0.9.85 DownloadHelper 4.5 DownThemAll! 1.1.4 Firebug 1.4.0b3 Firefox PDF Plugin for Mac OS X 1.1 Glubble 2.0.4.6 Greasemonkey 0.8.20090123.1 Japanese-English Dictionary for rikaichan 1.10 Live HTTP headers 0.15 Mass Password Reset 1.04 Modify Headers 0.6.6 Nagios Checker 0.14.4 Names Dictionary for rikaichan 1.10 Nightly Tester Tools 2.0.2 NoScript 1.9.5 Personas for Firefox 1.2.1 Rikaichan 1.06 Update Channel Selector 1.5 User Agent Switcher 0.7.1 Web Developer 1.1.7

Dave Miller [:justdave]

Reporter

Comment 1

•

15 years ago

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1pre) Gecko/20090630 Shiretoko/3.5.1pre Glubble/2.0.4.6 ^^ My correct UserAgent... I had User-Agent Switcher active because of a Facebook sniffing bug, and NTT believed it when I told it to paste. :)

Nelson Bolyard (seldom reads bugmail)

Comment 2

•

15 years ago

This is bug 337418 reincarnated.

Dave Miller [:justdave]

Reporter

Comment 3

•

15 years ago

OK, given the information in bug 337418 and the additional debugging info requested there, I'm suspecting it's a site with automatic-update/refresh usage since it happens at random without me actually touching the browser, and I have several tabs open with auto-refresh stuff (keeping an eye on release activity, etc). Tab 1: http://www.facebook.com/home.php - does ajax requests to poll for new status updates and notifications www.facebook.com has address 69.63.184.30 Tab 2: https://nagios.mozilla.org/sentry/ - has a Refresh: 150 on it. nagios.mozilla.org is an alias for dm-nagios01.mozilla.org. dm-nagios01.mozilla.org has address 63.245.208.170 Tab 3: https://nagios.mozilla.org/ftplag/ - has a Refresh: 60 on it. nagios.mozilla.org is an alias for dm-nagios01.mozilla.org. dm-nagios01.mozilla.org has address 63.245.208.170 Tab 4: http://people.mozilla.org/~nthomas/misc/uptake.html - has <meta http-equiv="refresh" content="180"> people.mozilla.org is an alias for people.mozilla.com. people.mozilla.com has address 63.245.208.169 Tab 5: http://downloadstats.mozilla.com/ - I assume it's using ajax of some sort to poll for the stats data. downloadstats.mozilla.com is an alias for star-moz-com-php5.nslb.sj.mozilla.com. star-moz-com-php5.nslb.sj.mozilla.com has address 63.245.209.77 Of course, the ones doing ajaxy stuff could be using other host names for where to pull data from.

Nelson Bolyard (seldom reads bugmail)

Comment 4

•

15 years ago

What product/component owns class nsDNSRecord ? IMO, the problem probably lies there, not in NSS. Perhaps we should add members of the cc list for bug 337418 to this bug's list.

timeless

Comment 5

•

15 years ago

PR_EnumerateAddrInfo nsprpub/pr/src/misc/prnetdb.c:2095 nsDNSRecord::GetNextAddr netwerk/dns/src/nsDNSService2.cpp:134 nsDNSRecord::GetNextAddrAsString netwerk/dns/src/nsDNSService2.cpp:166 NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179 XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:2454 XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1590 js_Invoke js/src/jsinterp.cpp:1386 js_Interpret js/src/jsinterp.cpp:5179 justdave: are you using PAC?

Assignee: wtc → nobody

Severity: major → critical

Component: NSPR → Networking

Product: NSPR → Core

QA Contact: nspr → networking

Summary: Crash [@ PR_EnumerateAddrInfo ] at random times → Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times

Version: other → Trunk

Dave Miller [:justdave]

Reporter

Comment 6

•

15 years ago

(In reply to comment #5) > justdave: are you using PAC? What's PAC?

timeless

Comment 7

•

15 years ago

http://www-archive.mozilla.org/catalog/end-user/customizing/enduserPAC.html

Dave Miller [:justdave]

Reporter

Comment 8

•

15 years ago

Nope, not using that.

Peter Beckman

Comment 9

•

15 years ago

Looks like NoScript is to blame. While running NoScript (no changes to base install), I loaded about 12 tabs, most to support.mozilla.com, along with three other tabs: Facebook, Nagios (open source monitoring tool, nagios.org) and OTRS (open source trouble ticket system, otrs.org). The common thread between OTRS and Nagios is meta refreshes every so often (90 seconds for Nagios, 2 minutes for OTRS. Additionally, Facebook does some triggered JS, which checks back to FB servers to see if there are any new status updates. I think it also interacts with their chat servers. I did not have this crash before 6/30/09, and I was either up to date or a day or two behind Shiretoko Nightlies. I'm fairly certain that whatever updates were made to Shiretoko and thus FF 3.5 between 6/27/09 and release is causing this issue. CrashID for the crash with only NoScript running: 41724f2c-19f4-47da-ab54-785202090703 7/3/09 5:01 PM

Peter Beckman

Comment 10

•

15 years ago

NoScript 1.9.5 was released on or around June 29, 2009. Since NoScript was the only AddOn installed, and the same tabs hadn't crashed Firefox with NO AddOns installed, I'm wondering if the combination is suspect. I likely updated NoScript on June 29 or 30. Maybe I'll install a previous version and see if that still causes the problems. I also wonder if it is NoScript to blame, since between the 26th and 30th a bunch of non-3.5 browsers had the same issue. Other related crashes of mine that show up in PR_EnumerateAddrInfo crash search: 3073a03a-093a-4a83-a7ba-2ba8e2090703 e18a0d58-1c80-469e-8bda-3f0082090630 Shiretoko eea60f39-2c9a-4953-86eb-c610a2090630 Shiretoko fa650c74-553b-425a-a1a0-a66742090702 42fc54ef-72f1-4bea-8276-e72032090701

Giorgio Maone [:ma1]

Assignee

Comment 11

•

15 years ago

Latest NoScript does perform async DNS resolution from the UI thread as part of ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe ) and implements a 2nd level cache on top of the native nsIDNSService's one in order to minimize XPCOM and inter-thread dispatching and gain more control over unpinning attacks. During development I found that if I cached nsIDNSRecord objects directly I caused random crashed like the one described here (I suppose for some pointer ownership issue), so I started storing the address strings returned by the query into an ad-hoc JavaScript DNSRecord object for caching purposes, and had no crash at all since then. However this report seems to suggest there's still something broken in what I do. Should I deep-clone the strings (e.g. new String(record.getNextAddrAsString()) ) rather than pushing them directly into a JS array? If you've got NoScript 1.9.5 (or even better 1.9.5.4 from http://noscript.net/getit#devel) installed, the relevant code is all in chrome://noscript/content/DNS.js

Giorgio Maone [:ma1]

Assignee

Comment 12

•

15 years ago

Sorry, I actually read the stack now and it doesn't seem an ownership issue but rather a concurrence one. Nothing I can fix on my side, apparently. Also, I wonder why none of my testers (nor I) had a single crash of this kind in one month of beta testing (the ones I got when I tried to cache nsIDNSRecord objects directly were in private preliminary builds) and I still can't see any report on the NoScript forum.

Giorgio Maone [:ma1]

Assignee

Comment 13

•

15 years ago

Stupid question related to comment 12 (unexpectedly few reports): might it be Mac OS X specific? I can see Darwin-specific code in prnetdb.c...

Dave Miller [:justdave]

Reporter

Comment 14

•

15 years ago

If you look at the crash-stats report linked from the URL field, you can see that almost all of the crashes with this signature are indeed on Mac. There's a couple Windows NT 6.0 in there, but not very many.

Peter Beckman

Comment 15

•

15 years ago

It does seem that the prnetdb.c does have OSX and Darwin specific ifdefs, the only one starts at line 196, with a possible elif at line 333, but the else is at line 403, and the endif at 416. There's one more DARWIN ifdef on line 2186 which defines function pr_StringToNetAddrFB(), which is called in PR_StringToNetAddr, which I don't see called, but then again, this code is pretty foreign to me, so it is possible that the Darwin specific code related to bug 404399 is to blame. Darwin 8.8.4 was OSX 10.4.4, and while OSX10.5 was released in October 2007 (after the bug was opened), it is possible that it is causing problems in 10.5. I'm guessing that since no popular code used this code recently, it had not reared its head before, as it does now. And given that the related crashes are almost all OSX, it's possible this code needs a new review.

James Rome

Comment 17

•

15 years ago

It worked for me without a crash in Windows7, but crashed in OS X

Peter Beckman

Comment 18

•

15 years ago

TEMPORARY FIX FOR THIS ISSUE If you want to avoid this bug temporarily, and this bug is due to your use of the NoScript Addon, open NoScript Preferences and Disable ABE. Tools > AddOns > NoScript Preferences > Advanced Tab > ABE Sub-tab > Uncheck Enable ABE According to NoScript developer Giorgio Maone confirmed in Bugzilla that ABE calls the code that seems to be broken in OSX version of Firefox, and that by disabling ABE, you still get the functionality of NoScript, without the crashes. However once this bug is fixed, you should re-enable ABE to have the utmost protection that NoScript can provide.

Peter Beckman

Comment 19

•

15 years ago

Is this going to get fixed? We've narrowed it down to a piece of OSX-specific code, probably a bad pointer. What do we do here to get this one fixed to we can use ABE again?

Nelson Bolyard (seldom reads bugmail)

Comment 20

•

15 years ago

Out of curiosity, what is ABE?

Giorgio Maone [:ma1]

Assignee

Comment 21

•

15 years ago

(In reply to comment #20) > Out of curiosity, what is ABE? http://noscript.net/abe

Peter Beckman

Comment 22

•

15 years ago

From comment #11: Latest NoScript does perform async DNS resolution from the UI thread as part of ABE's protection against internet-to-intranet CSRF (see http://noscript.net/abe ) and implements a 2nd level cache on top of the native nsIDNSService's one in order to minimize XPCOM and inter-thread dispatching and gain more control over unpinning attacks.

Nelson Bolyard (seldom reads bugmail)

Comment 23

•

15 years ago

I wonder if the changes to class nsDNSService for bug 453403 are in some way causal of these crashes.

Peter Beckman

Comment 24

•

15 years ago

It'd be great to get this issue resolved.

Jesse Ruderman

Updated

•

15 years ago

Summary: Crash [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] at random times → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr]

timeless

Updated

•

15 years ago

Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]

timeless

Updated

•

15 years ago

Summary: Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo - nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ] → Crash with NoScript and ABE enabled [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr][@ PR_EnumerateAddrInfo ]

benshadwick

Comment 25

•

15 years ago

This has been driving my wife nuts (OSX Firefox user) for weeks now. I finally did some digging and found out about about:crashes, which finally pointed us to the NoScript ABE issue. Would really like to see this addressed, since it seems to be affecting a lot of people (as NoScript is one of the most popular Firefox addons). She has just disabled the ABE feature and we're crossing our fingers that it stops the crashes.

timeless

Comment 26

•

15 years ago

ok, i claim this is noscript's fault: timeless-mbp-2:ns timeless$ grep em:ver '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/install.rdf' <em:version>1.9.9.35</em:version> timeless-mbp-2:ns timeless$ unzip '/Users/timeless/Library/Application Support/Firefox/Profiles/*.default/extensions/{73a6fe31-595d-460b-a920-fcc0f8843232}/chrome/noscript.jar' timeless-mbp-2:ns timeless$ grep idn-ser content/noscript/DNS.js host = CC["@mozilla.org/network/idn-service;1"].createInstance(CI.nsIIDNService).convertUTF8toACE(host); mao: the idn-service is a service. please use .getService().

Assignee: nobody → g.maone

Component: Networking → Extension Compatibility

Product: Core → Firefox

QA Contact: networking → extension.compatibility

Nelson Bolyard (seldom reads bugmail)

Comment 27

•

15 years ago

Timeless, Please spell your suggestion out a little more explicitly. You might even spell it out in a form that an adventurous user could try out him/her self by editing his/her own noscript.jar file. If that fixes the problem, then users get immediate relief, you get more fame & glory. :)

Jonathan Louie

Comment 28

•

15 years ago

I think fixing NoScript to not cause the bug is not as good as fixing the underlying bug. One day, another extension or even some Firefox code might come along that triggers the same bug.

Giorgio Maone [:ma1]

Assignee

Comment 29

•

15 years ago

@timeless: You're correct about nsIIDNService being, as the interface name says, a service. However I happened to copy that code (which, BTW, gets called quite rarely) verbatim from MDC. Therefore I'm gonna fix my call, but someone should fix the docs: https://developer.mozilla.org/en/nsIIDNService That said, how is this supposed to fix this very bug?

timeless

Comment 30

•

15 years ago

i'm not sure i want to think about or explain it. i can guess. the code uses locks to protect objects, but if there are two locks each of which should be the *only* lock for something, then when things think they have "locked" the object, they might have just locked a lock which wasn't the same as the other guy's lock for the same object. as for the wiki, wow, that sucks. in general, anyone who creates an account can edit any page on the wiki (including that one). I don't touch wikis. You're welcome to fix it, or i'll see about getting someone to fix it. nelson: i'd just as soon giorgio fix it and send it to everyone. jonathan louie: comment 29 has identified the underlying bug, it's bad documentation. our arrangement with extension authors is this: You have full access to the entire application, you can do anything, you can crash the browser. we expect you to follow certain rules and try not to crash the browser. among the rules are this: a service should be used as a service. we might someday enforce that codewise, however officially the design of xpcom is such that there might be a class for which it's legal to have both a service and multiple instances. Unfortunately, since xpcom is portable, it's possible some downstream vendor has taken advantage of that with their own objects, which means if we change how the service manager works, we're breaking our contract with them. it's unfortunate that sheppy made this mistake in his documentation (or in copying it from elsewhere), but he was basically the only person creating any MDC content and he created a lot, getting everything right is a lot to ask for.

Giorgio Maone [:ma1]

Assignee

Comment 31

•

15 years ago

While I agree that fixing the nsIIDNService instantiation is due (and indeed I did it), I doubt it's the actual culprit here, especially since it gets called exclusively for hosts containing characters which are illegal in ASCII domains (i.e. quite rarely so far). After re-digging Gecko's sources I found some more suspicious code: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/nsDNSService2.cpp#184 By "doing the right thing" with the nsIDNSRecord interface (i.e. calling hasMore() before getNextAddrAsString()), NoScript is apparently calling getNextAddr() *twice*, and looking at the stack traces it crashes on the second call. Furthermore, the thread safety of hasMore() seems dubious, so it plausible that it's preparing our crash since NoScript calls the DNS service asynchronously from an "unusual" thread (the UI one). Therefore I eliminated the hasMore() call completely, and released both "fixes" (nsIIDNService and hasMore() elision) in NoScript 1.9.9.36, now available on http://noscript.net/getit#direct Let's cross our fingers and see if crashes go down in a week or so...

Scoobidiver (away)

Updated

•

13 years ago

Depends on: 716345

Scoobidiver (away)

Comment 32

•

13 years ago

There are many startup crashes with this crash signature in 10.0b4. It doesn't seem related to NoScript. More reports at: https://crash-stats.mozilla.com/report/list?signature=PR_EnumerateAddrInfo

Crash Signature: [@ PR_EnumerateAddrInfo] [@ PR_EnumerateAddrInfo | nsDNSRecord::GetNextAddr]

Keywords: crash

Scoobidiver (away)

Comment 33

•

13 years ago

For crashes in 10.0b4, I filed bug 718389.

Wayne Mery (:wsmwk)

Comment 34

•

10 years ago

(In reply to Scoobidiver (away) from comment #33) > For crashes in 10.0b4, I filed bug 718389. I've closed the above bug report. Does anyone believe this crash still exists? I'm not seeing any on crash-stats

Whiteboard: [closeme 2015-06-01]

Phoenix

Comment 35

•

10 years ago

Resolved per whiteboard

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → INCOMPLETE

Whiteboard: [closeme 2015-06-01]