Closed Bug 995417 Opened 10 years ago Closed 10 years ago

Prevent our test suites from resolving non-loopback names

Categories

(Core :: Networking: DNS, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox31 --- fixed
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- wontfix
b2g-v1.3 --- wontfix
b2g-v1.3T --- wontfix
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: froydnj)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

Bug 995343 attempts to protect against a subset of the cases where our test suite contacts the external network.

A more complete fix would be to modify our DNS resolver code somewhere around <http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsHostResolver.cpp#1161> to check to see if the name that it has resolved is local or not, and if not, MOZ_CRASH or something like that.  We can use either a pref or an environment variable which is set by our test suites to determine whether to skip this check.

This will once and for all put an end to bug 617414.
And by local I mean loopback!
Summary: Prevent out test suites from resolving non-local names → Prevent out test suites from resolving non-loopback names
Summary: Prevent out test suites from resolving non-loopback names → Prevent our test suites from resolving non-loopback names
I did a quick and dirty try run with this idea to see how bad things would be... the version I did crashed on non-local connect instead of on dns.

https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542

other than marionette (which had a hard time), just 9 or 10 fails. a couple of which I know off the top of my head are false positives (necko special testing rfc 1918 rules) and can be worked around in the test. Seems triageable.
Depends on: 995599
Depends on: 995806
Depends on: 995995
(In reply to Patrick McManus [:mcmanus] from comment #2)
> I did a quick and dirty try run with this idea to see how bad things would
> be... the version I did crashed on non-local connect instead of on dns.
> 
> https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542

Is it reasonable to crash on non-local connect and to warn people appropriately in dns (NS_WARNING?) that something resolved to a non-local name?  Just crashing seems relatively unfriendly; the warning might help people figure out what's going on faster.
Flags: needinfo?(mcmanus)
(In reply to Nathan Froyd (:froydnj) from comment #3)
> (In reply to Patrick McManus [:mcmanus] from comment #2)
> > I did a quick and dirty try run with this idea to see how bad things would
> > be... the version I did crashed on non-local connect instead of on dns.
> > 
> > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
> 
> Is it reasonable to crash on non-local connect and to warn people
> appropriately in dns (NS_WARNING?) that something resolved to a non-local
> name?  Just crashing seems relatively unfriendly; the warning might help
> people figure out what's going on faster.

maybe! I don't like putting any crash path in a opt build that doesn't need to crash. Though I'm actually not inclined to use DNS at all because the parser looks up a lot of domain names for cache warming purposes and disabling that takes us farther from testing what we ship.

right now I'm just triaging the fails and trying to help diagnose them.. We can figure out what the enforcement patch ought to look like.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #4)
> (In reply to Nathan Froyd (:froydnj) from comment #3)
> > (In reply to Patrick McManus [:mcmanus] from comment #2)
> > > I did a quick and dirty try run with this idea to see how bad things would
> > > be... the version I did crashed on non-local connect instead of on dns.
> > > 
> > > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
> > 
> > Is it reasonable to crash on non-local connect and to warn people
> > appropriately in dns (NS_WARNING?) that something resolved to a non-local
> > name?  Just crashing seems relatively unfriendly; the warning might help
> > people figure out what's going on faster.
> 
> maybe! I don't like putting any crash path in a opt build that doesn't need
> to crash.

I should have been clearer: that crash path would only be if a MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set.  Or would you be opposed to that too?
Depends on: 996009
Depends on: 996019
Depends on: 996031
(In reply to Nathan Froyd (:froydnj) from comment #5)
> (In reply to Patrick McManus [:mcmanus] from comment #4)
> > (In reply to Nathan Froyd (:froydnj) from comment #3)
> > > (In reply to Patrick McManus [:mcmanus] from comment #2)
> > > > I did a quick and dirty try run with this idea to see how bad things would
> > > > be... the version I did crashed on non-local connect instead of on dns.
> > > > 
> > > > https://tbpl.mozilla.org/?tree=Try&rev=acd81b7ac542
> > > 
> > > Is it reasonable to crash on non-local connect and to warn people
> > > appropriately in dns (NS_WARNING?) that something resolved to a non-local
> > > name?  Just crashing seems relatively unfriendly; the warning might help
> > > people figure out what's going on faster.
> > 
> > maybe! I don't like putting any crash path in a opt build that doesn't need
> > to crash.
> 
> I should have been clearer: that crash path would only be if a
> MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set.  Or
> would you be opposed to that too?

I think this is ultimately what we would want to do, once we've fixed all of the existing cases where we attempt to connect to a non-local address in our tests.  Note that the NS_WARNING will only work in debug builds though, but hopefully all future failures will hit both debug and non-debug builds alike.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #6)
> > I should have been clearer: that crash path would only be if a
> > MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set.  Or
> > would you be opposed to that too?
> 
> I think this is ultimately what we would want to do, once we've fixed all of
> the existing cases where we attempt to connect to a non-local address in our
> tests.  Note that the NS_WARNING will only work in debug builds though,

Indeed.  Ideally people are testing debug builds on try or locally and would see this warning pointing out the problem, rather than trying to figure out what exact thing the socket transport service is grousing about.
(In reply to comment #7)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #6)
> > > I should have been clearer: that crash path would only be if a
> > > MOZ_CRASH_ON_NONLOCAL_HOSTS environment variable or somesuch were set.  Or
> > > would you be opposed to that too?
> > 
> > I think this is ultimately what we would want to do, once we've fixed all of
> > the existing cases where we attempt to connect to a non-local address in our
> > tests.  Note that the NS_WARNING will only work in debug builds though,
> 
> Indeed.  Ideally people are testing debug builds on try or locally and would
> see this warning pointing out the problem, rather than trying to figure out
> what exact thing the socket transport service is grousing about.

We could also make the MOZ_CRASH string argument very descriptive, get it to point to an MDN article, etc. as needed.
Blocks: 996504
This patch is the patch I've been using locally to try to triage non-local
connection bugs.  In conjunction with patch 2 (coming soon), this seems to DTRT
for mochitest runs.
...and here's the test bits.  I believe there's actually a third place we setup
environment variables, but maybe that's been changed nowadays...
This patch series induces all sorts of interesting oranges, mostly to do with IPC:

https://tbpl.mozilla.org/?tree=Try&rev=49e9c7a82979

Running most of the crashing tests standalone locally reveals no problems, although they do crash when run in directory batches.  Some subtle race in the networking code?
Depends on: 997188
Depends on: 997311
Depends on: 997341
Depends on: 997820
Depends on: 998298
Depends on: 998302
Depends on: 998303
Depends on: 999397
Depends on: 999518
OS: Mac OS X → All
Hardware: x86 → All
Depends on: 1009042
Depends on: 1010322
I am seeing test_speculative_connect.js xpcshell failures locally:

 7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 204] 1 == 1
 7:48.00 BAD CONNECT: connecting to 10.0.0.1 0
 7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 207] "object" != undefined
 7:48.00 TEST-PASS | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | [test_hostnames_resolving_to_addresses : 212] "object" != undefined
 7:48.00 <<<<<<<
PROCESS-CRASH | /opt/build/froydnj/build-mc/_tests/xpcshell/netwerk/test/unit/test_speculative_connect.js | application crashed [Unknown top frame]

Happens on try, too:

https://tbpl.mozilla.org/php/getParsedLog.php?id=39672912&tree=Try

Should we be checking for IsIPAddrLocal in addition to IsLoopBackAddress:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/DNS.cpp#181

?

I'm also seeing strange failures on try on OSX:

https://tbpl.mozilla.org/php/getParsedLog.php?id=39672348&tree=Try

I think this is because our Macs don't properly resolve localhost (!), which seems to be a surprisingly common problem (the first link is actually the most interesting):

https://discussions.apple.com/thread/2613868
http://apple.stackexchange.com/questions/99996/which-setting-in-osx-could-block-ping-localhost

Not really sure what to do about that one.  Patrick, can you shed some networking expertise on these two problems?
Flags: needinfo?(mcmanus)
(In reply to Nathan Froyd (:froydnj) from comment #12)
> I am seeing test_speculative_connect.js xpcshell failures locally:
> 

I think that test does a bunch of off host things well beyond the rfc 1918 stuff (that we should keep around). can you file a separate bug about it, assign to :sworkman and cc me?

> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=39672348&tree=Try
> 
> I think this is because our Macs don't properly resolve localhost (!),

I'm not familiar with that. where are they trying to connect to? (I would assume it would just fail and that shouldn't trigger the new assert code, right?)
Flags: needinfo?(mcmanus)
Depends on: 1011503
Small update.
Attachment #8407076 - Attachment is obsolete: true
Also set the appropriate environment variable for xpcshell.
Attachment #8407077 - Attachment is obsolete: true
Depends on: 1018400
Depends on: 1018414
Depends on: 1022785
Depends on: 1022798
Depends on: 1022801
Depends on: 1023483
All right, we are close enough to having all the appropriate dependent bugs
fixed that we can think about reviewing these patches and landing them in short
order.  First up, the networking changes.

Bikeshedding on the message welcome.
Attachment #8423866 - Attachment is obsolete: true
Attachment #8437909 - Flags: review?(mcmanus)
Now for the umpteen different places to set the appropriate environment
variable so things will crash when we connect to external hosts.
Attachment #8423867 - Attachment is obsolete: true
Attachment #8437910 - Flags: review?(jmaher)
Depends on: 1023638
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS

Review of attachment 8437910 [details] [diff] [review]:
-----------------------------------------------------------------

remoteautomation.py/automation.py.i - for android tests which use this we need to reference the webserver on the host machine, this is by default non localhost.  We also have b2g tests as well to consider.  Should we be doing this for talos and reftest?
Attachment #8437910 - Flags: review?(jmaher) → review-
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS

(In reply to Joel Maher (:jmaher) from comment #18)
> remoteautomation.py/automation.py.i - for android tests which use this we
> need to reference the webserver on the host machine, this is by default non
> localhost.  We also have b2g tests as well to consider.  Should we be doing
> this for talos and reftest?

reftests pick up MOZ_DISABLE_NONLOCAL_CONNECTIONS from one of these files (I forget which one offhand).  Talos should do this eventually, too.

We don't need to worry about remote tests needing to access the host machines because the networking changes actually check whether we're talking to localhost or something on the local network:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/DNS.cpp#181

and the host machines have appropriate IP addresses in this case.  (I ran into this very problem initially and almost opted not to set MOZ_DISABLE_NONLOCAL_CONNECTIONS for remote tests, but finally realized that I needed to allow local network connections, not just loopback ones.)

Example try run with these patches and a few miscellaneous ones from not-yet-fixed bugs:

https://tbpl.mozilla.org/?tree=Try&rev=82ef10561051

You can see the green for Android and B2G tests.

So, with that explanation at hand, re-r?'ing
Attachment #8437910 - Flags: review- → review?(jmaher)
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS

Review of attachment 8437910 [details] [diff] [review]:
-----------------------------------------------------------------

thanks for the context here.  The try run shows xpcshell failures on B2G, not sure if that is related to this patch or not.
Attachment #8437910 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #20)
> thanks for the context here.  The try run shows xpcshell failures on B2G,
> not sure if that is related to this patch or not.

Thanks for the review, my fault for not explaining things fully the first time around.  The B2G xpcshell failures are from bug 1023638, which just landed on inbound this morning.
Comment on attachment 8437910 [details] [diff] [review]
part 2 - testing infrastructure changes to set MOZ_DISABLE_NONLOCAL_CONNECTIONS

Thank you for your work on this bug!

After this initial patch lands (I think it's best to land this first before anything regresses), it would be great if we could add the define to more of the locations listed in bug 1023483 comment 3 - since aiui cppunit tests, jetpack etc won't have it enabled, unless they happen to inherit from automation.py/automationutils.py.

But work for another bug :-)
Depends on: 1024585
Depends on: 1024588
No longer depends on: 998298
Comment on attachment 8437909 [details] [diff] [review]
part 1 - netwerk/ changes for crashing on non-local connections

Review of attachment 8437909 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsSocketTransport2.cpp
@@ +1190,5 @@
> +                           "Non-local network connections are disabled and a connection "
> +                           "attempt to %s was made.  You should only access hostnames "
> +                           "available via the test networking proxy (if running mochitests) "
> +                           "or from a test-specific httpd.js server (if running xpcshell tests).",
> +                           mHost.get());

thanks.. lets print out the address too - we've already been bitten by different resolvers giving different answers (i.e. does-not-exist vs catchall). mNetAddr.GetAddress() iirc.

r+ with that
Attachment #8437909 - Flags: review?(mcmanus) → review+
Depends on: 1025959
https://hg.mozilla.org/integration/mozilla-inbound/rev/a3bfa5ab131e
https://hg.mozilla.org/integration/mozilla-inbound/rev/51342b493983

Landed with small changes to the error message wording and the addition of the IP address thus resolved.  Try runs:

https://tbpl.mozilla.org/?tree=Try&rev=c4a2cc9381ef (almost there except for Android M5)
https://tbpl.mozilla.org/?tree=Try&rev=911517506d3a (with current-this-morning m-c, looking green enough that I'm going to cancel the rest of the runs)

I'll make a post to dev-platform.
Flags: in-testsuite+
Assignee: nobody → nfroyd
I've mentioned the need for MOZ_DISABLE_NONLOCAL_CONNECTIONS to be defined in new harnesses, here:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#8.29_Must_avoid_patterns_known_to_cause_non_deterministic_failures

I'll file bugs for defining it in the other places mentioned in bug 1023483 comment 3 that don't overlap with those already set, so we get coverage in things like Jetpack too.
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/a3bfa5ab131e
https://hg.mozilla.org/mozilla-central/rev/51342b493983
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Depends on: 1026958
Depends on: 1026987
Depends on: 1027004
Depends on: 1027125
Depends on: 1027287
Actually, b2g30 is going to outlive esr24, so we should probably aim to get it on there as well.
https://tbpl.mozilla.org/?tree=Try&rev=90c9016d084e
> Aurora - https://tbpl.mozilla.org/?tree=Try&rev=63c92347e1ae

Looks green to me (amongst the jobs I'd expect to actually be so on Aurora). I'll aim to uplift this there tomorrow morning.

> Beta -   https://tbpl.mozilla.org/?tree=Try&rev=11e59c657ccc

Only one relevant failure in xpcshell:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42378982&tree=Try

This appears to be bug 995806, which I had mistakenly marked Gecko 31 as unaffected at the time. However, the patch in bug 995806 is going to need rebasing for that branch, so I've pinged maxim in the bug.

> Actually, b2g30 is going to outlive esr24, so we should probably aim to get
> it on there as well.
> https://tbpl.mozilla.org/?tree=Try&rev=90c9016d084e

This is failing in test_accounts.js due to trying to hit api.accounts.firefox.com:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42378299&tree=Try

It appears that's bug 995599, but that was uplifted to b2g30. I've pinged rnewman in the bug for help since markh is away until August.

It's also hitting Webapp update checks on B2G desktop mochitests:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42383005&tree=Try

I'll try backporting bug 1025959 to fix that. Not sure why I marked the B2G branches as unaffected at the time.
Beta looks green with bug 995806 landed. I'll push these patches there soon.

b2g30 is still hitting B2G desktop mochitest failures, however:
https://tbpl.mozilla.org/php/getParsedLog.php?id=42573795&tree=Try

17:07:05     INFO -  *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
17:07:05     INFO -  Non-local network connections are disabled and a connection attempt to inapp-pay-test.paas.allizom.org (63.245.215.73) was made.  You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server.

Looks like the webapp updates need to be disabled harder?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #31)
> 17:07:05     INFO -  *** UTM:SVC TimerManager:notify - notified
> @mozilla.org/b2g/webapps-update-timer;1
> 17:07:05     INFO -  Non-local network connections are disabled and a
> connection attempt to inapp-pay-test.paas.allizom.org (63.245.215.73) was
> made.  You should only access hostnames available via the test networking
> proxy (if running mochitests) or from a test-specific httpd.js server (if
> running xpcshell tests). Browser services should be disabled or redirected
> to a local server.

Noting that other logs linked in this bug also show marketplace.allizom.org connection attempts, the most relevant hit I can find on this appears to be in build/test/integration/helper.js:
https://github.com/mozilla-b2g/gaia/blob/v1.4/build/test/integration/helper.js#L46

Conveniently, bug 983573 massively refactored that code and landed on 2.0+, which would explain why this is only hitting v1.4. Unfortunately, that patch doesn't have a chance of being uplifted to v1.4. So I think we're looking for a way to pref-off or disable our way to victory on this one.

George, does my theory make any sense? Any ideas for what I might be able to do to avoid hitting the network from this code during mochitest runs?
Flags: needinfo?(gduan)
Note that webapp update checks are supposed to be disabled via:
http://mxr.mozilla.org/mozilla-b2g30_v1_4/source/testing/profiles/prefs_general.js#176
Bug 984274 disables webapp updates harder and apparently does the trick! Will land this on b2g30 once that patch is ready for landing & uplift.
https://tbpl.mozilla.org/?tree=Try&rev=cce952525b28
Flags: needinfo?(gduan)
Argh, bug 984274 caused a couple B2G mochitests to perma-fail, so I had to back it and this out. :(
Depends on: 1036028
Depends on: 1038871
Depends on: 1046863
Depends on: 1049688
Depends on: 1054650
Depends on: 1056769
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: