Closed Bug 1579424 Opened 5 years ago Closed 5 years ago

network ID: detect on MacOS when VPN overrides default gateway

Categories

(Core :: Networking, enhancement, P1)

Unspecified
macOS
enhancement

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: michal, Assigned: CuveeHsu)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(5 files, 1 obsolete file)

Bug 1567616 fixed this on Linux. We should do something similar on MacOS too.

Assignee: nobody → kershaw

I did some study of how VPN works. It seems that NordVPN adds two routes in my route table.

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
0/1                10.8.2.1           UGSc          117        0   utun4
128.0/1            10.8.2.1           UGSc            1        0   utun4

This forces all trafic to be routed to VPN instead of the default gateway.
The gateway 10.8.2.1 has no MAC address, so our current implementation does not work.
I was wondering maybe we should use the gateway's IP instead of MAC address to calculate network id.

Michal, could you remind me again why do we want to use gateway's MAC address?
Can we just use gateway's IP address?

Flags: needinfo?(michal.novotny)
Priority: P2 → P1

(In reply to Kershaw Chang [:kershaw] from comment #1)

I did some study of how VPN works. It seems that NordVPN adds two routes in my route table.

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
0/1                10.8.2.1           UGSc          117        0   utun4
128.0/1            10.8.2.1           UGSc            1        0   utun4

This forces all trafic to be routed to VPN instead of the default gateway.

This is just one possibility how to override default GW. You shouldn't try to interpret all routes, instead get a route for some predefined host like we do in NetlinkService:
https://searchfox.org/mozilla-central/rev/f372e8a46ef7659ef61be9938ec2a3ea34d343c6/netwerk/system/netlink/NetlinkService.cpp#512

The gateway 10.8.2.1 has no MAC address, so our current implementation does not work.
I was wondering maybe we should use the gateway's IP instead of MAC address to calculate network id.

Michal, could you remind me again why do we want to use gateway's MAC address?
Can we just use gateway's IP address?

MAC addresses are available only in case of TAP devices which use ethernet frames. We should use the best information available, i.e. MAC address for TAP devices, IP of GW or network address if GW is not specified for TUN devices.

Flags: needinfo?(michal.novotny)

This is just one possibility how to override default GW. You shouldn't try to interpret all routes, instead get a route for some predefined host like we do in NetlinkService:
https://searchfox.org/mozilla-central/rev/f372e8a46ef7659ef61be9938ec2a3ea34d343c6/netwerk/system/netlink/NetlinkService.cpp#512

Unfortunately, NetLink is not available on MacOS.
It seems that on MacOS the first entry is always the default gateway or the route that has the longest matching prefix (In VPN case, the destination is 0/1). I think we can just use the first entry of route table without iterating all routes.

Michal, could you remind me again why do we want to use gateway's MAC address?
Can we just use gateway's IP address?

MAC addresses are available only in case of TAP devices which use ethernet frames. We should use the best information available, i.e. MAC address for TAP devices, IP of GW or network address if GW is not specified for TUN devices.

I also think we should do this.
So, first we take the IP address of the route table's first entry and check if the IP address can be found in ARP table. If not, we just use the IP address to calculate network id.

(In reply to Kershaw Chang [:kershaw] from comment #3)

This is just one possibility how to override default GW. You shouldn't try to interpret all routes, instead get a route for some predefined host like we do in NetlinkService:
https://searchfox.org/mozilla-central/rev/f372e8a46ef7659ef61be9938ec2a3ea34d343c6/netwerk/system/netlink/NetlinkService.cpp#512

Unfortunately, NetLink is not available on MacOS.

I know, but there must by some other mechanism with similar functionality.

I also think we should do this.
So, first we take the IP address of the route table's first entry and check if the IP address can be found in ARP table. If not, we just use the IP address to calculate network id.

This probably won't work correctly just after establishing VPN connection using TAP device. The check might be done before or after the MAC address is discovered and the network ID would vary.

Attached file Bug 1579424 - WIP (obsolete) (deleted) —

(In reply to Michal Novotny [:michal] from comment #4)

(In reply to Kershaw Chang [:kershaw] from comment #3)

I know, but there must by some other mechanism with similar functionality.

It seems we can try to use nke, but I don't have time to study it right now.

I just uploaded a draft patch. It uses the network interface's name of gateway and the interface's IP and MAC address to hash the network id.
This is not that robust, but it's able to identify some common VPN networks for now.
Sine I am OOO next week, Junior will help to complete this bug.

I also think we should do this.
So, first we take the IP address of the route table's first entry and check if the IP address can be found in ARP table. If not, we just use the IP address to calculate network id.

This probably won't work correctly just after establishing VPN connection using TAP device. The check might be done before or after the MAC address is discovered and the network ID would vary.

I've tried some VPN software, but I can't find one that uses TAP. Do you know how to test this?

Assignee: kershaw → juhsu

I've tried some VPN software, but I can't find one that uses TAP. Do you know how to test this?

FWIW, mozilla VPN with viscosity support tap, which adds rows with network interface vtap0

Destination        Gateway            Flags        Refs      Use   Netif Expire
default            192.168.0.1        UGSc          139        0     en0       
default            10.48.240.1        UGScI           1        0   vtap0       

However, as comment 2 said, each VPN client has its implementation. (At least my tun is different from kershaw's)

(In reply to Kershaw Chang [:kershaw] from comment #6)

I've tried some VPN software, but I can't find one that uses TAP. Do you know how to test this?

It's easy to set your own VPN up. You could also just override default route on ethernet/WLAN device and we should be able to detect it. E.g. your default GW is 192.168.0.1 on en0 and you can override the traffic via a host 192.168.0.100 by adding following routes:

0.0.0.0/5 via 192.168.0.100 dev en0
8.0.0.0/7 via 192.168.0.100 dev en0
11.0.0.0/8 via 192.168.0.100 dev en0
12.0.0.0/6 via 192.168.0.100 dev en0
16.0.0.0/4 via 192.168.0.100 dev en0
32.0.0.0/3 via 192.168.0.100 dev en0
64.0.0.0/2 via 192.168.0.100 dev en0
128.0.0.0/3 via 192.168.0.100 dev en0
160.0.0.0/5 via 192.168.0.100 dev en0
168.0.0.0/6 via 192.168.0.100 dev en0
172.0.0.0/12 via 192.168.0.100 dev en0
172.32.0.0/11 via 192.168.0.100 dev en0
172.64.0.0/10 via 192.168.0.100 dev en0
172.128.0.0/9 via 192.168.0.100 dev en0
173.0.0.0/8 via 192.168.0.100 dev en0
174.0.0.0/7 via 192.168.0.100 dev en0
176.0.0.0/4 via 192.168.0.100 dev en0
192.0.0.0/9 via 192.168.0.100 dev en0
192.128.0.0/11 via 192.168.0.100 dev en0
192.160.0.0/13 via 192.168.0.100 dev en0
192.169.0.0/16 via 192.168.0.100 dev en0
192.170.0.0/15 via 192.168.0.100 dev en0
192.172.0.0/14 via 192.168.0.100 dev en0
192.176.0.0/12 via 192.168.0.100 dev en0
192.192.0.0/10 via 192.168.0.100 dev en0
193.0.0.0/8 via 192.168.0.100 dev en0
194.0.0.0/7 via 192.168.0.100 dev en0
196.0.0.0/6 via 192.168.0.100 dev en0
200.0.0.0/5 via 192.168.0.100 dev en0
208.0.0.0/4 via 192.168.0.100 dev en0

At this moment we don't have a reliable way to get mac address of gateway for the case in comment 4.
netlink doesn't work for OSX. Developer support even suggested shipping product against MAC address or BSSID. (see this, this is in Core OS components, which should cover OS X)

For Network Kernel Extension, they provides API for

  • filters KPI (kernel programming interfaces) with different layers,
  • Interface KPI to gather information about interfaces and so on
  • protocol plumber KPIs
    This only usable KPI here might be interface KPI, which plays the similar role of getifaddrs.

Try to find hints in apple developer support with no luck.

On the other hands, remote MAC is not consistent for a VPN. That is, gateway mac address is up to change.
Given the above context, hash(ip, interface name?) might be another way to generate network id.
OS X did some index for interface name (e.g., vtap0, vtap1,... , utun0, utun1...), which might be a good key for hash.

Here's the implementation plan:

  1. Hash all mac with predefined destination (as comment 2 said). The predefined list OTOMH is {default, 0/1, 128.0/1}
  2. If the mac isn't available via arp table. Consider hash all the ip's and interface name (need to test with different VPN clients)
  3. Otherwise, let's hash the mac/ip for all qualified interfaces. qualified means ipv4 and layer2 for calculation of ipv4 network id.
  4. Create a telemetry probes to see the portion we get id from 1 or 2 or 3.

Hello michal,
Could you check if the implementation plan in comment 9 makes sense?
On the other hand, Comment 8 looks like tun since tap is on layer 2.

Flags: needinfo?(michal.novotny)

(In reply to Junior [:junior] from comment #9)

At this moment we don't have a reliable way to get mac address of gateway for the case in comment 4.
netlink doesn't work for OSX. Developer support even suggested shipping product against MAC address or BSSID. (see [this]
For Network Kernel Extension

I don't understand the suggestion. BSSID is available on WiFi which uses ethernet frames, so MAC address must be available.

On the other hands, remote MAC is not consistent for a VPN. That is, gateway mac address is up to change.

If VPN uses TAP device, then it's not likely that MAC address of the gateway is going to change.

Here's the implementation plan:

  1. Hash all mac with predefined destination (as comment 2 said). The predefined list OTOMH is {default, 0/1, 128.0/1}

Checking just 0.0.0.0/1 and 128.0.0.0/1 isn't enough. Routes listed in comment #8 are also very often used to override the default gateway. And I guess that on Mac there can be also multiple 0.0.0.0/0 routes with different priority and/or in different tables.

  1. If the mac isn't available via arp table. Consider hash all the ip's and interface name (need to test with different VPN clients)

If the MAC isn't in arp table and the device uses ethernet frames, we should wait until MAC address is discovered.

Flags: needinfo?(michal.novotny)

At this moment we don't have a reliable way to get mac address of gateway for the case in comment 4.
netlink doesn't work for OSX. Developer support even suggested shipping product against MAC address or BSSID. (see [this]
For Network Kernel Extension

I don't understand the suggestion. BSSID is available on WiFi which uses ethernet frames, so MAC address must be available.

Apple has their own security policy for application developer. For example iOS developer can't gather the user's Mac address years ago.
Here's the suggestion I quote from apple forum:
"WARNING Given the privacy implications of this it’s likely that this information will not be available in the long term. In fact, we tried to make it unavailable in iOS 9 but withdrew that change after it caused a host of compatibility problems (you can read the backstory in this seven page thread). For a high school final project it’s fine to use this; you could even mention the state of this API in your write up as a perfect example of the tension between technical solutions and social acceptance. However, I strongly recommend against folks using it in a shipping product."

On the other hands, remote MAC is not consistent for a VPN. That is, gateway mac address is up to change.

If VPN uses TAP device, then it's not likely that MAC address of the gateway is going to change.

I mean the MAC address should be up to change for different connections with the same network setting.

Here's the implementation plan:

  1. Hash all mac with predefined destination (as comment 2 said). The predefined list OTOMH is {default, 0/1, 128.0/1}

Checking just 0.0.0.0/1 and 128.0.0.0/1 isn't enough. Routes listed in comment #8 are also very often used to override the default gateway. And I guess that on Mac there can be also multiple 0.0.0.0/0 routes with different priority and/or in different tables.

Thanks. Both make senses.

  1. If the mac isn't available via arp table. Consider hash all the ip's and interface name (need to test with different VPN clients)

If the MAC isn't in arp table and the device uses ethernet frames, we should wait until MAC address is discovered.

The thing is we don't have a reliable way for the MAC address discovery. (Maybe there is one.)
At the moment what I can do might be wait for a fixed heuristic second?

(In reply to Junior [:junior] from comment #12)

Apple has their own security policy for application developer. For example iOS developer can't gather the user's Mac address years ago.
Here's the suggestion I quote from apple forum:
"WARNING Given the privacy implications of this it’s likely that this information will not be available in the long term. In fact, we tried to make it unavailable in iOS 9 but withdrew that change after it caused a host of compatibility problems (you can read the backstory in this seven page thread). For a high school final project it’s fine to use this; you could even mention the state of this API in your write up as a perfect example of the tension between technical solutions and social acceptance. However, I strongly recommend against folks using it in a shipping product."

It probably makes sense in case of iOS, which is this case. I really don't believe MAC is not available on MacOS.

If VPN uses TAP device, then it's not likely that MAC address of the gateway is going to change.

I mean the MAC address should be up to change for different connections with the same network setting.

Are we talking about MAC address of the gateway on the local network, or MAC address of the gateway in VPN?

The thing is we don't have a reliable way for the MAC address discovery. (Maybe there is one.)
At the moment what I can do might be wait for a fixed heuristic second?

If there is no way to listen for changes in ARP table, then we might try to get the MAC again after some time.

I mean the MAC address should be up to change for different connections with the same network setting.

Are we talking about MAC address of the gateway on the local network, or MAC address of the gateway in VPN?

arp(gateway in routing table)

This doesn't answer my question. Anyway, if the VPN uses TAP device, then it behaves like any other ethernet network. So when the traffic is routed via a VPN, there must be some gateway in the VPN which have some MAC address and it won't change. If it changes, it's a network change.

The thing is we don't have a reliable way for the MAC address discovery. (Maybe there is one.)
At the moment what I can do might be wait for a fixed heuristic second?

If there is no way to listen for changes in ARP table, then we might try to get the MAC again after some time.

Bug 1584165 is about to do this.

Note of my observation.
tun case:

Destination        Gateway            Flags        Refs      Use   Netif Expire
default            10.128.44.1        UGSc          138        0     en0       
default            link#23            UCSI            0        0  utun10       

The routing table shows the Gateway is on linker layer service.
LLADDR can't get the mac from socket_dl
https://searchfox.org/mozilla-central/rev/05a22d864814cb1e4352faa4004e1f975c7d2eb9/netwerk/system/mac/nsNetworkLinkService.mm#121

DST GATEWAY MASK
0.0.0.0 10.128.44.1 132.0.5.4 
0.0.0.0 0.0.0.0 124.0.5.4 

The route table dump shows weird mask to me.

tap case:
Please see comment 7 for an example.
IP on the tap interface is not in the arp table.
Here's another clue for arp command

$ arp 10.48.240.1
? (10.48.240.1) at (incomplete) on vtap0 ifscope [ethernet]

The parsed mask for tap is weird also.

DST GATEWAY MASK
0.0.0.0 10.128.44.1 128.0.5.4 
0.0.0.0 10.48.240.1 132.0.5.4 

This is the table without VPN FYI

DST GATEWAY MASK
0.0.0.0  10.128.44.1 124.0.5.4 

Therefore, we can tell it's a VPN, but need to find another way to hash.
Will upload patch later.

Note that we still miss some part like predefined destination filter OTOMH.
Maybe there more parts missing.
viscosity is the only client I tested. I could try more free VPN later.
Alleged that ProtonVPN fails this approach.

This approach is assuming the ARP table is stable.
Therefore Bug 1584165 is needed.

Here's some result in US:

NordVPN

  • IKEv2
    -- additional default layer 2 gateway with ifname: ipsec0
  • OpenVPN TCP or UDP
    -- additional default layer 2 gateway with ifname: utun3

surfshark, PortonVPN basic (without secure core, which can't free tried?),

  • additional default layer 2 gateway with ifname: ipsec0

PrivateVPN

  • OpenVPN
    -- additional 0 and 128.0 layer 3 gateway (same IP) with ifname: utun3
  • L2TP
    -- additional default layer 2 gateway with ifname: ppp0

ExpressVPN (ipv6 parse failed)

  • Auto, OpenVPN UDP, OpenVPN TCP
    -- additional 0 and 128.0 layer 3 gateway (same IP) with ifname: utun3
  • IKEv2
    -- additional default layer 2 gateway with ifname: ipsec0
  • L2TP
    -- additional default layer 2 gateway with ifname: ppp0

Looks like we can detect there's a VPN with different protocol, but switching VPN with same protocol with layer 2 will have same network id.

Let's do the arp table cache as a follow up since we might or might not need this.

Any chance to move this forward? Thanks.

Flags: needinfo?(michal.novotny)
Group: mozilla-employee-confidential
Flags: needinfo?(michal.novotny)

I trace again how network id work on Netlink for the answer of my question
Looks like the implementation plan in comment 9 is overturned.

Instead of parsing the routing table, filtering with predefined destination, and worrying about big/little endian, we go with the plan:

Setup a socket with network.netlink.route.check.IPv4, it's for getting the information of the gateway from kernel.
Therefore intranet is fine (Haven't tested)
Since socket() read() write() are blocking, we need to put them in another thread (socket thread?)
Note that ipv6 always return sa_len == 0, which makes no sense to change the way getting key from ipv6.

Once network-id-change event landed, we need to adjust for multithread approach here.

Questions:
Which I don't understand is: looks like we're still gathering the routing table?
https://searchfox.org/mozilla-central/rev/1fe0cf575841dbf3b7e159e88ba03260cd1354c0/netwerk/system/netlink/NetlinkService.cpp#1443-1516
Or we only gather the default destination, instead of predefined destination.

Also, should we reuse the fall-back algorithm in P3 for the gateway provided by kernel?

Anything I missed on top of you head, michal?

Flags: needinfo?(michal.novotny)

Since socket() read() write() are blocking, we need to put them in another thread (socket thread?)

We already did it off main thread.

Questions:
Which I don't understand is: looks like we're still gathering the routing table?
https://searchfox.org/mozilla-central/rev/1fe0cf575841dbf3b7e159e88ba03260cd1354c0/netwerk/system/netlink/NetlinkService.cpp#1443-1516
Or we only gather the default destination, instead of predefined destination.

Haven't got the answer. I hash both default gateways from routing table and the routing of pre-defined address from kernel.

(In reply to Junior [:junior] from comment #27)

Questions:
Which I don't understand is: looks like we're still gathering the routing table?
https://searchfox.org/mozilla-central/rev/1fe0cf575841dbf3b7e159e88ba03260cd1354c0/netwerk/system/netlink/NetlinkService.cpp#1443-1516
Or we only gather the default destination, instead of predefined destination.

On Linux/Android we hash all default routes we find no matter what priority it has or what tables are stored in. And then we hash the information where the packets are sent to. E.g. let's say we have following 2 default routes and a response to the route query, then we'll hash it all:
default via 192.168.1.1 dev eth0
default via 10.0.0.1 dev wlan0
23.219.91.27 via 192.168.1.1 dev eth0 src 192.168.1.100

When for whatever reason the traffic is switched to use wlan0 then we would get a different ID because the hashed data would be:
default via 192.168.1.1 dev eth0
default via 10.0.0.1 dev wlan0
23.219.91.27 via 10.0.0.1 dev wlan0 src 10.0.0.50

Even if we fail to detect some default route, we would still be able to differentiate the ID according to where the traffic is routed, e.g.:
default via 192.168.1.1 dev eth0
23.219.91.27 via 10.0.0.1 dev wlan0 src 10.0.0.50

Flags: needinfo?(michal.novotny)

Cool looks like they have same approaches.
A minor difference is Mac OS hash "0.0.0.0" instead of default gateways because of lack of netmask information.

Attachment #9100354 - Attachment description: Bug 1579424 - P4 bail out non-predefined destination in routing table → Bug 1579424 - P4 hash the routing of predefine IP
Attachment #9100100 - Attachment description: Bug 1579424 - P1 Traverse the whole routing table → Bug 1579424 - P1 Traverse the whole routing table, r=michal
Attachment #9100101 - Attachment description: Bug 1579424 - P2 Calculate network id not only using the main gateway → Bug 1579424 - P2 Calculate network id not only using the main gateway, r=michal
Attachment #9100102 - Attachment description: Bug 1579424 - P3 Use ifname and ip as a fail over → Bug 1579424 - P3 Use ifname and ip as a fail over, r=michal
Attachment #9100354 - Attachment description: Bug 1579424 - P4 hash the routing of predefine IP → Bug 1579424 - P4 asking kernel the gateway of pre-defined address, r=michal
Attachment #9100355 - Attachment description: Bug 1579424 - P5 module log for network link service in OSX → Bug 1579424 - P5 module log for network link service in OSX, r=michal
Depends on: 1595630

Sorry for pushing the review but I don't want to miss the train.

Flags: needinfo?(michal.novotny)

Done. Sorry for the delay.

Flags: needinfo?(michal.novotny)
Pushed by juhsu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/224557872391 P1 Traverse the whole routing table, r=michal https://hg.mozilla.org/integration/autoland/rev/500126439162 P2 Calculate network id not only using the main gateway, r=michal https://hg.mozilla.org/integration/autoland/rev/1d6992a1811e P3 Use ifname and ip as a fail over, r=michal https://hg.mozilla.org/integration/autoland/rev/bc41fd81784b P4 asking kernel the gateway of pre-defined address, r=michal https://hg.mozilla.org/integration/autoland/rev/1159b7bcfbd2 P5 module log for network link service in OSX, r=michal

This warning seems to fire on every startup, is that indicative of some problem or can we turn the warning off?
https://searchfox.org/mozilla-central/rev/04d8e7629354bab9e6a285183e763410860c5006/netwerk/system/mac/nsNetworkLinkService.mm#606

Flags: needinfo?(juhsu)

I've removed the preference in bug 1596419 and forgot to update this code. I'll file a bug to fix this.

Flags: needinfo?(juhsu)
Depends on: 1600811

(In reply to Michal Novotny [:michal] from comment #37)

I've removed the preference in bug 1596419 and forgot to update this code. I'll file a bug to fix this.

In fact, I removed it in bug 1593693 which landed earlier. Anyway, it will be fixed in bug 1600811.

Depends on: 1600820

Hi there, sorry to comment on a closed bug but I was hoping this would solve an issue I was having but it hasn't seemed to.

I'm using Firefox 72.0.1 on macOS 10.14.6 (18G2022), but this issue was present from at least Firefox 70.

The issue I see is:

I have Firefox open, and try visiting a site that requires me to connect to my company's VPN. For example, I might hit GitHub's SSO signin page because I'm trying to visit a repo that requires SSO.

I then connect to my company's VPN using Cisco AnyConnect (4.8.01090) and other software resumes working, eg Slack.

In Firefox, If I try to then proceed directly with the page I visited before connecting to the VPN, such as the GitHub SSO page, my requests hang. I have to either wait some time or force refresh to get the page to load again.

My hunch here is that Firefox is not detecting the network change that came from connecting to the VPN and is still trying to use an idle connection established before connecting to the VPN. Waiting or force reloading has it use a new connection which then works.

Is this scenario meant to be covered by this bug? I had searched a bit before finding this one and had a hard time finding anything that described my trouble like this issue.

Thanks!

(In reply to dpiddy from comment #39)

My hunch here is that Firefox is not detecting the network change that came from connecting to the VPN and is still trying to use an idle connection established before connecting to the VPN. Waiting or force reloading has it use a new connection which then works.

Could you please check at about:networking#networkid what's the network ID before and after connecting to the VPN?

Flags: needinfo?(dpiddy)

While connected to the VPN: lO+izgovRRgEQaAYYGaNS3ONc7k=
After disconnecting from the VPN: Wd9/djvPMUrowsnGQsQ1HTIVmS0=

With "Autorefresh every 3 seconds" checked I saw it change shortly after disconnecting, and after reconnecting it changed back to lO+izgovRRgEQaAYYGaNS3ONc7k=.

Flags: needinfo?(dpiddy)

Does it sound like the issue I'm seeing should be covered by this bug or should I open a new one? Thanks!

Attachment #9098506 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: