Closed
Bug 1439231
Opened 7 years ago
Closed 7 years ago
Firefox TRR Mode 3 semi-reliably crashes Mac
Categories
(Core :: Networking, defect, P1)
Core
Networking
Tracking
()
RESOLVED
FIXED
People
(Reporter: ekr, Assigned: bagder)
References
Details
(Whiteboard: [necko-triaged][trr])
Repro
1. Turn on mode 3
2. Restart Firefox
3. Type something in the URL bar
Results:
Crash with the following stack trace
*** Panic Report ***
panic(cpu 3 caller 0xffffff800ec0f8af): assertion failed: inp->inp_flowhash != 0, file: /BuildRoot/Library/Caches/com.apple.xbs/Sources/xnu/xnu-4570.41.2/bsd/netinet/tcp_output.c, line: 1860
Backtrace (CPU 3), Frame : Return Address
0xffffff9225bcb800 : 0xffffff800e84f606
0xffffff9225bcb850 : 0xffffff800e97c654
0xffffff9225bcb890 : 0xffffff800e96e149
0xffffff9225bcb910 : 0xffffff800e801120
0xffffff9225bcb930 : 0xffffff800e84f03c
0xffffff9225bcba60 : 0xffffff800e84edbc
0xffffff9225bcbac0 : 0xffffff800ec0f8af
0xffffff9225bcbc60 : 0xffffff800ec1cbb4
0xffffff9225bcbcc0 : 0xffffff800ed722bc
0xffffff9225bcbde0 : 0xffffff800ed82b93
0xffffff9225bcbed0 : 0xffffff800ed82891
0xffffff9225bcbf40 : 0xffffff800edfa978
0xffffff9225bcbfa0 : 0xffffff800e801906
BSD process name corresponding to current thread: firefox
Mac OS version:
17D47
Reporter | ||
Comment 1•7 years ago
|
||
Obviously this is a MacOS defect, but given that this seems to be the only thing in Firefox that triggers it, we should probably try to figure out how to avert.
Comment 2•7 years ago
|
||
fwiw google shows a similar problem related to a vpn, but that's the only search result for that assertion..
Assignee: nobody → daniel
Blocks: 1434852
Assignee | ||
Updated•7 years ago
|
Priority: -- → P1
Whiteboard: [necko-triaged][trr]
Assignee | ||
Comment 3•7 years ago
|
||
Do you have any further details on the stack with some Firefox code? I mean, which particular function/method in Firefox triggers this problem?
Reporter | ||
Comment 4•7 years ago
|
||
No. Because the entire operating system crashes, no stack gets gathered.
Reporter | ||
Comment 5•7 years ago
|
||
At least none I was able to find.
Assignee | ||
Comment 6•7 years ago
|
||
Not that it helps much, but it looks like it is this assert: https://github.com/apple/darwin-xnu/blob/master/bsd/netinet/tcp_output.c#L1860
Assignee | ||
Comment 7•7 years ago
|
||
Ack, I could easily reproduce this on my mac (which is a mac mini fully updated with the latest macos version). I also repeatedly get the *exact* same backtrace as showed up here in the original post.
When this crash happens, the machine reboots instantly so it is really hard to figure out exactly what Firefox did to trigger it:
1. Regular MOZ_LOG-logging of "nsHostResolver" to a file doesn't get flushed enough so the file ends up blank after reboot.
2. Running 'mach run --debug' doesn't help because gdb won't catch any problem before the reboot.
Assignee | ||
Comment 8•7 years ago
|
||
I also tried using "sync" with MOZ_LOG, but the last lines it caught were not really helpful:
[838:DNS Resolver #1]: D/nsHostResolver CompleteLookup: [dns host namea] has [IP]
[838:DNS Resolver #1]: D/nsHostResolver nsHostResolver record 0x10344f680 calling back dns users
[838:Socket Thread]: D/nsHostResolver Checking blacklist for host [dns host name], host record [0x10344f680].
(repeated a few times)
Assignee | ||
Comment 9•7 years ago
|
||
(Apple at least looked at my bug report yesterday)
"Engineering has determined that your bug report (37706926) is a duplicate of 34406902 and will be closed."
Comment 10•7 years ago
|
||
daniel and I had a good meeting on this bug today.
1 - daniel has observed through testing that it is linked to aaaa. If we tweak TRR to not lookup aaaa the crash appears gone (though the testing isn't extensive).
2 - we observe that both daniel and ekr have no e2e v6. daniel has a link local v6 address.
3 - logging with trr off (mode 0) shows no v6 is returned to nsHostResolver from the system resolver. This is a bit different than other platforms.. e.g. linux sees v6 addresses here and then has to fallback when it cannot use them.
4 - hypothesis: system resolver normally filters v6 addresses when v6 is not confirmed to be working.. when trr bypasses the system resolver and tries to use addresses for which there is no connectivity (maybe no route?) the kernel panics.
5 - todo - daniel to have trr honor the disable v6 gecko pref. This is not very useful (its normally set to enable independent of the connectivity).
6 - todo - daniel to add a pref to disable aaaa only on mac as a temporary workaround
7 - todo - patrick to ask around and see if there is a way to query when v6 would be returned by the system resolver
8 - todo - daniel to consider feasibility to add a v6 probe for aaaa example.com using the system resolver to determine when to bypass 6. don't do it in trr-only mode.
Comment 11•7 years ago
|
||
it has been suggested that this might be related to a known TFO bug in MacOS that is believed fixed in the 10.13.4 beta.
Assignee | ||
Comment 12•7 years ago
|
||
Confirmed! It is certainly TFO related. When I switch off "network.tcp.tcp_fastopen_enable", I can leave the AAAA code in there and I've been able to click around, open and close many tabs to an extend that was not previously possible on the mac.
Assignee | ||
Comment 13•7 years ago
|
||
bug 1444453 has landed, which indirectly solves this one as well.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•