arm64 nightly: tab crash on github documentation/wiki pages
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox65 | --- | unaffected |
firefox66 | --- | fixed |
firefox67 | --- | fixed |
People
(Reporter: unixsmurf, Assigned: mgaudet)
References
(Blocks 1 open bug, )
Details
(Keywords: crash, regression, Whiteboard: [Feb 14: awaiting crash report to move out of Core:General])
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; rv:67.0) Gecko/20100101 Firefox/67.0
Steps to reproduce:
On arm64 windows nightly, loaded https://help.github.com/articles/associating-an-email-with-your-gpg-key/.
Actual results:
Loads seemingy fine, then after 5-10 seconds, tab crashes. Clicking "Restore this tab" just repeats the process.
I have seen this on multiple documentation/wiki pages on github - not just the github documentation, but project documentation as well.
First noticed a day or two ago. Definitely affects 67.0a1 (2019-01-30) and (2019-01-31).
Tested both on Lenovo Yoga C630 and HP Envy X2. Both running windows version 1803 on the "windows insider" slow track.
Expected results:
Tab not crashed.
Comment 1•6 years ago
|
||
Not reproducible for me.
Tested on following builds. Given link is loaded and no crashes observed.
Build ID 20190130215539
User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0
Build ID 20190205023948
User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0
Comment 2•6 years ago
|
||
Can you provide a crash report ID from about:crashes? Thanks.
For the future, you can file the bug from the crash report website and it will pre-fill fields in Bugzilla.
Reporter | ||
Comment 3•6 years ago
|
||
Hmm.
about:crashes says only "No crash reports have been submitted."
"Tab crash reporter" contains only
Gah. Your tab just crashed.
We can help!
Choose Restore This Tab to reload the page.
And "Close tab", "Restore this tab" buttons.
Sorry, I don't know the URL to the crash report website, and google isn't helping me.
Comment 4•6 years ago
|
||
Hmm… do you have crash reporting turned on? If you open about:preferences and search for crash, do you have
[x] "Allow Nightly to send technical and interaction data to Mozilla"
checked?
Reporter | ||
Comment 5•6 years ago
|
||
(Apologies for delay - the last few nights' arm64 Windows builds have been completely non-functional, but now working again.)
Yes, "Allow Nightly to send technical and interaction data to Mozilla" is checked.
Comment 6•6 years ago
|
||
Ted, any ideas for why there are no crash reports for this user?
Comment 7•6 years ago
|
||
There are some known issues with crash reporting on aarch64-windows currently, but I'm not actively involved in any of that work.
Reporter | ||
Comment 8•6 years ago
|
||
Umm, this bug wasn't about crashreporting - that was just something I was asked to do to provide more input to the actual problem. So I'm not sure RESOLVED:DUPLICATE of the crash reporting bug is the appropriate state?
Comment 9•6 years ago
|
||
Ok, moving to Core::General then since this it's very unlikely this issue is in browser/ code.
If you're able to use a debugger (lldb or gdb) then you could get the crash stack. Otherwise we'll have to wait for bug 1526276 or until someone else can reproduce this.
Comment 10•6 years ago
|
||
Andrew, since you have Lenovo Yoga C630, can you please try to repro?
Updated•6 years ago
|
Comment 11•6 years ago
|
||
(The crash reporting stuff is being worked on in bug 1526276.)
This reliably reproduces for me but of course due to bug 1526276 we can't tell why so back to Core:General we go :)
Note that Leif reported this before Ion was an option and I have Ion turned on and still crash after a few seconds so it's not related to Ion.
Comment 12•6 years ago
|
||
This reliably reproduces for me but of course due to bug 1526276 we can't
tell why so back to Core:General we go :)
Y'know... WinDbg is your friend :-)
This is a crash in JITted code. The memory at xip1 (aka x17) is inaccessible. I can't get a stack because we don't generate proper unwind info. I don't suppose anyone from JS could glance at this disassembly and magically know where it came from?
000001dc`c7ac1590 9100039f mov sp,x28
000001dc`c7ac1594 cb30ef9c sub x28,x28,xip0 sxtx #3
000001dc`c7ac1598 9278df9c and x28,x28,#-0x100
000001dc`c7ac159c 9100039f mov sp,x28
000001dc`c7ac15a0 aa1c03f1 mov xip1,x28
000001dc`c7ac15a4 ea01003f tst x1,x1
000001dc`c7ac15a8 540000a0 beq 000001dc`c7ac15bc
000001dc`c7ac15ac f8408458 ldr x24,[x2],#8
000001dc`c7ac15b0 f8008638 str x24,[xip1],#8 <<<<<<<<<<<<<< crash here
000001dc`c7ac15b4 f1000610 subs xip0,xip0,#1
000001dc`c7ac15b8 54ffffa1 bne 000001dc`c7ac15ac
000001dc`c7ac15bc b94000f0 ldr wip0,[x7]
000001dc`c7ac15c0 d100439f sub sp,x28,#0x10
000001dc`c7ac15c4 a9bf4384 stp x4,xip0,[x28,#-0x10]!
000001dc`c7ac15c8 cb1c0273 sub x19,x19,x28
000001dc`c7ac15cc d378de73 lsl x19,x19,#8
Comment 13•6 years ago
|
||
Actually maybe I can take a stab at this...
xip1 was set equal to sp. They have the value of 00000033c7dea700.
That address is in a MEM_RESERVE region from 00000033c7c00000 to 00000033c7ded000.
The next block above that (starting at 00000033c7ded000) has PAGE_READWRITE|PAGE_GUARD bits, suggesting that the next block above that was our stack.
It sounds like we grew the stack in such a sudden increment that we didn't take a guard page fault?
Comment 14•6 years ago
|
||
Lars, this is sounding an awful lot like bug 1351278 comment 21 -- with a similar github repro too. Are you aware of any reason that this might remain unfixed on arm64?
Comment 15•6 years ago
|
||
000001dc`c7ac1594 cb30ef9c sub x28,x28,xip0 sxtx #3
xip0 was 8768, so we subtracted 70k from the stack all in one go. Definitely the same type of issue...
Comment 16•6 years ago
|
||
David, can you go back and edit comment 12 and put triple-backticks around the backtrace so that it's possible to read it properly? Presumably the backticks in the addresses from windbg is confusing markdown in a major way.
Anyway, that's compiled JS code that we're looking at (x28 used as a stack pointer gives it away). It's possible that it's a similar problem to what we had with apply in bug 1351278. But the fix there was in platform-independent code, so there's no reason to believe that that bug in particular should be biting on arm64.
Comment 17•6 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #16)
David, can you go back and edit comment 12 and put triple-backticks around the backtrace so that it's possible to read it properly? Presumably the backticks in the addresses from windbg is confusing markdown in a major way.
Hmph, it looked fine on my end, and I have no edit button. We must be using a different interface. :-)
Comment 18•6 years ago
|
||
Lars, in bug 1351278 comment 22 you mentioned a possible exception for arm64. Is that still the case? It was quite some time ago so I admit it's a long shot.
Comment 19•6 years ago
|
||
Oh! A search for "sxtx" pointed me to it right away: it's vixl. https://searchfox.org/mozilla-central/rev/38035ee92463c9e9fbca729ac7e66476ac7eb27a/js/src/jit/arm64/Trampoline-arm64.cpp#121
Assignee | ||
Comment 22•6 years ago
|
||
Yeah, let me take a look.
Assignee | ||
Comment 23•6 years ago
|
||
Reporter | ||
Comment 24•6 years ago
|
||
I just got a tab crash for me on a different site, and the crash reporter finally appeared.
So I figured I would go back and generate a crash report for the aforementioned github URL.
But I can't - it's no longer crashing!
This on 67.0a1 (2019-02-20) (64-bit).
Comment 25•6 years ago
|
||
Comment 26•6 years ago
|
||
bugherder |
Comment 27•6 years ago
|
||
I'm guessing we'll want this on 66 as well. Please nominate for Beta approval when you get a chance.
Assignee | ||
Comment 28•6 years ago
|
||
Comment on attachment 9044291 [details]
Bug 1524419: Incrementally touch stack on arm64 r?tcampbell
Beta/Release Uplift Approval Request
- Feature/Bug causing the regression: None
- User impact if declined: Potential crashes on aarch64 windows
- Is this code covered by automated tests?: Unknown
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Changes stack touching to match the windows ABI, so should only improve things.
- String changes made/needed: None.
Comment 29•6 years ago
|
||
Matthew, touching the stack is only needed on Windows, right? Can we use an #ifdef to disable it for the other platforms?
Comment 30•6 years ago
|
||
I thought touching the stack on all platforms was deliberate, no? (bug 1488763 comment 2)
Comment 31•6 years ago
|
||
Touching stack is not /strictly/ needed on other platforms, but it doesn't seem worth having divergent behavior over. Linux has still experienced things like stack-clash and Bug 909094.
Comment on attachment 9044291 [details]
Bug 1524419: Incrementally touch stack on arm64 r?tcampbell
Should help avoid a crash, help for testing arm64 on beta.
OK for uplift to beta 12.
Comment 33•6 years ago
|
||
bugherder uplift |
Updated•6 years ago
|
Updated•6 years ago
|
Comment 34•6 years ago
|
||
I couldn't reproduce the issue to check if it's fixed on Firefox Nightly 67.0a1 (2019-01-30) and (2019-01-31) aarch64 builds and non aarch builds on Lenovo Yoga C630-13Q50 with Windows 10.
Leif Lindholm could you please check if the issue is fixed on the latest Firefox nightly and on Firefox 66.0b12?
Thanks.
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 35•5 years ago
|
||
(In reply to Hani Yacoub from comment #34)
I couldn't reproduce the issue to check if it's fixed on Firefox Nightly 67.0a1 (2019-01-30) and (2019-01-31) aarch64 builds and non aarch builds on Lenovo Yoga C630-13Q50 with Windows 10.
Leif Lindholm could you please check if the issue is fixed on the latest Firefox nightly and on Firefox 66.0b12?
Whoops, sorry, did not see the notification for this:
I have not seen this issue again since it started working from 67.0a1 (2019-02-20) (64-bit) as mentioned above.
Comment 36•5 years ago
|
||
This is on Release from some time. User did not see the issue since he logged the bug. Remove the qe-verify+ flag.
Reporter | ||
Comment 37•5 years ago
|
||
User most certainly saw the issue since he logged the bug. Just not since he reported (in comments above) the bug having gone away with 67.0a1 (2019-02-20). Reported against 67.0a1 (2019-01-30).
Description
•