Open Bug 1446819 Opened 7 years ago Updated 2 years ago

Crash [@ mozalloc_abort] with UNIMPLEMENTED abort message and Debugger

Categories

(Core :: JavaScript Engine, defect, P3)

ARM64
Linux
defect

Tracking

()

ASSIGNED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix

People

(Reporter: decoder, Assigned: nbp, NeedInfo)

References

(Blocks 3 open bugs)

Details

(5 keywords, Whiteboard: [jsbugmon:update,bisect][arm64:m3])

Crash Data

Attachments

(2 files, 1 obsolete file)

The following testcase crashes on mozilla-central revision fcb11e93adf5+ (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize --enable-simulator=arm64, run with --fuzzing-safe): evaluate(` var g = newGlobal(() => eval(fn(\`...\${pattern}, []\`)), SyntaxError); g.parent = this; g.eval("Debugger(parent).onEnterFrame = function() {};"); (function() { for (var assertEq = -10; assertEq < 10; assertEq++) {} } (function() {})) `); Backtrace: received signal SIGSEGV, Segmentation fault. 0x000000000049236a in mozalloc_abort (msg=msg@entry=0x1263858 "Redirecting call to abort() to mozalloc_abort\n") at memory/mozalloc/mozalloc_abort.cpp:33 #0 0x000000000049236a in mozalloc_abort (msg=msg@entry=0x1263858 "Redirecting call to abort() to mozalloc_abort\n") at memory/mozalloc/mozalloc_abort.cpp:33 #1 0x0000000000492340 in abort () at memory/mozalloc/mozalloc_abort.cpp:80 #2 0x00000000009bf3c0 in vixl::Simulator::VisitUnallocated (this=<optimized out>, instr=<optimized out>) at js/src/jit/arm64/vixl/Simulator-vixl.cpp:749 #3 0x000000000094f5b3 in vixl::Decoder::VisitUnallocated (this=<optimized out>, instr=0xfcba3680138) at js/src/jit/arm64/vixl/Decoder-vixl.cpp:872 #4 0x00000000009d4d75 in vixl::Decoder::Decode (instr=<optimized out>, this=<optimized out>) at js/src/jit/arm64/vixl/Decoder-vixl.h:158 #5 vixl::Simulator::ExecuteInstruction (this=0x7ffff5f3b000) at js/src/jit/arm64/vixl/MozSimulator-vixl.cpp:195 #6 0x00000000009d73cc in vixl::Simulator::Run (this=0x7ffff5f3b000) at js/src/jit/arm64/vixl/Simulator-vixl.cpp:68 #7 0x00000000009d523d in vixl::Simulator::RunFrom (first=0xfcba364f970, this=0x7ffff5f3b000) at js/src/jit/arm64/vixl/Simulator-vixl.cpp:76 #8 vixl::Simulator::call (this=0x7ffff5f3b000, entry=entry@entry=0xfcba364f970 "\376w\277\251\375\003", argument_count=argument_count@entry=8) at js/src/jit/arm64/vixl/MozSimulator-vixl.cpp:326 #9 0x0000000000622e4e in EnterBaseline (data=..., cx=0x7ffff5f16000) at js/src/jit/BaselineJIT.cpp:151 #10 js::jit::EnterBaselineAtBranch (cx=0x7ffff5f16000, fp=0x7ffff4573140, pc=<optimized out>) at js/src/jit/BaselineJIT.cpp:226 #11 0x000000000056d278 in Interpret (cx=0x7ffff5f16000, state=...) at js/src/vm/Interpreter.cpp:2038 #12 0x000000000056d805 in js::RunScript (cx=0x7ffff5f16000, state=...) at js/src/vm/Interpreter.cpp:417 #13 0x00000000005709bd in js::ExecuteKernel (cx=0x7ffff5f16000, script=..., script@entry=..., envChainArg=..., newTargetValue=..., evalInFrame=..., evalInFrame@entry=..., result=result@entry=0x7ffff4573090) at js/src/vm/Interpreter.cpp:700 #14 0x0000000000570ea1 in js::Execute (cx=cx@entry=0x7ffff5f16000, script=script@entry=..., envChainArg=..., rval=rval@entry=0x7ffff4573090) at js/src/vm/Interpreter.cpp:733 #15 0x0000000000a17d76 in ExecuteScript (cx=cx@entry=0x7ffff5f16000, scope=scope@entry=..., script=script@entry=..., rval=rval@entry=0x7ffff4573090) at js/src/jsapi.cpp:4712 #16 0x0000000000a18354 in ExecuteScript (cx=0x7ffff5f16000, envChain=..., scriptArg=..., rval=0x7ffff4573090) at js/src/jsapi.cpp:4731 #17 0x0000000000a183ea in JS_ExecuteScript (cx=<optimized out>, envChain=..., scriptArg=..., scriptArg@entry=..., rval=...) at js/src/jsapi.cpp:4752 #18 0x000000000046d5f9 in Evaluate (cx=0x7ffff5f16000, argc=<optimized out>, vp=<optimized out>) at js/src/shell/js.cpp:2025 #19 0x00000000005796fd in js::CallJSNative (cx=0x7ffff5f16000, native=0x46cb30 <Evaluate(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/vm/JSContext-inl.h:290 [...] #33 0x0000000000442f92 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:9410 rax 0x0 0 rbx 0x7ffff6eef700 140737336243968 rcx 0x7ffff6c212dd 140737333301981 rdx 0x0 0 rsi 0x7ffff6ef0770 140737336248176 rdi 0x7ffff6eef540 140737336243520 rbp 0x7fffffffbe80 140737488338560 rsp 0x7fffffffbe70 140737488338544 r8 0x7ffff6ef0770 140737336248176 r9 0x7ffff7fe4780 140737354024832 r10 0x0 0 r11 0x0 0 r12 0x7ffff5f59068 140737319899240 r13 0xfcba3680138 17367294279992 r14 0x7fffffffc100 140737488339200 r15 0x7fffffffc070 140737488339056 rip 0x49236a <mozalloc_abort(char const*)+42> => 0x49236a <mozalloc_abort(char const*)+42>: movl $0x0,0x0 0x492375 <mozalloc_abort(char const*)+53>: ud2
Tail of the instruction trace: ... 0x00000b1333c73b24 910003fc mov x28, sp 0x00000b1333c73b28 f940039c ldr x28, [x28] 0x00000b1333c73b2c f840879e ldr x30, [x28], #8 0x00000b1333c73b30 9100039f mov sp, x28 0x00000b1333c73b34 a8c15f93 ldp x19, x23, [x28], #16 0x00000b1333c73b38 9100639c add x28, x28, #0x18 (24) 0x00000b1333c73b3c 9100c2f7 add x23, x23, #0x30 (48) 0x00000b1333c73b40 72001c1f tst w0, #0xff 0x00000b1333c73b44 54000040 b.eq #+0x8 (addr 0xb1333c73b4c) 0x00000b1333c73b48 d61f0260 br x19 0x00000b1333ca4138 ffff00ac unallocated (Unallocated) Unallocated instruction at 0xb1333ca4138: 0xffff00ac This looks bad but it also looks a lot like bug 1445992 and they both could in principle be caused by bug 1445907 so it's hard to say for sure without fixing some other things first.
Priority: -- → P3
I just noticed that this is triggering super frequent, marking fuzzblocker. Especially because the machines fuzzing ARM64 are only a few and not very powerful, getting top crashers out of the way is really important to find other bugs.
Flags: needinfo?(lhansen)
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update,bisect][fuzzblocker]
Looks like a wild pointer loaded and used for a branch target. This is JS, not wasm, maybe Sean has some cycles to look into it (since it crashes on Simulator).
Flags: needinfo?(lhansen) → needinfo?(sstangl)
Sean, do you have any updates on this ARM64 abort? It's blocking ARM64 fuzzing from finding other bugs.
Whiteboard: [jsbugmon:update,bisect][fuzzblocker] → [jsbugmon:update,bisect][fuzzblocker][geckoview:crow]
I'm not able to reproduce the crash. Decoder, is this still showing up in fuzzing? If it's a fuzzblocker, could you please attach another testcase that works with the latest m-c? A bunch of ARM64 bugs were fixed in the meantime, and it's possible this could be a dup of one of those.
Flags: needinfo?(choller)
This is an automated crash issue comment: Summary: Crash [@ mozalloc_abort] Build version: mozilla-central revision 3c9d69736f4a Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize --enable-simulator=arm64 Runtime options: --fuzzing-safe Testcase: var lfLogBuffer = ` var g = newGlobal("/* 3/* */"); g.parent = this; g.eval("Debugger(parent).onEnterFrame = function() {};"); \`Assertion failed: expected exception \${ctor.name}, got \${exc}\`; `; untemplate = function(s) { return s.replace(/\\/g, '\\\\\\').replace(/`/g, '\\\`').replace(/\$/g, '\\\$'); } loadFile(lfLogBuffer); loadFile(lfLogBuffer); function loadFile(lfVarx) { try { evaluate(lfVarx); } catch (lfVare) {} lfVarx = untemplate(lfVarx); } Backtrace: received signal SIGSEGV, Segmentation fault. 0x0000000000495e4a in mozalloc_abort (msg=msg@entry=0x12973d0 "Redirecting call to abort() to mozalloc_abort\n") at memory/mozalloc/mozalloc_abort.cpp:34 #0 0x0000000000495e4a in mozalloc_abort (msg=msg@entry=0x12973d0 "Redirecting call to abort() to mozalloc_abort\n") at memory/mozalloc/mozalloc_abort.cpp:34 #1 0x0000000000495e20 in abort () at memory/mozalloc/mozalloc_abort.cpp:81 #2 0x0000000000a192b0 in vixl::Simulator::VisitUnallocated (this=<optimized out>, instr=<optimized out>) at js/src/jit/arm64/vixl/Simulator-vixl.cpp:751 #3 0x00000000009ac393 in vixl::Decoder::VisitUnallocated (this=<optimized out>, instr=0x1a89d02e4244) at js/src/jit/arm64/vixl/Decoder-vixl.cpp:872 #4 0x00000000009e9f35 in vixl::Decoder::Decode (instr=<optimized out>, this=<optimized out>) at js/src/jit/arm64/vixl/Decoder-vixl.h:158 #5 vixl::Simulator::ExecuteInstruction (this=this@entry=0x7ffff5f3b000) at js/src/jit/arm64/vixl/MozSimulator-vixl.cpp:195 #6 0x0000000000a3475c in vixl::Simulator::Run (this=0x7ffff5f3b000) at js/src/jit/arm64/vixl/Simulator-vixl.cpp:70 #7 0x00000000009fdffc in vixl::Simulator::call (this=0x7ffff5f3b000, entry=entry@entry=0x1a89d02b9960 "\376w\277\251\375\003", argument_count=argument_count@entry=8) at js/src/jit/arm64/vixl/MozSimulator-vixl.cpp:327 #8 0x000000000067673b in EnterBaseline (data=..., cx=0x7ffff5f17000) at js/src/jit/BaselineJIT.cpp:151 #9 js::jit::EnterBaselineAtBranch (cx=0x7ffff5f17000, fp=0x7ffff458b390, pc=<optimized out>) at js/src/jit/BaselineJIT.cpp:226 #10 0x00000000005a9209 in Interpret (cx=0x7ffff5f17000, state=...) at js/src/vm/Interpreter.cpp:2037 [...] #20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:9301 rax 0x0 0 rbx 0x7ffff6ef6700 140737336272640 rcx 0x7ffff6c282ad 140737333330605 rdx 0x0 0 rsi 0x7ffff6ef7770 140737336276848 rdi 0x7ffff6ef6540 140737336272192 rbp 0x7fffffffc590 140737488340368 rsp 0x7fffffffc580 140737488340352 r8 0x7ffff6ef7770 140737336276848 r9 0x7ffff7fe4780 140737354024832 r10 0x0 0 r11 0x0 0 r12 0x7ffff5f59068 140737319899240 r13 0x1a89d02e4244 29179205534276 r14 0x7fffffffc820 140737488341024 r15 0x7fffffffc790 140737488340880 rip 0x495e4a <mozalloc_abort(char const*)+42> => 0x495e4a <mozalloc_abort(char const*)+42>: movl $0x0,0x0 0x495e55 <mozalloc_abort(char const*)+53>: ud2
Still reproduces per comment 6.
Flags: needinfo?(choller)
I took a look at the crash in Comment 6. The ARM64 code is failing in the EnterJIT trampoline in case of OSR -- it sets a fake return address of 0x0, and then calls out to js::InitBaselineFrameForOsr() to fill in the return address. That makes its way to initForOsr(), which, since (fp->isDebugge()), calls Debugger::replaceFrameGuts(). It looks like replaceFrameGuts() doesn't like the layout of the ARM64 stack, even though the code itself looks identical with the x64/ARM/etc. trampoline code used by every other platform. I'm not familiar with the Debugger internals, but I'll take a look. In the meantime, since this was blocking fuzzing, it looks like you can work around the issue by passing --ion-osr=off, or avoiding using Debugger objects.
(In reply to Sean Stangl [:sstangl] from comment #8) > > I'm not familiar with the Debugger internals, but I'll take a look. In the > meantime, since this was blocking fuzzing, it looks like you can work around > the issue by passing --ion-osr=off, or avoiding using Debugger objects. The test in comment 6 also reproduces with --ion-osr=off. Avoiding the Debugger object is not trivial because you can get it back in various ways (newGlobal, evalnworker, etc).
Sorry about the long wait. I'm still not exactly sure what is causing the issue, but this is a workaround that will enable you to continue fuzzing by making sure that the erroneous condition never occurs. In summary, the problem is that the `masm.currentOffset()` stored in the PCMappingEntry used for OSR winds up pointing to the PoolHeader, instead of to the guard-jump just before the PoolHeader. So it's an off-by-1 error, but having read through the logic, I still haven't been able to find the source. Most likely something occurs when copying, but stepping through that codepath, I didn't observe where it went wrong. But it's tricky to map offsets to BufferOffsets. As a workaround, we can just flush the buffer before emitting a toggledCall(), and then fix up the `masm.currentOffset()` to ignore the constant pool entirely. This patch should be suitable for getting the fuzzers back up, and hopefully they'll be able to find the same bug in different scenarios, which will help to get a better understanding of what's happening. The code that ARM64 uses here is shared with ARM, so ARM should theoretically exhibit the same errors.
Flags: needinfo?(sstangl)
Attachment #8986602 - Flags: review?(tcampbell)
Whoops -- let a newline slip in there: forgot to commit.
Attachment #8986602 - Attachment is obsolete: true
Attachment #8986602 - Flags: review?(tcampbell)
Attachment #8986603 - Flags: review?(tcampbell)
Comment on attachment 8986603 [details] [diff] [review] 0001-Bug-1446819-Flush-constant-pools-before-recording-OS.patch Review of attachment 8986603 [details] [diff] [review]: ----------------------------------------------------------------- r=me to unblock fuzzing on ARM64
Attachment #8986603 - Flags: review?(tcampbell) → review+
Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4ae88dace273 Flush constant pools before recording OSR offsets. r=tcampbell
Keywords: checkin-needed
This needs either leave-open or a follow-up bug right? This looks like a potential serious bug in the ARM64 backend so I'm a bit worried about papering over it.
Flags: needinfo?(sstangl)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Please open a follow up for the arm64 issue
Assignee: nobody → sstangl
Reopening this bug to track the real fix for the ARM64. Or should we wait for the ARM64 fuzzers to find the same crash in a different code path (as suggested in comment 10) and then file a follow-up bug?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
P1 because jandem says "This looks like a potential serious bug in the ARM64 backend so I'm a bit worried about papering over it" (comment 14) and sstangl is now working on Ion for ARM64.
Priority: P3 → P2
Whiteboard: [jsbugmon:update,bisect][fuzzblocker][geckoview:crow] → [jsbugmon:update,bisect][fuzzblocker][geckoview:p1]
64=wontfix because we don't plan to ship ARM64 builds of Fennec or GV 64.
Whiteboard: [jsbugmon:update,bisect][fuzzblocker][geckoview:p1] → [jsbugmon:update,bisect][fuzzblocker][arm64:m3]

Reverting comment 13, I am unable to reproduce with either comment 0 nor comment 6, even when using --no-ion.

I am running the test suite to see if I am able to reproduce it, otherwise I will submit a patch to revert this change and mark this bug as works-for-me.

Comment on attachment 9046409 [details]
Bug 1446819 - Revert temporary fix. r=

Decoder & Gary, are you able to reproduce the original issue when applying this patch?
I was not able to reproduce it with our current test suite.

Attachment #9046409 - Flags: feedback?(nth10sd)
Attachment #9046409 - Flags: feedback?(choller)

Comment on attachment 9046409 [details]
Bug 1446819 - Revert temporary fix. r=

Sorry clearing feedback? for now. I was out on PTO recently and in the meantime ARM64 IonMonkey has landed and there have been a whole bunch of fuzzing bugs after, amongst other stuff I'm rushing I'm not sure I'd prioritise this over fuzzing ARM64 m-c itself. We'd file new bugs going forward.

Attachment #9046409 - Flags: feedback?(nth10sd)

Removing [fuzzblocker] whiteboard tag based on comment 12.
Taking over the bug as I have a patch waiting for feedback.

(In reply to Jan de Mooij [:jandem] from comment #14)

This needs either leave-open or a follow-up bug right? This looks like a
potential serious bug in the ARM64 backend so I'm a bit worried about
papering over it.

I agree, we should investigate if this is still an issue instead of keeping this work-around. However, running the test suite without the current applied patch seems fine.

I am now waiting on fuzzers to see if they can catch this issue again.

Assignee: sstangl → nicolas.b.pierron
Status: REOPENED → ASSIGNED
Flags: needinfo?(sstangl)
Whiteboard: [jsbugmon:update,bisect][fuzzblocker][arm64:m3] → [jsbugmon:update,bisect][arm64:m3]

This bug has a work-around and is no longer a fuzz blocker.
Thus lowering the priority to P3.

Priority: P2 → P3
QA Whiteboard: qa-not-actionable
Severity: critical → S3

The following patch is waiting for review from an inactive reviewer:

ID Title Author Reviewer Status
D21040 Bug 1446819 - Revert temporary fix. r= nbp sstangl: Resigned from review

:nbp, could you please find another reviewer or abandon the patch if it is no longer relevant?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nicolas.b.pierron)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: