Closed
Bug 1446307
Opened 7 years ago
Closed 6 years ago
ToggledCallSize() does not properly account for pools on ARM64
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla65
People
(Reporter: lth, Assigned: sstangl)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
application/x-javascript
|
Details | |
(deleted),
patch
|
sstangl
:
review+
|
Details | Diff | Splinter Review |
SoftIron Overdrive 1000, OpenSUSE
mozilla-inbound 408372:91a98a3ddd28 (m-c to m-i merge changeset)
Configure with --enable-debug --enable-optimize
Run jit_test.py without options
Errors:
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-02.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-03.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-04.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-bug1233178.js
When running with --no-baseline there are no errors, ergo, JS baseline compiler failure.
Normally this passes on the simulator when running on x64 linux, so probably a hardware-only failure, but needs deeper investigation.
Reporter | ||
Comment 1•7 years ago
|
||
But, the tests also do not fail when run with --baseline-eager --no-threads.
Reporter | ||
Updated•7 years ago
|
Blocks: arm64-baseline
Reporter | ||
Updated•7 years ago
|
No longer blocks: Fennec-ARM64
Comment 2•7 years ago
|
||
Reproduces reliably, sounds actionable -> P1
Flags: needinfo?(sstangl)
Priority: -- → P1
Comment 3•7 years ago
|
||
<jorendorff> [various stupid questions]
<lth> i'm imagining a missing icache flush when patching something
Assignee | ||
Comment 4•7 years ago
|
||
I don't have any ARM64 hardware at the moment, so if it doesn't reproduce in the simulator, I can't debug it.
Reporter | ||
Comment 5•7 years ago
|
||
To make things even more interesting, this is *only* for --enable-debug --enable-optimize builds, not for --disable-optimize --enable-debug or --enable-optimize --disable-debug.
Reporter | ||
Comment 6•7 years ago
|
||
For the time being I run tests on hardware manually, roughly weekly.
This morning a few more tests are failing and there are some new failure modes. Again this is an --enable-debug --enable-optimize build. My guess is that the underlying problem is the same however.
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-02.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-03.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-04.js
Exit code: -4
FAIL - debug/Script-getOffsetsCoverage-bug1233178.js
Exit code: -11
FAIL - debug/bug1240803.js
Exit code: -11
FAIL - debug/bug1246605.js
Assertion failure: thingKind == getAllocKind(), at /home/lhansen/m-i/js/src/gc/GC.cpp:563
Exit code: -11
FAIL - debug/bug-1248162.js
Exit code: -11
FAIL - debug/bug1252453.js
Reporter | ||
Comment 7•7 years ago
|
||
Simplified test case. Run this without flags (on hardware) to get an illegal instruction crash.
If run with --no-baseline there's no crash (unsurprising).
If run with --baseline-eager there's no crash (more interesting).
Reporter | ||
Comment 8•7 years ago
|
||
So there's a bug here with the constant pool header, but fixing that does not seem to make a difference. The bug is: On ARM64, sizeof(Instruction) is 1, so when WritePoolHeader() divides the pool size by sizeof(Instruction) to get the number of instructions to store in the pool header, it gets the wrong result. We must divide by 4 instead.
There appears to be some code somewhere that does not properly skip a constant pool that is emitted in the middle of a toggled call sequence. The code looks like this:
0x20b8b8dfbd8: add x28, x28, #0x8
0x20b8b8dfbdc: mov sp, x28 ; SyncStackPtr
0x20b8b8dfbe0: ldr x17, 0x20b8b8dfc4c ; First instruction of toggled call
0x20b8b8dfbe4: b 0x20b8b8dfc54 ; Branch across pool
=> 0x20b8b8dfbe8: .inst 0xffff001b ; Pool header (corrected)
... ; constants
0x20b8b8dfc54: nop ; Second instruction of toggled call
0x20b8b8dfc58: mov x16, #0x28b8
0x20b8b8dfc5c: movk x16, #0xb6e0, lsl #16
...
The program crashing at the point of => indicates that somebody knows that address and has generated code that uses it, but has not taken into account that there may be a pool there.
We could hack around this probably by flushing the pool before the patchable call but this seems wrong. (On the other hand, this is a poster child for suppressing constant pools if I ever saw one.)
Given that we don't crash with --baseline-eager this could have something to do with bailouts. What I know for sure is that we don't do any toggling of the calls for the debugger before the crash, because neither ToggleCall nor toggleDebugTraps is called.
The toggled calls are brittle due to a combination of circumstances:
- the offset that is returned for the toggled sequence is the offset to the SyncStackPtr,
and there's a comment explaining why this has to be so
- code that skips pools for toggled calls therefore needs to be aware that the sequence
can have other code in it than the toggled call instructions themselves or the pool
- mostly this seems to be done right, /in the places i've found/, but comments suggest
vaguely that that the right code was added later, because the comments are not quite
right
Reporter | ||
Comment 9•7 years ago
|
||
Oh yeah, the TC attached also crashes for --enable-debug --disable-optimize.
Reporter | ||
Comment 10•7 years ago
|
||
jit::EnterBaselineAtBranch() calls BaselineScript::nativeCodeForPC() which correctly returns the address of the SyncStackPtr's "mov" instruction. It then increments this value to skip past the handler:
data.jitcode = baseline->nativeCodeForPC(fp->script(), pc);
if (fp->isDebuggee()) {
MOZ_RELEASE_ASSERT(baseline->hasDebugInstrumentation());
data.jitcode += MacroAssembler::ToggledCallSize(data.jitcode);
}
However the size of the handler returned by ToggledCallSize() on ARM64 does not account for a possible constant pool in the middle of the handler. It does handle a pool before the initial LDR, but not before the NOP. It can also handle a pool at the beginning of the sequence, but then not a SyncStackPtr after the pool; or a pool after the SyncStackPtr, which is handled properly.
Superficially it looks like ARM could have the same problem - pools appear not to be handled at all, and toggledCall() does not have any guard that the instructions are not split up.
Reporter | ||
Comment 11•7 years ago
|
||
This fixes the problem (comment 10) on ARM64.
Can you take a look at the corresponding ARM code and judge whether it needs to be fixed there too? I expect debugging has not been exercised heavily on ARM...
I haven't scanned the rest of the arm64 code to see if there might be similar issues elsewhere, but I'll try to find time to do so.
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Flags: needinfo?(sstangl)
Attachment #8965649 -
Flags: review?(sstangl)
Reporter | ||
Updated•7 years ago
|
Summary: jit-test failures on ARM64 hardware, JS baseline JIT implicated → ToggledCallSize() does not properly account for pools on ARM64
Assignee | ||
Comment 13•7 years ago
|
||
Comment on attachment 8965649 [details] [diff] [review]
bug1446307-arm64-pool-fixes.patch
Review of attachment 8965649 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/arm64/MacroAssembler-arm64.h
@@ +1968,2 @@
>
> + const Instruction* cur = (const Instruction*)code;
This should use the InstructionIterator rather than open-coding its logic.
It would also be prudent to MOZ_ASSERT() that the instructions are indeed LDR/ADR and NOP/BLR.
@@ +1973,5 @@
> + cur = cur->skipPool();
> + cur = cur->NextInstruction(); // LDR/ADR
> + cur = cur->skipPool();
> + cur = cur->NextInstruction(); // NOP/BLR
> + return (uint8_t*)cur - code;
On ARM, the only remaining use of ToggledCallSize is to skip past the call -- so really, it should be some function like NextInstructionAfterToggledCall(), the size being an implementation detail that's leaking.
Attachment #8965649 -
Flags: review?(sstangl) → review+
Reporter | ||
Comment 14•7 years ago
|
||
Those are both good ideas but I appear to be swamped with other things. Since this is a JS-side issue, reassigning for rewrite + landing.
Assignee: lhansen → sstangl
Sean, is this still an issue?
status-firefox63:
--- → ?
Flags: needinfo?(sstangl)
Assignee | ||
Comment 16•6 years ago
|
||
I rebased the patch. ARM64 doesn't have an InstructionIterator yet, and ToggledCallSize is cross-platform and so I didn't want to fix that up either.
Can land as-is.
Attachment #8965649 -
Attachment is obsolete: true
Flags: needinfo?(sstangl)
Attachment #9022277 -
Flags: review+
Assignee | ||
Updated•6 years ago
|
Keywords: checkin-needed
Comment 17•6 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/24f10efb4ce1
Compute toggled call size properly; fix constant pool header. r=sstangl
Keywords: checkin-needed
Comment 18•6 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•