Closed
Bug 1421445
Opened 7 years ago
Closed 7 years ago
AMD Bobcat 64-bit crash mitigation
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla59
People
(Reporter: tcampbell, Assigned: tcampbell)
References
Details
Attachments
(3 files, 4 obsolete files)
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
As an attempt to reduce crashes in Bug 1281759, I propose adding NOPs before JIT code buffers. Looking at the 64-bit crashes on FF57 / FF58, the reported fault instruction is a few bytes before the code starts in the middle of the JitCode* pointer.
This patch observes that we over-allocate those buffers by sizeof(void*) due to incorrect alignment math. By stealing that data to fill with NOPs we remain size neutral and need fewer checks if CPU is affected. In the future, if this fix does not provide stability improvements, we remove the NOPs and have a size reduction.
Looking at crash data, it seems that most of the WRITE faults are crashing writing to address of %rax, which is consistent with trying to execute the JitCode* pointer as code. Currently, 75% of the crashes we see for this 64-bit FF57 AMD Bobcat signature are WRITE faults.
Assignee | ||
Comment 1•7 years ago
|
||
Add JitCodeHeader struct and fix allocation size computation to save one word per jit code buffer.
Assignee | ||
Comment 2•7 years ago
|
||
Add NOP-slide before jit code buffers. This is size-neutral when including previous patch. As a result, I don't try and be too clever about only applying this to affected machines.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → tcampbell
Status: NEW → ASSIGNED
Assignee | ||
Updated•7 years ago
|
Attachment #8932634 -
Flags: feedback?(jdemooij)
Comment 3•7 years ago
|
||
Comment on attachment 8932634 [details] [diff] [review]
0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch
Review of attachment 8932634 [details] [diff] [review]:
-----------------------------------------------------------------
Makes sense.
::: js/src/jit/Ion.cpp
@@ +739,5 @@
> return trampolineCode(p->value());
> }
>
> +void
> +JitCodeHeader::init(JitCode* jitCode) {
Nit: { on its own line.
Attachment #8932634 -
Flags: feedback?(jdemooij) → feedback+
Assignee | ||
Comment 4•7 years ago
|
||
Cleaned up assert to check end of buffer is within range.
Attachment #8932633 -
Attachment is obsolete: true
Attachment #8932919 -
Flags: review?(jdemooij)
Assignee | ||
Comment 5•7 years ago
|
||
Fixed parens/whitespace.
Attachment #8932634 -
Attachment is obsolete: true
Attachment #8932920 -
Flags: review?(jdemooij)
Assignee | ||
Comment 6•7 years ago
|
||
Oops.. s/code/codeStart in the macro.
Attachment #8932919 -
Attachment is obsolete: true
Attachment #8932919 -
Flags: review?(jdemooij)
Attachment #8932923 -
Flags: review?(jdemooij)
Assignee | ||
Comment 7•7 years ago
|
||
Try run with Windows PGO builds in progress: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1b201532226789ebad03cb8431f34538966e91a
Comment 8•7 years ago
|
||
Comment on attachment 8932923 [details] [diff] [review]
0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch
Review of attachment 8932923 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/Linker.cpp
@@ +51,4 @@
>
> // Bump the code up to a nice alignment.
> codeStart = (uint8_t*)AlignBytes((uintptr_t)codeStart, CodeAlignment);
> + MOZ_ASSERT(codeStart + masm.bytesNeeded() <= result + bytesNeeded);
Good to have this assert.
Attachment #8932923 -
Flags: review?(jdemooij) → review+
Updated•7 years ago
|
Attachment #8932920 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 9•7 years ago
|
||
Try run from the final patches (with s/code/codeStart/ fix): https://treeherder.mozilla.org/#/jobs?repo=try&revision=67a3d0645a7fa634429fcc95bf51b6c9fc3c316f
Comment 10•7 years ago
|
||
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/797d9e71b648
Don't waste space allocating jit code buffer. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/39f221c258fc
Add NOP-slide before JIT code buffers on AMD Bobcat. r=jandem
Comment 11•7 years ago
|
||
Backed out for frequent Valgrind test failures
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=39f221c258fc3cbb4162fd891ff68b6631f7ceb9&filter-searchStr=98b6ca8dacff858ad2151e9c6cc26eeff630bc64
https://treeherder.mozilla.org/logviewer.html#?job_id=148805049&repo=mozilla-inbound&lineNumber=35487
https://hg.mozilla.org/integration/mozilla-inbound/rev/33826eb2a21661bd8712411f2d32626503ced303
Flags: needinfo?(tcampbell)
Comment 12•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/797d9e71b648
https://hg.mozilla.org/mozilla-central/rev/39f221c258fc
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox59:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment 13•7 years ago
|
||
Reopening bug because the patches were backed out on inbound. The patches have been merged to central, but the backout hasn't yet.
Status: RESOLVED → REOPENED
status-firefox57:
--- → affected
status-firefox58:
--- → affected
Resolution: FIXED → ---
Comment 14•7 years ago
|
||
Sorry about that and thanks for already reopening.
https://hg.mozilla.org/mozilla-central/rev/33826eb2a216
Target Milestone: mozilla59 → ---
Updated•7 years ago
|
Priority: -- → P3
Assignee | ||
Comment 15•7 years ago
|
||
The Valgrind failure is that we pack instructions to close to the following JitCodeHeader data leading to decoding garbage and walking off the page and SEGV-ing Valgrind itself.
This patch adds a word of padding at end of JitCode buffer, but only on MOZ_VALGRIND.
Jan, does this seem too questionable to land? I'll follow up on the valgrind side to see if it should be fixed.
Flags: needinfo?(tcampbell)
Attachment #8935645 -
Flags: review?(jdemooij)
Assignee | ||
Comment 16•7 years ago
|
||
Another point that :pbone made is to check the Intel guides on recommendations for padding between code and data because they recommend not having garbage nearby.
Assignee | ||
Comment 17•7 years ago
|
||
Comment 18•7 years ago
|
||
Uhm, btw, could you add a check for linux kernel <4.14, alongside cpuid check, in NeedAmdBugWorkaround function?
Assignee | ||
Comment 19•7 years ago
|
||
For reference, this is the kernel patch that mirh is referring to: https://github.com/torvalds/linux/commit/bfc1168de949cd3e9ca18c3480b5085deff1ea7c
Also for reference, this is what :pbone is talking about in Intel guides:
> Assembly/Compiler Coding Rule 57. (M impact, L generality) If (hopefully read-only) data must
> occur on the same page as code, avoid placing it immediately after an indirect jump. For example,
> follow an indirect jump with its mostly likely target, and place the data after an unconditional branch.
After some discussion with :sewardj, it seems the Valgrind failure deserves further review and the assessment in Comment 15 is not correct. Valgrind shouldn't be looking any further than the basic block.
Assignee | ||
Comment 20•7 years ago
|
||
Comment on attachment 8935645 [details] [diff] [review]
0001-Bug-1421445-Work-around-Valgrind-bug-concerning-JitC.patch
This patch is not the right approach. I have an alternate that only puts the pad at the end of ExecutableAllocator pool that passes valgrind.
Attachment #8935645 -
Attachment is obsolete: true
Attachment #8935645 -
Flags: review?(jdemooij)
Assignee | ||
Comment 21•7 years ago
|
||
This patch add's UD2 ops at the end of JitCode to avoid perf impact of indirect branch followed by data bytes. This also mitigates the Valgrind crash.
Attachment #8936677 -
Flags: review?(jdemooij)
Assignee | ||
Comment 22•7 years ago
|
||
Comment 23•7 years ago
|
||
Comment on attachment 8936677 [details] [diff] [review]
0003-Bug-1421445-Insert-UD2-instructions-after-JitCode.patch
Review of attachment 8936677 [details] [diff] [review]:
-----------------------------------------------------------------
Excellent.
Attachment #8936677 -
Flags: review?(jdemooij) → review+
Comment 24•7 years ago
|
||
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a031f41d0fb9
Don't waste space allocating jit code buffer. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/90a8cf71fa46
Add NOP-slide before JIT code buffers on AMD Bobcat. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/af47243ec0d7
Insert UD2 instructions after JitCode. r=jandem
Comment 25•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a031f41d0fb9
https://hg.mozilla.org/mozilla-central/rev/90a8cf71fa46
https://hg.mozilla.org/mozilla-central/rev/af47243ec0d7
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment 26•7 years ago
|
||
Is that something you would like to uplift in 58?
Flags: needinfo?(tcampbell)
Assignee | ||
Comment 27•7 years ago
|
||
Comment on attachment 8932923 [details] [diff] [review]
0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch
Approval Request Comment
[Feature/Bug causing the regression]:
AMD Bobcat CPU bug
[User impact if declined]:
This attempts to mitigate some of the AMD Bobcat crash signatures which are the top 64-bit crash on 57 and 58.
[Is this code covered by automated tests?]:
Automated tests cover regressions on normal CPUs. There is no automated test for the specific CPU fault.
[Has the fix been verified in Nightly?]:
Nightly has not regressed with these patches.
[Needs manual test from QE? If yes, steps to reproduce]:
No. We will watch the crash stats after landing.
[List of other uplifts needed for the feature/fix]:
All three patches on this bug are needed.
[Is the change risky?]:
Low-moderate
[Why is the change risky/not risky?]:
We adjust memory layout of a key structure, which is a risk. This code is heavily covered by many existing test suites so if nightly is stable, I don't expect problems in uplift.
[String changes made/needed]:
None
Flags: needinfo?(tcampbell)
Attachment #8932923 -
Flags: approval-mozilla-beta?
Comment on attachment 8932923 [details] [diff] [review]
0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch
Sounds promising to fix a top crash in 58, let's uplift for the 58.0b12 build next week.
Attachment #8932923 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 29•7 years ago
|
||
Ted, do you expect that we will see any change in the AMD crash volume from Nightly users if this fix is working? Or are there too few AMD Bobcat users running Nightly?
In the four days since this fix landed in Nightly 59 (2017-12-14), there have been 103 EnterJit and 16 EnterBaseline crashes from Win64 Firefox users. In the four days preceding the fix, there were 104 EnterJit and zero EnterBaseline crashes.
Flags: needinfo?(tcampbell)
Comment 30•7 years ago
|
||
bugherder uplift |
Comment 31•7 years ago
|
||
Ted and I chatted on Slack. He said the mitigation didn't fix the AMD crashes. It just made the crashes weirder, such as impossible faults in register move instructions. He doesn't think we should back out the code change. It cleans up the JIT code and it shouldn't make these AMD crashes worse.
Flags: needinfo?(tcampbell)
Comment 32•7 years ago
|
||
Any update on this?
Might KAISER/Spectre/Meltdown workarounds-mitigation you or OS implement help (or worsen?) the issue?
Also before we all forget again, adding that little check I suggest in comment 18 wouldn't hurt.
You need to log in
before you can comment on or make changes to this bug.
Description
•