Closed
Bug 980067
Opened 11 years ago
Closed 11 years ago
ARM: not flushing icache correctly when coalescing ranges in different vmas
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
RESOLVED
FIXED
mozilla30
People
(Reporter: jandem, Assigned: jandem)
References
Details
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
mjrosenb
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
With the patch for bug 939562, we're getting b2g crashes on real hardware (not the emulator).
The problem seems to be that we're coalescing icache flushes, and apparently Linux doesn't handle this correctly when the range spans multiple virtual memory regions (a while ago Marty sent a message to the Linux kernel list about this [0]).
In this case we are Baseline-compiling a script and as part of this we also compile an IC stub. The IC code and Baseline code end up in different regions with a virtual memory boundary between them. The result is that the kernel doesn't flush the icache for the script code. This leads to random crashes and other problems that are very hard to debug and track down.
It's not clear why this bites us only now. It's probably not very common to coalesce cache flushes spanning multiple VM regions right next to each other.
Nicolas, can you confirm that fixing AutoFlushCache::update to always flush fixes the problem? Marty/Nicolas can you check how much this (and other fixes) would affect our (benchmark) performance?
[0] http://comments.gmane.org/gmane.linux.ports.arm.kernel/181277
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(mrosenberg)
Comment 1•11 years ago
|
||
Nasty! Great find, Jan.
Assignee | ||
Comment 2•11 years ago
|
||
Marty, this is probably the simplest fix that still avoids eagerly flushing everything.
Assignee | ||
Comment 3•11 years ago
|
||
Attachment #8386684 -
Attachment is obsolete: true
Attachment #8386684 -
Flags: review?(mrosenberg)
Attachment #8386685 -
Flags: review?(mrosenberg)
Comment 4•11 years ago
|
||
Comment on attachment 8386685 [details] [diff] [review]
Patch
I checked manually, and multiple times on a device, I can no longer reproduce the loop crash of the home screen.
I've setup AWFY to run this patch on top of recent changes (as if it landed), to see if there is any impact. The result should be uploaded in a few minutes.
Flags: needinfo?(nicolas.b.pierron)
Comment 5•11 years ago
|
||
Comment on attachment 8386685 [details] [diff] [review]
Patch
Review of attachment 8386685 [details] [diff] [review]:
-----------------------------------------------------------------
Since this is a correctness issue, let's get this in, and worry about perf later.
Attachment #8386685 -
Flags: review?(mrosenberg) → review+
Assignee | ||
Comment 6•11 years ago
|
||
Comment 7•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(mrosenberg)
Assignee | ||
Comment 8•11 years ago
|
||
Comment on attachment 8386685 [details] [diff] [review]
Patch
[Approval Request Comment]
> Bug caused by (feature/regressing bug #):
JIT bug on ARM.
> User impact if declined:
Crashes or correctness problems on ARM hardware (Android/b2g). Bug 939562 exposed this by crashing some b2g apps, but it's possible it could also cause problems without that patch.
> Testing completed (on m-c, etc.):
On m-c.
> Risk to taking this patch (and alternatives if risky):
Low.
> String or IDL/UUID changes made by this patch:
None.
Attachment #8386685 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
status-firefox29:
--- → affected
status-firefox30:
--- → fixed
Updated•11 years ago
|
Attachment #8386685 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 9•11 years ago
|
||
Comment 10•11 years ago
|
||
A use pattern in Odin is to define the entire code object as needing to be flushed early and to expect that flushing of smaller ranges within this will be suppressed.
This patch appears to degrade this. OnSamePage only returns true if the start or end are on the same page, not if one region is well within the other - this does not appear to match the comments.
Would it be safe to check if the regions overlap and in this case coalesce them?
Alternatively could the pages be flushed one at a time? Might this even be necessary, might the kernel split vma regions even when they are part of the same unchanged allocation.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 11•11 years ago
|
||
(In reply to Douglas Crosher [:dougc] from comment #10)
> Would it be safe to check if the regions overlap and in this case coalesce
> them?
That should be fine I think. The main problem in this bug was coalescing flushes for memory regions from different allocations that just happened to be close to each other. As long as you know both flushes touch memory from the same allocation, coalescing should be ok.
Flags: needinfo?(jdemooij)
You need to log in
before you can comment on or make changes to this bug.
Description
•