Open Bug 1175349 Opened 9 years ago Updated 2 years ago

Content process hangs if asm.js AOT hangs

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

REOPENED

People

(Reporter: azakai, Unassigned)

Details

Attachments

(1 file)

(deleted), application/x-javascript
Details
Attached file src.cpp.o.js (deleted) —
The attached js file, if loaded as a script tag in an html page, will hang the content process. The script takes a very long time to AOT, 30 seconds in the shell for me. In the browser, the UI just shows the spinner, and the slow script options never appear. All the user can do is close the browser or kill the content process manually.

(In fact, compilation never seems to finish for me, even after a minute or 2, so there could be another bug here which I can file separately. Just mentioning that here, because if you load that script, you will need to kill your content process.)
Blocks: e10s
I think the lack of slow-script dialog is caused by the fact that asm.js AOT (and parsing in general) does not check the interrupt flag (which is how the slow-script dialog gets triggered).  Testing locally, I can also kill the process by closing the tab (not just closing the brower), so I think e10s is doing its job here and this bug is unrelated to e10s.

As for the workload itself, is this just bug 1174230?
No longer blocks: e10s
Summary: [e10s] Content process hangs if asm.js AOT hangs → Content process hangs if asm.js AOT hangs
(Yes, the workload is that bug.)

Interesting that closing that tab can help, but it relies on the user knowing which tab it is - and that knowledge is needed at a time when all tabs show a spinner and nothing else. I assumed that was an e10s bug.
Well, that's more of a detail that, by default, e10s has a single content process (personally I have dom.ipc.processCount=10, so I'm able to use other tabs (up to 9 :)).  I think the plan with e10s is to gradually increase this limit.  I think the real bug here either the algorithmic regression you bisected.  Even if Odin checked the interrupt flag at common pinch-points (e.g., once per function), it wouldn't help here since presumably all the time is spent in the regalloc for one function.
I guess there isn't a feasible way in general to detect that AOT is taking too long and interrupting it?

That makes me think even more that this is an e10s issue. e10s *can* have a general way to detect that the content process is hanging, I was discussing this with Bill a while back. Such a general mechanism would catch AOT hangs among everything else.
There's another bug I can't find now where layout is slow and we don't show the hang UI. We really just need to find the right circumstances for showing it. I guess we could just measure the time between spins of the event loop.
(In reply to Alon Zakai (:azakai) from comment #4)
> I guess there isn't a feasible way in general to detect that AOT is taking
> too long and interrupting it?

There is a feasible way (at least in SpiderMonkey): check the interrupt flag; we just (historically) haven't done that anywhere inside the parser/compiler b/c it's "supposed" to be O(n).  Maybe we should (although I'd like to see this switch regression fixed).

But having a hang monitor like you and Bill are saying does sound like a great general e10s feature and it would cover up for weak points.
(In reply to Luke Wagner [:luke] from comment #3)
> Even if Odin checked the interrupt flag at common pinch-points
> (e.g., once per function), it wouldn't help here since presumably all the
> time is spent in the regalloc for one function.

To be fully precise, isn't it that the main thread is waiting for the helper threads to join? (in the case where parallel compilation isn't available, that would indeed be regalloc that takes all the time).

Maybe the main thread could wait for these threads to join *and* wake up every X ms to check for the interrupt flag?
We'd still need to cancel the runaway worker task on the helper thread.  Also, we have sequential configurations.
Long running Ion compilations can be canceled by other threads, via MIRGenerator::cancel().  This flag is frequently checked by the various stages of backend compilation, it's just a separate thing from the interrupt flag on the runtime.
Priority: -- → P3
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: