Closed Bug 1066381 Opened 10 years ago Closed 10 years ago

Trivial Infinite Loop on e10s shows spinner

Categories

(Firefox :: General, defect)

35 Branch
x86_64
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
e10s m4+ ---

People

(Reporter: suirtemedb, Assigned: billm)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Build ID: 20140911064110

Steps to reproduce:

Navigated to crashmybroswer.com and initiated the trivial infinite loop test


Actual results:

The other tabs that I had displayed black web pages


Expected results:

the other tabs should have not had any difference to them.
This issue is reproducible on the latest Nightly build id:20140911064110
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Untriaged → General
tracking-e10s: --- → ?
e10s does not use a separate process for each tab. All tabs run in one content process that will be sandboxed from the OS.
This may be a dupe of m3 bug 1059494. Investigating..
Assignee: nobody → jmathies
(In reply to Jim Mathies [:jimm] from comment #3)
> This may be a dupe of m3 bug 1059494. Investigating..

No, looks like it's just a situation where the content process gets hung in a tight script loop, and for some reason this can cause other tabs to stop painting when selected. Chrome stays responsive.
Generally we operate correctly here - after a few seconds, the slow script dialog pops up, and that gives the user to break out of the loop. The only problem here has to do with painting other tabs prior to the slow script dialog.

I'm not sure this should block m2.

David, any thoughts here on the lack of a paint refresh?
Flags: needinfo?(davidp99)
Attached file inifnite loop test case (deleted) —
It looks to me like, as you suggest, the content process is spinning on the script.  Therefore, it can't handle any repaint messages (or any other messages).  Only the chrome content can re-draw itself as it and the compositor are the main process which is not stalled by the script.

The display for the tab running the script does show content (even when you leave and come back) but that's an old draw.  You can see this by commenting out lines 16 and 22 in hungscript.html (the setTimeout call).  If you do this then the content process never gets a chance to redraw as red before it enters the loop -- so leaving the tab and returning still shows a green bar since it doesn't really redraw (it just reuses the old graphics objects in the compositor).

So this is really by design.  Changing the behavior will be big.  One-process-per-tab would do it.  So would keeping around the compositor objects for tabs we aren't looking at (which would be far too wasteful).  Otherwise, they would have to be redrawn from content... content which only exists on the stalled process.
Flags: needinfo?(davidp99)
jimm: I've confirmed the stuff in comment 7 on IRC.  There are occasional bugs (such as Bug 1047058) where objects stick around but layers/drawtargets are cleared out for big performance and resource reasons.
Also, I'm sure there is newer information on this but Chrome's original solution was the per-tab option:
http://blog.chromium.org/2008/09/multi-process-architecture.html
Deferring to M3 because this problem was not hit in normal browsing.
We're going to look to do some sort of "dumb" spinner on foreground tabs that can't paint for v1, and maybe offer the user the opportunity to kill the offending script in a future rev.

<jimm> blassey: bug 1066381 looks pretty bad
<firebot> https://bugzil.la/1066381 — NEW, jmathies — Trivial Infinite Loop on e10s causes other tabs to display black pages
<jimm> on mac you get a black content rect for every tab, on windows it's a white content rect.
<jimm> and it doesn't go away until the slow script dialog comes up and you cancel the script.
<blassey> jimm: if you go to crashmybrowser.com
<jimm> there's a test case in the bug if you want to avoid that site
<blassey> right, my point is that you've gotta go to a site that's trying to do something stupid
<jimm> yep
<blassey> sorry... are you arguing for m2 rather than m3?
<jimm> blassey: based on the feedback so far it feels more like a "we can't fix this unless we go to process per tab." 
<blassey> jimm: we could also do supersnappy
<jimm> supersnappy?
<blassey> seperate event loops per tab
<blassey> that get prioritized differently
<blassey> the project got killed
<blassey> but I still like the idea
<ttaubert> I thought most of the stuff was done, mostly only fixing tests was left
<blassey> ttaubert: I think that's correct
<jimm> bug 718121
<firebot> https://bugzil.la/718121 — NEW, bhackett1024 — Allow chrome and content to run on different threads
<billm> jimm: can't we just draw a spinner or something if conent isn't drawing?
<billm> the parent process could do that
<blassey> billm: could we interupt the script when the tab changes?
<blassey> or maybe after a delay if that tab changes
<billm> blassey: we could, although that would break run-to-completion for the script
<blassey> understood
<blassey> I guess I've never believed that to be as important as most other people do
<blassey> so I'm biased here
<billm> blassey: or we could overlay the tab with an option for the user to terminate the script
<billm> and remove it when the script recovers
<blassey> that might be nice
<billm> but I still think we need some sort of indication that a tab is busy. it might be doing something slow that's not JS.
<blassey> "http://crashmybrowser.com is hanging your browser, would you like to stop it?"
<billm> I kinda thought that's what bug 1067095 would do
<firebot> https://bugzil.la/1067095 — DUPLICATE — Black screen when switching tabs
<blassey> billm: if we can't draw, we should overlay
<blassey> if the reason is we're in content js, we should offer to interupt
<billm> er, wrong bug
<billm> bug 1009628
<firebot> https://bugzil.la/1009628 — FIXED, davidp99 — Need mozAfterRemotePaint event for remote iframes
<blassey> billm: that wasn't in the scope of those bugs
<jimm> if you open up a bunch of tabs, and then load that test case, you can easily flip between tabs multiple times before the slow script dialog forces into the foreground
<jimm> I don't think a spinner overlay on every tab during that time period of going to look good.
<billm> jimm: it seems no worse than what we do now. I guess I don't see this as a big problem.
<blassey> jimm: right now you can't switch tabs
<billm> blassey: why not? that would be a bug.
<jimm> I can switch tabs, I just don't get any drawing in those tabs
<blassey> billm: I was referring to non-e10s
<jimm> ah
<billm> blassey: do you think it's better to prevent switching tabs or worse?
<blassey> billm: worse
<billm> ok, I agree
<blassey> I'd rather let you switch tabs
<blassey> and have a "hey.. .that other tab is killing you right now, want to kill it?" 
<blassey> and I think we can be more agressive about it with e10s
<blassey> since our chrome is functional
<billm> I think with a spinner we'd be in pretty good shape. and a "kill JS" button would be gravy.
<billm> I guess we could also offer to kill the entire process
<jimm> can we detect which tab is the problem from chrome?
<billm> not immediately. we'd have to wait for the JS operation callback.
<blassey> billm: we could send an interupt
<blassey> which will trigger the slow script dialog
<jimm> sounds like chrome isn't aware there's a problem
<jimm> until the ssd triggers
<blassey> jimm: we can detect that
<blassey> if you switch tabs and we don't get a mozAfterRemotePaint in x seconds
<blassey> then your child is probably hung
<billm> even without switching tabs
<blassey> and we can send a interupt
<jimm> wouldn't a slow loading site also trigger that?
<jimm> (relying on mozAfterRemotePaint)
<blassey> jimm: yes, after a timeout
<blassey> I'm saying that if we're concerned about this interaction in particular
<blassey> we could be more liberal about sending an interupt when the user has switched tabs and the child is unresponsive
<jimm> I'm not familiar with how we would send an interupt or really what that means exactly. :)
<jimm> some sort of js interupt sent to the tab we switch away from telling it to stop processing?
<blassey> I was assuming we wree talking about a signal
<billm> the JS engine allows you to set a flag to request an "operation callback". the next time JS code hits a back edge or function entry, it will leave JS and call the "operation callback". from there we can do whatever we want.
<billm> it's much safer than an interrupt because you know you're in a reasonable state (and not in the middle of malloc or something)
<blassey> yes, that is nice
<billm> that's how we implement the slow script dialog, but it can be used for all sorts of things
<blassey> billm: would it make sense to do some basic message handling at that point
<billm> yeah, we'd have to be careful. we don't want to process arbitrary messages, but we could look out for special messages.
<billm> maybe we could use a separate channel
<billm> I think this is probably a post-version 1 sort of thing though
<jimm> billm: would the js engine be able to decide if the script was slow at that point?
<jimm> I'm not sure what we would do with that operation callback when we switch tabs away from the hung script.
<jimm> afaik, the js engine doesn't consider the script hung at that point.
<billm> jimm: at that point chrome could ask the user whether to terminate the script
<billm> we do know how long the script has been running for
<billm> the JS engine doesn't do any elaborate analysis. if the script has been running for >10 seconds, it puts up the dialog.
<jimm> seems like you would get a lot of false positives, but I'm willing to take a swing at this all the same. sounds like I should start with the slow script dialog, figure out how that works and work from there. 
<jimm> I see
<billm> jimm: I don't think we should do this now. I'd much rather we just put up a spinner or something.
<blassey> backing this truck up
<billm> doing the operation callback thing will be a ton of work and it's unlikely to benefit people much
<blassey> the black tabs are not any worse than the behavior in non-e10s
<blassey> where your entire browser is hung
<blassey> putting a spinner in would probably be a nice improvement
<blassey> but I don't think we need much more than that now
<jimm> ok, sounds good. will post this conversation to the bug.
<billm> I don't know. given that some people see black and others see white, I'm  guessing we're just displaying random memory.
<billm> that's what happened with the previous tab switching bug at least
<billm> and that seems worse than our existing behavior to me
<blassey> billm: I think that's just platform differences between if the memory is 0'd out or f'd out
<billm> blassey: could we at least use a consistent color?
<blassey> billm: I'd assume if we do a spinner we would
<billm> I'm not saying this should be M2, but probably M3 or M4
<blassey> we'd have a "design" that will be the busy place holder
<blassey> agreed
<billm> ok, sounds good
Depends on: 1049551
Pushing this back a little since bug 1049551 will help mitigate issues with unpainted tabs.
Assignee: jmathies → gwright
Isn't this basically a dupe of bug 1059494?
Attached image chromepopup.png (deleted) —
(In reply to Jim Mathies [:jimm] from comment #13)
> Isn't this basically a dupe of bug 1059494?

It isn't, that bug is specific to hung cpows.
Summary: Trivial Infinite Loop on e10s causes other tabs to display black pages → Trivial Infinite Loop on e10s shows spinner
Changing this up based on a discussion with gwright who chatted with billm about this.
Assignee: gwright → wmccloskey
The new hang notification works great on this!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: