Closed
Bug 761933
Opened 12 years ago
Closed 12 years ago
Gaia homescreen doesn't repaint while panning around when loaded out of process
Categories
(Core :: General, defect, P1)
Tracking
()
RESOLVED
FIXED
blocking-basecamp | + |
People
(Reporter: cjones, Assigned: cjones)
References
Details
If the homescreen is loaded in <iframe ... remote="true">, its content process crashes. Not sure why yet.
Assignee | ||
Updated•12 years ago
|
Blocks: b2g-e10s-work
Updated•12 years ago
|
blocking-basecamp: --- → +
Comment 1•12 years ago
|
||
How are we doing here?
Updated•12 years ago
|
Priority: -- → P1
Assignee | ||
Comment 2•12 years ago
|
||
Will see if fixing this ameliorates bug 779968.
Assignee: nobody → jones.chris.g
Blocks: 779968
Assignee | ||
Comment 3•12 years ago
|
||
The crash is gone (maybe I was testing the wrong thing?), but there are some other bugs.
Assignee | ||
Comment 4•12 years ago
|
||
Well, the good news (although possibly bad news) is that the layers txn counter and composition fps stay around 60. Need to retest with the stop-excessive-recomposites patches.
Bad news is that, if I pan around really quickly on the homescreen, the content just stops moving. It doesn't start moving again until I stop panning around. We're not repainting unnecessarily or anything silly like that. First suspicion is that we're not marking the homescreen active/visible properly and it stays at -10 nice or something.
Assignee | ||
Comment 5•12 years ago
|
||
Yay, with bug 778036 the not-repainting goes away.
But the fps is still the same :(.
Assignee | ||
Comment 6•12 years ago
|
||
Since I'm about 80% sure I was originally testing with the system app, I'll go ahead and morph this into the other blocker that it is, and leave open until bug 778036 lands so that we can verify the fix still works.
Summary: Crash when loading gaia homescreen app out of process → Gaia homescreen doesn't repaint while panning around when loaded out of process
Assignee | ||
Comment 8•12 years ago
|
||
I wasn't seeing this with patches from bug 778036, but now I am again, even after it landed :(.
Assignee | ||
Comment 9•12 years ago
|
||
The things we can try here, in parallel, are
1. revert m-c to 8/15-era revision, re-apply bug bug 778036 and re-confirm that perf is good. Bisect from there to tip to see what regressed this.
2. check that the system app isn't invalidating or processing any new messages that cause the b2g process to do nontrivial work. Some changes for async pan/zoom might have caused this.
3. drop the process priority of the b2g process to the same as foreground content processes, and instead just increase the priority of the compositor thread
I'll look at (1). Anyone want to volunteer to poke at (2) or (3)?
Assignee | ||
Comment 10•12 years ago
|
||
On http://hg.mozilla.org/mozilla-central/rev/f111e43bc076 and https://github.com/mozilla-b2g/gaia/commit/7ec42eac729ef7e0e464969797c6d3fb56dac2d5 , with bug 778036 applied, I still see the bug.
Time to poke at gaia a bit ...
Assignee | ||
Comment 11•12 years ago
|
||
Can't reproduce my results on https://github.com/mozilla-b2g/gaia/commit/d5f5e024109ce8b788e052fdef8d4f2c7689f4b2 with this patch applied
diff --git a/apps/system/js/bootstrap.js b/apps/system/js/bootstrap.js
index 1cd9052..8dfe405 100644
--- a/apps/system/js/bootstrap.js
+++ b/apps/system/js/bootstrap.js
@@ -51,6 +51,7 @@ function startup() {
home.src = src;
home.setAttribute('mozapp', src.replace('index.html', 'manifest.webapp'));
home.setAttribute('mozbrowser', 'true');
+ home.setAttribute('remote', 'true');
home.dataset.zIndexLevel = 'homescreen';
var windows = document.getElementById('windows');
windows.parentNode.insertBefore(home, windows);
i.e., the pre-homescreen-as-app change to gaia.
So back to the drawing board. Grrrrr.
Assignee | ||
Comment 12•12 years ago
|
||
Transaction counter is still showing that we're not overdrawing or anything like that. The entire gfx pipeline just shuts down while panning around.
The cuddle animations are still running doubly-interpolated (content main thread along with compositor thread) at 60fps, which is odd ... the content process can repaint at 60fps alongside the compositor resampling at 60fps, but it can't repaint at *all* while the main thread is routing input events.
Assignee | ||
Comment 13•12 years ago
|
||
Making the master-process priority equal to the foreground priority (and also equal to background, to eliminate possibility of bugs) doesn't change anything. Even prioritizing the foreground process above the master process doesn't help. Aieeeee!
Assignee | ||
Comment 14•12 years ago
|
||
(the setpriority() calls are correct and aren't returning error codes)
Assignee | ||
Comment 15•12 years ago
|
||
Thinking back, I might have been testing on the Nexus S. A kernel- or event-hub-specific problem might be contributing to this. Will retest on nxs in a bit to double check.
Assignee | ||
Comment 16•12 years ago
|
||
In the first 1 second of fast-panning, we process 39 events in the content process, and they're generally received 10-30ms apart. There's a 200ms outlier gap though, early in the sequence. This seems to be when we prerender the adjacent screen.
We're not using input-event compression, which is a known bug, and this is causing the queue to back up a bit.
But the event handling everything on the content-process side looks ok ...
Assignee | ||
Comment 17•12 years ago
|
||
Alright, we're not sending PLayers:Update messages at all while panning. Something is going badly wrong.
Assignee | ||
Comment 18•12 years ago
|
||
I'm seeing 2-3 PuppetWidget::Invalidate/Paint total during panning, but per discussion with mattwoodrow I think this is expected.
Now to see if we're getting to nsViewManager::ProcessPendingUpdatesForView/nsRefreshDriver::Notify ...
Assignee | ||
Comment 19•12 years ago
|
||
nsRefreshDriver::Notify isn't being called. Matt, any ideas about that?
Checking now whether ProcessPendingUpdatesForView is being called separately ...
Assignee | ||
Comment 20•12 years ago
|
||
No, we don't hit ProcessPendingUpdatesForView either.
Assignee | ||
Comment 21•12 years ago
|
||
The symptom here is that while panning around in OOP web content (the gaia homescreen), we don't repaint at all.
What's happening is
- content receives touch events
- it sets the -moz-transform of some elements based on the touch events
- the restyles required by those changes are being posted
- we're scheduling a view manager flush
- but there are no refresh observers registered with that refresh driver, so we never hit nsRefreshDriver::Notify() and never repaint
In the steady state of no animation or image refresh observers, what's supposed to observe refresh driver ticks? For some reason, whatever that is seems not to be running in content processes.
When style changes happen, nsCSSFrameConstructor::PostRestyleEventInternal calls mPresShell->GetPresContext()->RefreshDriver()->AddStyleFlushObserver(mPresShell). That's what triggers a new refresh driver tick to flush styles on the presshell.
Comment 23•12 years ago
|
||
I changed this code in my original OMTA patch to no longer request that refresh driver tick. Maybe thats what breaks here? dz probably kept the basic design.
Assignee | ||
Comment 24•12 years ago
|
||
This is all restyling from explicit -moz-transform changes, no animations. It works in-process.
Assignee | ||
Comment 25•12 years ago
|
||
Animations work fine OOP too, 60fps (with sidecar interpolation).
Assignee | ||
Comment 26•12 years ago
|
||
Here's approximately what's happening (with lots of logging, which is making things about 4x more costly)
- the homescreen starts on the page with just the search icon
- the content process is processing touchmove quickly, keeping up with dispatch from the parent
* because we're on the page with just the icon, restyles and invalidates are cheap
* content process is sleeping between events
- we process the first refresh driver tick, which repaints the second page
- the second page is more complex (lots more icons). repainting is relatively expensive
* while repainting, the event queue continues filling up
- we being flushing event queue
* restyles take longer now because we're processing both pages (?)
* content process no longer sleeping between events
- for some reason, we don't process more refresh driver ticks
So there are three bad things going on here
1. not compressing input events. This is the worst.
2. restyles taking longer than they should. Exacerbates (1)
3. not process refresh-driver timer even though it's fired. Related to (1), but possibly indicates event priority problem.
bent, do you have any idea why (3) might be occurring?
Comment 27•12 years ago
|
||
Should we send timer events from the parent to avoid priority issues?
Assignee | ||
Comment 28•12 years ago
|
||
That will just create different performance problems.
Assignee | ||
Comment 29•12 years ago
|
||
Based on some unscientific tests, I think we'll be able free up 8-10MB of locked memory by moving the homescreen OOP. And that's not counting the system wallpaper, which the system app is still rendering in my test. That might free up another MB.
Assignee | ||
Comment 30•12 years ago
|
||
The panning fps on OOP homescreen has gone down by about 10fps since last I checked. Grrrrr. Have some more work to do to move it OOP.
Assignee | ||
Comment 31•12 years ago
|
||
The work here is done. Optimizations remain.
You need to log in
before you can comment on or make changes to this bug.
Description
•