Closed Bug 714404 (nativefennecgllayers) Opened 13 years ago Closed 12 years ago

Use asynchronous GL layers where available instead of the software layer client

Categories

(Firefox for Android Graveyard :: General, defect, P1)

ARM
Android
defect

Tracking

(blocking-fennec1.0 beta+, fennec12+)

RESOLVED DUPLICATE of bug 725095
Tracking Status
blocking-fennec1.0 --- beta+
fennec 12+ ---

People

(Reporter: pcwalton, Unassigned)

References

Details

(Keywords: meta, mobile, perf)

We want to use asynchronous GL layers for the second or third Fennec Native release, at least on devices where it is supported. This will provide numerous immediate benefits, including:

* Much less checkerboarding due to the layers infrastructure's tile cache.
* Correct handling of position:fixed.
* Accelerated video and WebGL.
* A framework to do asynchronous panning of iframes and overflow:scroll elements.
* General performance and reliability improvements due to the use of more robust GL code that's been shipping on desktop for nearly a year.

The idea is to keep the Java compositor code around, but to reduce its role to drawing of decorations (the checkerboard, the scroll bars, the page shadow, the background, and the FPS meter). We will need to introduce a way to register callbacks on the Gecko compositor thread so that the Java compositor is able to draw before and after the layer tree is rendered (but before glFlush() or glFinish() is called).

To support asynchronous panning and zooming, we also need some way for widget to be able to apply a scale and translation to the layers in the layer tree without touching the Gecko event loop. These transforms would be separate from the current notion of layer transforms; they would purely be under widget control and content would be unable to affect them. For the first version it should be fine if widget is only able to apply global transformations to the entire layer tree (translation and scaling), similar to the way gfxContext can have a scale and/or translation applied to it today. Eventually we would like to be able to take individual nsIContentViews and apply translations to them, so that the user is able to asynchronously pan subdocument frames.

I don't think we need a true framework for asynchronous scrolling at the engine level right now; B2G needs it, and we can migrate to its implementation once it's ready, but Fennec Native already has asynchronous scrolling via a combination of Java, widget, and browser.js, and that code can remain.
(In reply to Patrick Walton (:pcwalton) from comment #0)
> * Much less checkerboarding due to the layers infrastructure's tile cache.

There are some repaint optimizations but there's not a tile cache.

> The idea is to keep the Java compositor code around, but to reduce its role
> to drawing of decorations (the checkerboard, the scroll bars, the page
> shadow, the background, and the FPS meter).

There's already support for drawing a checkerboard (although it was turned off in favor of using the page's background color) and an FPS meter.  There's no support for scroll bars.  I'm not sure what page shadow and background you're referring to.

> To support asynchronous panning and zooming, we also need some way for
> widget to be able to apply a scale and translation to the layers in the
> layer tree without touching the Gecko event loop.

This already exists but there's no pure-C++ binding to it.  romaxa was adding one last I heard.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #1)
> There are some repaint optimizations but there's not a tile cache.

That would be useful to add, mostly so that we can zoom out without seeing lots of checkerboards and without rendering. A tile cache at various positions and sizes would help us here by allowing us to retain areas outside the visible rect at a smaller size. I believe iOS has this these days.

But, either way, we'll still get benefits. Being able to avoid copying around areas that haven't changed when scrolling will be the biggest win in the battle against checkerboarding. Another win will be to use the logic of the layers system to decide when to buffer outside the visible rect instead of unconditionally rendering what is often a very large region (1024x2048 or more).

> > The idea is to keep the Java compositor code around, but to reduce its role
> > to drawing of decorations (the checkerboard, the scroll bars, the page
> > shadow, the background, and the FPS meter).
> 
> There's already support for drawing a checkerboard (although it was turned
> off in favor of using the page's background color) and an FPS meter. 
> There's no support for scroll bars.  I'm not sure what page shadow and
> background you're referring to.

When you overscroll, we draw a shadow and a background behind the page. I'd still like to see whether we can keep that in Java, because we do want to be able to draw it before Gecko is up. All we need is a callback that allows us to draw some extra stuff using the currently-selected GL context before and after the layer tree is drawn. We already draw everything using Android's direct bindings to GLES APIs (not the javax.microedition.khronos.opengles stuff); the cross-language barrier shouldn't be a concern here.
(In reply to Patrick Walton (:pcwalton) from comment #2)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #1)
> > There are some repaint optimizations but there's not a tile cache.
> 
> That would be useful to add, mostly so that we can zoom out without seeing
> lots of checkerboards and without rendering. A tile cache at various
> positions and sizes would help us here by allowing us to retain areas
> outside the visible rect at a smaller size. I believe iOS has this these
> days.
> 

It sounds like you want something like rings of pre-rendered content at diminishing scale factors, I assume off the critical re-render path for the main content area.  I'm not sure a tile cache is the best way to do that, but maybe.

> But, either way, we'll still get benefits. Being able to avoid copying
> around areas that haven't changed when scrolling will be the biggest win in
> the battle against checkerboarding.

That optimization already exists.

> Another win will be to use the logic of
> the layers system to decide when to buffer outside the visible rect instead
> of unconditionally rendering what is often a very large region (1024x2048 or
> more).
> 

That logic doesn't exist in gecko yet (it lives in frontend JS in xul-fennec), but I'm convinced it should be in gecko going forward.  We want to add it for b2g.

> > > The idea is to keep the Java compositor code around, but to reduce its role
> > > to drawing of decorations (the checkerboard, the scroll bars, the page
> > > shadow, the background, and the FPS meter).
> > 
> > There's already support for drawing a checkerboard (although it was turned
> > off in favor of using the page's background color) and an FPS meter. 
> > There's no support for scroll bars.  I'm not sure what page shadow and
> > background you're referring to.
> 
> When you overscroll, we draw a shadow and a background behind the page. I'd
> still like to see whether we can keep that in Java, because we do want to be
> able to draw it before Gecko is up.

I see.  There's really not a good solution for doing that in layers without slowing down startup, even bypassing general gecko startup with something like liblayers.so.  We'll need to hand off some state like that from java->gecko when gecko comes up anyway, so doing this initially in java then switching to gecko could be an option.

> All we need is a callback that allows us
> to draw some extra stuff using the currently-selected GL context before and
> after the layer tree is drawn. We already draw everything using Android's
> direct bindings to GLES APIs (not the javax.microedition.khronos.opengles
> stuff); the cross-language barrier shouldn't be a concern here.

That's not terribly hard in theory, but it requires a new layers rendering interface.  It also ties us to compositing on a java-visible thread, which should be OK as long as the java code doesn't need a java event loop.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> It sounds like you want something like rings of pre-rendered content at
> diminishing scale factors, I assume off the critical re-render path for the
> main content area.

Yes, that would work too.

> That's not terribly hard in theory, but it requires a new layers rendering
> interface.  It also ties us to compositing on a java-visible thread, which
> should be OK as long as the java code doesn't need a java event loop.

The Java code doesn't need a Java event loop (we don't even use a Java event loop right now for rendering; all rendering is on a separate thread). Something as simple as gfxContext::AddPreRenderHook(nsRunnable *) and gfxContext::AddPostRenderHook(nsRunnable *) would be fine.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> That logic doesn't exist in gecko yet (it lives in frontend JS in
> xul-fennec), but I'm convinced it should be in gecko going forward.  We want
> to add it for b2g.

We don't need this for the first version; we can use the clipSubdocument stuff that we're currently using. But it'd be better to switch over to Gecko's buffering once it's ready.
(In reply to Patrick Walton (:pcwalton) from comment #4)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> > That's not terribly hard in theory, but it requires a new layers rendering
> > interface.  It also ties us to compositing on a java-visible thread, which
> > should be OK as long as the java code doesn't need a java event loop.
> 
> The Java code doesn't need a Java event loop (we don't even use a Java event
> loop right now for rendering; all rendering is on a separate thread).
> Something as simple as gfxContext::AddPreRenderHook(nsRunnable *) and
> gfxContext::AddPostRenderHook(nsRunnable *) would be fine.

That's eminently doable.
tracking-fennec: --- → +
Priority: -- → P1
tracking-fennec: + → 11+
tracking-fennec: 11+ → 12+
Depends on: 723036
Currently I have panning and zooming standing up.

Problems:
(1) Cannot cope with surface loss. Reverts to Java compositor on surface loss (which is extremely common from e.g. bringing up the URL bar), which makes the browser unusable thereafter.
(2) Incorrect transforms for zooming. Shouldn't be hard to fix this one.
(3) An occasional startup crash (seems to be a race).
(4) Racy viewport updates.
(5) Adreno 200 crashes are not solved yet.
(6) Decorations in the Java compositor are not implemented yet.
(7) We do not yet buffer areas outside the visible rect.
(8) We still checkerboard a lot, although significantly less than the Java compositor.

(1) through (6) are reasonably straightforward (although #4 is a slight risk, we've solved these issues before in the Java compositor).

(7) is an unknown -- Chris, do you know how whether this is straightforward? I'd like to ask layers to retain some Thebes layer content outside the visible area, so that panning doesn't result in checkerboards.

(8) should be improved by the solution to (7) and by removing gfxImageSurface from the Java compositor. Ultimately, I think we do want Skia here -- pixman is not competitive on mobile. But I would not gate shipping on Skia at this time.
(In reply to Patrick Walton (:pcwalton) from comment #7)
> (7) is an unknown -- Chris, do you know how whether this is straightforward?
> I'd like to ask layers to retain some Thebes layer content outside the
> visible area, so that panning doesn't result in checkerboards.

Can we use the displayport API here?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #8)
> (In reply to Patrick Walton (:pcwalton) from comment #7)
> > (7) is an unknown -- Chris, do you know how whether this is straightforward?
> > I'd like to ask layers to retain some Thebes layer content outside the
> > visible area, so that panning doesn't result in checkerboards.
> 
> Can we use the displayport API here?

It'd sure make it easy :)

Also I realized that (8) may be simpler than I thought -- it may just be that we flood the compositor thread with too many AsyncRender messages during an asynchronous kinetic scroll and the new thebes content can't get a word in edgewise due to the backlog. I suspect this may be the case because of two observations:

(1) Synchronous scrolling with GL layers (in which all the content had to be rerendered at each frame) was quite performant -- 50-55 fps.
(2) When I break in the debugger during a fling, I almost always see the main Gecko thread blocked sending layer updates during a paint, suggesting that the compositor thread is busy.
(In reply to Patrick Walton (:pcwalton) from comment #9)
 > Also I realized that (8) may be simpler than I thought -- it may just be
> that we flood the compositor thread with too many AsyncRender messages
> during an asynchronous kinetic scroll and the new thebes content can't get a
> word in edgewise due to the backlog.

Probably we should ensure that the compositor never has more than one pending AsyncRender event (i.e. as new messages arrive, pending events are discarded).

If we still find that Gecko being blocked on the compositor is hurting us, we'll need to make PLayers updates async, but I think we can get away without this at least for v1.
I'm working a patch prevent multiple composite event from being in the queue, we already do this for async plugins. I'm going to put this logic into ScheduleComposition.

It looks like async PLayers would help a lot. We could also consider double buffering the shared surfaces, only swapping with the layer and tagging it as dirty during a PLayers sync update. The compositer would upload them within the Composite call if the layer is dirty where Gecko isn't blocked on us. This would prevent gecko from blocking the 5-10ms required to upload. This should be a simple architectural change while still buying us the benefit of not blocking gecko on uploads.
(In reply to Ali Juma [:ajuma] from comment #10)
> (In reply to Patrick Walton (:pcwalton) from comment #9)
>  > Also I realized that (8) may be simpler than I thought -- it may just be
> > that we flood the compositor thread with too many AsyncRender messages
> > during an asynchronous kinetic scroll and the new thebes content can't get a
> > word in edgewise due to the backlog.
> 
> Probably we should ensure that the compositor never has more than one
> pending AsyncRender event (i.e. as new messages arrive, pending events are
> discarded).
> 
> If we still find that Gecko being blocked on the compositor is hurting us,
> we'll need to make PLayers updates async, but I think we can get away
> without this at least for v1.

I don't think it's too much of a problem -- I've never seen composition slower than 15ms or so, which is a drop in the bucket compared to all the other sources of jank that can hog the Gecko event loop on mobile.
Timed painting on timecube.com while panning quickly. (Due to the scaled background, this is a site that pixman is especially poor at rendering.) For each paint on the Gecko thread:

Basic layers painting: mean 68ms = 14fps, std. dev. 9ms.
Layer forwarding: mean 149ms = 5fps, std. dev. 130ms.

So it looks like most of the problem is that we're indeed falling down with layer forwarding. With basic layer painting alone, we can render at 14 fps, but with layer forwarding this drops us down to 5 fps and a huge standard deviation. I wonder if the easiest thing to do is to make CompositorParent::ScheduleRenderOnCompositorThread() semi-synchronous, in that it never allows another AsyncRender message to be placed in the queue until the first one is acked. (Might also want to throw a mutex or something around ForwardTransaction and ScheduleRenderOnCompositorThread, so that ForwardTransaction always has priority.)

But note that 14fps is still pretty bad. We're creating a dummy gfxImageSurface in widget and rendering basic layers to it -- can this be avoided?
(In reply to Patrick Walton (:pcwalton) from comment #13)
> I wonder if the easiest thing to do is to make
> CompositorParent::ScheduleRenderOnCompositorThread() semi-synchronous, in
> that it never allows another AsyncRender message to be placed in the queue
> until the first one is acked.

This must be done regardless. It's a simple patch, I will try to find some time this weekend or if not I'll finish it early Monday. Let's no make any syncronouse, here's how we do it with async plugins: http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginInstanceChild.cpp#3358

We need to call ScheduleComposite instead of Composite. ScheduleComposite should post a task if and only if one isn't currently posted.

Perhaps we can discuss your measurements on Monday since the numbers seem extremely high I'm curious to know exactly how you are measuring this (are they cumulative since you need to paint before forwarding). If you have custom timing code you can check it in with an IFDEF.
Depends on: land-maple
Depends on: 726371
Depends on: 726467
Depends on: 725102
Depends on: 726817
Depends on: 726827
Depends on: 726868
Depends on: 727937
Depends on: 727948
Keywords: meta, mobile, perf
Depends on: 728379
No longer depends on: 727937
Depends on: 728000
Depends on: 729082
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
blocking-fennec1.0: --- → beta+
Depends on: 727142
Depends on: 662891
Depends on: 731353
Depends on: 732206
No longer blocks: checkerboarding
Depends on: 734164
No longer depends on: 729082
No longer depends on: omtc
Depends on: omtc
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.