Bugzilla

Dale Harvey (:daleharvey)

Comment 6

•

11 years ago

Attached image Screen shot of profile while scrolling naver.com (deleted) — Details

I did a profile when scrolling naver.com, that seems different from Dale's profile, so I would like to share it. I used nexus-4 for that. In this profile, the child spends long time on Paint, because it is waiting the main process to allocate the buffer (SendGrallocBufferConstructor). As I understand, TextureClientPool should avoid that allocation by keeping some buffers on its pool. However the number of active buffers (mOutstandingClients) is bigger than the maximum texture clients managed by this pool (sMaxTextureClients), so no textures are kept in the pool, causing the child to request a new buffer almost always. I did a small test, by increasing the sMaxTextureClients from 50 to 200, and I could see a improvement in frequency that the white screen appears.

Comment 7

•

11 years ago

I should have mentioned, mine was on a hamachi device, apologies

Updated

•

11 years ago

Component: Gaia::Browser → Graphics: Layers

Product: Firefox OS → Core

Version: unspecified → 30 Branch

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

11 years ago

Whiteboard: [systemsfe][c=handeye p= s= u=1.4] → [c=handeye p= s= u=1.4]

Comment 8

•

11 years ago

I don't know what we can do here. Increasing the number of maximum clients means a better chance of OOM. At some point, you're trying to draw more than this device can cache or keep up with. Especially given that we're hitting the desktop site as mentioned above.

Comment 9

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #8) > I don't know what we can do here. Increasing the number of maximum clients > means a better chance of OOM. At some point, you're trying to draw more > than this device can cache or keep up with. Especially given that we're > hitting the desktop site as mentioned above. I think we should get a more realistic test case here. Desktop sites will always have problems, as they aren't optimized for mobile devices. I'm renoming this because I don't think it's realistic to set the no checkerboarding requirement on non-optimized sites for mobile.

blocking-b2g: 1.4+ → 1.4?

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 10

•

11 years ago

Tapas, Can you please help check if this happening on an optimized site for the phone?

Flags: needinfo?(tkundu)

Reporter

Comment 11

•

11 years ago

(In reply to Preeti Raghunath(:Preeti) from comment #10) > Tapas, > > Can you please help check if this happening on an optimized site for the > phone? It does not come on optimized site for the phone. I tested with youtube, yahoo and cnbc mobile websites.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 12

•

11 years ago

(In reply to Tapas Kumar Kundu from comment #11) > (In reply to Preeti Raghunath(:Preeti) from comment #10) > > Tapas, > > > > Can you please help check if this happening on an optimized site for the > > phone? > > It does not come on optimized site for the phone. I tested with youtube, > yahoo and cnbc mobile websites. Are you saying you can or can't reproduce this with these mobile optimized sites? I can't tell by your comment here.

Flags: needinfo?(tkundu)

Reporter

Comment 13

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #12) > (In reply to Tapas Kumar Kundu from comment #11) > Are you saying you can or can't reproduce this with these mobile optimized > sites? I can't tell by your comment here. I CANNOT reproduce this issue with mobile optimized web sites.

Flags: needinfo?(tkundu)

Updated

•

11 years ago

Summary: Browser app scrolling shows white screen occasionally → Browser app scrolling shows white screen occasionally on Desktop Sites

Milan Sreckovic [:milan] (needinfo for best results)

Comment 14

•

11 years ago

Inder Since this is not seen in mobile sites, we wouldn't block on this. Please assess and let me know

Flags: needinfo?(ikumar)

Inder

Comment 15

•

11 years ago

Moving the ni to Vikram, who can assess the requirement from perf perspective.

Flags: needinfo?(ikumar) → needinfo?(mvikram)

Mandyam Vikram

Comment 16

•

11 years ago

The problem is that an end user will not know if he is hitting a mobile or desktop site. Also, as everyone knows, not all websites direct mobiles to a mobile friendly site. Was this a regression since when APZC was introduced(I'm not sure because I thought the browser always supported APZC). Can we quantify the memory increase by increasing the buffer pool? We could consider making this a pref value as some devices may not be that memory constrained.

Flags: needinfo?(mvikram)

Dave Huseby [:huseby]

Comment 17

•

11 years ago

During triage we were wondering if adding checkerboarding over the background color to indicate motion would be an acceptable interim solution for this bug.

Flags: needinfo?(milan)

Comment 18

•

11 years ago

That suggestion has certainly been forwarded before, including the perhaps having different pattern for different applications, or only doing it on applications and not browser, or others. Something like that is probably doable in the 2.0 timeframe if we decide to prioritize it.

Flags: needinfo?(milan)

Michael Vines [:m1] [:evilmachines]

Comment 19

•

11 years ago

Milan Please respond to Comment 16

Flags: needinfo?(milan)

Comment 20

•

11 years ago

Maybe we can try FF for Android using the same Gecko version on a QRD device to better level-set the issue. Might as well try Chrome too. If they do no better than I think we should reconsider spending further v1.4 time on this.

Flags: needinfo?(tkundu)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 21

•

11 years ago

(In reply to Mandyam Vikram from comment #16) > ... > Was this a regression since when APZC was introduced(I'm not sure because I > thought the browser always supported APZC). Can we quantify the memory > increase by increasing the buffer pool? We could consider making this a pref > value as some devices may not be that memory constrained. Yes, browser supported APZ since the start. I don't know if we have devices that run both 1.0 and 1.4 in order to compare if this should be marked as a regression.

Flags: needinfo?(milan)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 22

•

11 years ago

(In reply to Michael Vines [:m1] [:evilmachines] from comment #20) > Maybe we can try FF for Android using the same Gecko version on a QRD device > to better level-set the issue. Might as well try Chrome too. If they do no > better than I think we should reconsider spending further v1.4 time on this. I tested firefox aurora[1] on msm8x26 android Kitkat. Browser is scrolling fine with www.cnbc.com in android and it does not show any white screen if we scroll fast. Same is observed with chrome too. So if want to make v1.4 FFOS as good as 'firefox for android' then we should fix this issue in v1.4. [1] https://www.mozilla.org/en-US/mobile/aurora/

Flags: needinfo?(tkundu)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 23

•

11 years ago

BenWa, let's see if there is something obvious here.

Assignee: nobody → bgirard

blocking-b2g: 1.4? → 1.4+

Benoit Girard (:BenWa)

Comment 24

•

11 years ago

(In reply to Andre Graziani (:graziani) from comment #6) > Created attachment 8397502 [details] > Screen shot of profile while scrolling naver.com > > I did a profile when scrolling naver.com, that seems different from Dale's > profile, so I would like to share it. I used nexus-4 for that. > > In this profile, the child spends long time on Paint, because it is waiting > the main process to allocate the buffer (SendGrallocBufferConstructor). The pool is aimed to make this a bit better but in general bug 959089 should be a better solution. (In reply to Mandyam Vikram from comment #16) > Can we quantify the memory > increase by increasing the buffer pool? We could consider making this a pref > value as some devices may not be that memory constrained. Yes, take sMaxTextureClients * 256 * 256 * 4 will give you an upper-bound. Adding a preference for this is a good idea but we should discuss this in a different bug (clone of this bug is fine). (In reply to Tapas Kumar Kundu from comment #22) > So if want to make v1.4 FFOS as good as 'firefox for android' then we should > fix this issue in v1.4. On Firefox for android we just use GL Texture. They are overall slower than Gralloc but they can be faster if there's a lot of allocation of gpu tiles and the compositor thread is busy thus dealing servicing the incoming gralloc allocations. Bug 959089 will hopefully close this gap. From the profile in Comment 6 this is what we're seeing. This would explain why Firefox for android would be faster then. My suggestion here is to divert all effort to bug 959089.

Depends on: 959089

Updated

•

11 years ago

Depends on: 996458

Milan Sreckovic [:milan] (needinfo for best results)

Comment 25

•

11 years ago

(In reply to Benoit Girard (:BenWa) from comment #24) > Adding a preference for this is a good idea but we should discuss this in a > different bug (clone of this bug is fine). Opened bug 996458

Comment 26

•

11 years ago

(In reply to Benoit Girard (:BenWa) from comment #24) > ... > > My suggestion here is to divert all effort to bug 959089. That may be the only practical thing to do now. It would probably disqualify this bug from being 1.4, there are a lot of changes in that bug, and it depends on more work that needs to be done. We have our explanation as to why we're slower, at this point, I would prefer we reconsider this as a blocker. Re-sending to triage.

blocking-b2g: 1.4+ → 1.4?

Comment 27

•

11 years ago

Inder, Based on risk, we'd like to move this to 2.0

Flags: needinfo?(ikumar)

Inder

Comment 28

•

11 years ago

Preeti -- ok. fine by me. Moving ni to Vikram to get his input as well.

Flags: needinfo?(ikumar) → needinfo?(mvikram)

Mike Lee [:mlee]

Updated

•

11 years ago

Whiteboard: [c=handeye p= s= u=1.4] → [c=handeye p= s= u=]

Mike Lee [:mlee]

Updated

•

11 years ago

Status: NEW → ASSIGNED

Mandyam Vikram

Comment 29

•

11 years ago

Ok. I guess we don't have too many options.

Flags: needinfo?(mvikram)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 30

•

11 years ago

Moving this to 2.0.

blocking-b2g: 1.4? → 2.0?

Updated

•

11 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=993473

Mike Lee [:mlee]

Comment 31

•

11 years ago

Minusing from 2.0 since in past releases desktop sites have not been expected to be problem-free when viewing in the fxOS browser. If that understanding has changed and involved teams are committing to supporting full desktop sites feel free to renom.

blocking-b2g: 2.0? → -

Priority: P1 → P3

Target Milestone: 1.4 S4 (28mar) → ---

Comment 32

•

11 years ago

(In reply to Benoit Girard (:BenWa) from comment #24) > My suggestion here is to divert all effort to bug 959089. Bug 959089 was landed to master and visually it seems that checkerboarding was reduced when scrolling desktop pages. But it still happens a lot. This is the new profile I got after patch from bug 959089, for scrolling in the same conditions as reported in comment 6: http://people.mozilla.org/~bgirard/cleopatra/#report=5c319354028788c710d0c381cda655405b153635 The time spent by the child process waiting for buffer allocation is about 30% of the total.

Comment 33

•

11 years ago

Thanks for capturing this. Can you please enable some debug info in SimpleTextureClientPool (at the top of the file) to see why we aren't reusing tiles more efficiently?

Comment 34

•

11 years ago

Attached file texture recycle log for simpletiles (deleted) — Details

Important to notice that I had to enable "Simple Tiling" in developer menu to get this log. By default the SimpleTextureClientPool path is not taken, it uses the TextureClientPool. So, using the Simple Tiling, the logs show that about 25% of the new textures requested come from newly allocated textures, and 75% comes from recycled.

Comment 35

•

11 years ago

Hey Andre. Sorry about misdirecting you. We want to use the default tile pool. Looking at TextureClientPool I don't think we are running into sMaxTextureClients. Could you add a printf into the GetTextureClient path to verify? More likely we shrink down to sMinCacheSize, which is 0, and then have to allocate our way back up. That having said there is another bug here. We aren't flushing the pool on a low-memory notification. We only do that after the timer. That needs to be fixed as well. Can you add the printf and see what mOutstandingClients looks like during your tests? We can whip up a patch that keeps more tiles around as long there is no memory pressure.

Comment 36

•

11 years ago

Attached patch Only clear pool on low memory. (deleted) — Details — Splinter Review

Comment 37

•

11 years ago

Andre, want to try this patch? After a few initial allocations we should never wait, except if we run low on memory. Also please log how many outstanding texture clients we have with a printf so we know what we are dealing with here. Tapas: "The time spent by the child process waiting for buffer allocation is about 30% of the total." Are you guys allocating without MAP_POPULATE again in your vendor code? 30% time spent in allocation sounds a lot like memset exercising the page fault handler of the kernel. Would be good if you can measure where this time is spent. If its in the kernel, its probably something we should fix in the vendor code as well (the caching stuff above is a separate bug, that avoids the parent process round-trip).

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 38

•

11 years ago

Attached file Log number of outstanding clients (deleted) — Details

Andreas, I think the line 80 was deleted by mistake in your patch, otherwise the pool will be always empty. So I just inserted it back in my tests. This log contains the number of outstanding clients and the number of textures in the pool.

Reporter

Comment 39

•

11 years ago

(In reply to Andreas Gal :gal from comment #37) > Tapas: "The time spent by the child process waiting for buffer allocation is > about 30% of the total." Are you guys allocating without MAP_POPULATE again > in your vendor code? 30% time spent in allocation sounds a lot like memset > exercising the page fault handler of the kernel. Would be good if you can > measure where this time is spent. If its in the kernel, its probably > something we should fix in the vendor code as well (the caching stuff above > is a separate bug, that avoids the parent process round-trip). Thanks for pointing me to this. I am looking into it.. I will update asap

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 40

•

11 years ago

(In reply to Andreas Gal :gal from comment #37) > Tapas: "The time spent by the child process waiting for buffer allocation is > about 30% of the total." Are you guys allocating without MAP_POPULATE again > in your vendor code? 30% time spent in allocation sounds a lot like memset > exercising the page fault handler of the kernel. Would be good if you can > measure where this time is spent. If its in the kernel, its probably > something we should fix in the vendor code as well (the caching stuff above > is a separate bug, that avoids the parent process round-trip). Can you please point me to exact gecko function/line number where you are seeing this delay during buffer allocation ? Thanks a lot for your help.

Flags: needinfo?(gal)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(andre.graziani)

Comment 41

•

11 years ago

Hi Tapas, I am going to rewrite my statement to make it clearer: "The time spent by the child process waiting for parent process to allocate buffer is about 30% of the total." I got this from the profile in comment 32. Here is the exactly same profile, but with a better range: http://people.mozilla.org/~bgirard/cleopatra/#report=f4ef254bbb5a77a213007d24f05d4b1a3bbe73d5 If you keep expanding the methods with highest running time, you may end up here: http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ISurfaceAllocator.cpp#318 where 30% of the time is spent. However, AFAIK, the allocation happens at parent process, so you may go deeper into parent side.

Flags: needinfo?(andre.graziani)

Comment 42

•

11 years ago

The parent handles this in SharedBufferManagerParent::RecvAllocateGrallocBuffer, which does: sp<GraphicBuffer> outgoingBuffer = new GraphicBuffer(aSize.width, aSize.height, aFormat, aUsage); In previous versions of your silicon gonk we have seen an mmap in the gralloc/ion code that doesn't pre-map the entire buffer and then does an memset, which for larger buffers causes hundreds of segfaults, which is slow.

Flags: needinfo?(gal)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 43

•

11 years ago

110 is a lot of textures. Why are we doing that?

Reporter

Comment 44

•

11 years ago

(In reply to Andreas Gal :gal from comment #42) > The parent handles this in > SharedBufferManagerParent::RecvAllocateGrallocBuffer, which does: > > sp<GraphicBuffer> outgoingBuffer = new GraphicBuffer(aSize.width, > aSize.height, aFormat, aUsage); > > In previous versions of your silicon gonk we have seen an mmap in the > gralloc/ion code that doesn't pre-map the entire buffer and then does an > memset, which for larger buffers causes hundreds of segfaults, which is slow. I just confirmed that SharedBufferManagerParent::RecvAllocateGrallocBuffer() takes negligible time to create gralloc buffer. but there is a big IPC delay delay between ISurfaceAllocator::AllocGrallocBuffer() and SharedBufferManagerParent::RecvAllocateGrallocBuffer() . And this delay happens randomly during scrolling www.cnbc.com I used following gaia/gecko for profiling: gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/master&id=ed5d408dc1120b035ebce9a809499c30fbfb4582 gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/master&id=45c69de2af9d2504a8baac39ca759403931d5158 please make NI on me for faster response .

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(andre.graziani)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

11 years ago

Flags: needinfo?(gal)

Comment 45

•

11 years ago

Do we peg both CPUs at 100% during this time? Any idea why there is a scheduling delay?

Flags: needinfo?(gal)