Closed Bug 961584 Opened 11 years ago Closed 11 years ago

deallocating gralloc buffers on the compositor is slow

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 939348

People

(Reporter: gal, Unassigned)

Details

Attachments

(1 file, 3 obsolete files)

We yank badly when launching or re-launching the settings app on nexus-5. The homescreen fades out and the settings up fades in and about half way through the animation we lose several frames. I captured the gralloc allocation and de-allocation traffic: I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 150, 2, 307) -> 0xac2abe80, time = 372 us I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 150, 2, 307) -> 0xac2abf00, time = 309 us I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 150, 2, 307) -> 0xae941180, time = 385 us I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 150, 2, 307) -> 0xae941680, time = 330 us I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 60, 2, 307) -> 0xae941a80, time = 255 us I/Gecko ( 2647): GrallocBufferAcctor::Create -- new GraphicBuffer(1080, 60, 2, 307) -> 0xae941c00, time = 203 us I/Gonk ( 2647): Setting nice for pid 2800 to 18 I/Gonk ( 2647): Changed nice for pid 2800 from 1 to 18. I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 64, 32, time = 56 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 64, 32, time = 27 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1039, 1422, time = 117 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1039, 1422, time = 100 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 360, 32, time = 31 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 360, 32, time = 31 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 225, time = 45 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 225, time = 49 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 150, time = 63 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 150, time = 1 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 60, time = 33 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 60, time = 1 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 150, time = 610 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 150, time = 27 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 60, time = 386 us I/Gecko ( 2647): GrallocBufferAcctor::~GrallocBufferActor 1080, 60, time = 23 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 64, 32, time = 162 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 64, 32, time = 61 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 1039, 1422, time = 4074 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 1039, 1422, time = 4387 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 360, 32, time = 88 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 360, 32, time = 100 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 1080, 225, time = 1219 us I/Gecko ( 2800): GrallocBufferAcctor::~GrallocBufferActor 1080, 225, time = 1040 us We knew that gralloc allocation is expensive but here actually the de-allocation kills us.
Is this on the content side? I'd imagine these can trivially be made into async messages.
This is the compositor. The compositor gets stalled due to this. The homescreen going out of view is deallocating gralloc buffers because we GC it and release the layer tree.
I tried sending the gralloc buffers to a separate thread for deallocation (once the compositor gets them). That helps a little but the allocations themselves are still causing yank. We have to get the allocations away from the compositor. If the MozSurface business is going to take a non-trivial amount of time we need some dirty hack in the meantime. With these kind of pause times we really can't keep going with the current architecture any longer.
This patch allocates gralloc buffers on the client side and sends them to the parent. This seems to work flawlessly with my Nexus 5. Newer QC chipsets don't use pmem, which may be been restricted to the parent and hence might not work. However, I am not even sure we actually do that right now. For newer chipsets there is no pmem and gralloc is just another form of ashmem which should not be privileged. Can someone try this on the production devices?
Attachment #8362416 - Flags: feedback?(sotaro.ikeda.g)
Attachment #8362416 - Attachment is obsolete: true
Attachment #8362416 - Flags: feedback?(sotaro.ikeda.g)
Attachment #8362418 - Flags: feedback?(sotaro.ikeda.g)
I tried the patch on 2 devices, a keon and a Buri, here are the issues I'm seeing. Keon: - No animations for app launch/app close. Some other animations of the system app works OK. Buri: - Some glitches inside apps. I tried to get a screenshot but the screenshots are fine. - I opened the gallery app to see the screenshot and after a while it crashed (OOM). But when the homescreen tried to start again it fails with the following in logcat: E/OomLogger( 132): [Kill]: select 753 ((Preallocated a), adj 2, size 3304, to kill E/OomLogger( 132): [Kill]: select 755 ((Preallocated a), adj 2, size 3309, to kill E/OomLogger( 132): [Kill]: send sigkill to 755 ((Preallocated a), adj 2, size 3309 I/Gecko ( 753): ###################################### forms.js loaded I/Gecko ( 753): ############################### browserElementPanning.js loaded I/Gecko ( 753): ######################## BrowserElementChildPreload.js loaded I/GonkMemoryPressure( 132): Checking to see if memory pressure is over. I/GonkMemoryPressure( 132): Memory pressure is over. E/Sandbox ( 753): install_syscall_filter() failed E/memalloc( 753): /dev/pmem: Failed to open pmem device: Permission denied E/memalloc( 753): /dev/pmem: failed to initialize pmem area W/memalloc( 753): Falling back to ashmem W/memalloc( 753): Falling back to ashmem E/memalloc( 753): ashmem: ASHMEM_CACHE_FLUSH_RANGE failed fd = 33 E/memalloc( 753): ashmem: ASHMEM_CACHE_FLUSH_RANGE failed fd = 36 E/memalloc( 753): ashmem: ASHMEM_CACHE_FLUSH_RANGE failed fd = 33 Not sure why behaviors are different on the 2 devices.
Yeah, makes sense, on the older devices the child has no access to pmem. The good news is we can still do this conditionally. We can set a pref that allows the client to allocate directly. We can keep the old code and add a new constructor that does this, and use the old code otherwise.
IMO I rather we work towards an IPC message to a dedicated thread to deal with Gralloc. This will have two additional IPC context switches but well have a patch that works on all devices. Rather then maintaining two code paths, one of which isn't fast enough.
If I understand the new architecture of our existing vendor and the general architecture of our new vendor correctly neither requires privilege to allocate a gralloc buffer. So really /dev/pmem is a thing of the past. Gralloc on the newer architectures is just ashmem. On our older chipset where we have /dev/pmem we have smaller screens and smaller gralloc buffers as a result, making the pause times not as obvious, so if this bug fixes that for new devices going forward and the code still works (albeit slower) on the old hardware, I would say this is a lot less effort and a lower complexity fix. Warning: I might be wrong on the architecture piece so lets consult with our chipset friends.
(In reply to Benoit Girard (:BenWa) from comment #8) > IMO I rather we work towards an IPC message to a dedicated thread to deal > with Gralloc. This will have two additional IPC context switches but well > have a patch that works on all devices. Rather then maintaining two code > paths, one of which isn't fast enough. I think that gralloc allocation performance is critical enough that if the improvement over the dedicated thread solution is measurable we should do both.
gal: BenWa, we will need 2 context switch, you have to allocate in the parent, and you still need a sync ipc to register the actor in the compositor [08:54am] gal: with my patch you have 1 context switch (locally generate the gralloc buffer, register the actor with the serialized copy of it in the compositor) [08:54am] gal: here is how we could make this work on the old devices [08:55am] gal: what I did in the patch is the baseline, to register a grallocbufferactor the client has to supply the gralloc buffer [08:55am] gal: this works on new devices [08:55am] gal: for old devices we add an additional protocol on a separate thread that just allocates a gralloc buffer and can receive a __delete__ for it [08:55am] gal: the client will get the gralloc buffer from that thread if it can't allocate locally [08:55am] gal: then you have 1 path with an optimization [08:56am] gal: we can rip out the current maybemagicgrallocbuffer stuff [08:56am] gal: what do you think? At least one of the vendors is saying that to avoid DOS we should keep the allocation in the parent so we will need to spin up a 2nd thread in the parent and talk to it with a new protocol to actually allocate the gralloc buffers but it seems we can keep the rest of the code here and then just delete the current path.
attachment 8362418 [details] [diff] [review] seems good. I did not recognize that the content process can allocate gralloc buffer directly in recent android. I feel Bug 950112 way of thing could be used to allocate gralloc buffer in older android. I am going to confirm it.
We just need to confirm for how long we'd need to maintain that old path; if we can get it working "well enough", we can just focus on the second, modern path. The two paths are both relatively small, so it's not a huge amount of have both of them around.. and they're doing a very isolated workload ("allocate a graphicbuffer"), so should have little impact to the rest of the system if we have to make changes there. I'm mostly concerned about the chipsets in future low-end/low-memory devices. Will they have a new enough chipset to avoid pmem? Also note that even on devices with pmem, it's basically known that we overuse pmem...
One concern to attachment 8362418 [details] [diff] [review] is that any content process can allocate any amount of gralloc buffer without a permission from b2g process. current b2g also does not filter the request from a content process. But it is possible if it want to do.
If followings are not a problem for b2g, attachment 8362418 [details] [diff] [review] seems OK. - In recent android, content process could allocate any types of gralloc buffers. - Allocate gralloc buffer in content process without confirmation by b2g is OK from security point of view.
I changed attachment 8362418 [details] [diff] [review] as to allocate gralloc buffer via android::IGraphicBufferAlloc. By this patch, gralloc buffer is allocated in b2g process then delivered to content process. I think this is safer way than attachment 8362418 [details] [diff] [review]. Binder's ipc is multithreaded ipc, gralloc allocation performance should be better than gecko's ipc. I confirmed that the patch works on nexus-4. But the patch does not work on buri device. gralloc buffer allocation was OK. But during OpenGL rendering on b2g process, it failed to get genlock like the following. It's cause seems same as in Bug 959089 comment0. Originally allocated gralloc buffer's are closed soon after allocation. That seems badly affect to genlock. 01-20 15:57:03.649 134 268 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x2, err=Invalid argument fd=130) 01-20 15:57:03.659 134 268 W Adreno200-EGLSUB: <LockImage:1942>: genlock_lock_buffer GENLOCK_READ_LOCK failed 01-20 15:57:03.659 134 268 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x2, err=Not a typewriter fd=163) 01-20 15:57:03.659 134 268 W Adreno200-EGLSUB: <LockImage:1942>: genlock_lock_buffer GENLOCK_READ_LOCK failed 01-20 15:57:03.659 134 268 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x2, err=Not a typewriter fd=150) 01-20 15:57:03.659 134 268 W Adreno200-EGLSUB: <LockImage:1942>: genlock_lock_buffer GENLOCK_READ_LOCK failed 01-20 15:57:03.659 134 268 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x2, err=Not a typewriter fd=171) 01-20 15:57:03.659 134 268 W Adreno200-EGLSUB: <LockImage:1942>: genlock_lock_buffer GENLOCK_READ_LOCK failed
Attachment #8362711 - Flags: feedback?(gal)
Gal, can I have a feedback to attachment 8362711 [details] [diff] [review]? It fixes the concern in Comment 15.
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
(In reply to Sotaro Ikeda [:sotaro] from comment #16) > Created attachment 8362711 [details] [diff] [review] > Add android::IGraphicBufferAlloc's gralloc allocation to attachment 8362418 [details] [diff] [review] > [details] [diff] [review] > > I confirmed that the patch works on nexus-4. > > But the patch does not work on buri device. gralloc buffer allocation was > OK. But during OpenGL rendering on b2g process, it failed to get genlock > like the following. It's cause seems same as in Bug 959089 comment0. > Originally allocated gralloc buffer's are closed soon after allocation. That > seems badly affect to genlock. I am going to try another way to fix the above genlock problem.
This patch is created for master hamachi. I confirmed thebes layers rendered as normal! Gralloc buffer is allocated by using android's binder ipc. By the previous patch, originally allocated gralloc buffers are deleted soon after creation in b2g proces. By this behavior, somehow genlock handle becomes invalid in b2g process. This patch keep the originally allocated gralloc buffer alive when the gralloc buffer is used by GrallocBufferActor. Camera and video app has a problem. To fix the problem, change around ImageBridge and GonkNativeWindow becomes necessary.
Attachment #8362711 - Attachment is obsolete: true
Attachment #8362711 - Flags: feedback?(gal)
Attachment #8363224 - Flags: feedback?(gal)
After applying attachment 8363224 [details] [diff] [review], I never saw hw composer rendering.
Comment on attachment 8362418 [details] [diff] [review] Allocate gralloc buffers on the client and share them with the parent. I have a concern about Comment 15. I still tend to allocate gralloc buffer from b2g. But it is a kind of platform policy. If the policy is OK, from implementation seems no problem. And ICS b2g phone's problem could be fixed like attachment 8363224 [details] [diff] [review].
Attachment #8362418 - Flags: feedback?(sotaro.ikeda.g) → feedback+
Comment on attachment 8363224 [details] [diff] [review] Add gralloc buffer allocator as android service to attachment 8362418 [details] [diff] [review] Need more investigation.
Attachment #8363224 - Flags: feedback?(gal)
About attachment 8362418 [details] [diff] [review], I did not confirmed gralloc buffers are actually used. symptom seems just fallback to shmem.
(In reply to Sotaro Ikeda [:sotaro] from comment #23) > About attachment 8362418 [details] [diff] [review], I did not confirmed > gralloc buffers are actually used. symptom seems just fallback to shmem. I am going to confirm it.
Sotaro, I made a new patch that allocates gralloc buffers on the image bridge and then shares them with the compositor. That works much better. I recommend we do that instead of this patch.
The feedback from chipset partners is that we should continue to allocate gralloc in the parent only, by the way.
(In reply to Andreas Gal :gal from comment #25) > Sotaro, I made a new patch that allocates gralloc buffers on the image > bridge and then shares them with the compositor. That works much better. I > recommend we do that instead of this patch. Thanks.
Comment on attachment 8363224 [details] [diff] [review] Add gralloc buffer allocator as android service to attachment 8362418 [details] [diff] [review] This patch does not work. obsolete it.
Attachment #8363224 - Attachment is obsolete: true
I moved the patches of "gralloc buffer allocation by using binder ipc" to Bug 950112. In the bug, I implemented the following 2 ways. From the investigation, it becomes clear that if we limit the usage of "gralloc buffer allocation by using binder ipc" only after JB4.3(fence support), it is relatively easy. But if we want to use it in ICS and JB4.2(no fence support), we are going to face unknown genlock's limitations' problem. [1] Use android::IGraphicBufferAlloc as gralloc allocator - This works only for gonk since android 4.3(fence support). - ICS can also use this way. But somehow genlock becomes invalid like Bug 961584 Comment 16. Originally allocated gralloc buffers are soon destoryed in b2g. It might affect to the above genlock problem. [2] Create custom gralloc allocator - Create custom allocator as to prevent the above genlock problem. - At least, it seems need to keep alive the Originally allocated gralloc buffers.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: