Closed Bug 775244 Opened 12 years ago Closed 12 years ago

Implement a shared memory pool

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: nical, Assigned: nical)

References

Details

Nicolas Silva [:nical]

Assignee

Description

•

12 years ago

With async-video (Bug 598868) we use a lot of Shmems to transfer images from the content side to the compositor side. The current SharedImage pool is not going to work well with the what we need to do to remove the extra video frame copies. We need a better generic system to reuse Shmems as much as possible. What I need for async-video is a ShmemFactory that would contain some several pools of shared memory, each pool being associated to a shmem size in pseudo-C++, something like: ShmemFactory { map<Size, list<Shmem> > mPools; ShmemAllocator* mAllocator; MessageLoop* mLoop; int mSizeThreshold; int mTimeout; bool mStop; void Recycle(Shmem* aShmem); // dispatches an async task Shmem* AllocateSync(unsigned int aSize); // dispatches a sync task void Stop(); void Destroy(); void Flush(); void RecycleNow(Shmem* aShmem); // only within mLoop Shmem* AllocateNow(unsigned int aSize); // only within mLoop }; AllocateSync and RecycleSync would dispatch synchronous tasks on mLoop (in my case it would be the ImageBridgeChild's thread message loop). On allocation requests: - if mStop is true, return a nil pointer - if we have a pool associated with a shmem size within [aSize..aRequestedSize+mSizeThreshold], pop an image from the pool and return it. - if we don't have this, ask mAllocator to allocate a shmem of the requested size and return it. On recycle requests: - if mStop is true ask mAllocator to deallocate the shmem - else find or create a pool associated to a size of shmemSize and add the shmem Flush asks mAllocator to deallocate all the Shmems In the IPDL protocol's destruction sequence: first thing: - call Stop() which: - calls Flush() - sets mStop to true then the allocator schedules a task in its own message loop and this will task will: - call Destroy() which: - sets mLoop to null - sets mAllocator to null After this the ShmemFactory is unable to do anything (in particular to touch the ipdl protocol), it is just in a state in which it waits for its reference count to reach zero. we'd have a per-sub-pool timeout which would release their shmems automatically if the sub-pool has not been touched for a certain perido of time. The destruction sequence relies on the fact that PImageBridge (just like PCompositor) have a two step Stop/Destroy destruction sequence with Destroy() beeing called by a task schedule after Stop() is called. This ensures that if tasks are dispatched just before mTop goes true, they will be processed before the factory and the protocol reach a step in which they can't allocate memory anymore. No task would be dispatched when mStop is true so it would ensure that no task would arrive after Destroy(). This destruction scheme is a bit scarry but I see no other solution to conceal all the constraints introduced by having the shared memory used outside the allocator's thread. In the case of async video, mAllocator would be the ImageBridgeChild singleton, and all ImageBridgeContainers would use the same ShmemFactory. With firefox moving toward a cross-process architecture, I suppose we will eventually need mecanisms for Shmem reuse outside of async video, so this implementation would be generic enough, with the only onstraint being this two-step destruction sequence. If we only ever use it with async-video, it will still be worth doing it like this anyway. By the way we could even add something fancy like: AllocateAsync(aSize, MsgLoop aCallbackLoop , Task aCallBack) to have a asynchronous API, but I don't think it would play very well with the way video works. It would be nice to build new features on top of an asynchronous API though.

Yev

Comment 1

•

12 years ago

Why not just copy the images to a buffer object in the first place? Then just copy or output the image from the buffer, and call a finish() when your done with it. Also, instead of the two step Stop/Destroy sequence couldn't you just use a barrier around the event that needs to finish first? I'm not very familiar with the APIs so I apologize if this doesn't translate correctly

Nicolas Silva [:nical]

Assignee

Comment 2

•

12 years ago

(In reply to Yev from comment #1) > Why not just copy the images to a buffer object in the first place? Then > just copy or output the image from the buffer, and call a finish() when your > done with it. I assume you mean gl buffer object, right? On some platforms we don't use opengl based compositing. On these platforms we presently don't use off-main-thread compositing either, but it is likely that we will eventually have a OMTC + non-gl layers configuration (I heard something about windows xp but i am not sure...). So on some cases we don't want to use gl buffer objects. On most cases, however, we want to use opengl compositing with off-main-thread compositing. The best, there would be to copy the images directly into gpu memory as you suggest. The thing is that we are trying to make off-main-thread compositing not only cross-thread but also cross-process. It presently works like this on Boot to Gecko for instance. In this configuration, sharing gl objects between processes can be tricky, and involves platform-specific APIs. On B2G we are actually doing what you said, using gralloc buffers. On some other platforms though it is going to take a while before we can do such optimizations. In the mean time we have to rely on shared memory. > > Also, instead of the two step Stop/Destroy sequence couldn't you just use a > barrier around the event that needs to finish first? > the two-step destruction sequence comes from the fact that shared resources can be transferred from a process to another asynchronously. Consider two processes "Compositor" and "Content": - Content sends an image to Compositor. - compositor uses the image, and sends it back asynchronously to Content when it doesn't need it any more, so that content can reuse it. At some point content needs to shut down. There is a builtin message in the system we use for resource sharing and inter process communication (IPDL) that destroys one side, and send a message (__delete__) to the other side that will automatically destroy the other side. What can happen is that Content sends __delete__ to Compositor while Compositor sends back an image to Content. In this case, the message from Compositor to Content may arrive after Content is destroyed. In this case you get a bad crash. So this two step tear-down sequence is there to ensure that all asynchronous messages from are arrived and no more will be sent before we complete the destruction with the __delete__ message. You cant just use a barrier to around the reception of these messages because there may or may not be messages coming from the compositor at this moment, and when they are there you don't know how many they are. > I'm not very familiar with the APIs so I apologize if this doesn't translate correctly No problem :)

Yev

Comment 3

•

12 years ago

Would implementing OpenCL as the OMTC be a possibility? It allows much more control, such as a wait_for command that can be used on Content so that it doesn't send _delete_ to compositor at the wrong time. I'm pretty sure OpenCL works on gpu and pretty much any modern cpu (Arm 7 supports it too). There are plenty of OpenGL sharing protocols as well.

Nicolas Silva [:nical]

Assignee

Comment 4

•

12 years ago

(In reply to Yev from comment #3) > Would implementing OpenCL as the OMTC be a possibility? It allows much more > control, such as a wait_for command that can be used on Content so that it > doesn't send _delete_ to compositor at the wrong time. I'm pretty sure > OpenCL works on gpu and pretty much any modern cpu (Arm 7 supports it too). > There are plenty of OpenGL sharing protocols as well. I don't think that opencl has enough support out there yet and that it can get around the complexity of our sharing model.

Joe Drew (not getting mail)

Updated

•

12 years ago

Blocks: 783366

Nicolas Silva [:nical]

Assignee

Comment 5

•

12 years ago

With the way async-video has evolved (especially with bug 790716 and bug 773440) since I filed this bug, the need for a generic shared memory pool is less present. It is likely that if we need to reduce the number of shmems we use with video, we will implement a different solution than the one proposed here.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Implement a shared memory pool

Categories

(Core :: IPC, defect)

Tracking

()

People

(Reporter: nical, Assigned: nical)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5