Closed Bug 538323 Opened 15 years ago Closed 15 years ago

Implement video painting using layers

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(2 files, 2 obsolete files)

Plan for this bug: -- Create mozilla::layers::ImageLayer -- An ImageLayer has a thread-safe ImageContainer (layers are not thread-safe) -- Create mozilla::layers::Image (also thread-safe) to represent immutable image buffers -- ImageContainer has a method SetImage() (can be called by any thread) -- Images can have various formats. Here, we just want planar-YUV for now. -- ImageContainers act as an image factory. ImageContainers and Images are abstract, the implementation classes can be specific to the backend -- Each format has its own interface for writing data into the image -- The BasicLayers implementation will do YUV conversion in software, when the data is written in, using oggplay code for now -- nsVideoFrame creates an ImageLayer using the LayerManager for the window, passes the ImageContainer to the decoder -- The decoder thread creates an Image for each video frame it decodes. (This gives us the ability to decode directly into an Image, which we'll need on the N900 so the DSP decoder can write into a texture buffer without the CPU touching the data) -- When it's time to display a frame, the video decoder thread calls ImageContainer::SetImage() -- nsVideoFrame creates an nsDisplayVideo custom display item that returns the ImageLayer -- For fallback rendering (i.e. the LayerManager being rendered to is not the window's LayerManager), we'll get the current Image and create a temporary ImageLayer to render the it. (Any Image can be rendered in any ImageContainer; Image objects will have to expose a gfxASurface getter to support this. Unfortunate, but I think it's the only way...) Later we'll do various extensions: -- Add packed-YUV format with Image implementation using bc-cat texture streaming, for N900 -- Alternative Image implementations using D3D and OGL for YUV conversion -- Replace oggplay YUV code -- Add RGB-565 format for NPP_DrawImage plugin rendering on N900 (avoiding format conversion) -- Extend ImageContainer to support a queue of timestamped Images, and have video store its entire queue of frames in the ImageContainer (this probably gives slightly smoother playback, but may not really be necessary)
(In reply to comment #0) > -- The decoder thread creates an Image for each video frame it decodes. (This > gives us the ability to decode directly into an Image, which we'll need on the Will this mean that frames that may be frame skipped later will take up CPU (or GPU) time to do yuv conversion, etc?
That's up to the layer backends. For the software backend (BasicLayers), I plan to do YUV conversion when you call ImageContainer::SetImage(), which you only call when you actually want to display a frame. With the trunk Ogg backend, nsOggDecodeStateMachine::PlayVideo would call SetImage instead of calling oggplay_yuv2bgra and mDecoder->SetRGBData. So in that case the answer to your question is "no". On the N900, if we have Leonora on the DSP decoding directly into Images that are buffers allocated by bc-cat ... I'm not sure. I don't know when YUV conversion happens in that setup. If you're using a GL or D3D GPU program to do YUV conversion, I imagine that would run only at paint time. However, someone will have to decide when to upload the frame data to VRAM. If you upload early and then decide to skip the frame, you wasted the cost of the upload. I don't know if there's any downside to uploading "just in time". Although I don't know what the optimal setup is, I think any of the options can be implemented behind the interface I've sketched out.
(In reply to comment #0) > -- The decoder thread creates an Image for each video frame it decodes. (This > gives us the ability to decode directly into an Image, which we'll need on the > N900 so the DSP decoder can write into a texture buffer without the CPU > touching the data) This is a bit difficult to do right now because with the trunk decoder, we have to make a PlanerYUVImage reference the YUV buffer of an OggPlayDataHeader, but OggPlayDataHeaders can only be accessed from a single thread and Images can be accessed by any thread. So for now I'm going to just create an Image in nsOggDecodeStateMachine::PlayVideo and call SetImage immediately.
Instead of having the frame own an ImageLayer which owns an ImageContainer, I think we're going to need to have the element (or decoder) own an ImageContainer, and have the frame hook an ImageLayer up to that ImageContainer. We need to be able to support decoding of <video> elements without frames (e.g., for canvas.drawImage()).
Attached patch Part 1: create ImageLayer infrastructure (obsolete) (deleted) — Splinter Review
Attachment #425101 - Flags: superreview?(dbaron)
Attachment #425101 - Flags: review?(jmuizelaar)
Is it possible to make sure that the PlanarY/U/V data have a pitch that is a multiple of 16? It would greatly help putting a good SSE2 implementation in the GFx code.
Theora doesn't guarantee it, but we already do a memcpy in oggplay_data_handle_theora_frame (http://mxr.mozilla.org/mozilla-central/source/media/liboggplay/src/liboggplay/oggplay_data.c#412), so we can adjust that to guarantee whatever alignment we want. We'd either need to assume a particular alignment is guaranteed or break the OggPlay API to expose the chosen alignment. Either way is fine--we're removing OggPlay soon, so I don't think we should be too precious about useful local hacks temporarily.
I've added mYStride and mUVStride fields to PlanarYUVImage::Data. We'll definitely want these. I haven't changed anything to do with alignment guarantees. It might be best to simply recommend 16-byte alignment, but not require it. There are some issues to do with Theora's requirements for chroma subsampling at the edges of the image. Matthew will study the problem and give us some advice, and we might need to update PlanarYUVImage::Data again. But the current interface wouldn't cause regressions, so we can go ahead with it for now.
Attachment #425101 - Attachment is obsolete: true
Attachment #426619 - Flags: superreview?(dbaron)
Attachment #426619 - Flags: review?(jmuizelaar)
Attachment #425101 - Flags: superreview?(dbaron)
Attachment #425101 - Flags: review?(jmuizelaar)
Theora supports the notion of a picture region, which is a crop rectangle that exists within the full video frame output by the decoder. It's specified as an arbitrary width and height and offset. The offset is restricted to 0-255. The video frame is required to be a multiple of 16 in both height and width (with the chroma planes divided by two in one or both directions depending on the video format), so for any video that is not a multiple of 16x16, the picture region will be used to crop the video frame to the correct output size. It's also possible to specify the picture region arbitrarily during encoding, so it's possible to produce a video where the video frame will be large and the picture region small (e.g. you could produce a 1920x1080 video with a 314x159 picture region originating at 161x80). Depending on the colour conversion routine, it may be faster to convert the entire frame (especially if Theora is modified to provide memory alignment guarantees for the video frames) or just the picture region. If the conversion routine converts only the picture frame, it must also take into account that the chroma samples (when subsampled) are sited relative to the picture frame--so the correct chroma sample to use for a given luma sample depends on whether the picture offset is odd or even. For reference, this is detailed in the Theora spec at http://www.theora.org/doc/Theora.pdf, specifically sections 2.1, 2.2, and 4.4.4. Note that the current liboggplay code does not handle all cases correctly. For example, it only supports even picture region offsets.
It sounds to me like the best way to handle that is for nsOggDecoder to supply the full video frame as an ImageLayer, then set a clip rect on the layer and adjust its transform to crop to the picture region. That requires no changes to the currently proposed layers API. Layer backends can still optimize YUV conversion since they have access to the cliprect. Does that make sense?
Yeah, that sounds perfect.
To make layer backend implementation easier, and because it's normally what we want in the display list code too, we decided to make the layer cliprect apply after any transform on the layer (instead of before). That means when we want to crop to the picture region for Theora, and the video is being transformed to fit the CSS content-box, we may have to create an intermediate container layer to which we attach the transform.
This adds a reftest to make sure that cliprects are being applied after the transform that we use to fit the video layer to its CSS content-box.
Attachment #425102 - Attachment is obsolete: true
Attachment #427056 - Flags: review?(kinetik)
Attachment #425102 - Flags: review?(kinetik)
Comment on attachment 427056 [details] [diff] [review] Part 2: restructure video painting to use ImageLayers (v2) Looks great. My only comment is that anything that mentions "YUV" should be renamed to "YCbCr" because it's more accurate (YUV refers to analog signals) and less ambiguous (U and V can mean different things for different formats). It doesn't matter right now, but at some point in the future we might need to add a member to PlanarYUVImage::Data that specifies the colour space. Right now, Theora only supports "unknown", Rec 470M, and Rec 470BG, which we currently all treat as the same without problems (so I assume is correct). It's possible there might be other colour spaces added in the future, the most obvious ones being "full range" (0-255 rather than 16-235) and "HD" (Rec 709).
Attachment #427056 - Flags: review?(kinetik) → review+
I thought about color spaces for layers in general. I think we have to specify what color space each leaf layer or Image is in. I think that ThebesLayers and CairoImages are in the device output color space, so I've added comments specifying that. For PlanarYCbCrImage it's less clear; it seems to me that liboggplay's YUV routines (which we're currently using) expect the "REC 470M" color space (see Theora spec section 4.3.1), so I've specified that as the PlanarYCbCrImage color space. In the future we should add a color space parameter to PlanarYCbCrImage::Data. In the future we should also probably take another look at color spaces and color correction with layers in general. For GPU-based layer backends it might make more sense to have ThebesLayers and CairoImages be sRGB.
(In reply to comment #13) > To make layer backend implementation easier, and because it's normally what we > want in the display list code too, we decided to make the layer cliprect apply > after any transform on the layer (instead of before). That means when we want > to crop to the picture region for Theora, and the video is being transformed to > fit the CSS content-box, we may have to create an intermediate container layer > to which we attach the transform. I'm a little uneasy about this decision. Can you talk more about the rationale? From my quick look, it seems that neither CoreAnimation nor Clutter have the concept of a post transform clip-rect.
Most of our clipping comes from the CSS "overflow" property being non-"visible". So if you have a video using a layer transform, or we're using layer transforms for CSS transforms (which we will, of course) and you have a CSS transform applied to a chunk of content, you normally will not be clipping inside the transformed content, but you will almost always be clipping the transformed content in some way, at least to the viewport. Note that to avoid proliferating container layers just for clipping, we push cliprects down the layer tree as far as possible. For example, a typical layer tree would look like ContainerLayer (root) ThebesLayer (background and scrollbars) (clipped to viewport) VideoLayer (clipped to scrollport (viewport minus scrollbars)) (transformed for letterboxing) ThebesLayer (content over video) (clipped to scrollport (viewport minus scrollbars)) ThebesLayer (CSS-transformed content) (clipped to scrollport (viewport minus scrollbars)) If we require clipping to happen before transformation, then we'd have to have something like ContainerLayer (root) ThebesLayer (background and scrollbars) (clipped to viewport) ContainerLayer (clipping to the scrollport) VideoLayer (transformed for letterboxing) ThebesLayer (content over video) ThebesLayer (CSS-transformed content) That requires an extra ContainerLayer. We'd also have to do some analysis to minimize the number of extra ContainerLayers. Bas also assures me that processing the cliprect after transformation is easier to implement in the D3D/OpenGL backends. Maybe Clutter and Core Animation don't support this directly because they don't normally expect to render inside a container with scrollbars? Does this make sense?
Blocks: 551277
Comment on attachment 426619 [details] [diff] [review] Part 1 v2: Create ImageLayer infrastructure >+ /** >+ * Layers that paint themselves, such as ImageLayers, should paint >+ * in response to this method call. aContext will already have been >+ * set up to account for all the properties of the layer (transform, >+ * opacity, etc). >+ */ >+ virtual void Paint(gfxContext* aContext) {} >+ It would be useful to have a comment explaining what role BasicImplData has and why it has virtual functions. >-#ifdef DEBUG >+ #ifdef DEBUG > PRBool InConstruction() { return mPhase == PHASE_CONSTRUCTION; } > PRBool InDrawing() { return mPhase == PHASE_DRAWING; } > PRBool IsBeforeInTree(Layer* aBefore, Layer* aLayer); > #endif Indenting is fine, but indent the #if and #endif the same amount. >+ * We need to be able to hold a reference to a gfxASurface from Image >+ * subclasses. This is potentially a problem since Images can be addrefed >+ * or released off the main thread. We can ensure that we never AddRef >+ * a gfxASurface off the main thread, but we might want to Release due >+ * when an Image is destroyed off the main thread. some words missing around "due when" The aFormats argument to ImageContainer::CreateImage should probably be |const Image::Format*| (const added). Should ImageContainer::SetImage and GetImage be called SetCurrentImage and GetCurrentImage instead? Maybe call the aSize argument to GetAsSurface aSizeResult? >+ * be used ont he main thread, since layer managers should only be "ont he" -> "on the" I tend to think Manager() and GetAsSurface() should have an assertion that NS_IsMainThread(). I find the class hierarchy introduced here a bit confusing (esp. the *ImplData and the distinctions between ImageLayer and BasicImageLayer, etc.); some comments at the top of classes (or somewhere else?) saying how they relate to others might help a bit. > CPPSRCS = \ >+ BasicImages.cpp \ > BasicLayers.cpp \ > $(NULL) please use tabs to match the surrounding code >+#define THEBES_INLINE_DECL_THREADSAFE_REFCOUNTING(_class) >+public: >+ nsrefcnt AddRef(void) { >+ NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt"); >+ nsrefcnt count = PR_AtomicIncrement((PRInt32*)&mRefCnt); >+ NS_LOG_ADDREF(this, count, #_class, sizeof(*this)); >+ return mRefCnt; >+ } >+ nsrefcnt Release(void) { >+ NS_PRECONDITION(0 != mRefCnt, "dup release"); >+ nsrefcnt count = PR_AtomicDecrement((PRInt32 *)&mRefCnt); >+ NS_LOG_RELEASE(this, count, #_class); >+ if (count == 0) { >+ mRefCnt = 1; /* stabilize */ >+ NS_DELETEXPCOM(this); >+ return 0; >+ } >+ return mRefCnt; >+ } >+protected: >+ nsAutoRefCnt mRefCnt; These should both |return count| rather than |return mRefCnt|. sr=dbaron I think I'd prefer that somebody else do the superreviews for the layers stuff if you can find an alternative.
Attachment #426619 - Flags: superreview?(dbaron) → superreview+
(In reply to comment #19) > It would be useful to have a comment explaining what role BasicImplData > has and why it has virtual functions. OK > Indenting is fine, but indent the #if and #endif the same amount. Oops, didn't mean to indent. > The aFormats argument to ImageContainer::CreateImage should probably > be |const Image::Format*| (const added). yep. > Should ImageContainer::SetImage and GetImage be called SetCurrentImage > and GetCurrentImage instead? Hmm ... yes. Especially since we might move to supporting a queue of timestamped images here. > Maybe call the aSize argument to GetAsSurface aSizeResult? OK. > I tend to think Manager() and GetAsSurface() should have an assertion > that NS_IsMainThread(). Yes. > I find the class hierarchy introduced here a bit confusing (esp. the > *ImplData and the distinctions between ImageLayer and BasicImageLayer, > etc.); some comments at the top of classes (or somewhere else?) saying > how they relate to others might help a bit. OK. > These should both |return count| rather than |return mRefCnt|. Ah, good catch. > I think I'd prefer that somebody else do the superreviews for the layers > stuff if you can find an alternative. Sure, I'll try vlad in future.
Comment on attachment 426619 [details] [diff] [review] Part 1 v2: Create ImageLayer infrastructure >+ // Pass ownership of the buffer to the surface >+ imgSurface->SetData(&imageSurfaceDataKey, mBuffer.forget(), DestroyBuffer); >+ >+ nsRefPtr<gfxASurface> result; >+ >+#if defined(XP_MACOSX) >+ nsRefPtr<gfxQuartzImageSurface> quartzSurface = >+ new gfxQuartzImageSurface(imgSurface); >+ if (quartzSurface) { >+ result = quartzSurface.forget(); >+ } >+#endif >+ if (!result) { >+ result = imgSurface.forget(); >+ } >+ mSurface = result.get(); >+ >+ return result.forget(); Can we just do the following?: nsRefPtr<gfxASurface> result = imgSurface; #if defined(XP_MACOSX) nsRefPtr<gfxQuartzImageSurface> quartzSurface = new gfxQuartzImageSurface(imgSurface); if (quartzSurface) { result = quartzSurface; } #endif mSurface = result.get(); return result.forget(); >+ >+/** >+ * Our image container is very simple. It's really just a factory >+ * for the image objects. We use a Monitor to synchronize access to >+ * mImage. >+ */ >+class BasicImageContainer : public ImageContainer { >+public: >+ BasicImageContainer(BasicLayerManager* aManager) : >+ ImageContainer(aManager), mMonitor("BasicImageContainer") >+ {} >+ virtual already_AddRefed<Image> CreateImage(Image::Format* aFormats, >+ PRUint32 aNumFormats); >+ virtual void SetImage(Image* aImage); >+ virtual already_AddRefed<Image> GetImage(); >+ virtual already_AddRefed<gfxASurface> GetAsSurface(gfxIntSize* aSize); >+ >+protected: >+ Monitor mMonitor; >+ nsRefPtr<Image> mImage; >+}; >+ >+/** >+ * Returns true if aFormat is in the given format array. >+ */ >+static PRBool >+FormatInList(Image::Format* aFormats, PRUint32 aNumFormats, >+ Image::Format aFormat) >+{ >+ for (PRUint32 i = 0; i < aNumFormats; ++i) { >+ if (aFormats[i] == aFormat) { >+ return PR_TRUE; >+ } >+ } >+ return PR_FALSE; >+} >+ >+already_AddRefed<Image> >+BasicImageContainer::CreateImage(Image::Format* aFormats, >+ PRUint32 aNumFormats) >+{ >+ nsRefPtr<Image> image; >+ // Prefer cairo surfaces because they're native for us >+ if (FormatInList(aFormats, aNumFormats, Image::CAIRO_SURFACE)) { >+ image = new BasicCairoImage(); >+ } else if (FormatInList(aFormats, aNumFormats, Image::PLANAR_YUV)) { >+ image = new BasicPlanarYUVImage(); >+ } >+ return image.forget(); >+} The CreateImage interface feels weird to me. All of the users pass in a single entry list and assert if they don't get what they want. Maybe we could add something like this later if the need arises? >+ enum Format { >+ /** >+ * The PLANAR_YUV format creates a PlanarYUVImage. All backends should >+ * support this format, because the Ogg video decoder depends on it. >+ * The maximum image width and height is 16384. >+ */ >+ PLANAR_YUV, YCbCr? perhaps YUV is the better choice...
Attachment #426619 - Flags: review?(jmuizelaar) → review+
(In reply to comment #21) > (From update of attachment 426619 [details] [diff] [review]) > >+ enum Format { > >+ /** > >+ * The PLANAR_YUV format creates a PlanarYUVImage. All backends should > >+ * support this format, because the Ogg video decoder depends on it. > >+ * The maximum image width and height is 16384. > >+ */ > >+ PLANAR_YUV, > > YCbCr? perhaps YUV is the better choice... Technically it's YCbCr I believe.
(In reply to comment #21) > Can we just do the following?: > > nsRefPtr<gfxASurface> result = imgSurface; > #if defined(XP_MACOSX) > nsRefPtr<gfxQuartzImageSurface> quartzSurface = > new gfxQuartzImageSurface(imgSurface); > if (quartzSurface) { > result = quartzSurface; > } > #endif > mSurface = result.get(); > > return result.forget(); Sure > The CreateImage interface feels weird to me. All of the users pass > in a single entry list and assert if they don't get what they want. Maybe we > could add something like this later if the need arises? We're going to need a multi-element list as soon as we add support for bc-cat packed-YCbCr images on the N900. We'll only support that format on the N900 with OpenGL. > >+ enum Format { > >+ /** > >+ * The PLANAR_YUV format creates a PlanarYUVImage. All backends should > >+ * support this format, because the Ogg video decoder depends on it. > >+ * The maximum image width and height is 16384. > >+ */ > >+ PLANAR_YUV, > > YCbCr? perhaps YUV is the better choice... Yeah, it's PLANAR_YCBCR in my tree. Sorry I forgot to post the updated patch.
Whiteboard: [needs landing]
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing]
During review of the D3D10 backend I realized that the Data object contains the width/height of the data associated with it, which in a sense means that these can be different for each Image. However, in all likelyhood the width/height will be the same for every Image per ImageContainer. Could we make this fact more explicit and require that every Image per ImageContainer have the same size? This will make having a pool of textures much easier.
There are video formats (chained Oggs) where not every frame has the same size.
(In reply to comment #26) > There are video formats (chained Oggs) where not every frame has the same size. However, in ogg and, I assume, most other video formats (I only checked quicktime) the size is still per chain and not per frame. From a performance point of view it makes more sense to be able to reuse frames than it does to reuse ImageContainers across chain changes, especially since frames are likely more expensive to allocate than ImageContainers.
The decoder runs on its own thread, and we can't change the ImageLayer's ImageContainer off the main thread. Synchronizing with the main thread could introduce unnecessary stutter (especially with off-main-thread rendering). > This will make having a pool of textures much easier. I'm not sure why. Isn't it fairly easy to optimize for the common case where a frame has the same size as a texture in the pool?
Having said that, it's also true that things would go rather wrong with the current setup if the frame size changes, since the size of the ImageLayer changes without reflow having happened. In practice, with off-main-thread-rendering, you'd see the new frame rendered with the old transform briefly until the main thread was able to update the transform to letterbox the new frame and possibly reflow to fit the new frame size. To address that, I think we need to set a size on the ImageLayer as well, and if the size of the ImageLayer and the size of the ImageContainer's current frame don't match, the ImageLayer will letterbox to fit. If we're doing that, we should probably just delegate all letterboxing to ImageLayer and not use a transform here at all. I'll file a bug for that. I stand by comment #28 though.
(In reply to comment #28) > The decoder runs on its own thread, and we can't change the ImageLayer's > ImageContainer off the main thread. Synchronizing with the main thread could > introduce unnecessary stutter (especially with off-main-thread rendering). > I'm not sure this would be that bad. I don't know that people have the expectation that switching chains is something that should happen smoothly. I'm also not sure that the visual glitch caused by not reflowing synchronously is that much better. Do any other players seemlesly do this? > > This will make having a pool of textures much easier. > > I'm not sure why. Isn't it fairly easy to optimize for the common case where a > frame has the same size as a texture in the pool? Perhaps, but it's not as obvious that it's the correct thing to do. For example neither the software nor the D3D implementation do this :). I'd prefer we make it more explicit in the design.
(In reply to comment #31) > (In reply to comment #28) > > The decoder runs on its own thread, and we can't change the ImageLayer's > > ImageContainer off the main thread. Synchronizing with the main thread could > > introduce unnecessary stutter (especially with off-main-thread rendering). > > I'm not sure this would be that bad. I don't know that people have the > expectation that switching chains is something that should happen smoothly. > I'm also not sure that the visual glitch caused by not reflowing > synchronously is that much better. Do any other players seemlesly do this? I don't know. Another case where this could be important is variable-bit-rate streaming. It's possible that in response to a bandwidth change we might want to change from one resolution to another while playing. In that case, we definitely don't want to stutter. If we do what I suggested in bug 533091, there would be no visual glitches. > > I'm not sure why. Isn't it fairly easy to optimize for the common case > > where a frame has the same size as a texture in the pool? > > Perhaps, but it's not as obvious that it's the correct thing to do. For > example neither the software nor the D3D implementation do this :). I'd > prefer we make it more explicit in the design. The software and D3D backends currently don't do any surface caching at all, but I think that's just because we haven't bothered yet so I wouldn't draw conclusions from that :-). I'm happy to add documentation to ImageContainer that says implementations should optimize for the case where Images in the container all have the same size, if that helps.
(In reply to comment #32) > (In reply to comment #31) > > (In reply to comment #28) > > > I'm not sure why. Isn't it fairly easy to optimize for the common case > > > where a frame has the same size as a texture in the pool? > > > > Perhaps, but it's not as obvious that it's the correct thing to do. For > > example neither the software nor the D3D implementation do this :). I'd > > prefer we make it more explicit in the design. > > The software and D3D backends currently don't do any surface caching at all, > but I think that's just because we haven't bothered yet so I wouldn't draw > conclusions from that :-). I'm happy to add documentation to ImageContainer > that says implementations should optimize for the case where Images in the > container all have the same size, if that helps. Alright, we'll see how it goes. Another thing that we should do is have the video decoder decode into an area of memory allocated by the layers backend. I've filed bug 553278 on this issue.
No longer blocks: 551277
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: