Remote D2D canvas drawing to the GPU process
Categories
(Core :: Graphics, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox69 | --- | fixed |
People
(Reporter: bobowen, Assigned: bobowen)
References
(Blocks 1 open bug)
Details
Attachments
(14 files, 29 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Assignee | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 8•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 14•6 years ago
|
||
Assignee | ||
Comment 15•6 years ago
|
||
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 18•6 years ago
|
||
Assignee | ||
Comment 19•6 years ago
|
||
Assignee | ||
Comment 20•6 years ago
|
||
Assignee | ||
Comment 21•6 years ago
|
||
Assignee | ||
Comment 22•6 years ago
|
||
Assignee | ||
Comment 23•6 years ago
|
||
Assignee | ||
Comment 24•6 years ago
|
||
Assignee | ||
Comment 25•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 26•6 years ago
|
||
Assignee | ||
Comment 27•6 years ago
|
||
Assignee | ||
Comment 28•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 29•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 30•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 31•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 32•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 33•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 34•6 years ago
|
||
Assignee | ||
Comment 35•6 years ago
|
||
Assignee | ||
Comment 36•6 years ago
|
||
Updated•6 years ago
|
Comment 37•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 38•6 years ago
|
||
Assignee | ||
Comment 39•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 40•6 years ago
|
||
Assignee | ||
Comment 41•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 42•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 43•6 years ago
|
||
Comment 44•6 years ago
|
||
Comment 45•6 years ago
|
||
Comment 46•6 years ago
|
||
Assignee | ||
Comment 47•6 years ago
|
||
Assignee | ||
Comment 48•6 years ago
|
||
Assignee | ||
Comment 49•6 years ago
|
||
Assignee | ||
Comment 50•6 years ago
|
||
Assignee | ||
Comment 51•6 years ago
|
||
Assignee | ||
Comment 52•6 years ago
|
||
Comment 53•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 54•6 years ago
|
||
Hey Ryan, we're nearly two months in here with little movement. What's the ETA on first run through?
Comment 55•6 years ago
|
||
Apologies, I've been preoccupied trying to ship scroll anchoring in 66. I've glanced through these patches and hope to do a first pass this week.
Comment 56•6 years ago
|
||
Comment 57•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 58•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #56)
Comment on attachment 9028022 [details] [diff] [review]
Part 13: Make the recording of surface data more efficientReview of attachment 9028022 [details] [diff] [review]:
::: gfx/2d/RecordedEventImpl.h
@@ +2727,2 @@}
}Was the difference from this change actually noticeable? I'd expect at least
some of this transformation to be performed by the compiler.
Looking at the disassembly (in Nightly) it appears to be doing the call to BytesPerPixel and the two multiplications on every iteration. Also for the SizeCollector the call to BytesPerPixel and a single multiplication is done per iteration (it drops the other multiplication because the pointer position isn't used).
I doubt it makes a big difference in the grand scheme of things, I think I did this before I found more important performance wins. I thought it was worth keeping anyway.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #57)
Comment on attachment 9027981 [details] [diff] [review]
Part 9: Add a D3D11 device to be used on canvas threads in the GPU processReview of attachment 9027981 [details] [diff] [review]:
Does it work to share the D3D11 device with the compositor? That would make
some of the texture synchronization easier and would help avoid the problems
that we've run into with texture sharing between devices. It would also be a
memory usage improvement.
When I tried this originally it crashed pretty quickly, but if we used the Direct2D synchronization (from part 15) around all of the Direct 3D and DXGI calls on the compositor thread as well, then that might well solve those problems.
I guess we could add a flag to the Direct3D texture code to set when we call Factory::CreateDrawTargetForD3D11Texture (and anywhere else) and then always use the synchronization.
Updated•6 years ago
|
Comment 59•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 60•6 years ago
|
||
Thanks rhunt (and jrmuizel) for starting to look at these, I know how busy you all are.
(In reply to Ryan Hunt [:rhunt] from comment #59)
Comment on attachment 9028027 [details] [diff] [review]
Part 14: Refactor Path recording to record Arc properlyReview of attachment 9028027 [details] [diff] [review]:
I'm not sure if this will conflict with this commit [1] by Bas that landed a
bit back.
Yeah, thanks, doesn't look like the rebase will be too bad.
+#define NEXT_PARAMS(_type) \
Can you undefine this macro when it's done being used? Just to ensure it
plays nice with unified builds.
Will do thanks.
@@ +60,5 @@
aPathSink.Close();
break;
default:
MOZ_MAKE_COMPILER_ASSUME_IS_UNREACHABLE(
"We control mOpTypes, so this should never happen.");
This is only true if we validate the incoming bytes of the PathOps
constructor that takes a stream. Do we have plans to do this, and if not
what will happen here?
I think I was originally using a similar case statement in the recording before I hit on the idea of having it all in one vector<uint8_t>, where this did hold. I obviously didn't change that when I copied it, thanks I'll fix it.
I think here I will return false and I need to check for that in the caller.
In the other cases I think I should probably just crash.
@@ +120,5 @@
- return newPathOps;
+}+size_t
+PathOps::Size() constWhat is this used for? Does it make sense to compute this when building
paths and transmit it along with recordings?
It's only used in RecordedPathCreation::OutputSimpleEventInfo, which isn't called normally, only if you add in some logging.
So, I didn't think it was worth computing up front as it isn't normally needed.
::: gfx/2d/PathRecording.h
@@ +43,5 @@
- bool StreamToSink(PathSink &aPathSink) const;
- PathOps TransformedCopy(const Matrix &aTransform) const;
- size_t Size() const;
It might be good to document whether this is size in bytes or amount of
items.
Will do.
::: gfx/2d/RecordedEventImpl.h
@@ +726,5 @@ReferencePtr mRefPtr;
FillRule mFillRule;
- PathOps* mPathOps;
- bool mOwnPathOps = false;
My understanding is that this is new optimization to prevent a copy when
recording a path.In this case, RecordedPathCreation is being initialized as a temporary in
DrawTargetRecording::EnsurePathStored where it's guaranteed that the path
will outlive this event, so it's okay that we take a pointer to the PathOps
and don't free it or worry about a dangling pointer.I'm a bit worried about this being broken unintentionally in the future. It
took me a bit to figure it out.Would it make more sense to instead use a RefPtr<PathRecording> to the
source path? Or as an additional pointer to ensure the lifetime for the
direct pointer works out. This would ensure it stays alive no matter how the
caller uses this class.Alternatively, if there's no good solution to this then please add a comment
explaining what's going on.
I think I was trying to use the same member for the recording and playback (because we only need the PathOps for playback).
But I think you're right, it'd be better if I just held a RefPtr<PathRecording> as well.
Comment 61•6 years ago
|
||
Comment 62•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #58)
Does it work to share the D3D11 device with the compositor? That would make
some of the texture synchronization easier and would help avoid the problems
that we've run into with texture sharing between devices. It would also be a
memory usage improvement.When I tried this originally it crashed pretty quickly, but if we used the Direct2D synchronization (from part 15) around all of the Direct 3D and DXGI calls on the compositor thread as well, then that might well solve those problems.
I guess we could add a flag to the Direct3D texture code to set when we call Factory::CreateDrawTargetForD3D11Texture (and anywhere else) and then always use the synchronization.
You might have seen this already, but this is documented at https://docs.microsoft.com/en-us/windows/desktop/direct2d/multi-threaded-direct2d-apps
Ultimately, only one thread can be using the D3DDevice's immediate context at a time, so we need to lock around all the compositor threads usage of the device.
This does mean that <canvas> work would block the compositor, whereas with separate devices, we can have some amount of preemption.
The lower memory usage, and less cross-device sharing is appealing though, would make a good follow-up investigation.
Comment 63•6 years ago
|
||
Comment 64•6 years ago
|
||
Comment 65•6 years ago
|
||
Comment 66•6 years ago
|
||
Comment 67•6 years ago
|
||
Assignee | ||
Comment 68•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #61)
Comment on attachment 9028447 [details] [diff] [review]
Part 12: Add CanvasParent, CanvasChild and RecordedTextureData
Thanks for looking at all of this and all of the feedback.
I'm tied up with other work at the moment, but I'll try and answer questions and then update the patches when I get a chance.
We have some gfx documentation in gfx/docs/, do you think you could put your
explainer doc into that folder as part of this work?
Yes that's the plan after it's had some more work and formatting, although I wasn't sure where to put it, so that answers that. :-)
::: gfx/layers/client/TextureRecorded.h
...
Any reason for the naming to be reversed for the filename?
I think I named the class before I realised that the other files and classes were the other way around, so I'll change it to be consistent.
::: gfx/layers/ipc/CanvasChild.cpp
@@ +139,5 @@
+already_AddRefed<gfx::DataSourceSurface>
+CanvasChild::GetDataSurface(const gfx::SourceSurface* aSurface)
+{
- MOZ_ASSERT(aSurface);
You mentioned in the notes that you want to make this work from other
threads. What other threads are consumers of this?Maybe add a main thread assert/warning for the meantime?
I think I saw a situation where a canvas texture ended up on the paint thread, but yeah I'll add an assertion to make this obvious until I can do the follow-up.
::: gfx/layers/ipc/CompositorBridgeChild.cpp
@@ +258,5 @@
...
- if (gfx::gfxVars::RemoteCanvasEnabled()) {
Should we do this lazily? Maybe a TODO even if you don't want to do so now.
Yeah, I was more concerned about making the creation of the ring buffer lazy, but the whole thing probably could be as well.
I'll add as another follow-up.
::: gfx/layers/ipc/PCanvas.ipdl
@@ +19,5 @@
- async CreateTranslator(TextureType aTextureType, Handle aReadHandle,
CrossProcessSemaphoreHandle aReaderSem,
CrossProcessSemaphoreHandle aWriterSem);
- async StartTranslation();
My understanding is that this gets called when we detect that the reader has
transitioned into the Stopped state (which we can check from the shmem ring
buffer), and we need this explicit API to tell it to wake up again?Calling this ResumeTranslation, or NotifyNewDataAvailable or something along
those lines.Documenting both methods here would be really useful.
I think ResumeTranslation is much better thanks, I'll rename.
Sorry I'd missed these as I was going through adding more comments (there are no doubt other places as well).
Assignee | ||
Comment 69•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #62)
(In reply to Bob Owen (:bobowen) from comment #58)
Does it work to share the D3D11 device with the compositor? That would make
some of the texture synchronization easier and would help avoid the problems
that we've run into with texture sharing between devices. It would also be a
memory usage improvement.When I tried this originally it crashed pretty quickly, but if we used the Direct2D synchronization (from part 15) around all of the Direct 3D and DXGI calls on the compositor thread as well, then that might well solve those problems.
I guess we could add a flag to the Direct3D texture code to set when we call Factory::CreateDrawTargetForD3D11Texture (and anywhere else) and then always use the synchronization.
You might have seen this already, but this is documented at https://docs.microsoft.com/en-us/windows/desktop/direct2d/multi-threaded-direct2d-apps
Yeah, that is where I found this originally to make the sharing across the canvas threads work.
I leave the possibility of sharing devices to a follow-up as you suggest.
Assignee | ||
Comment 70•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #63)
...
- if (gfxVars::UseWebRender() || gfxVars::RemoteCanvasEnabled()) {
Why does remote canvas mean we need to create a DirectComposition device?
Hmm, I think this might be a hangover from when I was trying to force the creation of the device that the DrawTargetD2D1 depended on.
I'll try and get rid of it.
(In reply to Matt Woodrow (:mattwoodrow) from comment #64)
Comment on attachment 9027992 [details] [diff] [review]
Part 10: Add a CanvasTranslator and canvas recorded events
...
RefPtr<ID3D11Device> device =
gfx::DeviceManagerDx::Get()->GetCanvasDevice();
Can we also assert on device being non-null, since D3D11TextureData::Create
will go looking for a different one if it's null, and we probably don't want
that.
Ah yes, I'll add a check for that.
- TextureMap::const_iterator result = mTextureDatas.find(aDrawTarget);
Should we be holding the mTextureDatasMonitor when accessing mTextureDatas?
This should be on the same thread that might modify it, so I think we're OK here.
Only WaitForTextureData gets call from the compositor thread, which is why that and the modifications hold the Monitor.
We have the DataMutex class to make the equivalent pattern impossible when
using Mutexes, would be nice if we could have a Monitor equivalent.
Yeah, it would be good to have this checked with assertions, the class would have to hold the thread that it can be modified on, as it could be any one of the canvas threads for each translation run, but that should be doable.
Does DataMutex allow for this sort of case, where we only modify on one thread and don't want to lock for read access on that thread?
::: gfx/layers/RecordedCanvasEventImpl.h
@@ +218,5 @@
+inline bool
+RecordedTextureUnlock::PlayCanvasEvent(CanvasTranslator* aTranslator) const
+{
- TextureData* textureData = aTranslator->LookupTextureData(mDT);
The translator's mTextureDatas owns these TextureData* (using UniquePtr),
and is also protected by a Monitor (suggesting that it's accessed from
multiple threads).Are we guaranteed that this won't be deleted from under us between the
LookupTextureData call, and the Unlock?Maybe we could have assertions around which threads can read/write the
mTextureDatas map, and only allow accessing the raw pointers from the
(hopefully single) write thread?
Hmm, the compositor thread is the only other thread that accesses that is not the (current) write thread.
With a call to WaitForTextureData through LookupTextureDataForClientDrawTarget.
I don't think the TextureData can normally get deleted, because the compositor thread holds a read lock on the associated TextureClient in the content process.
That might break down in some failure scenarios though and I agree it's a non-obvious guarantee either way.
Actually, the compositor thread only uses this to call Serialize to get a SurfaceDescriptor to create its TextureHost.
I should probably just call Serialize on the canvas thread and store the SurfaceDescriptors in a separate container and change this Monitor to protect that.
We would add on the canvas thread and remove on the compositor thread (think this set-up only happens once per texture).
mTextureDatas would then only be accessed on a single thread at a time.
Assignee | ||
Comment 71•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #65)
Comment on attachment 9028448 [details] [diff] [review]
Part 15: Spread the playback of canvas recordings across multiple threads in
the GPU process
...+class MOZ_RAII AutoMoz2DSerialize final
AutoSerializeWithMoz2D?
That's a much better name, thanks.
::: gfx/layers/RecordedCanvasEventImpl.h
@@ +175,5 @@return false;
}
- gfx::AutoMoz2DSerialize moz2dSerialize(
aTranslator->GetReferenceDrawTarget()->GetBackendType());
It would be nice if we could somehow verify that we're not missing any
callers that access our d3d device (current or future).Have you tried running with D2D and the D3D device's debugging modes enabled?
I used one of these to track down a couple of problems I had with all of this, but I'm not sure I had both of these turned on.
I'll look into it.
Assignee | ||
Comment 72•6 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
Comment on attachment 9028446 [details] [diff] [review]
Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorderReview of attachment 9028446 [details] [diff] [review]:
::: gfx/2d/RecordedEvent.h
@@ +194,5 @@
- @param aRecordedEvent the event to record
- */
- template<class RE>
- void RecordEvent(const RE* aRecordedEvent)
Do you know how much of a performance difference it makes having 3 different
implementations of this code (RecordEvent, write and WriteInParts)? Why is
MemWriter faster? Is it just because we only have to call UpdateWriteCount
once per-event?
MemWriter is much faster because there are no checks for space and the compiler can just merge any continuous fixed length WriteElements in the specific RecordedEventDerived<Derived>::Record(MemWriter) into direct memory movements in assembly.
The compiler also does a similar thing instead of calling memcpy for write, but only for individual WriteElements because of the need to check for space each time.
If I make write more complicated, then the compiler stops doing this, although I don't think it's as important as using MemWriter. Maybe I could convince the compiler to still do this by letting it know which paths are more likely.
It would be even nicer to change the RecordedEvents to do one fixed length write and optionally one (or more) variable length writes, then the compiler would probably always optimize. That would be a large separate refactor though, so I decided not to go down that route. It would probably help on the performance of the reading side as well though.
Are there any callers to write() that happen when RecordEvent isn't on the
stack?
I don't think so.
It seems like we could just implement write as the contents of WriteInParts,
and call UpdateWriteCount once, at the end of RecordEvent.
Quite possibly, but from what I remember this was more about getting the compiler to do multiple simple memory movements in assembly, instead of multiple calls to memcpy.
(In reply to Matt Woodrow (:mattwoodrow) from comment #67)
Comment on attachment 9028446 [details] [diff] [review]
Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorderReview of attachment 9028446 [details] [diff] [review]:
Would it be possible to make the RingBuffer a standalone class (ideally in
ipc/glue/)?I'm finding it hard to reason about the RecordedEvent logic interleaved the
strict ring buffer code, and I think that's making this feel more
complicated than it needs to.If we did that then we could get another reviewer (without requiring
gfx/moz2d experience) to look over the thread safety of the actual ring
buffer, which I think would be valuable.
I think this is a bit difficult with the way the recording is currently set up in Moz2D.
The recording is done through virtual calls (through RecordToStream), with the actual recording (in RecordedEventImpl.h) not usable directly from outside of Moz2D.
In order to get the performance benefits described above the writes need to be in Moz2D not the other side of a virtual call and Moz2D can't include code from elsewhere (and certainly as it stands the ring buffer isn't generic enough to put into mfbt).
I did make some attempts to untangle this, but I don't think it's a small job.
If the RecordedEvent definitions and some of the templated code in RecordedEventImpl.h could be moved back into RecordedEvent.h then I think using a Stream defined outside of Moz2D would be easier.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 73•6 years ago
|
||
Stealing the review back from rhunt, because I probably have more time then he does now.
Comment 74•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #68)
You mentioned in the notes that you want to make this work from other
threads. What other threads are consumers of this?Maybe add a main thread assert/warning for the meantime?
I think I saw a situation where a canvas texture ended up on the paint thread, but yeah I'll add an assertion to make this obvious until I can do the follow-up.
This does seem likely, if you wrap the <canvas> in some other effect that needs to be drawn on the content side (any filter without WR, complex filter chains with WR), then we'd end up trying to draw the <canvas> DT with our paint thread recordings.
I don't think sending a request to the main thread and waiting will help here, since it's very possible for the main thread to already be waiting on the paint threads to complete.
Assignee | ||
Comment 75•6 years ago
|
||
Assignee | ||
Comment 76•6 years ago
|
||
r=jrmuizel - rebased version of part 1 in case you want to apply the roll-up and compile.
Comment 77•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #72)
(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
Comment on attachment 9028446 [details] [diff] [review]
Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorderReview of attachment 9028446 [details] [diff] [review]:
::: gfx/2d/RecordedEvent.h
@@ +194,5 @@
- @param aRecordedEvent the event to record
- */
- template<class RE>
- void RecordEvent(const RE* aRecordedEvent)
Do you know how much of a performance difference it makes having 3 different
implementations of this code (RecordEvent, write and WriteInParts)? Why is
MemWriter faster? Is it just because we only have to call UpdateWriteCount
once per-event?MemWriter is much faster because there are no checks for space and the compiler can just merge any continuous fixed length WriteElements in the specific RecordedEventDerived<Derived>::Record(MemWriter) into direct memory movements in assembly.
The compiler also does a similar thing instead of calling memcpy for write, but only for individual WriteElements because of the need to check for space each time.
If I make write more complicated, then the compiler stops doing this, although I don't think it's as important as using MemWriter. Maybe I could convince the compiler to still do this by letting it know which paths are more likely.It would be even nicer to change the RecordedEvents to do one fixed length write and optionally one (or more) variable length writes, then the compiler would probably always optimize. That would be a large separate refactor though, so I decided not to go down that route. It would probably help on the performance of the reading side as well though.
Are there any callers to write() that happen when RecordEvent isn't on the
stack?I don't think so.
It seems like we could just implement write as the contents of WriteInParts,
and call UpdateWriteCount once, at the end of RecordEvent.Quite possibly, but from what I remember this was more about getting the compiler to do multiple simple memory movements in assembly, instead of multiple calls to memcpy.
(In reply to Matt Woodrow (:mattwoodrow) from comment #67)
Comment on attachment 9028446 [details] [diff] [review]
Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorderReview of attachment 9028446 [details] [diff] [review]:
Would it be possible to make the RingBuffer a standalone class (ideally in
ipc/glue/)?I'm finding it hard to reason about the RecordedEvent logic interleaved the
strict ring buffer code, and I think that's making this feel more
complicated than it needs to.If we did that then we could get another reviewer (without requiring
gfx/moz2d experience) to look over the thread safety of the actual ring
buffer, which I think would be valuable.I think this is a bit difficult with the way the recording is currently set up in Moz2D.
The recording is done through virtual calls (through RecordToStream), with the actual recording (in RecordedEventImpl.h) not usable directly from outside of Moz2D.
In order to get the performance benefits described above the writes need to be in Moz2D not the other side of a virtual call and Moz2D can't include code from elsewhere (and certainly as it stands the ring buffer isn't generic enough to put into mfbt).
I did make some attempts to untangle this, but I don't think it's a small job.
If the RecordedEvent definitions and some of the templated code in RecordedEventImpl.h could be moved back into RecordedEvent.h then I think using a Stream defined outside of Moz2D would be easier.
I'm confused by this. It seems like the EventRingBuffer abstraction already allows CanvasEventRingBuffer to be outside of Moz2D. Can't CanvasEventRingBuffer be the RingBuffer standalone class?
Comment 78•6 years ago
|
||
Also, it seems like this kind of thing would be useful for the WebGL remoting work. Do you know what's happening there? Can we share infrastructure?
Assignee | ||
Comment 79•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #77)
(In reply to Bob Owen (:bobowen) from comment #72)
...I'm finding it hard to reason about the RecordedEvent logic interleaved the
strict ring buffer code, and I think that's making this feel more
complicated than it needs to.If we did that then we could get another reviewer (without requiring
gfx/moz2d experience) to look over the thread safety of the actual ring
buffer, which I think would be valuable.I think this is a bit difficult with the way the recording is currently set up in Moz2D.
The recording is done through virtual calls (through RecordToStream), with the actual recording (in RecordedEventImpl.h) not usable directly from outside of Moz2D.
In order to get the performance benefits described above the writes need to be in Moz2D not the other side of a virtual call and Moz2D can't include code from elsewhere (and certainly as it stands the ring buffer isn't generic enough to put into mfbt).
I did make some attempts to untangle this, but I don't think it's a small job.
If the RecordedEvent definitions and some of the templated code in RecordedEventImpl.h could be moved back into RecordedEvent.h then I think using a Stream defined outside of Moz2D would be easier.I'm confused by this. It seems like the EventRingBuffer abstraction already allows CanvasEventRingBuffer to be outside of Moz2D. Can't CanvasEventRingBuffer be the RingBuffer standalone class?
Sorry I wasn't very clear there, I was talking about the known length (at compile time) memcpys.
This means that the compiler can convert multiple contiguous (in the case of MemWriter in EventRingBuffer::RecordEvent) or single WriteElements (in the case of EventRingBuffer::write) straight into assembly, instead of lots of calls to memcpy for small lengths.
If that part of EventRingBuffer is outside of Moz2D the other side of a virtual call then I don't think the compiler can do that.
The rest can and is outside of Moz2D. Obviously some of it needs to be because it uses things that Moz2D can't.
I guess that could be moved out of layers and somewhere more generic, but it would still be very specific for use with Moz2D recording, because of the above.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #78)
Also, it seems like this kind of thing would be useful for the WebGL remoting work. Do you know what's happening there? Can we share infrastructure?
It has its own more generic producer-consumer queue (at least last time I saw it). It seemed quite complicated for the simple char* writes and reads that Moz2D recording needs.
It's possible that these problems can be solved and we could eventually use more common ring buffer code, but in order to get these optimizations I think Moz2D recording would need some rework.
Comment 80•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #79)
Sorry I wasn't very clear there, I was talking about the known length (at compile time) memcpys.
This means that the compiler can convert multiple contiguous (in the case of MemWriter in EventRingBuffer::RecordEvent) or single WriteElements (in the case of EventRingBuffer::write) straight into assembly, instead of lots of calls to memcpy for small lengths.
If that part of EventRingBuffer is outside of Moz2D the other side of a virtual call then I don't think the compiler can do that.The rest can and is outside of Moz2D. Obviously some of it needs to be because it uses things that Moz2D can't.
I guess that could be moved out of layers and somewhere more generic, but it would still be very specific for use with Moz2D recording, because of the above.
I've filed bug 1529677 to move the writing of the event type into RecordToStream. If we take that change and move some of the reading complexity so that we only adjust the counts once per event it seems like it should be possible consolidate more of the ring buffer code in CanvasEventRingBuffer.
i.e. the code in UpdateWriteCount and UpdateReadCount would be moved to virtual functions, with these functions ideally (when not reading/writing in parts) only being called once per event.
Does this seem workable?
Assignee | ||
Comment 81•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
(In reply to Bob Owen (:bobowen) from comment #79)
...
I've filed bug 1529677 to move the writing of the event type into RecordToStream. If we take that change and move some of the reading complexity so that we only adjust the counts once per event it seems like it should be possible consolidate more of the ring buffer code in CanvasEventRingBuffer.i.e. the code in UpdateWriteCount and UpdateReadCount would be moved to virtual functions, with these functions ideally (when not reading/writing in parts) only being called once per event.
Does this seem workable?
Moving the writing of the event type does seem like an improvement.
Although I don't think having the single update to the write and read counts are the main benefit for MemWriter, I think it
more comes from the efficiency of the memory copying.
Indeed once you have taken the hit of doing the copies separately (either through memcpy or more efficient means) there might be a benefit in updating the counts quickly, because it means the other side can then start reading or writing, if it is waiting.
I'm happy to try and move more stuff into CanvasEventRingBuffer though and see if it has a detectable effect on performance.
Comment 82•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #81)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
I'm happy to try and move more stuff into CanvasEventRingBuffer though and see if it has a detectable effect on performance.
It would also be good to get some gtests for the ring buffer.
Comment 83•6 years ago
|
||
Assignee | ||
Comment 84•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #82)
(In reply to Bob Owen (:bobowen) from comment #81)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
I'm happy to try and move more stuff into CanvasEventRingBuffer though and see if it has a detectable effect on performance.It would also be good to get some gtests for the ring buffer.
More testing would definitely be good, at the moment it very much relies on integration testing with the canvas tests.
I've not used gtests or unit tested something similar to this before, do you know of any good examples I could look at?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #83)
Comment on attachment 9028001 [details] [diff] [review]
Part 11: Add DataSurfaceProvider to be used to retrieve surface data from a
SourceSurfaceRecordingReview of attachment 9028001 [details] [diff] [review]:
I think my preference would be for Canvas to deal with getting the pixel
data for a DrawTargetRecording without needing DrawTargetRecording to know
about it. Is that possible?
Possibly, my only reservation is when the SourceSurface is handed off to something else which then needs the data.
I guess Canvas would know when that will happen though, I'll look into it.
I should be able to get back to this soon.
Assignee | ||
Comment 85•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #53)
Comment on attachment 9028447 [details] [diff] [review]
Part 12: Add CanvasParent, CanvasChild and RecordedTextureDataReview of attachment 9028447 [details] [diff] [review]:
The IPC parts look reasonable.
(I was going to point out CrossProcessSemaphore::CloseHandle, to save
resources after the semaphore has been shared, but it's a no-op on Windows
and this is Windows-only, so it looks like you won't need it.)
I must have missed this originally.
This is currently Windows only, but should be fairly easy to extend to other platforms.
At first I thought I was already doing this, but that was for the ring buffer shared memory.
I see this is basically the same for CrossProcessSemaphore's internal shared memory on posix.
I've added them locally to part 8 that creates the CrossProcessSemaphores, thanks.
Comment 86•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #84)
More testing would definitely be good, at the moment it very much relies on integration testing with the canvas tests.
I've not used gtests or unit tested something similar to this before, do you know of any good examples I could look at?
We have some stuff in https://searchfox.org/mozilla-central/source/gfx/tests/gtest. I'm not sure what kind of testing we have for IPC things.
Assignee | ||
Comment 87•6 years ago
|
||
I've rebased and made changes addressing up to comment 80.
While running performance tests to get a baseline for the suggested changes in that comment, I've hit issues where occasionally the reading (or possible recording) of RecordedPathCreation (or maybe RecordedPathDestruction) is failing. I'm struggling at the moment to see what the issue is.
I'll pick this up again when I get back from PTO.
Updated•6 years ago
|
Assignee | ||
Comment 88•6 years ago
|
||
I fixed the race I had, which was in the code when I had a clash between the Reader moving from Waiting to Stopped and the Writer moving it from Waiting to Processing. This only showed up on my new dev machine, I think because of the large number of cores available.
I'm going to push new patches to Phabricator.
Unfortunately, I think that means previously reviewed patches will need to be r+ed again.
(In reply to Bob Owen (:bobowen) from comment #81)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
(In reply to Bob Owen (:bobowen) from comment #79)
...
I've filed bug 1529677 to move the writing of the event type into RecordToStream. If we take that change and move some of the reading complexity so that we only adjust the counts once per event it seems like it should be possible consolidate more of the ring buffer code in CanvasEventRingBuffer.i.e. the code in UpdateWriteCount and UpdateReadCount would be moved to virtual functions, with these functions ideally (when not reading/writing in parts) only being called once per event.
Does this seem workable?
Moving the writing of the event type does seem like an improvement.
Although I don't think having the single update to the write and read counts are the main benefit for MemWriter, I think it
more comes from the efficiency of the memory copying.
Indeed once you have taken the hit of doing the copies separately (either through memcpy or more efficient means) there might be a benefit in updating the counts quickly, because it means the other side can then start reading or writing, if it is waiting.I'm happy to try and move more stuff into CanvasEventRingBuffer though and see if it has a detectable effect on performance.(In reply to Bob Owen (:bobowen) from comment #84)
I've moved most of the reading and writing logic back into CanvasEventRingBuffer, Only the MemWriter part is still in EventRingBuffer. This doesn't seem to affect the performance, which is mainly from using MemWriter most of the tome anyway.
I think it generally makes thing easier to follow in CanvasEventRingBuffer.
This is done on top of the patch for bug 1529677.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #82)
(In reply to Bob Owen (:bobowen) from comment #81)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
I'm happy to try and move more stuff into CanvasEventRingBuffer though and see if it has a detectable effect on performance.It would also be good to get some gtests for the ring buffer.
More testing would definitely be good, at the moment it very much relies on integration testing with the canvas tests.
I've not used gtests or unit tested something similar to this before, do you know of any good examples I could look at?
I've not had chance to look at this yet, but I'd like to get something landed (preffed off) and make this one of the follow-ups before enabling on Nightly.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #83)
Comment on attachment 9028001 [details] [diff] [review]
Part 11: Add DataSurfaceProvider to be used to retrieve surface data from a
SourceSurfaceRecordingReview of attachment 9028001 [details] [diff] [review]:
I think my preference would be for Canvas to deal with getting the pixel
data for a DrawTargetRecording without needing DrawTargetRecording to know
about it. Is that possible?Possibly, my only reservation is when the SourceSurface is handed off to something else which then needs the data.
I guess Canvas would know when that will happen though, I'll look into it.
I should be able to get back to this soon.
I've managed to do this in what is hopefully an acceptable way.
In order to be able to intervene and fetch the data, I've made PersistentBufferProvider::BorrowSnapshot allowable inside or outside of a lock and made CanvasRenderingContext2D use it whenever it fetches a SourceSurface for which we require (or might require) the data.
I think this also removes the possibility of that needing to happen off the main thread.
Assignee | ||
Comment 89•6 years ago
|
||
Assignee | ||
Comment 90•6 years ago
|
||
This patch modifies DoWithEvent so that we can more easily have a
DoWithEventFromStream callable from outside of Moz2D similar to
LoadEventFromStream. We will add that in a later patch for the new
EventRingBuffer. It also changes the only user of LoadEventFromStream
over to it, so we can can get rid of it and LoadEvent entirely.
Depends on D27076
Assignee | ||
Comment 91•6 years ago
|
||
Depends on D27078
Assignee | ||
Comment 92•6 years ago
|
||
Depends on D27079
Assignee | ||
Comment 93•6 years ago
|
||
Otherwise, we crash in the content process when we try to record this.
Depends on D27080
Assignee | ||
Comment 94•6 years ago
|
||
This is ground work for when we will be returning a recording TextureData for
certain types in subsequent patches.
Depends on D27082
Assignee | ||
Comment 95•6 years ago
|
||
This is so we don't need to lock the previous back buffer when it might also be
locked by the compositor. These locks are generally for copying to the next
back buffer or when getting image data from the previous back buffer.
This also makes it easier to asynchronously cache the DataSourceSurface in the
GPU process, when a page is using getImageData. This is done in a later patch.
Depends on D27086
Assignee | ||
Comment 96•6 years ago
|
||
These are to be used as part of recording canvas drawing in the content
processes and playing it back in the GPU process through shared memory.
Depends on D27088
Assignee | ||
Comment 97•6 years ago
|
||
Depends on D27089
Assignee | ||
Comment 98•6 years ago
|
||
These are extensions to the Moz2D RecordedEvents to record and play back canvas
texture related functions in the GPU process.
The CanvasTranslator handles the playback of these and the Moz2D ones.
Depends on D27090
Assignee | ||
Comment 99•6 years ago
|
||
Depends on D27091
Assignee | ||
Comment 100•6 years ago
|
||
RecordedTextureData records TextureData calls for play back in the GPU process.
CanvasChild and CanvasParent set up the recorder and translator.
They also help to manage the starting of translation and co-ordinating the
translation with the frame transactions.
This patch also includes other changes to wire up recording and playback.
Depends on D27092
Assignee | ||
Comment 101•6 years ago
|
||
Depends on D27094
Assignee | ||
Comment 102•6 years ago
|
||
This also improves the recording and translation speeds.
Depends on D27095
Assignee | ||
Comment 103•6 years ago
|
||
Depends on D27096
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 104•6 years ago
|
||
Bob, one thing I was just thinking about that may have missed. How does this change handle image data? Does it get put in the recording every frame? e.g. If I'm drawing a bunch of 10k x 10k images to canvas every frame, how is that handled?
Assignee | ||
Comment 105•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #104)
Bob, one thing I was just thinking about that may have missed. How does this change handle image data? Does it get put in the recording every frame? e.g. If I'm drawing a bunch of 10k x 10k images to canvas every frame, how is that handled?
I haven't done anything specific for cases like that and I may be misunderstanding what you mean, but ...
In general assuming we're talking about the same images (SourceSurfaces?) being used over and over again then I think the caching will kind of work as it does now. Except you have a SourceSurfaceRecording cached in the content process that references the real SourceSurface in the GPU process.
If you have any examples of this sort of behaviour then I can test to see how it goes.
Comment 106•6 years ago
|
||
Hey bob I've been asked to help review "Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorder" -- is the ring buffer design copied or based off any other implementation/literature? It's a pretty subtle thing to review so I want to be sure I have all the context I can.
Comment 107•6 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #105)
If you have any examples of this sort of behaviour then I can test to see how it goes.
What prevents this example from consuming infinite memory?
<html>
<canvas width=1920 height=1920 id=dest></canvas>
<script>
c = document.getElementById("dest").getContext("2d");
i = 0;
function loadImage() {
var img = new Image;
img.onload = function() {
c.drawImage(img, 0, 0);
img.src = "";
loadImage();
}
img.src = "http://lorempixel.com/1920/1920/?" + i++
}
loadImage();
</script>
Updated•6 years ago
|
Assignee | ||
Comment 108•6 years ago
|
||
(In reply to Alexis Beingessner [:Gankro] from comment #106)
Hey bob I've been asked to help review "Part 8: Add a CanvasEventRingBuffer and CanvasDrawEventRecorder" -- is the ring buffer design copied or based off any other implementation/literature? It's a pretty subtle thing to review so I want to be sure I have all the context I can.
Hi Alexis, I just read the basic theory and then implemented from scatch to fit in with the fairly simple requirements for Moz2D recording and our IPC and synchronisation classes. Sorry to be brief on mobile.
Comment 109•5 years ago
|
||
I wrote a sync version of the script above but it consumes infinite memory on current Firefox so maybe it's ok not to worry too much about.
<html>
<canvas width=1920 height=1920 id=src></canvas>
<canvas width=1920 height=1920 id=dest></canvas>
<script>
c = document.getElementById("dest").getContext("2d");
src = document.getElementById("src").getContext("2d");
for (var i = 0; i < 100; i++ ) {
src.fillStyle = `rgb(${255.*Math.random()}, 0, 255, 1)`;
src.fillRect(Math.random()*1920, Math.random()*1920, 150, 100);
}
for (var i = 0; i<10000; i++) {
img = src.getImageData(0, 0, 1920, 1920);
c.putImageData(img, 0, 0);
}
</script>
Assignee | ||
Comment 110•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #109)
I wrote a sync version of the script above but it consumes infinite memory on current Firefox so maybe it's ok not to worry too much about.
Yeah, this eats impressive amounts of memory in the content process with the remote canvas enabled and disabled.
In the dev build it crashes the GPU process because I currently have a gfxDevCrash if an event fails to play back and I think it fails at some point because of a timeout.
If I remove the gfxDevCrash is seems to behave in a fairly similar manner either way.
I might need to think about removing the gfxDevCrash and handle things differently before we enable on Nightly or possible before we roll out.
However I think having the crash is useful at the moment, but maybe I should have another gfxDevCrash at the actual timeout, to distinguish between that and a failure for a different reason.
Comment 111•5 years ago
|
||
Comment 112•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/be11539bd8d8
https://hg.mozilla.org/mozilla-central/rev/274a9d9596a2
https://hg.mozilla.org/mozilla-central/rev/b2e0e341bb82
https://hg.mozilla.org/mozilla-central/rev/76f6050cb57f
https://hg.mozilla.org/mozilla-central/rev/c83dbcc4dade
https://hg.mozilla.org/mozilla-central/rev/258c6c199656
https://hg.mozilla.org/mozilla-central/rev/a2720ec3086f
https://hg.mozilla.org/mozilla-central/rev/fcc9f5f6dfe1
https://hg.mozilla.org/mozilla-central/rev/0e6cf27e9728
https://hg.mozilla.org/mozilla-central/rev/04067aec22bb
https://hg.mozilla.org/mozilla-central/rev/e32d6c3ba887
https://hg.mozilla.org/mozilla-central/rev/4357d695b8d5
https://hg.mozilla.org/mozilla-central/rev/09b2e60abc85
https://hg.mozilla.org/mozilla-central/rev/2fd1ba3d96cd
https://hg.mozilla.org/mozilla-central/rev/2195b79ea888
Updated•5 years ago
|
Updated•5 years ago
|
Description
•