Open Bug 900785 Opened 11 years ago Updated 1 year ago

Report missed frames to the profiler and to tests

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: roc, Unassigned)

References

Details

(Whiteboard: [leave open])

Attachments

(3 files, 7 obsolete files)

(deleted), patch
avih
: review+
Details | Diff | Splinter Review
(deleted), patch
avih
: review+
Details | Diff | Splinter Review
(deleted), patch
bas.schouten
: review+
avih
: feedback+
Details | Diff | Splinter Review
On Windows >= 7 at least, and maybe other platforms, I think we can use platform APIs to detect when we missed a frame --- i.e. consecutive screen refreshes with no intervening composite by Gecko. This would be helpful for monitoring and debugging smoothness issues like in bug 894128. It seems like the best place to observe this is in the compositor. But we'd like to be able to relate it back to main thread refresh driver ticks.
I'm thinking of the following setup: -- Give each nsRefreshDriver::Tick a unique ID and include that ID in PROFILER_LABEL("nsRefreshDriver", "Tick"). -- Pass that ID through to the layer system (and with OMTC, over to the compositor thread). Also pass along a boolean indicating if this update is intended to be the next frame after the previous update. That will be false when the refresh driver was sleeping before the current tick. -- When the compositor detects a missed frame that should not have been missed, issue a PROFILER_LABEL with the ID of the missed tick. Or if we detect that there was no layer-tree update during the missed frame interval, issue a different PROFILER_LABEL (it must have been the compositor's fault alone). I think that should take care of the profiler side. For tests I think we should have a per-nsIWidget count of missed frames, protected by a mutex, updated by the compositor and readable via some preffable DOM API.
Nick points out that we can use the mMostRecentRefresh of the Tick as its ID. We already pass this to the layer system using LayerManager::SetPaintStartTime.
This is pretty much exactly what I was planning. I was planning on capturing the period of time where the refresh driver is 'inactive'. So the profiler UI can ignore these periods and during the periods of time where the refresh driver is active it can infer miss frames and frame latency. I plan on writing custom UI to show frame time budget but I'm not entirely sure how to visualize everything yet. Attachment 773765 [details] [diff] will let us insert more flexible data into the profile.
What I have in mind will probably take long so feel free to go for a simpler solution in the mean time.
I've written the code (for the D3D10 main-thread layer manager at least) but I can't test it because I can't get SPS to work at all. Soon after startup it just disables itself. I've disabled the mLockedForPrivateBrowsing code too.
Hmm, maybe I didn't disable that code hard enough.
Attached patch D3D10 integration with SPS (obsolete) (deleted) — — Splinter Review
This seems to work OK but reports a lot of missed frames. It could be a bug in the reporting mechanism, but it looks just as likely that we really do miss a lot of frames.
Comment on attachment 785621 [details] [diff] [review] D3D10 integration with SPS Review of attachment 785621 [details] [diff] [review]: ----------------------------------------------------------------- I think the patch refers to a previous experimental patch on the same subject. E.g. the following are not at m-c: ::: layout/base/nsRefreshDriver.cpp @@ -348,5 @@ > > PreciseRefreshDriverTimerWindowsDwmVsync(double aRate, bool aPreferHwTiming = false) > : PreciseRefreshDriverTimer(aRate) > , mPreferHwTiming(aPreferHwTiming) > - , mLastRefreshesDisplayed(0) Not in m-c. @@ -457,5 @@ > - if (refreshDriver->mLastRefreshesDisplayed > 0) { > - uint64_t framesDisplayedSinceLastTick = > - timingInfo.cRefreshesDisplayed - refreshDriver->mLastRefreshesDisplayed; > - if (framesDisplayedSinceLastTick == 0) { > - printf("Redundant tick!\n"); This whole TimerTick is not part of m-c.
Attached patch correct patch (obsolete) (deleted) — — Splinter Review
This patch almost always reports a missed frame at the start of a burst of animation. The very first frame isn't reported, of course, because the refresh driver reports its not animated, but the next frame is reported as missed. This may just be a bug in the instrumentation, or it may be a real bug.
Attached patch even more correct patch (obsolete) (deleted) — — Splinter Review
Attachment #785621 - Attachment is obsolete: true
Attachment #786059 - Attachment is obsolete: true
This patch seems to consistently say that the second frame of an animation is late by the number of frames the compositor was suspended for before the animation started. So that's clearly a bug, but the code looks right to me, so I need to dig deeper.
DwmGetCompositionTimingInfo is messing with me: *** CheckMissedFrames time=364980604 mScreenRefreshCounterAtLastPresent=473 cRefreshesDisplayed=474 refresh driver tick=616314625582062 *** CheckMissedFrames time=364980632 mScreenRefreshCounterAtLastPresent=474 cRefreshesDisplayed=478 refresh driver tick=616314677929375 *** CheckMissedFrames time=364980655 mScreenRefreshCounterAtLastPresent=478 cRefreshesDisplayed=479 refresh driver tick=616314730276688 *** CheckMissedFrames time=364981367 mScreenRefreshCounterAtLastPresent=479 cRefreshesDisplayed=481 refresh driver tick=616315915690034 *** CheckMissedFrames time=364981383 mScreenRefreshCounterAtLastPresent=481 cRefreshesDisplayed=523 refresh driver tick=616315942708002 *** CheckMissedFrames time=364981398 mScreenRefreshCounterAtLastPresent=523 cRefreshesDisplayed=524 refresh driver tick=616315995055315 *** CheckMissedFrames time=364981414 mScreenRefreshCounterAtLastPresent=524 cRefreshesDisplayed=525 refresh driver tick=616316022073283 *** CheckMissedFrames time=364981431 mScreenRefreshCounterAtLastPresent=525 cRefreshesDisplayed=526 refresh driver tick=616316047402628 time is in ms. The big jump between 364980655 and 364981367 was a gap between smooth scrolling operations. You can see that during that time (712ms), cRefreshesDisplayed only increases by 2. Then in the next 16ms interval it jumps by 42 (which is close to 712ms at 60fps).
Attached patch fixed "late composition info" bug (obsolete) (deleted) — — Splinter Review
The secret is to NOT pass the correct HWND to DwmGetCompositionTimingInfo. This seems to work well in my limited testing. I'll clean this up and ask for review.
Attachment #786085 - Attachment is obsolete: true
Attached patch Part 1: Factor out DWM API glue into its own file (obsolete) (deleted) — — Splinter Review
Attachment #786700 - Flags: review?(avihpit)
Attachment #786701 - Flags: review? → review?(bas)
Comment on attachment 786701 [details] [diff] [review] Part 2: Detect missed screen refreshes and report them to the profiler Review of attachment 786701 [details] [diff] [review]: ----------------------------------------------------------------- You might also want to check out 'profiler_set_frame_number'. The visualization for this never got ported to the new timeline but it will show toggle-able delimiters for the frames using a checkbox in the lower level of the cleopatra UI. With a bit of UI work we can show useful frames information in the timeline.
How should I use it? It's completely undocumented. Maybe you should write the patch :-)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18) > How should I use it? It's completely undocumented. Maybe you should write > the patch :-) I'd like to add fancy support for frames to the profiler but it will likely take time that I don't have ATM. I'll likely get back to this when I continue to work on OMTC. Let's just take these patches for now with the intention of improving it later.
I tested a bit with the patch |fixed "late composition info" bug| and it consistently didn't detect all missed frames which I was able to notice visually. I'm not sure yet if has different detection rate compared to the earlier "even more correct patch" patch, or if I just didn't test the earlier patch enough. I'll re-test both patches hopefully later this week, and update this bug with the results.
Comment on attachment 786701 [details] [diff] [review] Part 2: Detect missed screen refreshes and report them to the profiler Review of attachment 786701 [details] [diff] [review]: ----------------------------------------------------------------- I'm not overly eager to add more stuff into layers to add other unrelated data we want to get. I'd sort of prefer a 'presentation callback' we can hang other logic off. But I suppose for now getting something in is useful enough to belay that. ::: gfx/layers/d3d10/LayerManagerD3D10.cpp @@ +63,5 @@ > cairo_user_data_key_t gKeyD3D10Texture; > > LayerManagerD3D10::LayerManagerD3D10(nsIWidget *aWidget) > + : mWidget(aWidget), > + mScreenRefreshCounterAtLastPresent(0) nit: I prefer the comma on this line as with other Layers code. But I don't feel too strongly about it. @@ +885,5 @@ > > +void > +LayerManagerD3D10::CheckForMissedFrames() > +{ > + if (!HasDwmGetCompositionTimingInfo() || !mWidget) { If we're not passing an HWND is there any reason to care if we don't have a widget here? It probably doesn't matter much though.
Attachment #786701 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #21) > I'm not overly eager to add more stuff into layers to add other unrelated > data we want to get. I'd sort of prefer a 'presentation callback' we can > hang other logic off. But I suppose for now getting something in is useful > enough to belay that. I think because this data is tightly coupled to platform-specific compositing details, it's ok to have it here. But I see that there is other stuff being crammed into LayerManager (tab-switching times, really?) that should be removed or moved to a callback. > nit: I prefer the comma on this line as with other Layers code. But I don't > feel too strongly about it. Fixed. I do feel strongly about it :-). > @@ +885,5 @@ > > > > +void > > +LayerManagerD3D10::CheckForMissedFrames() > > +{ > > + if (!HasDwmGetCompositionTimingInfo() || !mWidget) { > > If we're not passing an HWND is there any reason to care if we don't have a > widget here? It probably doesn't matter much though. Good point. Taken out. (Although it doesn't matter much.)
Attached patch Part 1: Move DWM APIs to WinUtils (deleted) — — Splinter Review
Attachment #786700 - Attachment is obsolete: true
Attachment #786700 - Flags: review?(avihpit)
Attachment #787319 - Flags: review?(avihpit)
Carrying forward Bas' r+.
Attachment #786701 - Attachment is obsolete: true
Attachment #787323 - Flags: review+
Comment on attachment 787321 [details] [diff] [review] Part 2: Make nsRefreshDriver use the WinUtils DWM APIs instead of loading the DLL itself Review of attachment 787321 [details] [diff] [review]: ----------------------------------------------------------------- Thank you for taking this ugliness away :) r=me
Attachment #787321 - Flags: review?(avihpit) → review+
Comment on attachment 787319 [details] [diff] [review] Part 1: Move DWM APIs to WinUtils Review of attachment 787319 [details] [diff] [review]: ----------------------------------------------------------------- r=me for moving the DWM stuff into WinUtils. However, I tested again the earlier patches: "fixed late composition..." (same detection "system" as the latest patch) which uses the desktop window (0) definitely detects less (visually confirmed) missed frames than the earlier patch "even more correct patch". I tested all combinations of window (from widget or the Desktop) x timinginfo.cRefreshesDisplayed/cRefresh/cRefreshFrameDisplayed/cDXRefresh; Of these 8 combinations, the widget window with either cRefreshesDisplayed or cRefreshFrameDisplayed worked well and the same (As far as I could judge). Other combinations either detected less, or detected nothing (i.e. the counter was always 0). I'm also not sure I noticed this "initial burst of detected jank" with window/cRefreshesDisplayed , but as far as detecting smooth animation glitches, I think this combination still gives the better value, even if it does have the initial false positives burst.
Attachment #787319 - Flags: review?(avihpit) → review+
Attached patch part 3 v2 (deleted) — — Splinter Review
Avi, can you please try this and verify that it works well for you? This basically uses *both* counters and reports a missed frame if either of them indicate a miss. This should work pretty well.
Attachment #787323 - Attachment is obsolete: true
Attachment #788789 - Flags: feedback?(avihpit)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28) > This basically uses *both* counters and reports a missed frame if either of > them indicate a miss. This should work pretty well. It does! :)
Comment on attachment 788789 [details] [diff] [review] part 3 v2 Review of attachment 788789 [details] [diff] [review]: ----------------------------------------------------------------- Oops. I was expecting the ni? flag to clear when I posted.. but it was on the patch. This patch works well.
Attachment #788789 - Flags: feedback?(avihpit) → feedback+
Comment on attachment 788789 [details] [diff] [review] part 3 v2 Review of attachment 788789 [details] [diff] [review]: ----------------------------------------------------------------- It's a bit more complicated now unfortunately.
Attachment #788789 - Flags: review?(bas)
Comment on attachment 788789 [details] [diff] [review] part 3 v2 Review of attachment 788789 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/Layers.h @@ +142,5 @@ > > public: > LayerManager() > + : mId(0) > + , mDestroyed(false) nit: Should we init the two new members here just for good measure? ::: gfx/layers/d3d10/LayerManagerD3D10.cpp @@ +888,5 @@ > } > > +void > +LayerManagerD3D10::CheckForMissedFrames() > +{ I'm sort of worried about the overhead this has - then again, once per frame isn't that often. @@ +898,5 @@ > + > + DWM_TIMING_INFO timingInfoWindow; > + timingInfoWindow.cbSize = sizeof(DWM_TIMING_INFO); > + HRESULT hrWindow = WinUtils::dwmGetCompositionTimingInfoPtr( > + (HWND)mWidget->GetNativeData(NS_NATIVE_WINDOW), &timingInfoWindow); nit: I'd prefer a little horizontal whitespacing here to make it easier to read.
Attachment #788789 - Flags: review?(bas) → review+
I created an configurable experimental vsync patch at bug 856427 comment 127, which includes parts 1, 2, 3v2 from this bug. One of its options is to call DwmFlush() on various places. On some of those, some/many missed frames are not detected. It's possible that the reduced detection rate is due to very tight timing between (optional) DwmFlush and the missed frames detection code. This is where the flushes were added (invoked when the number matches a configuration bit): > if (mTarget) { > PaintToTarget(); > } else { > + WinUtils::DwmFlushDebug(4); > + > mSwapChain->Present(0, 0); > > CheckForMissedFrames(); > + > + WinUtils::DwmFlushDebug(8); > } roc, would you still expect it to detect missed frames if either of those flushes are invoked?
(In reply to Bas Schouten (:bas.schouten) from comment #32) > nit: Should we init the two new members here just for good measure? Sure. > I'm sort of worried about the overhead this has - then again, once per frame > isn't that often. Yes I assume it will be OK. > nit: I'd prefer a little horizontal whitespacing here to make it easier to > read. Sure. (In reply to Avi Halachmi (:avih) from comment #33) > This is where the flushes were added (invoked when the number matches a > configuration bit): > > > if (mTarget) { > > PaintToTarget(); > > } else { > > + WinUtils::DwmFlushDebug(4); > > + > > mSwapChain->Present(0, 0); > > > > CheckForMissedFrames(); > > + > > + WinUtils::DwmFlushDebug(8); > > } > > roc, would you still expect it to detect missed frames if either of those > flushes are invoked? Yes. I don't understand this. Maybe moving the CheckForMissedFrames check helps?
Landing the cleanup patches: https://hg.mozilla.org/integration/mozilla-inbound/rev/3f00d003742c https://hg.mozilla.org/integration/mozilla-inbound/rev/533ac6b8ee38 Not landing the actual missed-frame detection patch for now due to comment #33.
Whiteboard: [leave open]
The color on the titlebar and tabbar are also different. I also notice a few other subtle differences.
(In reply to Gary [:streetwolf] from comment #36) > Tried this patch on inbound and it changes the titlebar under Windows 8 to > what it looked like in non-aero mode under Windows 7 > > Bad: > http://hg.mozilla.org/integration/mozilla-inbound/rev/533ac6b8ee38 > > http://imageshack.us/a/img706/9128/j6w.png This changeset is part 2 which should really be harmless. If anything, the regression is probably due to part 1 which moves the DWM stuff from nsUXThemeData to WinUtils, and maybe somehow manages to mess something up.
On a positive note the problems with Chrome corruption I was getting reported at https://bugzilla.mozilla.org/show_bug.cgi?id=837489#c125 hasn't happened since running with this patch. It still is too early to say it is 100% fixed but so far so good.
Comment on attachment 787319 [details] [diff] [review] Part 1: Move DWM APIs to WinUtils Review of attachment 787319 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/windows/nsToolkit.cpp @@ +77,5 @@ > nsToolkit::Startup(HMODULE hModule) > { > nsToolkit::mDllInstance = hModule; > nsUXThemeData::Initialize(); > + WinUtils::Initialize(); We might need to initialize WinUtils before nxUXThemeData.
(In reply to Avi Halachmi (:avih) from comment #41) > We might need to initialize WinUtils before nxUXThemeData. Confirmed that swapping the init order fixes the win8 theme issue.
(In reply to Avi Halachmi (:avih) from comment #42) > (In reply to Avi Halachmi (:avih) from comment #41) > > We might need to initialize WinUtils before nxUXThemeData. > > Confirmed that swapping the init order fixes the win8 theme issue. Hope that swapping things will still get rid of my corruption.
(In reply to Gary [:streetwolf] from comment #43) > Hope that swapping things will still get rid of my corruption. Which one would you prefer? ;) Even if it'll break, we'll still have a good pointer to start investigation.
Depends on: 905680
(In reply to Avi Halachmi (:avih) from comment #44) > (In reply to Gary [:streetwolf] from comment #43) > > Hope that swapping things will still get rid of my corruption. > > Which one would you prefer? ;) > Even if it'll break, we'll still have a good pointer to start investigation. I'm sure people would argue that the titlebar should be appropriate to the OS one is using. Do you have a Try Server I can test out the change you made to see if I still have corruption. So far I haven't had any.
I guess using the correct init order will bring the corruption back. Part 1 is a refarctoring which shouldn't have any functional effect. I also posted on bug 837489 pointing here, since it might help narrowing down the issue.
(In reply to Avi Halachmi (:avih) from comment #46) > I guess using the correct init order will bring the corruption back. Part 1 > is a refarctoring which shouldn't have any functional effect. I also posted > on bug 837489 pointing here, since it might help narrowing down the issue. But there is a lot more to part 2 than just the order of these two things. Maybe the ordering doesn't matter as it was the other things part 2 did? So far still no corruption.
Part 2 is also a technically refactor which shouldn't have any change in functionalily. I'd be very surprised if part 2 affects theme/aero or if it "fixes" this corruption.
Are you saying then that this patch should have nothing to do with fixing my corruption?
My corruption in the chrome is that I usually get boxes and bits of boxes in my tab strip, navigation bar, and bookmarks toolbar. What's interesting is that the color of these boxes is whatever the color is of the titlebar. Now that this patch gives me a Windows 7 sort of titlebar/caption bar this might explain the lack of corruption. Seems too coincidental.
(In reply to Gary [:streetwolf] from comment #49) > Are you saying then that this patch should have nothing to do with fixing my > corruption? Exactly. It _should_ have nothing to do with fixing the corruption, but due to a bug at the patch it apparently accidentally did. And now that we know what change affected this corruption, maybe the corruption could also be fixed faster. Also, you could test yourself if disabling Aero on your system "fixes" the corruption (and report back to bug 837489).
There is no Aero in Windows 8. I'm surprised that this patch was able to make a Windows 8 titlelbar/caption bar essentially a Windows 7 one in classic mode. Whatever the case I hope that someone can figure out what is fixing my corruption and provide a patch for it. The corruption bug doesn't seem to be on any developers radar.
(In reply to Gary [:streetwolf] from comment #52) > There is no Aero in Windows 8. I'm surprised that this patch was able to > make a Windows 8 titlelbar/caption bar essentially a Windows 7 one in > classic mode. > > Whatever the case I hope that someone can figure out what is fixing my > corruption and provide a patch for it. The corruption bug doesn't seem to > be on any developers radar. That's false. Aero exists in Windows 8. The Aero Glass theme was replaced with the a flatter Metro Theme.
Maybe so but there is no way to turn it off and there isn't a classic mode. All I can see are High Contrast Themes which I wouldn't use at all.
Gary, this discussion doesn't belong here. I've posted the relevant info already at bug 837489 in the hope that it might help solve that issue. If you need further assistance or got some more info, please use the correct bug. Also, DWM ("Aero") can be disabled on windows 8, but it's not trivial. Use google if you really want to try it. But then again, even if you don't manage to disable it yourself, you don't really have to as it was just a suggestion to confirm this hypothesis. The patches which were posted here _should_ have no effect on anything other than if they contain bugs, such as the uninitialized Theme stuff due to incorrect initialization order. Please use bug 837489 if you want to continue this discussion. Thanks.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #23) > Created attachment 787319 [details] [diff] [review] > Part 1: Move DWM APIs to WinUtils I think this might have broken glass rendering on Windows (bug 906089). I am in the middle of a clobber so I can't check right away, but I think it might be due to this line - http://mxr.mozilla.org/mozilla-central/source/widget/windows/WinUtils.cpp#79 MSDN states no such thing. I think roc might have been looking at the wrong msdn page. :)
(In reply to Jim Mathies [:jimm] from comment #56) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #23) > > Created attachment 787319 [details] [diff] [review] > > Part 1: Move DWM APIs to WinUtils > > I think this might have broken glass rendering on Windows (bug 906089). I am > in the middle of a clobber so I can't check right away, but I think it might > be due to this line - > > http://mxr.mozilla.org/mozilla-central/source/widget/windows/WinUtils.cpp#79 > > MSDN states no such thing. I think roc might have been looking at the wrong > msdn page. :) While I don't know if this is correct or not, I can confirm that the only MDSN pages stating that are for some oddball "compatibility with 16-bit Windows" functions. Should definitely get a second look.
I found that >31 comment on MSDN page for LoadModule. For LoadLibrary its: If the function succeeds, the return value is a handle to the module. If the function fails, the return value is NULL. To get extended error information, call GetLastError.
The LoadLibrary return value should be fixed, but the cause for the broken glass issue was comment 41 (init order), which got fixed in bug 905680.
Depends on: 906350
(In reply to khagaroth from comment #58) > I found that >31 comment on MSDN page for LoadModule. > > For LoadLibrary its: > > If the function succeeds, the return value is a handle to the module. > > If the function fails, the return value is NULL. To get extended error > information, call GetLastError. I knew that sounded familiar. While this turned out to not be the cause of the glass problem we should still get it touched up just in case. Filed bug 906350.
Assignee: roc → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: