Open Bug 1299117 Opened 8 years ago Updated 2 years ago

Implement Time to First Meaningful Paint (TMP) for Telemetry/Profiler

Categories

(Core :: Layout, defect, P3)

All
Unspecified
defect

Tracking

()

ASSIGNED
Performance Impact none

People

(Reporter: Dominik, Assigned: mstange)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

First Meaningful Visual is the time when a page's primary content appeared on the screen. A current alternative definition is also: Time from navigation to the most significant visual change to the screen that average users can observe. The PRD for the whole Phase 1 of the Progressive Web Metrics program is here: https://docs.google.com/document/d/1RLqGVNL43B4oxovgR2DqmElXcBvJHJekT1bcfYooEbk/edit#
Blocks: 1298380
Summary: Implement Time to First Meaningful Visual (TTFMVisual) → Implement Time to First Meaningful Paint (TMP)
Clarified the minimal scope for this bug, which is only telemetry and profiler. Developer facing numbers depend on validation and spec stabilizing.
Keywords: dev-doc-needed
Summary: Implement Time to First Meaningful Paint (TMP) → Implement Time to First Meaningful Paint (TMP) for Telemetry/Profiler
For reference, Chrome's UMA metric tracking bug: https://bugs.chromium.org/p/chromium/issues/detail?id=632081 Also interesting, an issue with FB's chat using long-polling that causes the network to never fully idle: https://bugs.chromium.org/p/chromium/issues/detail?id=658276
Given we have assigned most bugs now and that Dan and you thought it mapped to Gecko's internal, can we get this bug on track?
Flags: needinfo?(bugs)
Our current design for these "Time to First X" metrics is to compute the sum of the elapsed times for a list of finer-grained events that occur prior to the X event. We compare that sum to the actual stopTime - startTime captured for the X event, and account for differences introduced by parallelism. This allows us to capture comparable performance data in all the cases we care about: non-e10s, e10s/APZ, and Quantum. Markus owns the design and implementation for us here.
Assignee: nobody → mstange
Flags: needinfo?(bugs)
This is one of the key page load metrics for Quantum to optimize. Having this for Firefox 53 is critical so we can start benchmarking.
Kanru, one of the bugs I mentioned, this is important and depends on some layout understanding. :dholbert and :mstange offered mentoring.
Flags: needinfo?(kchen)
Flags: needinfo?(wpan)
Flags: needinfo?(tlee)
Flags: needinfo?(kchen)
I'm working on a patch.
Status: NEW → ASSIGNED
Flags: needinfo?(wpan)
Flags: needinfo?(tlee)
Attachment #8829499 - Flags: review?(bkelly)
mstange: for validating the measurement, how could we get it for a page load locally when not in the profile?
Flags: needinfo?(mstange)
I intend to add a patch that adds the profiler markers, in this bug. I just wanted to upload what I had. I'm thinking one "fmp candidate" marker per candidate, and then a "first meaningful paint" marker which contains a list of the candidates that were considered for that marker.
Flags: needinfo?(mstange)
A random idea. The changes made by Markus are based on the number of frames on the frame tree. I am wondering if the number of display items is a better parameter than frames? During loading a document, it is very often that JS code create a bunch of hidden content on the DOM tree. display items are what is going to paint. They are more close to user's experience.
I like that idea if creation of display items reflects progress for above the fold layout. I am not familiar enough to tell how stable the number of display items is and how close it correlates to page load progress.
The number of display items is dependent on the size we choose for the displayport. Counting display items could be a better heuristic than counting frames. We definitely need to play around with different heuristics and look at profiles to determine which one works best. Harald: How do you feel about leaving out the telemetry part of the patch until we've had a chance to test the current patches for a while? Also, is the data representation I'm using for the profiler marker detailed enough for your purposes? The current patch just adds one marker, at network idle, with the timestamp of the FMP, and an additional data structure containing the networkIdle timestamp and an array containing the timestamp and layout significance score of each candidate paint. Thinker: Would you like to try out different heuristics once this patch has landed and we've had a chance to try it out? I'd be happy to hand over the remaining work.
Attachment #8829499 - Flags: review?(bkelly)
Attachment #8837874 - Flags: review?(hkirschner)
Comment on attachment 8837874 [details] Bug 1299117 - Add a profiler marker for first meaningful paint that carries a list of candidates. https://reviewboard.mozilla.org/r/112870/#review114686 LGTM, will be exciting to have as visual milestones in the frontend
Attachment #8837874 - Flags: review?(hkirschner) → review+
Attachment #8829499 - Flags: review?(bkelly) → review?(bugs)
Comment on attachment 8829499 [details] Bug 1299117 - Measure time to first meaningful paint and report it to Telemetry. https://reviewboard.mozilla.org/r/106582/#review115092 We do need to be able to handle background loads to have any useful metrics, and based on the comments the patch isn't doing that. Necko folks might have some ideas for this. ::: layout/painting/FirstMeaningfulPaintDetector.h:33 (Diff revision 4) > + * start and "network stable". > + * The layout significance of a paint is a number that's based on the scroll > + * height of the document's root scrollable frame during that paint and on the > + * number of layout objects that have been created for this document since the > + * last paint. > + * The network stable time is defined as the end of the first 500ms interval What happens if there never is such interval? I assume that may happen quite often with certain web pages. Please add a comment. ::: layout/painting/FirstMeaningfulPaintDetector.h:79 (Diff revision 4) > + void NetworkIdleTimerFired(); > + > + /** > + * Returns whether the load group of mPresContext's document has any pending > + * (non-background) requests. > + */ This looks wrong. Why non-background? XHR is using by defaul background request, and XHR often brings in new content to the page. Yet some XHRs are in non-background to get progress events. Also, what happens with websocket? And EventSource also uses LOAD_BACKGROUND. ::: layout/painting/FirstMeaningfulPaintDetector.h:82 (Diff revision 4) > + * Returns whether the load group of mPresContext's document has any pending > + * (non-background) requests. > + */ > + bool HasPendingRequest(); > + > + nsPresContext* mPresContext; This needs a comment why this is always valid pointer. ::: layout/painting/FirstMeaningfulPaintDetector.cpp:52 (Diff revision 4) > + mDocShell = static_cast<nsDocShell*>(mPresContext->GetDocShell()); > + nsCOMPtr<nsIWebProgress> webProgress = do_GetInterface(static_cast<nsIDocShell*>(mDocShell)); > + if (webProgress) { > + mListener = new FMPDListener(*this); > + nsresult rv = webProgress->AddProgressListener(mListener, > + nsIWebProgress::NOTIFY_STATE_REQUEST | Why only _REQUEST?
Attachment #8829499 - Flags: review?(bugs) → review-
Markus, are you still working on this Time To First Meaningful Paint telemetry probe?
Flags: needinfo?(mstange)
Whiteboard: [qf:p3]
P1, reported as key metric.
Whiteboard: [qf:p3] → [qf:p1]
Attachment #8837874 - Attachment is obsolete: true
(In reply to Chris Peterson [:cpeterson] from comment #21) > Markus, are you still working on this Time To First Meaningful Paint > telemetry probe? Sorry for the delay here. I've picked it back up.
Flags: needinfo?(mstange)
Comment on attachment 8829499 [details] Bug 1299117 - Measure time to first meaningful paint and report it to Telemetry. https://reviewboard.mozilla.org/r/106582/#review115092 > This looks wrong. Why non-background? XHR is using by defaul background request, and XHR often brings in new content to the page. > Yet some XHRs are in non-background to get progress events. > > Also, what happens with websocket? > > And EventSource also uses LOAD_BACKGROUND. I've changed it to return true if there is any request, not just any non-background request. I haven't done anything about WebSocket. My guess is that it's not relevant to the majority of page loads that we care about, but we can still add detection for it if we deem it useful. (Ideally that would be done by someone other than me, though.) > Why only _REQUEST? Because we react to these notifications by checking whether there is a request in the document's load group. So we only need to be notified if there is a chance that the answer to that question has changed.
We are looking into this measurement to be reported in webpagetest. As one possible way of exposing it to the WPT addon, could we add a pref that exposes TMP to the Navigation Timing API, from which the addon can read it? ni? Markus for input on how to expose this ni? bsmedberg who looks into WPT and benchmarking TMP
Flags: needinfo?(mstange)
Flags: needinfo?(benjamin)
(In reply to :Harald Kirschner :digitarald from comment #27) > We are looking into this measurement to be reported in webpagetest. As one > possible way of exposing it to the WPT addon, could we add a pref that > exposes TMP to the Navigation Timing API, from which the addon can read it? Sure. We can add it to PerformanceTiming. We don't seem to have PerformanceNavigationTiming, which https://github.com/tdresser/time-to-first-meaningful-paint suggests to add this to.
Flags: needinfo?(mstange)
(In reply to :Harald Kirschner :digitarald from comment #27) > We are looking into this measurement to be reported in webpagetest. As one > possible way of exposing it to the WPT addon, could we add a pref that > exposes TMP to the Navigation Timing API, from which the addon can read it? I've added a patch which implements performance.timing.firstMeaningfulPaint, behind the pref dom.performance.first-meaningful-paint.enabled. Setting that pref to true exposes this information to web pages, not just to the WPT add-on. Since it's off by default, I think that's ok.
Comment on attachment 8829499 [details] Bug 1299117 - Measure time to first meaningful paint and report it to Telemetry. https://reviewboard.mozilla.org/r/106582/#review129050 Could you run this in a debug build to check whether the assertion in OnStatusChange fails. (since I think it will). And please test also that XHRs end up blocking reporting, since that is what we're trying to do here I think. (XHRs are by default using LOAD_BACKGROUND) ::: layout/painting/FirstMeaningfulPaintDetector.h:50 (Diff revision 7) > + * The constructor will set a progress listener on the docshell that > + * contains this prescontext, so that network stable time can be detected. > + * We store a pointer to the prescontext and will notify the prescontext's > + * document's nsDOMNavigationTiming once we know the first meaningful > + * paint's timestamp. > + * The pres context is expected to outlast this object. In other words, this Nit, 'prescontext' I think. That is how you've written it elsewhere. ::: layout/painting/FirstMeaningfulPaintDetector.h:51 (Diff revision 7) > + * contains this prescontext, so that network stable time can be detected. > + * We store a pointer to the prescontext and will notify the prescontext's > + * document's nsDOMNavigationTiming once we know the first meaningful > + * paint's timestamp. > + * The pres context is expected to outlast this object. In other words, this > + * object has to be destroyed before the pres context is destroyed. ditto ::: layout/painting/FirstMeaningfulPaintDetector.cpp:55 (Diff revision 7) > + nsCOMPtr<nsIWebProgress> webProgress = do_GetInterface(static_cast<nsIDocShell*>(mDocShell)); > + if (webProgress) { > + mListener = new FMPDListener(*this); > + nsresult rv = webProgress->AddProgressListener(mListener, > + nsIWebProgress::NOTIFY_STATE_REQUEST | > + nsIWebProgress::NOTIFY_STATUS); Why nsIWebProgress::NOTIFY_STATUS? Especially given OnStatusChange has MOZ_ASSERT(false, "Unexpected notification.");
Attachment #8829499 - Flags: review?(bugs)
Flags: needinfo?(benjamin)
I confirmed locally that pending XHRs cause HasPendingRequest() to return false. A green-ish try push is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eff2f96b6a4849f83085474f8e9245a2cd88c9a8
Comment on attachment 8854214 [details] Bug 1299117 - Expose performance.timing.firstMeaningfulPaint behind a pref. https://reviewboard.mozilla.org/r/126164/#review132510 What is this about? Which spec defines the .webidl part?
Comment on attachment 8854214 [details] Bug 1299117 - Expose performance.timing.firstMeaningfulPaint behind a pref. https://reviewboard.mozilla.org/r/126164/#review132516 ::: modules/libpref/init/all.js:182 (Diff revision 2) > pref("dom.performance.enable_notify_performance_timing", false); > > // Enable Permission API's .revoke() method > pref("dom.permissions.revoke.enable", false); > > +// Whether to expose firstMeaningfulPaint on PerformanceTiming I would probably leave this out. bool prefs should default to false.
Attachment #8854214 - Flags: review?(bugs) → review+
Comment on attachment 8857725 [details] Bug 1299117 - Add a profiler marker for first meaningful paint that carries a list of candidates. https://reviewboard.mozilla.org/r/129666/#review132520 ::: layout/painting/FirstMeaningfulPaintDetector.cpp:142 (Diff revision 1) > if (timing) { > timing->NotifyFirstMeaningfulPaintForRootContentDocument(mCandidateTimeStamp); > } > + PROFILER_MARKER_PAYLOAD( > + "FirstMeaningfulPaint", > + new FirstMeaningfulPaintMarkerPayload(mCandidateTimeStamp, Hrm, allocating/freeing memory even if profiler isn't running :/ Is that a generic issue? Could we first check if profiler is running and only if it is, se PROFILER_MARKER_PAYLOAD
Attachment #8857725 - Flags: review?(bugs) → review+
Comment on attachment 8829499 [details] Bug 1299117 - Measure time to first meaningful paint and report it to Telemetry. https://reviewboard.mozilla.org/r/106582/#review132534 I think I'd like to see some tweak to the timer usage still. Not sure when to cancel it though. ::: layout/base/nsPresContext.cpp:1009 (Diff revision 8) > doc->AddCharSetObserver(this); > UpdateCharSet(doc->GetDocumentCharacterSet()); > } > + > + if (IsRootContentDocument()) { > + mFirstMeaningfulPaintDetector = MakeUnique<FirstMeaningfulPaintDetector>(this); We want to create the object only if telemetry is enabled, right? Of if the pref to enable fmp in .webidl is set. ::: layout/painting/FirstMeaningfulPaintDetector.h:36 (Diff revision 8) > + * number of layout objects that have been created for this document since the > + * last paint. > + * The network stable time is defined as the end of the first 500ms interval > + * during a document's life cycle during which there were no pending network > + * requests for this document. If there is no such interval, then no first > + * meaningful paint will be reported. Do we end up creating the timer still all the time? Since we shouldn't ::: layout/painting/FirstMeaningfulPaintDetector.cpp:114 (Diff revision 8) > +{ > + if (mDone || HasPendingRequest()) { > + return; > + } > + > + if (mNetworkStableTimer) { So at some point we really should just give up with the timer. I'm worried about having more and more all-the-time running timers consuming CPU and causing wakeups
Attachment #8829499 - Flags: review?(bugs) → review-
Comment on attachment 8857725 [details] Bug 1299117 - Add a profiler marker for first meaningful paint that carries a list of candidates. https://reviewboard.mozilla.org/r/129666/#review132520 > Hrm, allocating/freeing memory even if profiler isn't running :/ Is that a generic issue? > > Could we first check if profiler is running and only if it is, se PROFILER_MARKER_PAYLOAD I went back and forth on this when I wrote it. I had it wrapped in an "if (profiler_is_active())" first, but then I decided that one small allocation per pageload shouldn't be much of an issue and removed the if. But having the check doesn't really hurt so I'll re-add it.
Comment on attachment 8829499 [details] Bug 1299117 - Measure time to first meaningful paint and report it to Telemetry. https://reviewboard.mozilla.org/r/106582/#review132534 > We want to create the object only if telemetry is enabled, right? Of if the pref to enable fmp in .webidl is set. Good point. I'll create a static function FirstMeaningfulPaintDetector::CreateIfNeeded() that does those checks. > Do we end up creating the timer still all the time? Since we shouldn't We do not schedule the timer as long as there is still at least one pending request. We only schedule the timer once the number of pending requests has reached zero. The sequence is: [Wait for requests to be zero] -> [schedule timer] -> [wait for 500ms] -> [check if number is still zero, if yes: finish] So the timer is only rescheduled multiple times if the page keeps jumping between having no pending network requests and having at least one pending network request. And if the page really does that, then I expect those frequent new network requests to cause a lot more wakeups and CPU usage than a 500ms timer would.
We aren't going to be using this metric for QF as far as I know, so I don't think this needs to be P1 any more. Putting it back into the triage pool.
Whiteboard: [qf:p1] → [qf]
Whiteboard: [qf] → [qf-]
Priority: -- → P3

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Performance Impact: --- → -
Whiteboard: [qf-]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: