Use counters are not recorded properly when a tab is closed
Categories
(Core :: DOM: Core & HTML, defect, P2)
Tracking
()
People
(Reporter: bzbarsky, Assigned: edgar)
References
(Blocks 1 open bug)
Details
Attachments
(6 files, 1 obsolete file)
STEPS TO REPRODUCE:
- Start a nightly from today or later with a clean profile.
- Open a tab with http://web.mit.edu/bzbarsky/test-use-counter.html in it. This just exercises a codepath I know has a usecounter in today's build.
- Open a second tab with
about:memory
in it. - Close the tab opened in step 2.
- Minimize memory usage in
about:memory
. - Open
about:telemetry
. - Type
USE_COUNTER
in the search field at top right and hit enter.
EXPECTED RESULTS: There is a histogram for the USE_COUNTER2_HTMLDOCUMENTNAMEDGETTERHIT_PAGE
counter (and one for USE_COUNTER2_HTMLDOCUMENTNAMEDGETTERHIT_DOCUMENT
).
ACTUAL RESULTS: There is no such histogram.
ADDITIONAL INFORMATION:
- If I load a different page from http://web.mit.edu between steps 2 and 4, that does not change things.
- If I load a different page from http://web.mit.edu between steps 2 and 4, and switch the order of steps 4 and 5 (so memory minimization happens before tab closing and gets to destroy the test document when it happens), then I do see the histograms.
We collect the use counters in ~Document
. It seems plausible that when we close the tab we just kill the process without destroying its documents and therefore never collect use counters for anything that is loaded or in bfcache at that point?
This seems like it would bias our data, esp. for sites that claim you should close the tab after using them (e.g. bank sites).
Reporter | ||
Comment 1•5 years ago
|
||
[Tracking Requested - why for this release]: We are not getting correct data from our use counter telemetry.
Comment 2•5 years ago
|
||
I'd argue that having the code in ~Document
is wrong since destructors shouldn't have side effects; they should just handle memory management (and may be deferred or skipped for reasons of memory management). Instead, it should live in code that runs at a known time as we navigate away from the document.
Not sure if you're taking this on bz, but if not, Hsin-Yi, can you help find an owner?
Reporter | ||
Comment 4•5 years ago
|
||
I don't plan to fix this; I just ran into it when trying to understand the data from a use counter I had added that wasn't making sense to me.
Comment 5•5 years ago
|
||
(In reply to Liz Henry (:lizzard PTO till Aug 10 n-i to :pascald instead) from comment #3)
Not sure if you're taking this on bz, but if not, Hsin-Yi, can you help find an owner?
Will work on it.
Comment 6•5 years ago
|
||
Thanks Edgar!
Assignee | ||
Comment 7•5 years ago
|
||
We do destruct Document
and try to collect the data by calling Telemetry::Accumulate when we close the tab, it seems that Telemetry doesn't successfully report the data collected during XPCOM_shutdown procedure to parent process?
Assignee | ||
Comment 8•5 years ago
|
||
Telemetry schedules content data reporting in a IPC timer, but the content process is already in shutdown procedure, so the timeout handler doesn't have chance to be executed.
Reporter | ||
Comment 9•5 years ago
|
||
We do destruct Document and try to collect the data by calling Telemetry::Accumulate when we close the tab
Fwiw, I wasn't seeing us do that (via printfs added to ~Document and the like). It's possible that we have multiple issues going on here...
Assignee | ||
Comment 10•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #9)
We do destruct Document and try to collect the data by calling Telemetry::Accumulate when we close the tab
Fwiw, I wasn't seeing us do that (via printfs added to ~Document and the like). It's possible that we have multiple issues going on here...
The callstack for my case is something like,
#0 0x00007fdd35a2c2f0 in mozilla::dom::Document::ReportUseCounters(mozilla::dom::Document::UseCounterReportKind) (this=0x7fdd25ad9000, aKind=mozilla::dom::Document::UseCounterReportKind::eDefault) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:14465
#1 0x00007fdd35a2b061 in mozilla::dom::Document::~Document() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:1767
#2 0x00007fdd3786ec45 in nsHTMLDocument::~nsHTMLDocument() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/html/nsHTMLDocument.cpp:158
#3 0x00007fdd3786ec69 in nsHTMLDocument::~nsHTMLDocument() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/html/nsHTMLDocument.cpp:158
#4 0x00007fdd35a2d12e in mozilla::dom::Document::DeleteCycleCollectable() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:1910
#5 0x00007fdd35aca715 in mozilla::dom::Document::cycleCollection::DeleteCycleCollectable(void*) (this=0x7fdd3fc907d0 <mozilla::dom::Document::_cycleCollectorGlobal>, p=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.h:496
#6 0x00007fdd333aca7c in SnowWhiteKiller::MaybeKillObject(SnowWhiteKiller::SnowWhiteObject&) (this=0x7ffcacc85550, aObject=...) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2429
#7 0x00007fdd333a3408 in SnowWhiteKiller::~SnowWhiteKiller() (this=0x7ffcacc85550) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2416
#8 0x00007fdd33393633 in nsCycleCollector::FreeSnowWhite(bool) (this=0x7fdd49e88ee0, aUntilNoSWInPurpleBuffer=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2609
#9 0x00007fdd33396c73 in nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (this=0x7fdd49e88ee0, aCCType=ShutdownCC, aManualListener=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3584
#10 0x00007fdd333965d7 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (this=0x7fdd49e88ee0, aCCType=ShutdownCC, aBudget=..., aManualListener=0x0, aPreferShorterSlices=false) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3413
#11 0x00007fdd33396371 in nsCycleCollector::ShutdownCollect() (this=0x7fdd49e88ee0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3357
#12 0x00007fdd333975ce in nsCycleCollector::Shutdown(bool) (this=0x7fdd49e88ee0, aDoCollect=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3645
#13 0x00007fdd33399204 in nsCycleCollector_shutdown(bool) (aDoCollect=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3999
#14 0x00007fdd3358a184 in mozilla::ShutdownXPCOM(nsIServiceManager*) (aServMgr=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/build/XPCOMInit.cpp:706
#15 0x00007fdd33589d25 in NS_ShutdownXPCOM(nsIServiceManager*) (aServMgr=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/build/XPCOMInit.cpp:564
#16 0x00007fdd3b69ae2c in XRE_TermEmbedding() () at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:223
#17 0x00007fdd34132c92 in mozilla::ipc::ScopedXREEmbed::Stop() (this=0x7fdd49eb2978) at /home/edgar/Workspace/mercurial/mozilla-central/ipc/glue/ScopedXREEmbed.cpp:90
#18 0x00007fdd384e566c in mozilla::dom::ContentProcess::CleanUp() (this=0x7fdd49eb2000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/ipc/ContentProcess.cpp:211
#19 0x00007fdd3b69baf3 in XRE_InitChildProcess(int, char**, XREChildData const*) (aArgc=13, aArgv=0x7ffcacc86048, aChildData=0x7ffcacc85ec0) at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:758
#20 0x00007fdd3b6a6e27 in mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) (this=0x7fdd49e026c0, argc=15, argv=0x7ffcacc86048, aChildData=0x7ffcacc85ec0) at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/Bootstrap.cpp:67
#21 0x00005575466b418a in content_process_main(mozilla::Bootstrap*, int, char**) (bootstrap=0x7fdd49e026c0, argc=15, argv=0x7ffcacc86048) at /home/edgar/Workspace/mercurial/mozilla-central/browser/app/../../ipc/contentproc/plugin-container.cpp:56
#22 0x00005575466b42e9 in main(int, char**, char**) (argc=16, argv=0x7ffcacc86048, envp=0x7ffcacc860d0) at /home/edgar/Workspace/mercurial/mozilla-central/browser/app/nsBrowserApp.cpp:267
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #0)
- Open a tab with http://web.mit.edu/bzbarsky/test-use-counter.html in it. This just exercises a codepath I know has a usecounter in today's
BTW, the link in comment #0 seems broken, I got a "Page Not Found", so I try to reproduce the bug with following script.
<!DOCTYPE html>
<form name="x"></form>
<button id="btn">click me</button>
<pre id="result"></pre>
<script>
function w(msg) {
result.textContent += msg + '\n';
}
btn.addEventListener("click", function(e) {
w(document.x);
});
</script>
Assignee | ||
Comment 12•5 years ago
|
||
Reporter | ||
Comment 13•5 years ago
|
||
BTW, the link in comment #0 seems broken
Er, sorry, should be http://web.mit.edu/bzbarsky/www/test-use-counter.html
The callstack for my case is something like,
That's in a debug build, right? It's coming through ContentProcess::CleanUp
which is only called in debug build per its documentation. In opt builds, at least according to the documentation and code inspection of ContentChild::ActorDestroy
, we land at ProcessChild::QuickExit
and kill the process without doing the CC pass that you're seeing collect the document.
So it does sound like we have two possible issues here:
- In an opt build the "report the use counters" code is not even reached.
- Even if it were reached (as in debug builds), it might not report them successfully due to being unable to send the IPC message.
Right?
Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #13)
That's in a debug build, right?
Yes.
- In an opt build the "report the use counters" code is not even reached.
- Even if it were reached (as in debug builds), it might not report them successfully due to being unable to send the IPC message.
Right?
Exactly. Looks like we need to find a proper time to report data for Document, maybe in Document::OnPageHide and it requires change some logic to ensure that we don't report data duplicately.
Assignee | ||
Comment 15•5 years ago
|
||
Assignee | ||
Comment 16•5 years ago
|
||
Assignee | ||
Comment 17•5 years ago
|
||
Recording use counters in ~Document doesn't work as expected for some cases, for instance,
user closes tab right after using them:
- In opt build, Document destructor isn't even reached given that we just kill the process
without doing cleanup; - In debug build, we reach the recording code, but it doesn't report use counters successfully
due to it is unable to send the IPC message.
This patch moves the recording code at the time that user navigates away from current page,
and since user might also navigate back same document from bfcache, in order to not report
duplicated data, we introduce new flags to keep track of whether the use counters need to
be recorded.
Assignee | ||
Comment 18•5 years ago
|
||
Reporter | ||
Comment 19•5 years ago
|
||
I'm trying to understand the proposed bfcache behavior. It would be pretty helpful if the commit message described the behavior being aimed for, not just the means used to accomplish it...
Assignee | ||
Comment 20•5 years ago
|
||
(In reply to Edgar Chen [:edgar] from comment #17)
Created attachment 9084630 [details]
Bug 1568950 - Part 1: Record use counters when user navigates away from current page;
This also affects the Telemetry::*_DOCUMENTS_DESTROYED
, it is still recorded only once per document or page, but it is not recorded on document destroy any more, maybe we should rename they. However, I am not sure if renaming it will break the usage of use counters data. :froydnj, would you mind shed some light here? Thank you.
Assignee | ||
Comment 21•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #19)
I'm trying to understand the proposed bfcache behavior. It would be pretty helpful if the commit message described the behavior being aimed for, not just the means used to accomplish it...
So the proposed behavior is something that, if a user
- navigate to PAGE_X, and PAGE_X use FEATURE_A.
- navigate to PAGE_Y
- navigate back to PAGE_X from history.
- PAGE_X now use another FEATURE_B.
In this case, we should record the usage of FEATURE_B, but don't want to record FEATURE_A twice, does this make sense?
(I will revise commit message a bit, thanks)
Reporter | ||
Comment 22•5 years ago
|
||
So the proposed behavior is that every time we get pagehide for a page we report use counters that are newly set from the time the document was created or the last time we reported, whichever happened last? That makes sense, I think, as long as the "number of pageloads" and "number of document creations" counters that I think telemetry uses as the denominator is not affected by this change.
Comment 23•5 years ago
|
||
(In reply to Edgar Chen [:edgar] from comment #20)
(In reply to Edgar Chen [:edgar] from comment #17)
Created attachment 9084630 [details]
Bug 1568950 - Part 1: Record use counters when user navigates away from current page;This also affects the
Telemetry::*_DOCUMENTS_DESTROYED
, it is still recorded only once per document or page, but it is not recorded on document destroy any more, maybe we should rename they. However, I am not sure if renaming it will break the usage of use counters data. :froydnj, would you mind shed some light here? Thank you.
Renaming it to something else means you just have discontinuous data: you have one histogram before the change and one, different, histogram after the change.
I think as long as its still fundamentally recorded once per document, you should be fine keeping the name.
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #23)
Renaming it to something else means you just have discontinuous data: you have one histogram before the change and one, different, histogram after the change.
I think as long as its still fundamentally recorded once per document, you should be fine keeping the name.
That is what I am worried about. I will keep the name and revise the comment around it a bit. Thanks.
Assignee | ||
Comment 25•5 years ago
|
||
(In reply to Edgar Chen [:edgar] from comment #14)
Looks like we need to find a proper time to report data for Document, maybe in Document::OnPageHide
I just realized this still doesn't work properly for the svg document which is from <img> element, in such setup the pagehide of svg document is called from https://searchfox.org/mozilla-central/rev/3a61fb322f74a0396878468e50e4f4e97e369825/image/SVGDocumentWrapper.cpp#54 which still have same issue as comment #13. We need to have a specifial handle for this case.
Comment 26•5 years ago
|
||
Will this also fix bug 1413266?
Updated•5 years ago
|
Assignee | ||
Comment 27•5 years ago
|
||
Assignee | ||
Comment 28•5 years ago
|
||
Remove nsIDOMWindowUtils.forceUseCounterFlush calls give that we won't manually
report use counters in real case.
Assignee | ||
Comment 29•5 years ago
|
||
Assignee | ||
Comment 30•5 years ago
|
||
Assignee | ||
Comment 31•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 32•5 years ago
|
||
Comment 33•5 years ago
|
||
Assignee | ||
Comment 34•5 years ago
|
||
(In reply to Tim Smith π¨βπ¬ [:tdsmith] from comment #26)
Will this also fix bug 1413266?
I am not sure if this bug could also fix it, the scenario of bug 1413266 is different from this bug.
But now we report use counters earlier, I would expect bug 1413266 could benefit from this a bit.
Comment 35•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/81f72abb79ff
https://hg.mozilla.org/mozilla-central/rev/57d7a634ef8f
https://hg.mozilla.org/mozilla-central/rev/0521dcbe7c20
https://hg.mozilla.org/mozilla-central/rev/1c1c4ca0bc14
https://hg.mozilla.org/mozilla-central/rev/2e365eaea75b
Updated•5 years ago
|
Comment 36•5 years ago
|
||
I have reproduced this issue using Firefox 70.0a1 (2019.07.25) on Win 10 x64.
I can confirm this issue is fixed, I verified using Firefox 70.0b8 Win 10 x64, Ubuntu 18.04 x64 and macOS 10.13.6.
Updated•5 years ago
|
Description
•