Closed
Bug 828153
Opened 12 years ago
Closed 9 years ago
Better Firefox Health Report integration with Talos
Categories
(Firefox Health Report Graveyard :: Client: Desktop, defect, P4)
Firefox Health Report Graveyard
Client: Desktop
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gps, Unassigned)
References
Details
Attachments
(1 file)
(deleted),
patch
|
k0scist
:
review+
taras.mozilla
:
feedback-
|
Details | Diff | Splinter Review |
Firefox Health Report doesn't currently integrate well with Talos.
For starters, it appears that the database is initialized during actual tests, not during the special first application run Talos uses to initialize the profile. This is because FHR initialization is on a timer and this timer may not fire during the initial profile set-up application run. For this first app run, Talos has a special page, getInfo.html. We should initialize FHR from this page (assuming we want FHR running during Talos tests).
During many Talos tests, FHR appears to be performing its initialization routine in the background. This could skew Talos results and add noise. While we likely want FHR enabled for some Talos tests (especially those that measure startup and shutdown time), we could possibly do without it for others (e.g. scrolling perf). Or, we could at least delay running these Talos tests until FHR is fully initialized.
I'm not an expert on Talos and the tests therein. But, I'm pretty sure the current situation is far from ideal. I defer to someone else for a recommended course of action.
Comment 1•12 years ago
|
||
how do we turn it on and off?
Reporter | ||
Comment 2•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #1)
> how do we turn it on and off?
The healthreport.service.enabled pref can be used to short-circuit the service initialization on startup. It's my understanding we'd need to update PerfConfigurator.py and push out a new Talos. This could take a few days.
Comment 3•12 years ago
|
||
would this be:
on:
healthreport.service.enabled=1
off:
healthreport.service.enabled=0
is this on by default? has this landed yet? We can easily land a patch and deploy a talos, the process is:
1) write/review patch
2) run on try for select platforms/tests
3) upload new talos.zip to build network
4) deploy to talos.json in the tree
This usually takes 1-2 days. If we were to do different things with the health report in getInfo.html or turn it on/off per test, it could take a couple more days.
Reporter | ||
Comment 4•12 years ago
|
||
healthreport.service.enabled=false would be the way to turn it off.
This hasn't landed in m-c yet. It is running on services-central.
I think we should preemptively land Talos support for easily disabling FHR globally via this pref.
We could follow that up with new bug(s) for getInfo.html or per-test/suite options.
Comment 5•12 years ago
|
||
Reporter | ||
Comment 6•12 years ago
|
||
Vladan: Would you mind posting your Talos + FHR findings in this bug?
Flags: needinfo?(vdjeric)
Comment 7•12 years ago
|
||
Comment on attachment 699972 [details] [diff] [review]
disable health reporting in talos (1.0)
I wish we kept these preferences in order. Also, it might be nice to have a comment as to why we set this preference.
Attachment #699972 -
Flags: review?(jhammel) → review+
Comment 8•12 years ago
|
||
Comment on attachment 699972 [details] [diff] [review]
disable health reporting in talos (1.0)
Testing without FHR is dangerous. It's a path that ever user will be hitting. I'm not comfortable disabling that until unless there is an equivalent test.
Attachment #699972 -
Flags: feedback-
Comment 9•12 years ago
|
||
Good point Taras- I will hold off on this until we have a better story for testing with this preference on.
Comment 10•12 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #6)
> Vladan: Would you mind posting your Talos + FHR findings in this bug?
I re-ran FHR Talos numbers today:
FHR disabled: https://tbpl.mozilla.org/?tree=Try&rev=a20fa3fbd9ac
FHR enabled: https://tbpl.mozilla.org/?tree=Try&rev=a92dcbbd1400
The top observation is that shutdown times have regressed by ~200% on Win7, ~100% on WinXP + Linux, and ~20% on Mac OS X.
This might indeed be because the FHR DB is getting created at shutdown during Talos tests. It needs to be investigated & fixed ASAP because it's also making the other Talos measures meaningless, i.e. no FHR work gets done until shutdown.
Also, we probably shouldn't be creating the DB at shutdown at all and instead we should wait for an idle time during a future session to do it.
Flags: needinfo?(vdjeric)
Reporter | ||
Comment 11•12 years ago
|
||
For more comparison purposes:
Console logging: https://tbpl.mozilla.org/?tree=Try&rev=0a8d26e4aea8
60s Startup Delay (instead of 10s): https://tbpl.mozilla.org/?tree=Try&rev=b4d9af5cb6f3
Reporter | ||
Comment 12•12 years ago
|
||
OK. The console logging tells me a lot.
http://gps.pastebin.mozilla.org/2064339 is the relevant FHR shutdown logs from a Windows run. We see it takes 2.229s wall time for FHR to shut down. This is similar to a Linux64 (http://gps.pastebin.mozilla.org/2064351) which takes 1.986s. Compare this with an OS X run (http://gps.pastebin.mozilla.org/2064343) where it took 0.113s!
FWIW, the OS X times are very similar to my own MBP, where I have an SSD. I wonder if the OS X Talos machines are running SSDs?
Anyway, if we look at the Windows logs, every SQLite query that modifies the DB takes 125-175ms to complete. Because of a sub-optimal implementation of session recording, we are performing 13 such queries during shutdown. 13 * 150ms = 1.950s.
This reinforces the point that DB writes on shutdown are bad. We need to refactor the session recording code ASAP (bug 827157) and eliminate as many of (hopefully all) these writes on shutdown.
As a quick fix, I /think/ if we move the writes to a transaction, we'll only incur 1 disk flush at transaction commit time. These operations are all related, so they should be incurring within a transaction anyway.
As an aside, 100+ ms to perform a DB write seems like a lot, even for magnetic (I'm assuming) disks. I guess there is a lot more disk head movement than I thought there would be! Who knows.
Comment 13•12 years ago
|
||
(In reply to comment #12)
> FWIW, the OS X times are very similar to my own MBP, where I have an SSD. I
> wonder if the OS X Talos machines are running SSDs?
They aren't.
> This reinforces the point that DB writes on shutdown are bad. We need to
> refactor the session recording code ASAP (bug 827157) and eliminate as many of
> (hopefully all) these writes on shutdown.
Is that something that we can do within the next couple of days? We usually do not let regressions of this magnitude stay in the tree...
Reporter | ||
Comment 14•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=67dffc73b9eb is a trivial change to perform most of the shutdown writes in a transaction. If my theory about I/O and transactions is correct, this should recover a majority of the regression.
A more in depth solution depends on what Product and Metrics have to say in bug 830495 and its effect on bug 827157.
Assignee: jmaher → gps
Reporter | ||
Comment 15•12 years ago
|
||
I just posted a patch in bug 827157 that removes all shutdown-specific DB writes. I expect this will recover all lost shutdown performance losses.
Reporter | ||
Comment 16•12 years ago
|
||
My theory about transactions appears to be correct! tp5n_shutdown_paint times for Windows:
Before FHR: 1000-1250
Current FHR: 3200-4000
Transaction: 1626.0
Transactions FTW!
Reporter | ||
Comment 17•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=bf784d3cf643 combined the better session recording in bug 827157 with the transactions off main thread patch in bug 830209. I'm optimistic the latter will have some benefit to Talos since there is no longer main thread blocking ~10s after startup.
Comment 18•12 years ago
|
||
the question for this bug is "do we want to disable FHR for talos?" From what Taras says we shouldn't. I believe his reasoning is valid. Are we looking for <10% perf differences before going live? Do we want to have this on for some tests and not others?
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #18)
> the question for this bug is "do we want to disable FHR for talos?" From
> what Taras says we shouldn't. I believe his reasoning is valid. Are we
> looking for <10% perf differences before going live? Do we want to have
> this on for some tests and not others?
There are multiple questions for this bug that were posed in the original comment. We still have a number of paths we can proceed down. I don't believe we've made any concrete decisions yet.
Reporter | ||
Comment 20•12 years ago
|
||
One of the reports we received was a regression for Kraken.
Looking at the logs at https://tbpl.mozilla.org/php/getParsedLog.php?id=18845314&tree=Try&full=1, it appears that the FHR database is being created while Kraken, Dromaeo, etc run during the dromaeojs Talos test. This is certainly throwing off numbers.
From what limited I know about these benchmarks, I /think/ we should disable FHR completely or at least ensure the database schema is created *before* the benchmarks start.
Comment 21•12 years ago
|
||
(In reply to comment #20)
> From what limited I know about these benchmarks, I /think/ we should disable
> FHR completely or at least ensure the database schema is created *before* the
> benchmarks start.
Do you mean disable FHR on Talos, or in Firefox?
Comment 22•12 years ago
|
||
I assume disabling it on Talos. If we do that we should have a test which show the effect of FHR. Is there any way to wait until the database schema is created?
Comment 23•12 years ago
|
||
Marking as P4 because we'll be addressing this fully after FHR on Android. We don't want to land a partial change in this area, because we don't want to hide the impact of FHR on Talos.
Priority: -- → P4
Reporter | ||
Updated•12 years ago
|
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Reporter | ||
Comment 24•11 years ago
|
||
I'm unassigning bugs I haven't worked on in a while.
Assignee: gps → nobody
Comment 25•9 years ago
|
||
FHR is going away per bug 1209088, wontfix.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•