Closed
Bug 834003
Opened 12 years ago
Closed 12 years ago
Compare telemetry histograms on a Talos run on a PGO versus a non PGO build
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: ehsan.akhgari, Assigned: vladan)
References
Details
Attachments
(3 files, 3 obsolete files)
See <https://wiki.mozilla.org/Buildbot/Talos> for how to run Talos locally. For Tp5, you can get the pageset zip file by pinging :jmaher or somebody on #releng.
I think the interesting tests will be Tp5 and Ts. The JS tests are explicitly non-interesting since we don't have any reason to consider stopping PGOing JS.
Reporter | ||
Comment 1•12 years ago
|
||
This is the nightly without PGO: <http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/01/2013-01-24-05-41-58-mozilla-central/>, this is the previous nightly with PGO: <https://tbpl.mozilla.org/php/getParsedLog.php?id=19055132&tree=Firefox&full=1>
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
Assignee | ||
Comment 4•12 years ago
|
||
Assignee | ||
Comment 5•12 years ago
|
||
Attachment #708922 -
Attachment is obsolete: true
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #708923 -
Attachment is obsolete: true
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #708925 -
Attachment is obsolete: true
Assignee | ||
Comment 8•12 years ago
|
||
I ran the Talos suite locally on a slow Win7 laptop using the Jan23rd (PGO) and Jan24th (no PGO) Nightly builds and then wrote a script to compare the gathered Telemetry data.
The first attachment shows Telemetry measurements that suffered 1% or greater regressions when PGO was disabled, e.g. GC, CC, image decode, page load, session restore, search service initialization, etc. You will also notice regressions in several MOZ_SQLITE_* histograms in this file. These SQLite operations are I/O bound and their histograms are not meaningful to this experiment, but they do help explain regressions in other operations which use SQLite, e.g. PLACES_FRECENCY_CALC_TIME_MS.
The second attachment shows measures that were unaffected by disabling PGO -- unsurprisingly, these are mostly I/O bound operations. The third attachment has a list of histograms that seemingly benefited from disabling PGO. Some of these improvements are clearly I/O timing noise (e.g. DNS_LOOKUP_TIME, FX_SESSION_RESTORE_WRITE_FILE_MS, MOZ_SQLITE*), but others are a bit harder to explain:
- All the cache lock wait times improved. This might be an I/O artifact
- GC_SLICE_MS improved by 27% but almost twice as many GCs were done
- EVENTLOOP_UI_LAG_EXP_MS, a measure of browser responsiveness, improved by 12.9%
- Gradient generation time improved by 2.2% (probably noise)
A few notes on methodology:
- I used the Nightly builds Ehsan linked to in comment 1. They're from different days, so there might be some variation from patches that landed on mozilla-central during January 23rd.
- I limited my script to the histograms collected by Telemetry. The simpleMeasurements Telemetry (e.g. startup & shutdown timings) aren't meaningful since the entire Talos suite is run in a single Firefox session. We can refer to the real Talos numbers for PGO impact on startup & shutdown times. The Telemetry chromeHang & slowSQL data isn't relevant to this experiment.
- I had to configure Talos to run only 45 of the 100 pages in the benchmark since it would error out after benchmarking ~50% of the pages and I didn't want to waste time debugging the test scripts.
- I used the histograms from bug 833917 + about 80 other timing-based histograms which could be easily identified by names of the form *_MS
- The test machine was an E-350 laptop with a mechanical hard drive, Windows 7, 2GB RAM shared with video card, power options set to max performance
Assignee | ||
Updated•12 years ago
|
Attachment #708939 -
Attachment is patch: false
Assignee | ||
Comment 9•12 years ago
|
||
Please note that most of the compared histograms are time measures and not necessarily performance measures -- you'll need to understand the Telemetry probe to interpret the regression.
Reporter | ||
Comment 10•12 years ago
|
||
Thanks a lot, Vladan, this is super helpful!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 11•12 years ago
|
||
The TOTAL_CONTENT_PAGE_LOAD_TIME was surprising to me, because I usually think of pageload as being mostly io-bound. Maybe it's something in the network cache.
You need to log in
before you can comment on or make changes to this bug.
Description
•