Closed
Bug 631571
Opened 14 years ago
Closed 13 years ago
Measure browser responsiveness in Talos
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ted, Assigned: jmaher)
References
Details
Attachments
(3 files, 3 obsolete files)
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
wlach
:
review+
|
Details | Diff | Splinter Review |
We'd like to measure browser responsiveness so we can find spots where we're currently not very responsive, and ensure that we improve the metric. In addition, we want to make sure that large architectural changes like e10s don't harm responsiveness of common tasks.
The scope of this work will be to get some measure of responsiveness out of the browser (an estimate of how long we're taking to service events), and have Talos exercise various functionality while sampling that measurement, and then have this all running on Tinderbox and reporting something useful.
We discussed this work with Blizzard a few months ago, I think Sheila took some notes. I'll have to dig through my email to see if I can find them.
Reporter | ||
Comment 1•14 years ago
|
||
The patches in bug 606574 are currently written to fire a tracing event every 50ms (so 20/s), and report how long that event took to get processed. Right now the output just looks like:
MOZ_EVENT_INTERVAL <duration>
where duration is in integer milliseconds. We may need to tweak that to be more useful. The output goes to stdout by default, but can be redirected to a file by setting MOZ_INSTRUMENT_EVENT_LOOP_OUTPUT to a filename.
Comment 2•14 years ago
|
||
I should also have this in my email, I'll go hunting.
Comment 3•14 years ago
|
||
From the summary of the last meeting here's the top 7 items requested:
1. Startup
a. Time between startup and visible window
b. Time between visible window appearing and the browser is completely responsive (type in a url)
2. Switching tabs
3. Opening a new window
4. Quit the browser
5. Awesome bar search results (how fast it returns).
6. Right click menu.
7. Responsiveness in foreground tab during page load.
Comment 4•14 years ago
|
||
Good list, but mobile needs at least two more:
1) What about scrolling time?
2) What about zoom-in zoom-out time?
I think everyone would benefit if we started measure these.
Reporter | ||
Comment 5•14 years ago
|
||
The instrumentation I'm adding in bug 606574 won't let us measure "how long does it take to do X?", I figured the existing Talos harness was good enough for that (Tpan, etc). What it will let us measure is "how responsive is the browser while doing X?", which we currently have no data on.
Specifically, it will tell you how long events take to be serviced at 20Hz samples, so we can perform an action (load a page), and determine roughly how responsive we are until the action finishes.
Comment 6•14 years ago
|
||
The UX team is helping to come up with a list of UI events that we should prioritize the measurement and optimization of. I'll post it here once it's been gathered.
Re comment #5: We should do some manual testing of common events (new tab, scroll on large page) to verify that attempting to correlate responsiveness via the event loop logging is useful for these higher-level user scenarios.
Reporter | ||
Comment 7•14 years ago
|
||
I've run through a few non-scientific tests while developing the patch, and it's definitely producing measurable data. You can watch the event servicing time spike when you load a new page or do other intensive tasks. I've got somewhat-outdated try server builds here that you can try out if you'd like to see what it produces:
http://people.mozilla.com/~tmielczarek/event-loop-instrument/
Comment 8•14 years ago
|
||
That's fantastic, thanks Ted!
The next bit would be to find a way to make the correlation, either in the harness, or maybe by doing post-processing of the event loop logs and a log we spit out from the high-level tests, possibly via timestamp?
Reporter | ||
Comment 9•14 years ago
|
||
Yeah, I hadn't quite worked that out yet. I was worried about getting timestamps that actually line up, given how finicky system clocks can be. I'm using mozilla::TimeStamp internally for interval measurement, but it doesn't actually let you get a wallclock time, only intervals between points in time.
Reporter | ||
Comment 10•13 years ago
|
||
As a note, we're punting on Talos measurement for this quarter. We're going to figure out how to measure something useful on developer machines first, then we'll figure out what we want to automate.
Reporter | ||
Updated•13 years ago
|
Assignee: ted.mielczarek → anodelman
Comment 11•13 years ago
|
||
From discussions in email/irc the initial tests we are going measure responsiveness during are:
- window open
- tab open
- tp5
Comment 12•13 years ago
|
||
As a proof of concept flipped on ted's responsiveness measurement while running a single cycle of tdhtml (a test that cycles through a few dhtml heavy web pages).
Here's the output from the instrumentation:
MOZ_EVENT_TRACE start 1311194309792
MOZ_EVENT_TRACE sample 1311194310055 109
MOZ_EVENT_TRACE sample 1311194310158 51
MOZ_EVENT_TRACE sample 1311194311994 152
MOZ_EVENT_TRACE sample 1311194315275 67
MOZ_EVENT_TRACE sample 1311194320416 244
MOZ_EVENT_TRACE sample 1311194322645 87
MOZ_EVENT_TRACE stop 1311194326678
We can see that we exceeded the responsiveness measure of the instrumentation several times - but I don't know how to correlate that to exactly what the browser was doing at the time. Ted - do you have any advice here?
Reporter | ||
Comment 13•13 years ago
|
||
If we're just cycling through pages, maybe we can get load start times for each page? Then developers would be able to see which pages were the cause of the unresponsive periods.
Comment 14•13 years ago
|
||
I was going to ask the same question w.r.t. tp5 as mentioned in Comment 11. How are you going to measure responsiveness during the Tp5 test? Is that a measure of time before we load the page? i.e. from "starting page n load" to "page n load finished"? Is that a measure of time from the event of "page n loaded" to "page n+1 loaded"?
Reporter | ||
Comment 15•13 years ago
|
||
The numbers in comment 12 are simply samples from when the event loop took longer than 50ms to respond to an event. You can correlate them to whatever you want providing you have timestamps, the third field is a unix timestamp in milliseconds.
But most of our Talos tests don't try to drill down into too much detail, do they? They simply give you a number.
Comment 16•13 years ago
|
||
Here's the mixed together results for tdhtml for comparison:
NOISE: MOZ_EVENT_TRACE start 1311278181023
NOISE: MOZ_EVENT_TRACE sample 1311278181283 106
NOISE: MOZ_EVENT_TRACE sample 1311278181453 67
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/colorfade.html (next: http://localhost/page_load_test/dhtml/diagball.html)
NOISE: MOZ_EVENT_TRACE sample 1311278183256 219
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/diagball.html (next: http://localhost/page_load_test/dhtml/fadespacing.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/fadespacing.html (next: http://localhost/page_load_test/dhtml/imageslide.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/imageslide.html (next: http://localhost/page_load_test/dhtml/layers1.html)
NOISE: MOZ_EVENT_TRACE sample 1311278186559 90
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers1.html (next: http://localhost/page_load_test/dhtml/layers2.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers2.html (next: http://localhost/page_load_test/dhtml/layers4.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers4.html (next: http://localhost/page_load_test/dhtml/layers5.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers5.html (next: http://localhost/page_load_test/dhtml/layers6.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers6.html (next: http://localhost/page_load_test/dhtml/meter.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/meter.html (next: http://localhost/page_load_test/dhtml/movingtext.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/movingtext.html (next: http://localhost/page_load_test/dhtml/mozilla.html)
NOISE: MOZ_EVENT_TRACE sample 1311278191699 240
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/mozilla.html (next: http://localhost/page_load_test/dhtml/replaceimages.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/replaceimages.html (next: http://localhost/page_load_test/dhtml/scrolling.html)
NOISE: MOZ_EVENT_TRACE sample 1311278193913 71
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/scrolling.html (next: http://localhost/page_load_test/dhtml/slidein.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidein.html (next: http://localhost/page_load_test/dhtml/slidingballs.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidingballs.html (next: http://localhost/page_load_test/dhtml/textslide.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/textslide.html (next: http://localhost/page_load_test/dhtml/zoom.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/zoom.html (next: http://localhost/page_load_test/dhtml/colorfade.html)
I think that if I added a timestamp to the pageload information we might be able to do a proper correlation.
Comment 17•13 years ago
|
||
Still I'm not sure what metric we want to be monitoring - should we do something like "amount of unresponsive events during test"? Then you'd still need a human to look over the results to see where things got worse.
Comment 18•13 years ago
|
||
And again with more timestamps:
NOISE: MOZ_EVENT_TRACE start 1311279480976
NOISE: MOZ_EVENT_TRACE sample 1311279481244 114
NOISE: 1311279482448: Cycle 1: loaded http://localhost/page_load_test/dhtml/colorfade.html (next: http://localhost/page_load_test/dhtml/diagball.html)
NOISE: MOZ_EVENT_TRACE sample 1311279483176 147
NOISE: 1311279483773: Cycle 1: loaded http://localhost/page_load_test/dhtml/diagball.html (next: http://localhost/page_load_test/dhtml/fadespacing.html)
NOISE: 1311279485150: Cycle 1: loaded http://localhost/page_load_test/dhtml/fadespacing.html (next: http://localhost/page_load_test/dhtml/imageslide.html)
NOISE: 1311279485852: Cycle 1: loaded http://localhost/page_load_test/dhtml/imageslide.html (next: http://localhost/page_load_test/dhtml/layers1.html)
NOISE: MOZ_EVENT_TRACE sample 1311279486457 68
NOISE: 1311279486567: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers1.html (next: http://localhost/page_load_test/dhtml/layers2.html)
NOISE: 1311279487112: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers2.html (next: http://localhost/page_load_test/dhtml/layers4.html)
NOISE: 1311279487657: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers4.html (next: http://localhost/page_load_test/dhtml/layers5.html)
NOISE: 1311279488342: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers5.html (next: http://localhost/page_load_test/dhtml/layers6.html)
NOISE: 1311279488885: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers6.html (next: http://localhost/page_load_test/dhtml/meter.html)
NOISE: 1311279489824: Cycle 1: loaded http://localhost/page_load_test/dhtml/meter.html (next: http://localhost/page_load_test/dhtml/movingtext.html)
NOISE: 1311279490634: Cycle 1: loaded http://localhost/page_load_test/dhtml/movingtext.html (next: http://localhost/page_load_test/dhtml/mozilla.html)
NOISE: MOZ_EVENT_TRACE sample 1311279491566 212
NOISE: 1311279492340: Cycle 1: loaded http://localhost/page_load_test/dhtml/mozilla.html (next: http://localhost/page_load_test/dhtml/replaceimages.html)
NOISE: 1311279493135: Cycle 1: loaded http://localhost/page_load_test/dhtml/replaceimages.html (next: http://localhost/page_load_test/dhtml/scrolling.html)
NOISE: MOZ_EVENT_TRACE sample 1311279493730 64
NOISE: 1311279493743: Cycle 1: loaded http://localhost/page_load_test/dhtml/scrolling.html (next: http://localhost/page_load_test/dhtml/slidein.html)
NOISE: 1311279495274: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidein.html (next: http://localhost/page_load_test/dhtml/slidingballs.html)
NOISE: 1311279495934: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidingballs.html (next: http://localhost/page_load_test/dhtml/textslide.html)
NOISE: 1311279496723: Cycle 1: loaded http://localhost/page_load_test/dhtml/textslide.html (next: http://localhost/page_load_test/dhtml/zoom.html)
NOISE: 1311279497428: Cycle 1: loaded http://localhost/page_load_test/dhtml/zoom.html (next: http://localhost/page_load_test/dhtml/colorfade.html)
Reporter | ||
Comment 19•13 years ago
|
||
(In reply to comment #17)
> Still I'm not sure what metric we want to be monitoring - should we do
> something like "amount of unresponsive events during test"? Then you'd
> still need a human to look over the results to see where things got worse.
This is the crux of the issue. bsmedberg at one point suggested reporting something like the max/median/mean values, but the latter two probably aren't quite right since we're reporting only samples >50ms. I could change the code to report all samples again, but it's a lot (20 samples per second at most).
Comment 20•13 years ago
|
||
I don't think we should treat responsiveness the same way we treat speed performance numbers: there isn't a single number to report, in most cases.
Instead, I suspect it almost needs to be like "responsiveness mochitesting": we have a series of actions that we know do or could or used to cause the event loop to go away for more than 50ms. If we've fixed the problem, any latency over 50ms would cause the test to fail/go orange/whatever. If we haven't yet fixed the problem, we measure the latency and make sure it doesn't get worse.
There are some issues with this, mostly the random pauses which we can't control yet (GC/CC), so perhaps we need a way to blacklist/measure those separately from the main test item.
Reporter | ||
Comment 21•13 years ago
|
||
I filed bug 674606 on bsmedberg's "responsiveness mochitest" idea, after discussion we're going to continue to work on this as well, since it should produce useful data in the short term.
Comment 22•13 years ago
|
||
Comment 23•13 years ago
|
||
Graph server patch already applied to graphs-stage for testing.
Attachment #560439 -
Flags: review?(jmaher)
Comment 24•13 years ago
|
||
Currently missing:
- ability to parametrize time between pings to event loop/threshold for recording slow pings as already described in bug 679427 (waiting on checkin, deployment)
- determination of the 'alpha' variable to be used in calculating the responsiveness metric
- further testing to ensure that the responsiveness test option doesn't wobble the regular tp5 results (this would make roll out a lot simpler)
Assignee | ||
Updated•13 years ago
|
Attachment #560439 -
Flags: review?(jmaher) → review+
Comment 25•13 years ago
|
||
(In reply to alice nodelman [:alice] [:anode] from comment #24)
> Currently missing:
> - ability to parametrize time between pings to event loop/threshold for
> recording slow pings as already described in bug 679427 (waiting on checkin,
> deployment)
> - determination of the 'alpha' variable to be used in calculating the
> responsiveness metric
> - further testing to ensure that the responsiveness test option doesn't
> wobble the regular tp5 results (this would make roll out a lot simpler)
Is this the full set of things you see standing between the talos user responsiveness test and rolling it out to staging?
How do you determine the alpha variable? I'm not clear on what that step entails. (sorry if you explained it already and I missed it).
Version: unspecified → Trunk
Comment 26•13 years ago
|
||
There's currently a discussion occurring over email between jlebar and bz about determining a way to calculate a useful metric from the responsiveness data that we are collecting.
Ted has checked in the ability to set the time between event loop pings/threshold for recording pings to moz-central, so that is done.
I will be working to complete the further testing to determine if adding the responsiveness test will perturb the general tp5 results - that will decide if we can roll the test out as part of standard tp5 or if it needs to be split off into a separate test run (which would be expensive, as tp has our longest test run time).
Comment 27•13 years ago
|
||
Adding correct names to graph server db, now that mozAfterPaint tests are standard. Already applied to graphs-stage.
Attachment #563464 -
Flags: review?(jmaher)
Comment 28•13 years ago
|
||
Attachment #560438 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #563464 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 29•13 years ago
|
||
as a note the sql pages haven't landed on the graph server. I am not sure if this bug is ready for those. Please let me know if we need to add more to the server or just land these and I can land them to the graphs repository.
Assignee | ||
Comment 30•13 years ago
|
||
updated patch with latest responsiveness calculation and for bitrot.
Assignee: anodelman → jmaher
Attachment #563467 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Assignee | ||
Comment 31•13 years ago
|
||
updated patch, passes on talos staging. The numbers for tp5 do not change, so we don't need to side by side stage this.
Attachment #567451 -
Attachment is obsolete: true
Attachment #568714 -
Flags: review?(wlachance)
Comment 32•13 years ago
|
||
Comment on attachment 568714 [details] [diff] [review]
have talos support 'responsiveness' test option (1.0)
Overall looks like a nice straightforward addition to Talos. My only comment:
+def responsiveness_Metric(val_list):
Should probably be responsiveness_metric, going by PEP8, which is endorsed by the Mozilla style guidelines (https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide#Python_practices). I know we have 3(!) different styles in this file but we might as well settle on the "right" one for new code.
Attachment #568714 -
Flags: review?(wlachance) → review+
Comment 33•13 years ago
|
||
Comment on attachment 568714 [details] [diff] [review]
have talos support 'responsiveness' test option (1.0)
Oh, one other minor comment: do we actually need to specify responsiveness: False for most of the tests in the config file? It looks like we just assume the value is false if it's not set...
Assignee | ||
Comment 34•13 years ago
|
||
landed on build-talos:
http://hg.mozilla.org/build/talos/rev/f9abfefec8aa
Comment 35•13 years ago
|
||
Glad to see that this has been resolved. Has this update been deployed or when is it expected to be deployed?
Assignee | ||
Comment 36•13 years ago
|
||
this is deployed, so when builds start running again, we will see results :) I will post a link in this bug to a graph server view to show the responsiveness data.
Assignee | ||
Comment 37•13 years ago
|
||
We had to back this out due to OSX 10.5.8 and 10.6.2 failing to terminate the browser and Fedora crashing.
Assignee | ||
Comment 38•13 years ago
|
||
ok, ready to deploy again in bug 702351
Assignee | ||
Comment 39•13 years ago
|
||
Here are the data points:
http://graphs-new.mozilla.org/graph.html#tests=[[130,1,1],[130,1,12],[130,1,13],[130,1,21],[130,1,22]]&sel=1322091900253,1322167344714&displayrange=7&datatype=running
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 40•13 years ago
|
||
Thanks Joel. I added a link to the project Snappy wiki [1]. We'll review during next week's call.
[1]https://wiki.mozilla.org/Performance/Snappy#Current_Infrastructure
You need to log in
before you can comment on or make changes to this bug.
Description
•