Open Bug 1610389 Opened 5 years ago Updated 2 years ago

Investigate Desktop Browsertime issues.

Categories

(Testing :: Performance, task, P3)

Version 3
task

Tracking

(Not tracked)

People

(Reporter: sparky, Unassigned)

References

(Blocks 2 open bugs)

Details

This bug is for investing the issues we are seeing when we run browsertime on desktop. The biggest issue is that for a given page, we find that with Raptor, Firefox is shown to be faster, but with Browsertime, Chrome is faster (and vice versa).

The other issue is that the average results don't have a consistent increase or decrease that would be indicative of a change in overhead - something like that would be expected and perfectly reasonable given that we are using two different tools. This really leads us to the question of which tool is reporting the correct values? We can't tell at this stage.

:nalexander has a theory that the additional processes running for browsertime (on top of the existing processes like mitmproxy) are a bit too much and might be causing the problem. This would explain why mobile tests are given great results since they use a host machine for many, if not all, of those processes.

The ideal solution here is to have those processes run on a host machine for desktop as well. That said, this would be a large amount of work if it's true so we need to determine if that's actually the case which is the purpose of this bug.

What have we tried to do to test this theory so far - we need Chrome on the machine to be able to tell if the issue is resolved properly:

  1. Testing on large vs xlarge linux instances. Not fruitful, no drastic changes in variance, and we also don't have chrome here.
  2. Testing on MacMinis vs. MBP. Some changes in metrics were found, they aren't incredibly significant but it suggests we may be onto something.

The next step is to get Chrome on the two MBP machines and then redo the second test. It should be installed on them shortly (for another reason): bug 1607708.

If after retrying with Chrome we still don't find anything (it's quite possible because the mini and mbp machines aren't very different in terms of specs), then we will have to setup something locally (host machine + target machine) to test the change in metrics.

For some more context, here are some docs that show this issue a bit better:

  1. First document for the full analysis: https://docs.google.com/document/d/1qCCWdlzQZSVzi1d7SKm8ReyJKX5KzdTnqMkHN59Cd9w/edit?usp=sharing
  2. Last report on the desktop analysis discussing the flip in metrics: https://docs.google.com/document/d/15kjwG7tMNDon66Wt_iD4992NxfYk9p17BcM9fxZzNGY/edit?usp=sharing

We started looking at this issue on Windows 10 where we are comparing the values obtained when using different values for when we check if the page load is completed and how often we do that check - a graph for this will be posted here shortly.

I just talked with Greg about this bug and for reference here is a graph view from perfherder, which shows the difference between the web extension and browsertime:

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=mozilla-central,2180078,1,13&series=mozilla-central,2006432,1,10&timerange=1209600

As you can see the browsertime results are way more noisy than the webextension ones.

:whimboo it's true for that case, but the majority of tests show a decrease or no change in variance (amazon is also a low-value test which rarely catches unique regressions i.e. the test is not very useful). Here are some graphs that break down the variance changes (in blue/light-blue) across platforms and high-value tests: https://docs.google.com/document/d/1qCCWdlzQZSVzi1d7SKm8ReyJKX5KzdTnqMkHN59Cd9w/edit?usp=sharing

You'll notice that ~7% (4/54) of high-value test+platform combinations exhibit an increase in variance - we can probably add some small fixes over time to get them more stable but it's nothing too concerning.

In regards to the flip issue, I hit a pyplot bug that messed with interpreting the data. After some sanity checks using the Perfherder Graph view and comparing it with my local results I'm now 100% confident in these results. The results can be found in this report: https://docs.google.com/document/d/1NUNlOb97CWBFgJXm2f-7IlZZxWJ1OuM8WFY8kPgi5Y0/edit?usp=sharing

The next step is to try to match our data with what Peter sees by enabling the HAR export and visual metrics. If we can match up, then we can be sure that our setup is working and that the reason Peter sees Firefox slower than Chrome is because of these additional settings. That said, none of this will explain why Chrome is slower with Browsertime in contrast to the Webextension (whereas Firefox is faster with Browsertime).

Blocks: 1650133
Blocks: 1612042
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.