Open Bug 1795511 Opened 2 years ago Updated 1 year ago

Switch motionmark to use 'ramp' mode and report complexity score

Categories

(Testing :: Raptor, task, P2)

Default
task

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: jrmuizel, Assigned: aglavic, NeedInfo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxp])

Attachments

(1 file, 2 obsolete files)

To get stable numbers to compare Firefox with WebRender vs without we chose the current configuration. (See bug 1423267 comment 3).

However, I don't think this configuration is working well:

  1. The units reported here are (ms) but maybe they're fps? https://treeherder.mozilla.org/jobs?repo=mozilla-central&revision=c14f7934269f333be9e65958c7a012899b3123bd&group_state=expanded&selectedTaskRun=bYAN6l1qTH63dIoAgKRD0w.0
  2. The values seem to cap at around 60 (which suggests that they are fps)
  3. This configuration is not representative of the way that people actually run MotionMark
  4. Chrome appears to do worse than Firefox in CI but that doesn't match the results when running it manually.

In bug 1778575 we're looking to fix frame scheduling which should make the measurements we get in ramp mode much more stable. I don't think we need to wait for that to land before changing the mode though. For now, I'd rather have numbers that are closer to what MotionMark reports than stability.

Summary: Switch motionmark to use ramp and report complexity → Switch motionmark to use 'ramp' mode and report complexity

Joel, does this seem reasonable?

Flags: needinfo?(jmaher)

in general this seems reasonable- if the numbers most people get are not represented with our CI/tests, then we should change our CI. Keep in mind we can also change the labels we use (default is 'ms', can add 'fps') and make sure things are lower_is_better || higher_is_better.

Keep in mind we also have older hardware that runs these tests- maybe it is representative. There are plans in place to upgrade the CPU (and keep intel GPU) with this month ordering some prototypes.

I would leave this up to the perf tooling team to prioritize/change/review as needed. :kimberlythegeek, can you chime in here if there are other things to consider.

Flags: needinfo?(jmaher) → needinfo?(ksereduck)

:jrmuziel Could you provide more information on how the configuration is not representative, and using ramp mode?

Flags: needinfo?(ksereduck) → needinfo?(jmuizelaar)
Priority: -- → P3
Severity: -- → S3
Whiteboard: [perftest:triage]

When you run https://browserbench.org/MotionMark/ in its default configuration it uses ramp mode. The constant complexity mode that we run it in is only accessible through https://browserbench.org/MotionMark/developer.html.

Flags: needinfo?(jmuizelaar) → needinfo?(ksereduck)

:jrmuizel, regarding point (4), have you seen this on multiple machines and platforms, or only you're own so far?

Also, can you elaborate on why you want the complexity to be reported? We could add this to our extra-options, but it's unclear if we'll ever have more than 1 complexity variation of motionmark running at once.

Type: enhancement → task
Priority: P3 → P2
Flags: needinfo?(ksereduck) → needinfo?(jmuizelaar)

(In reply to Greg Mierzwinski [:sparky] from comment #5)

:jrmuizel, regarding point (4), have you seen this on multiple machines and platforms, or only you're own so far?

I've run it on a couple of other machines now and the results are mixed.

Also, can you elaborate on why you want the complexity to be reported? We could add this to our extra-options, but it's unclear if we'll ever have more than 1 complexity variation of motionmark running at once.

Complexity is the score reported by MotionMark when you run it in it's default configuration. I just want that. That will prevent tests from getting capped at 60fps like they currently do.

Flags: needinfo?(jmuizelaar)

Ah ok, perfect, thanks for the additional info!

Summary: Switch motionmark to use 'ramp' mode and report complexity → Switch motionmark to use 'ramp' mode and report complexity score
Whiteboard: [perftest:triage]

Who should do this work?

The jira task wasn't setup properly so it evaded our grooming filter sorry about that. We'll find someone to look into this at the next grooming session (on Monday Dec 19).

Assignee: nobody → aglavic
Status: NEW → ASSIGNED

:jrmuizel a few questions about the switch:

  1. Would you prefer mean or median for the complexity scores?
  2. What are the units for complexity score? Should we use a unit of 'score'?
  3. Do you want this to be changed for both motionmark-html and motionmark-animometer?
Flags: needinfo?(jmuizelaar)

As well if we are tracking score, is lower still better?

(In reply to Andrej Glavic (:andrej) from comment #10)

:jrmuizel a few questions about the switch:

  1. Would you prefer mean or median for the complexity scores?
  2. What are the units for complexity score? Should we use a unit of 'score'?
  3. Do you want this to be changed for both motionmark-html and motionmark-animometer?
Attachment #9310719 - Attachment is obsolete: true
Priority: P2 → P1

(In reply to Andrej Glavic (:andrej) from comment #10)

:jrmuizel a few questions about the switch:

  1. Would you prefer mean or median for the complexity scores?

probably the median

  1. What are the units for complexity score? Should we use a unit of 'score'?

yep, score seems best

  1. Do you want this to be changed for both motionmark-html and motionmark-animometer?

Yes

As well if we are tracking score, is lower still better?

No, higher is better

Flags: needinfo?(jmuizelaar)

Since we are already changing the parameters for the controller, would you like to keep all other existing preferences listed below?

  • test-interval=15
  • display=minimal
  • tiles=big
  • frame-rate=30
  • kalman-process-error=1
  • kalman-measurement-error=4
  • time-measurement=performance
Flags: needinfo?(jmuizelaar)

I think defaults look more like:

  • frame-rate=50
  • test-interval=30

I think everything else can stay the same.

Flags: needinfo?(jmuizelaar)
Attachment #9311131 - Attachment is obsolete: true
Priority: P1 → P2

We are working on changing motionmark to use ramp mode, but for chrome and chromium when we alter to ramp mode on macs we find that we get a return value of one for all tests and subtests:
https://treeherder.mozilla.org/jobs?repo=try&revision=344b651c2a66fb39b8e4b65fe033d0a7117fc8ed
This is for the 1300 M2s but it was a similar thing for the 1015

We've been seeing a number of scoring issues with MotionMark in general - including a reported 0 score on Chrome on the Multiply test on very fast devices. But not all tests, so I suspect something else is going wrong here. Hoping to fix some of the structural scoring problems MotionMark 2. In the meantime, how difficult would it be to make a brand new taskcluster job so we can at least track ramp results for Firefox?

Flags: needinfo?(aglavic)

We can definitely do that :) I can look into that and get it sometime soon after all-hands!

Flags: needinfo?(aglavic)

Leaving on need info

Flags: needinfo?(aglavic)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: