Closed Bug 1138953 Opened 9 years ago Closed 9 years ago

compare.py --compare-e10s should default to compare only a specific revision, not a range in history

Tracking

(e10s-)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

e10s

---

People

(Reporter: jmaher, Assigned: jmaher)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

talos compare.py e10s to compare e10s results to the same revision (0.9) 9 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch		Details \| Diff \| Splinter Review
updated with stddev/geomean, etc. (0.92) 9 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch		Details \| Diff \| Splinter Review
new format for e10s with --master-revision and support for graphs (0.93) 9 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch		Details \| Diff \| Splinter Review
update compare.py that supports per revision comparison for e10s (1.0) 9 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch	wlach : review+	Details \| Diff \| Splinter Review
bug1138953.v5.patch 9 years ago Avi Halachmi (:avih) (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Description

•

9 years ago

      No description provided.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 1

•

9 years ago

Attached patch talos compare.py e10s to compare e10s results to the same revision (0.9) (obsolete) (deleted) — Details — Splinter Review

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 2

•

9 years ago

it is also noted that we should have a mode where we print out:
<geomean> (<num-datapoints>) +/- <stddev> (<stddev-as-%-of-geomean>) (<min> - <max>)

instead of just:
<min> - <max>

this is useful for comparing specific revisions and history.  I am thinking it should be --full-stats or something like that.

Likewise we should have a: --master-rev which does what this patch is doing, then we could easily compare between two types of test for a given rev, or compare anything to a different branch overtime.

Avi Halachmi (:avih)

Comment 3

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #2)
> it is also noted that we should have a mode where we print out:
> <geomean> (<num-datapoints>) +/- <stddev> (<stddev-as-%-of-geomean>) (<min>
> - <max>)

Er.. (<stddev-as-%-of-geomean>%) (added the '%' to make it more readable).

Also we really don't have use for so much accuracy when the values are typically so noisy. Limiting the displayed values to 3-4 significant digits would also make it more readable.

Avi Halachmi (:avih)

Comment 4

•

9 years ago

Also, this addition is not only for fun. Being able to compare apples to apples (same kind of data on both sides), including the number of datapoints each side of the comparison uses will enable us to understand better how valid the comparison is.

If, for instance one side uses 100 data points and the other side has one, it's a valuable information to have when assessing how valid the comparison is, and the same goes for the other types of data this addition brings.

Jim Mathies [:jimm]

Updated

•

9 years ago

Blocks: e10s-tests

tracking-e10s: --- → -

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

9 years ago

this is what is looks like with my latest bits:
python compare.py --compare-e10s --rev eea6188b9b05 --pgo --verbose --branch Firefox --platform Linux --test tresize
   test		Master geo (count) +/- stddev (min, max)	:test	geo (count) +/- stddev (min, max)

Linux:
:) tresize           	 16.8 (48) +/-   0.3 ( 16.4,  18.6)	:	 13.4 (3) +/-   0.2 ( 13.2,  13.6)

Avi Halachmi (:avih)

Comment 6

•

9 years ago

Thanks.

So, I thought a bit more about it. Personally I don't think the min/max values are valuable, especially when we have stddev which gives some representation of the distribution, and also min/max are highly affected by outliers, while stddev is much less so - while still taking all values into account.

Maybe to simplify it and also to help interpret the output better, each side will show only geomean, stddev (possibly only as % of geomean), # of data points, and between the two - the change in percentage between the two geomeans. I could also leave without the smiley since with this suggestion the improvement/regression would appear clearly at the middle as percentage, but we could also leave it for fun :)

e.g.

test     g.mean stddev  points   change    g.mean stddev points

tXYZ     16.8   +/- 10% (48#)    [+20%]    20.2   +/- 9% (3#)

We can also probably round the percentage values to integers, since even for low percentage values we don't really care much if it's 2.2% or 2.6%.

Joel, what do you think?

Avi Halachmi (:avih)

Comment 7

•

9 years ago

Also, for canvasmark/dromaeo_dom/dromaeo_css/V8 - higher is better. Does the smiley take that into account? maybe also the change% at the middle should add a '*' for tests where higher is better?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

9 years ago

yes, we have a reverse tests coded into compare.py.  I agree on dropping the digits and making these whole numbers.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

9 years ago

Attached patch updated with stddev/geomean, etc. (0.92) (obsolete) (deleted) — Details — Splinter Review

jmaher@jmaher-ThinkPad-X230:~/mozilla/talos/talos$ python compare.py --compare-e10s --rev eea6188b9b05 --pgo --verbose --branch Firefox --platform Linux --test tresize
   test		Master gmean stddev points	change	Test	gmean stddev points

Linux:
:) tresize           	   16 +/-  0 (48#)	[-20.4%]	   13 +/-  0 (3#)
jmaher@jmaher-ThinkPad-X230:~/mozilla/talos/talos$ python compare.py --compare-e10s --rev eea6188b9b05 --pgo --verbose --branch Firefox --platform Linux --test tresize --master-revision eea6188b9b05
   test		Master gmean stddev points	change	Test	gmean stddev points

Linux:
:) tresize           	   16 +/-  0 (3#)	[-20.8%]	   13 +/-  0 (3#)

Attachment #8571967 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 10

•

9 years ago

Attached patch new format for e10s with --master-revision and support for graphs (0.93) (obsolete) (deleted) — Details — Splinter Review

python compare.py --compare-e10s --rev eea6188b9b05 --pgo --verbose --branch Firefox --platform Win7 --test tresize --master-revision eea6188b9b05 --print-graph-url 
   test		Master gmean stddev points	change	Test	gmean stddev points

Win7:
:( tresize           	   14 +/-  0 (3#)	[1.8%]	   15 +/-  0 (3#)	http://goo.gl/YvHlcI

Attachment #8573316 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

9 years ago

Attached patch update compare.py that supports per revision comparison for e10s (1.0) (obsolete) (deleted) — Details — Splinter Review

this has been verified by avih to produce some useful results.

Attachment #8573324 - Attachment is obsolete: true

Attachment #8573375 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 12

•

9 years ago

Comment on attachment 8573375 [details] [diff] [review]
update compare.py that supports per revision comparison for e10s (1.0)

I don't really have a lot to say about this, if it works for you, by all means land it. :)

Attachment #8573375 - Flags: review?(wlachance) → review+

Avi Halachmi (:avih)

Comment 13

•

9 years ago

Attached patch bug1138953.v5.patch (deleted) — Details — Splinter Review

This only aligns the formatting compared to the previous patch.

Attachment #8573375 - Attachment is obsolete: true

Attachment #8573435 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 14

•

9 years ago

Comment on attachment 8573435 [details] [diff] [review]
bug1138953.v5.patch

Review of attachment 8573435 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

Attachment #8573435 - Flags: review?(jmaher) → review+

Avi Halachmi (:avih)

Comment 15

•

9 years ago

https://hg.mozilla.org/build/talos/rev/a3e616ae0e30

thanks ;)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 16

•

9 years ago

https://hg.mozilla.org/build/talos/rev/85995e968f39

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

9 years ago

Depends on: 1151466

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.