811779 - Expand set of reftests running on m-i/m-c/try

Assignee

Description

•

12 years ago

We are currently running b2g emulator reftest-sanity tests on the core branches. We should expand the set of reftests being run to all passing tests.

Andrew Halberstadt [:ahal]

Assignee

Comment 1

•

12 years ago

Attached patch Base patch (obsolete) (deleted) — Details — Splinter Review

From Aug-Oct I triaged most of the failing/random reftests that cropped up and ended up with this patch. It's been awhile since I ran it so there are probably new failures by now.

I plan on checking it in to cedar and trying to get a stable green run again.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Depends on: 811783

Andrew Halberstadt [:ahal]

Assignee

Comment 2

•

12 years ago

Pushed to cedar: https://hg.mozilla.org/projects/cedar/rev/a0b15032b295

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

12 years ago

Most of the chunks are getting killed because they are taking more than an hour to run. When running these on my desktop they took around ~20 min but I guess the slaves aren't as powerful.

We'll want to:
1) Figure out how to speed them up
2) Use more chunks
3) Possibly get the emulators running on mac where wait times aren't as high

This is all outside the scope of this bug. It'll be a slow process.

Aki Sasaki (not active)

Comment 4

•

12 years ago

I overwrote local changes on Cedar in the latest merge: https://hg.mozilla.org/projects/cedar/diff/5b7cce7a7f1b/layout/reftests/font-inflation/reftest.list

Andrew Halberstadt [:ahal]

Assignee

Comment 5

•

12 years ago

For posterity here is :cjones' rankings of reftest b2g importance:

== Critical ==
I wouldn't consider shipping a phone without knowing the exact state
of these tests.  In order of importance.

crashtests
layout/reftests/reftest-sanity
layout/reftests/bugs
layout/reftests/invalidation


 == High priority ==
Tests that are critical to the project and for which desktop/android
coverage is *not* mostly sufficient.  In no particular order.

content/canvas/test/reftest
image/test/reftest
gfx/tests/reftest
layout/reftests/position-dynamic-changes
layout/reftests/text
layout/reftests/canvas
layout/reftests/svg/smil
layout/reftests/svg/as-image
layout/reftests/font-inflation
layout/reftests/transform
layout/reftests/image
layout/reftests/scrolling
layout/reftests/forms
layout/reftests/css-gradients
layout/reftests/ogg-video
layout/reftests/transform-3d
layout/reftests/layers
layout/reftests/flexbox
layout/reftests/webm-video
layout/reftests/selection
layout/reftests/css-selectors
layout/reftests/css-calc
layout/reftests/font-face


 == Normal priority ==
Tests that we should run but for which desktop/android coverage *is*
mostly sufficient.  In no particular order.

content/html/content/reftests
content/test/reftest
layout/reftests/border-radius
layout/reftests/cssom
layout/reftests/text-shadow
layout/reftests/columns
layout/reftests/list-item
layout/reftests/table-width
layout/reftests/css-ui-valid
layout/reftests/css-optional
layout/reftests/box-sizing
layout/reftests/bidi
layout/reftests/font-matching
layout/reftests/table-background
layout/reftests/text-indent
layout/reftests/marquee
layout/reftests/image-element
layout/reftests/indic-shaping
layout/reftests/line-breaking
layout/reftests/datalist
layout/reftests/css-transitions
layout/reftests/svg
layout/reftests/css-visited
layout/reftests/css-charset
layout/reftests/table-dom
layout/reftests/counters
layout/reftests/css-parsing
layout/reftests/unicode
layout/reftests/text-decoration
layout/reftests/box-ordinal
layout/reftests/abs-pos
layout/reftests/table-overflow
layout/reftests/css-placeholder
layout/reftests/css-default
layout/reftests/text-transform
layout/reftests/text-overflow
layout/reftests/pagination
layout/reftests/image-rect
layout/reftests/z-index
layout/reftests/percent-overflow-sizing
layout/reftests/object
layout/reftests/font-features
layout/reftests/image-region
layout/reftests/inline-borderpadding
layout/reftests/css-disabled
layout/reftests/pixel-rounding
layout/reftests/native-theme
layout/reftests/box-shadow
layout/reftests/table-bordercollapse
layout/reftests/floats
layout/reftests/css-import
layout/reftests/text-svgglyphs
layout/reftests/generated-content
layout/reftests/table-anonymous-boxes
layout/reftests/w3c-css
layout/reftests/first-line
layout/reftests/box-properties
layout/reftests/css-mediaqueries
layout/reftests/css-valid
layout/reftests/css-invalid
layout/reftests/first-letter
layout/reftests/css-ui-invalid
layout/reftests/box
layout/reftests/css-enabled
layout/reftests/backgrounds
layout/reftests/ib-split
layout/reftests/tab-size
layout/reftests/border-image
layout/reftests/css-submit-invalid
layout/reftests/margin-collapsing
layout/reftests/dom
layout/reftests/css-required
layout/reftests/css-valuesandunits
layout/reftests/mathml
parser/htmlparser/tests/reftest
editor/reftests
netwerk/test/reftest
toolkit/content/tests/reftests
widget/reftests

 == Completely worthless ==
(Just a waste of CPU cycles, please don't run.)

dom/plugins/test/reftest
editor/reftests/xul
layout/reftests/printing
layout/reftests/xul
layout/reftests/xul-document-load
layout/xul
toolkit/themes/pinstripe/reftests

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

12 years ago

I disabled everything except the critical and high priority tests on cedar. Unfortunately because chunking doesn't take into account skipped tests, and due to the uneven distribution of skipped tests, some chunks are still timing out (since they are still running 1000+ tests while other chunks are only running ~100-200 tests).

Andrew Halberstadt [:ahal]

Assignee

Comment 7

•

12 years ago

The easiest solution would probably be to create a separate reftest_b2g.list root manifest. This way we wouldn't technically be skipping everything and all chunks would see an even distribution of tests.

cmtalbert

Comment 8

•

12 years ago

Yes, for now, comment 7 is the way forward. The chunking problem itself is being worked on in bug 818156.

Depends on: 818156

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Depends on: 820958

Andrew Halberstadt [:ahal]

Assignee

Comment 9

•

12 years ago

Attached patch Patch 1.0 - Enable larger set of b2g reftests (deleted) — Details — Splinter Review

I'm fairly confident that the set of reftests enabled by this patch is green enough on cedar to get the ball rolling.

Notes:
* the two important files to look at are layout/reftests/reftest.list to see the overall set of tests that will be run and layout/tools/reftest/runreftestb2g.py since I had to turn off <iframe mozbrowser> due to bug 785074 which causes tons of additional failures
* I used skip-if instead of random or fails-if to avoid unnecessary test slave load
* after landing this patch we'll need to update the mozharness configs to point to the root manifest instead of reftest-sanity
* if you'd rather I create a separate root manifest for B2G as opposed to skip-if'ing everything in the main one, I can attach a new patch

There are obviously still some fundamental problems with the reftest harness on B2G and this patch isn't going to make anyone happy (including myself). But at the end of the day it will put us in a better position in terms of test coverage than we are currently.

Attachment #681547 - Attachment is obsolete: true

Attachment #699322 - Flags: review?(jgriffin)

Attachment #699322 - Flags: feedback?(jones.chris.g)

Jonathan Griffin (:jgriffin)

Comment 10

•

12 years ago

Comment on attachment 699322 [details] [diff] [review]
Patch 1.0 - Enable larger set of b2g reftests

Review of attachment 699322 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like we're still getting the odd random orange on cedar; I guess we can cover those with new bugs, or skip those as well if they become too frequent.

Attachment #699322 - Flags: review?(jgriffin) → review+

Andrew Halberstadt [:ahal]

Assignee

Comment 11

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d932f2172ce2

Whiteboard: [automation-needed-in-aurora][automation-needed-in-b2g18]

Ed Morley [:emorley]

Comment 12

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla21

Ed Morley [:emorley]

Comment 13

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2

Ed Morley [:emorley]

Comment 14

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2

Andrew Halberstadt [:ahal]

Assignee

Comment 15

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b

This patch applied cleanly to aurora, but there were massive differences on the b2g-18 branch. I don't think that merging it by hand will produce a green test run anyway, so I'd advocate not turning these tests on there and waiting for the next merge.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Whiteboard: [automation-needed-in-aurora][automation-needed-in-b2g18]

Andrew Halberstadt [:ahal]

Assignee

Comment 17

•

12 years ago

I made a typo when merging the root manifest to aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/a6a8dd94822b

Jonathan Griffin (:jgriffin)

Comment 18

•

12 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #15)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b
> 
> This patch applied cleanly to aurora, but there were massive differences on
> the b2g-18 branch. I don't think that merging it by hand will produce a
> green test run anyway, so I'd advocate not turning these tests on there and
> waiting for the next merge.

There aren't any planned merges to b2g18.  We may need to bite the bullet and hide the tests on b2g18 until we can exclude all of the failures there.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 19

•

12 years ago

Comment on attachment 699322 [details] [diff] [review]
Patch 1.0 - Enable larger set of b2g reftests

Sorry, this f? hit me at a really bad crunch time.  I didn't look through all the manifest changes but it's usually bad form to disable tests without a bug to re-enable or comment explaining why.

Attachment #699322 - Flags: feedback?(jones.chris.g)

Andrew Halberstadt [:ahal]

Assignee

Comment 20

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #19)
> Sorry, this f? hit me at a really bad crunch time

No worries, I mostly just wanted you to be aware of this horrible patch and the fact that more tests are running.

> it's usually bad form to disable tests without
> a bug to re-enable or comment explaining why.

Agreed, though:

A) We are running 10 chunks at 30+ minutes each for over 5 hours of B2G reftest per push. Realistically these tests are never coming back on with emulators as we just don't have capacity for even this much. When we switch to pandaboards I'll re-enable everything, re-triage on pandas and emulator reftests will be phased out.

B) There are so many failures (possibly in the thousands) that I don't know if it is harness, platform, emulator or test related. The best I could do is comment with a tracking bug which isn't much more useful than nothing at all.

Daniel Holbert [:dholbert]

Comment 21

•

11 years ago

Are there any tracking bugs filed on further-increasing the number of reftests that are run on B2G?

We've got a frightening number of entire subdirectories marked as "skip-if(B2G)", from this bug's changeset - this part in particular, tweaking the toplevel reftest.list file:
 http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/reftest.list

(I'm assuming the situation was worse beforehand, but this still leaves us in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans to get better. :))

Jonathan Griffin (:jgriffin)

Comment 22

•

11 years ago

ahal, can you answer dholbert?

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Assignee

Comment 23

•

11 years ago

(In reply to Daniel Holbert [:dholbert] from comment #21)
> Are there any tracking bugs filed on further-increasing the number of
> reftests that are run on B2G?
> 
> We've got a frightening number of entire subdirectories marked as
> "skip-if(B2G)", from this bug's changeset - this part in particular,
> tweaking the toplevel reftest.list file:
>  http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/
> reftest.list
> 
> (I'm assuming the situation was worse beforehand, but this still leaves us
> in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans
> to get better. :))

Yes, I'd really like to get more tests enabled as well. The main problem is that reftests can't be run on the Ubuntu AWS VM's (bug 818968) so we have to keep them running on actual hardware. Combined with the fact that they are *very* slow on the emulators (~40 minutes for 500 tests) we can't just wholesale enable them without getting long test backups.

That being said, since we've moved other tests off of the Fedora pool, we do have a bit of spare capacity to enable some more tests. I've blogged/posted to dev.b2g about this in the past but no one seemed interested.

To answer your question, there aren't any bugs filed to enable specific swathes of tests, but if you would like to, feel free to make them block the 'b2g-reftest' main tracking bug. Or if you want to give me a list of which tests are currently disabled that you think would be most useful to disable, I'd be happy to enable them when I have a few spare cycles.

Flags: needinfo?(ahalberstadt)

Phil Ringnalda (:philor)

Comment 24

•

11 years ago

Not really 40 minutes for 500 tests, it's actually more like 25 minutes for 500 tests, and 15 minutes of setup/teardown time per hunk. We could easily add another 1000 tests and not lose anything, just by switching from 10 hunks to 5.

Base patch 12 years ago Andrew Halberstadt [:ahal] (deleted), patch		Details \| Diff \| Splinter Review
Patch 1.0 - Enable larger set of b2g reftests 12 years ago Andrew Halberstadt [:ahal] (deleted), patch	jgriffin : review+	Details \| Diff \| Splinter Review