Closed Bug 1650043 Opened 4 years ago Closed 4 years ago

mach try auto runs xpcshell tests and android tests for browser-window-only changes (and doesn't run the right tests)

Categories

(Firefox Build System :: Task Configuration, defect)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1639164

People

(Reporter: Gijs, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Attached file Tests run from mach try auto (deleted) —

+++ This bug was initially created as a clone of Bug #1638485 +++

https://treeherder.mozilla.org/#/jobs?repo=try&revision=37f286d688c18ef52899f351ec3bedc63a3aa53a

I changed:

browser/base/content/browser-siteProtections.js
browser/components/controlcenter/content/protectionsPanel.inc.xhtml

and a bunch of the browser chrome tests and the head.js file in browser/base/content/test/protectionsUI/.

and it ran tasks as in the attachment.

The xpcshell runs were not necessary, and neither were the android runs. It could probably do without running arm or asan, too (ie it ran too much).

On the flip side, the tests did not run on macOS or Windows opt/shippable. On all the Windows debug platforms, it ran 0 or just one mochitest-bc job, but that job ran a bunch of unrelated tests, and didn't run the affected tests... (from looking at https://firefoxci.taskcluster-artifacts.net/bSvFLQTyRzKLRlMnviB1Jw/0/public/logs/live_backing.log ). So it also didn't run enough tests...

Summary: mach try auto runs xpcshell tests and android tests for browser-window-only changes (and is confused about artifact vs. non-artifact builds) → mach try auto runs xpcshell tests and android tests for browser-window-only changes (and doesn't run the right tests)

In the future, could you file separate bugs for the separate problems you encounter? Since they are separate problems, having separate bugs makes it easier to have separate discussions and avoid losing track.

(In reply to :Gijs (he/him) from comment #0)

The xpcshell runs were not necessary, and neither were the android runs. It could probably do without running arm or asan, too (ie it ran too much).

On the flip side, the tests did not run on macOS or Windows opt/shippable. On all the Windows debug platforms, it ran 0 or just one mochitest-bc job, but that job ran a bunch of unrelated tests, and didn't run the affected tests... (from looking at https://firefoxci.taskcluster-artifacts.net/bSvFLQTyRzKLRlMnviB1Jw/0/public/logs/live_backing.log ). So it also didn't run enough tests...

The platform selection is still suboptimal, that's bug 1639164.

The test you modified was run in test-linux1804-64/debug-mochitest-browser-chrome-e10s-2, test-linux1804-64/debug-mochitest-browser-chrome-fis-e10s-1 and test-macosx1014-64/debug-mochitest-browser-chrome-e10s-2. We currently run manifests that have a high probability of failure (in this case it was selected with 99% confidence) on three separate platforms.

I'm going to resolve this as duplicate of bug 1639164, since AFAICS most of the problems you described are related to that.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE

(In reply to Marco Castelluccio [:marco] from comment #1)

The test you modified was run in test-linux1804-64/debug-mochitest-browser-chrome-e10s-2, test-linux1804-64/debug-mochitest-browser-chrome-fis-e10s-1 and test-macosx1014-64/debug-mochitest-browser-chrome-e10s-2. We currently run manifests that have a high probability of failure (in this case it was selected with 99% confidence) on three separate platforms.

Not sure if you knew about this feature, so I'll mention it: on Treeherder, click on the small filter icon (on the left of the number "1").
Two new fields will appear, close to "Active Filters". Press on the select button that says "select filter field" and select "test path", then write the test path in the field that says "enter field value", then press "add". You'll see all tasks that run the test.

(In reply to Marco Castelluccio [:marco] from comment #1)

In the future, could you file separate bugs for the separate problems you encounter? Since they are separate problems, having separate bugs makes it easier to have separate discussions and avoid losing track.

Yeah, sorry, guilty as charged.

(In reply to :Gijs (he/him) from comment #0)

The xpcshell runs were not necessary, and neither were the android runs. It could probably do without running arm or asan, too (ie it ran too much).

On the flip side, the tests did not run on macOS or Windows opt/shippable. On all the Windows debug platforms, it ran 0 or just one mochitest-bc job, but that job ran a bunch of unrelated tests, and didn't run the affected tests... (from looking at https://firefoxci.taskcluster-artifacts.net/bSvFLQTyRzKLRlMnviB1Jw/0/public/logs/live_backing.log ). So it also didn't run enough tests...

The platform selection is still suboptimal, that's bug 1639164.

The test you modified was run in test-linux1804-64/debug-mochitest-browser-chrome-e10s-2, test-linux1804-64/debug-mochitest-browser-chrome-fis-e10s-1 and test-macosx1014-64/debug-mochitest-browser-chrome-e10s-2. We currently run manifests that have a high probability of failure (in this case it was selected with 99% confidence) on three separate platforms.

I'm going to resolve this as duplicate of bug 1639164, since AFAICS most of the problems you described are related to that.

So I'm a bit confused. Is it expected that we built on Windows and then basically ran a bunch of unrelated tests, but not the tests that are actually modified in the commit? It would seem to me that at a minimum, if test files themselves are modified in the commit, those tests should be run on all applicable platforms... Is there already a separate bug on file about this?

Flags: needinfo?(mcastelluccio)

(In reply to :Gijs (he/him) from comment #3)

(In reply to Marco Castelluccio [:marco] from comment #1)

In the future, could you file separate bugs for the separate problems you encounter? Since they are separate problems, having separate bugs makes it easier to have separate discussions and avoid losing track.

Yeah, sorry, guilty as charged.

(In reply to :Gijs (he/him) from comment #0)

The xpcshell runs were not necessary, and neither were the android runs. It could probably do without running arm or asan, too (ie it ran too much).

On the flip side, the tests did not run on macOS or Windows opt/shippable. On all the Windows debug platforms, it ran 0 or just one mochitest-bc job, but that job ran a bunch of unrelated tests, and didn't run the affected tests... (from looking at https://firefoxci.taskcluster-artifacts.net/bSvFLQTyRzKLRlMnviB1Jw/0/public/logs/live_backing.log ). So it also didn't run enough tests...

The platform selection is still suboptimal, that's bug 1639164.

The test you modified was run in test-linux1804-64/debug-mochitest-browser-chrome-e10s-2, test-linux1804-64/debug-mochitest-browser-chrome-fis-e10s-1 and test-macosx1014-64/debug-mochitest-browser-chrome-e10s-2. We currently run manifests that have a high probability of failure (in this case it was selected with 99% confidence) on three separate platforms.

I'm going to resolve this as duplicate of bug 1639164, since AFAICS most of the problems you described are related to that.

So I'm a bit confused. Is it expected that we built on Windows and then basically ran a bunch of unrelated tests, but not the tests that are actually modified in the commit?

The ML tool is not perfect. In order to be able to catch a given percentage of regressions, it has to select some false positive tests.
It doesn't only take into account what specific files you are modifying, but also which directories, which components, and so on, and, to simplify, you can consider that it builds some heuristic rules based on those characteristics.
If it only considered files, it'd probably have fewer false positives, but it wouldn't be able to catch as many regressions.

Currently, each selected test is then spread across multiple configurations according to the probability of failure. The test you modified was spread across the three configurations I listed above, some other tests (false positives) were spread across other configurations.

It would seem to me that at a minimum, if test files themselves are modified in the commit, those tests should be run on all applicable platforms... Is there already a separate bug on file about this?

There is not, but I'm not sure we want that. There are a lot of different configurations (double digits), if we have reason to believe the test is platform-independent it's not worth to run it on all configurations.

Flags: needinfo?(mcastelluccio)

(In reply to Marco Castelluccio [:marco] from comment #4)

The ML tool is not perfect. In order to be able to catch a given percentage of regressions, it has to select some false positive tests.

That said, I think we can increase the current threshold, since according to the data we have we would schedule way less and lose only a bit in terms of accuracy.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: