Closed Bug 1399401 Opened 7 years ago Closed 6 years ago

Upgrade all win7/win10 gecko workers to generic-worker 10.7.8

Categories

(Taskcluster :: Services, enhancement)

Product:

Component:

Type:

enhancement

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla61

People

(Reporter: pmoore, Assigned: pmoore)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Screenshot gecko-t-win7-32-gpu-b showing drive mounted as E: instead of Y: 7 years ago Pete Moore [:pmoore][:pete] (deleted), image/png		Details
Escape backslashes in GCLI screenshot test 7 years ago J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow) (deleted), patch	pmoore : review+	Details \| Diff \| Splinter Review
Github Pull Request for OpenCloudConfig 7 years ago Pete Moore [:pmoore][:pete] (deleted), text/x-github-pull-request	grenade : review+	Details
gecko patch: enable coalescing on win7/win10 worker types 6 years ago Pete Moore [:pmoore][:pete] (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review

Pete Moore [:pmoore][:pete]

Assignee

Description

•

7 years ago

These worker types currently run outdated versions of generic-worker: > gecko-t-win7-32: generic-worker 8.2.0 > gecko-t-win7-32-gpu: generic-worker 8.2.0 > gecko-t-win10-64: generic-worker 8.3.0 We should upgrade these worker type to 10.2.2 for the following benefits: * all tasks user-sandboxed (dedicated user for each task, which is deleted after the task completes) * more secure (no access to secrets on machines) * more realistic environment (winlogon session belonging to a regular user, with full dedicated desktop environment) * tasks cannot (intentionally or accidentally) interfere with each other * latest features available * includes several bug fixes and logging/monitoring improvements * avoid needing to maintain two different branches of generic-worker Currently when upgrading we hit these failures: > Windows 7 opt: > tc-M-e10s(5 bc1) > > Windows 7 debug: > tc-M-e10s(5 bc5) tc-M(5 bc1 bc7) > > windows7-32-stylo-disabled opt: > tc-M-e10s(5 bc2) > > windows7-32-stylo-disabled debug: > tc-M-e10s(5 bc2) > Windows 10 x64 opt: > tc-X(X) > > Windows 10 x64 debug: > tc-X(X) tc-M-e10s(5) > > windows10-64-stylo-disabled opt: > tc-M-e10s(5) > > windows10-64-stylo-disabled debug: > tc-M-e10s(5) Fixing these failures will allow us roll out new worker features to these worker types. This bug originates from https://bugzilla.mozilla.org/show_bug.cgi?id=1382204#c59

Pete Moore [:pmoore][:pete]

Assignee

Comment 1

•

7 years ago

Add this commit to your try push to switch to generic-worker 10.2.2: * https://hg.mozilla.org/try/raw-rev/835faadcf252b3b019476c713ab5459ccc6af951

Pete Moore [:pmoore][:pete]

Assignee

Comment 3

•

7 years ago

Hi Joel, Is this something you can help me with? Many thanks, Pete

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

7 years ago

:pmoore, you can followup with :mattn for the alert/dialog/notification failures and :rstrong for the xpcshell failures related to installation/updating. I could help after the migration, but as it stands many on our team are at full capacity for the rest of the month.

Flags: needinfo?(jmaher)

Pete Moore [:pmoore][:pete]

Assignee

Comment 5

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #4) > :pmoore, you can followup with :mattn for the alert/dialog/notification > failures and :rstrong for the xpcshell failures related to > installation/updating. > > I could help after the migration, but as it stands many on our team are at > full capacity for the rest of the month. Matt, could you help me diagnose these alert/dialog/notification failures? Robert, could you help me diagnose the xpcshell failures? Many thanks guys! Pete

Flags: needinfo?(robert.strong.bugs)

Flags: needinfo?(MattN+bmo)

Pete Moore [:pmoore][:pete]

Assignee

Comment 6

•

7 years ago

For convenience I've made a new try push from latest mozilla-central revision: * https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa0ec3f9f9c876dc0ec7b8d8237d86f206dcbb51 I just did this by applying the patch from comment 1 against the latest mozilla-central revision (893fe1549e1e).

Matthew N. [:MattN]

Updated

•

7 years ago

Depends on: 1398491

Pete Moore [:pmoore][:pete]

Assignee

Comment 7

•

7 years ago

I spoke to Robert over IRC and he kindly pointed me to https://bugzilla.mozilla.org/show_bug.cgi?id=1067756#c21 That would explain it, because after the worker upgrade, task users would not have write access to these directories.

Flags: needinfo?(robert.strong.bugs)

Pete Moore [:pmoore][:pete]

Assignee

Comment 8

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #7) > I spoke to Robert over IRC and he kindly pointed me to > https://bugzilla.mozilla.org/show_bug.cgi?id=1067756#c21 > > That would explain it, because after the worker upgrade, task users would > not have write access to these directories. Rebuilding win7/win10 beta worker types in OpenCloudConfig: * https://tools.taskcluster.net/groups/L6NZFeUNSlabxdmqJSgcPw Thanks Robert!

Pete Moore [:pmoore][:pete]

Assignee

Comment 9

•

7 years ago

https://github.com/mozilla-releng/OpenCloudConfig/commit/b20fe867f48dde8dd040b339eef21047ca5b728d

Pete Moore [:pmoore][:pete]

Assignee

Comment 10

•

7 years ago

rstrong has also just highlighted to me that the privilege granted in bug 1353889 for the GenericWorker account will need to be granted to the task users e.g. for gecko-t-win10-64-beta that would be: https://github.com/mozilla-releng/OpenCloudConfig/blob/b20fe867f48dde8dd040b339eef21047ca5b728d/userdata/Manifest/gecko-t-win10-64-beta.json#L1147

Pete Moore [:pmoore][:pete]

Assignee

Comment 11

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #6) > For convenience I've made a new try push from latest mozilla-central > revision: > > * > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=aa0ec3f9f9c876dc0ec7b8d8237d86f206dcbb51 > > I just did this by applying the patch from comment 1 against the latest > mozilla-central revision (893fe1549e1e). gecko-1-b-win2012-beta was broken, hopefully fixed now with: https://github.com/mozilla-releng/OpenCloudConfig/commit/bae0c1c78410197cb64a2e9e122b37e6e515255e When the rollout of the new AMIs complete, we should be able to retrigger those broken builds. AMI rollouts are happening in: * https://tools.taskcluster.net/groups/bv4SHLSIT3eEsKUbriOoVw/tasks/LjPipx7TQTKl9IuMaACpZw/runs/0

Pete Moore [:pmoore][:pete]

Assignee

Comment 12

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #10) > rstrong has also just highlighted to me that the privilege granted in bug > 1353889 for the GenericWorker account will need to be granted to the task > users > > e.g. for gecko-t-win10-64-beta that would be: > https://github.com/mozilla-releng/OpenCloudConfig/blob/ > b20fe867f48dde8dd040b339eef21047ca5b728d/userdata/Manifest/gecko-t-win10-64- > beta.json#L1147 Applied to beta worker types: * https://github.com/mozilla-releng/OpenCloudConfig/commit/02a45d1e675b25f95bf764b095fa037181d5aa2b

Pete Moore [:pmoore][:pete]

Assignee

Comment 13

•

7 years ago

Rollout: * https://tools.taskcluster.net/groups/APKkrkSaRXG8pbg4izPsFA

Pete Moore [:pmoore][:pete]

Assignee

Comment 14

•

7 years ago

New push with fixes: * https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdd9ec4a5222cf3a3b82db8d155015e655fbc986

Pete Moore [:pmoore][:pete]

Assignee

Comment 15

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #14) > New push with fixes: > > * > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=bdd9ec4a5222cf3a3b82db8d155015e655fbc986 This hasn't fixed the tc-X(X) tests yet, I'm now checking the change to make the 'Mozilla Maintenance Service' folder writable to task users worked with this test task: * https://tools.taskcluster.net/groups/OO_g0scBShmyWdRDzYQ7mg/tasks/OO_g0scBShmyWdRDzYQ7mg/details

Pete Moore [:pmoore][:pete]

Assignee

Comment 16

•

7 years ago

Backslash escaping syntax mistake in previous task, new task: * https://tools.taskcluster.net/groups/Z_FR3TFAQ-axtTqdVOV93w/tasks/Z_FR3TFAQ-axtTqdVOV93w/details

Pete Moore [:pmoore][:pete]

Assignee

Comment 17

•

7 years ago

Updated task: * https://tools.taskcluster.net/groups/DlCMdhvqS5us3EIgcm6Fiw/tasks/DlCMdhvqS5us3EIgcm6Fiw/runs/0/logs/public%2Flogs%2Flive.log Looks like my change[1] didn't work for some reason: Z:\task_1505581330>echo hellooo 1>"C:\Program Files (x86)\Mozilla Maintenance Service\hello.txt" Access is denied. [taskcluster 2017-09-16T17:05:27.754Z] Exit Code: 1 --- [1] https://github.com/mozilla-releng/OpenCloudConfig/commit/b20fe867f48dde8dd040b339eef21047ca5b728d

Pete Moore [:pmoore][:pete]

Assignee

Comment 18

•

7 years ago

I still haven't got around to further investigating the issue in comment 17, and I'll be out for a few days now. Rob, if you get the time to look into this, it would be awesome, otherwise I can have a look when I'm back next week. Basically, I made a change to a manifest in OCC so that a directory is read/writable to Everyone, rolled everything out, but I can't write to that directory in a task. All the links are in comment 17. Like I say, I can also take a look when I'm back next week. Thanks guys!

Flags: needinfo?(rthijssen)

Robert Strong (they/them - no direct email)

Comment 19

•

7 years ago

iirc the build system Windows images have the maintenance service installed. Do these systems?

Rob Thijssen [:grenade (EET/UTC+0300)]

Comment 20

•

7 years ago

my guess is that the command isn't succeeding because of missing quotes around the arg with spaces in it. eg this line (https://github.com/mozilla-releng/OpenCloudConfig/commit/b20fe867f48dde8dd040b339eef21047ca5b728d#diff-87907bc6a1f2a26aacbddd5425eea212R977): reads: "C:\\Program Files (x86)\\Mozilla Maintenance Service", but should read: "\"C:\\Program Files (x86)\\Mozilla Maintenance Service\"",

Flags: needinfo?(rthijssen)

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1370877

Pete Moore [:pmoore][:pete]

Assignee

Comment 21

•

7 years ago

(In reply to Rob Thijssen (:grenade - UTC+3) from comment #20) > my guess is that the command isn't succeeding because of missing quotes > around the arg with spaces in it. > > eg this line > (https://github.com/mozilla-releng/OpenCloudConfig/commit/ > b20fe867f48dde8dd040b339eef21047ca5b728d#diff- > 87907bc6a1f2a26aacbddd5425eea212R977): > > reads: > "C:\\Program Files (x86)\\Mozilla Maintenance Service", > > but should read: > "\"C:\\Program Files (x86)\\Mozilla Maintenance Service\"", Thanks Rob! I'm trying a push with this now, to see if it helps. Thanks for looking. :) https://github.com/mozilla-releng/OpenCloudConfig/commit/3357d441046aad0442f2032aed6d7cc78bf996c1

Pete Moore [:pmoore][:pete]

Assignee

Comment 22

•

7 years ago

\o/ it worked https://tools.taskcluster.net/groups/OYQXXYwcS02jjJ8CkLPIXg/tasks/OYQXXYwcS02jjJ8CkLPIXg/runs/0/logs/public%2Flogs%2Flive.log

Pete Moore [:pmoore][:pete]

Assignee

Comment 23

•

7 years ago

Retrying failed xpcshell tasks: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdd9ec4a5222cf3a3b82db8d155015e655fbc986&filter-searchStr=xpcshell&duplicate_jobs=visible

Pete Moore [:pmoore][:pete]

Assignee

Comment 24

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #23) > Retrying failed xpcshell tasks: > > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=bdd9ec4a5222cf3a3b82db8d155015e655fbc986&filter- > searchStr=xpcshell&duplicate_jobs=visible That fixed xpcshell! Now just the mochitests left.

Joel Maher ( :jmaher ) (UTC -8)

Comment 25

•

7 years ago

great stuff

Pete Moore [:pmoore][:pete]

Assignee

Comment 26

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #25) > great stuff All credit goes to rstrong and grenade for that one :-) New try push against latest mozilla central head revision: * https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3e3fd07da461fc449a0038c51b69db7392a2e2f&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded

Pete Moore [:pmoore][:pete]

Assignee

Comment 27

•

7 years ago

Mozilla central push this try push is based on: * https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=756e10aa8bbd416cbc49b7739f78fb81d5525477&filter-searchStr=windows

Pete Moore [:pmoore][:pete]

Assignee

Comment 28

•

7 years ago

Problem due to bug 1398748 - worked around by using C: drive for downloads and caches, and made new try push: Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cf5af74253f36b7a7dfc31f59f524a691cf6483&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows Based on mozilla-central push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=e1f660fc493bc4bf2f91b6df94bc98e8e3840c42&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows

Pete Moore [:pmoore][:pete]

Assignee

Comment 29

•

7 years ago

Hey Matthew, Would you be able to have a look at the latest try push, and let me know if there is anything new there that isn't already being managed in a different bug? https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cf5af74253f36b7a7dfc31f59f524a691cf6483&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable Many thanks! Pete

Joel Maher ( :jmaher ) (UTC -8)

Comment 30

•

7 years ago

there are a lot of failures which all seem to be related to popups/notifications/other windows. For example, bc1 already runs on taskcluster VM and passes as tier-1, but it is failing with this change. I verified in the log that we are running: 10:54:42 INFO - 17 INFO TEST-START | browser/base/content/test/alerts/browser_notification_do_not_disturb.js 10:54:45 INFO - GECKO(2968) | MEMORY STAT | vsize 1742MB | vsizeMaxContiguous 131597346MB | residentFast 260MB | heapAllocated 108MB 10:54:45 INFO - 18 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_do_not_disturb.js | took 3117ms 10:54:45 INFO - 19 INFO checking window state 10:54:45 INFO - 20 INFO TEST-START | browser/base/content/test/alerts/browser_notification_open_settings.js 10:54:47 INFO - GECKO(2968) | MEMORY STAT | vsize 1791MB | vsizeMaxContiguous 131597346MB | residentFast 311MB | heapAllocated 140MB 10:54:47 INFO - 21 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_open_settings.js | took 2163ms 10:54:47 INFO - 22 INFO checking window state 10:54:47 INFO - 23 INFO TEST-START | browser/base/content/test/alerts/browser_notification_remove_permission.js 10:54:48 INFO - GECKO(2968) | MEMORY STAT | vsize 1792MB | vsizeMaxContiguous 131597346MB | residentFast 312MB | heapAllocated 142MB 10:54:48 INFO - 24 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_remove_permission.js | took 785ms 10:54:48 INFO - 25 INFO checking window state 10:54:48 INFO - 26 INFO TEST-START | browser/base/content/test/alerts/browser_notification_replace.js 10:54:49 INFO - GECKO(2968) | MEMORY STAT | vsize 1792MB | vsizeMaxContiguous 131597346MB | residentFast 296MB | heapAllocated 117MB 10:54:49 INFO - 27 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_replace.js | took 462ms 10:54:49 INFO - 28 INFO checking window state 10:54:49 INFO - 29 INFO TEST-START | browser/base/content/test/alerts/browser_notification_tab_switching.js 10:54:49 INFO - GECKO(2968) | MEMORY STAT | vsize 1783MB | vsizeMaxContiguous 131597346MB | residentFast 278MB | heapAllocated 104MB 10:54:49 INFO - 30 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_tab_switching.js | took 759ms but on your try push I see failures like this: 07:58:53 INFO - 2 INFO TEST-START | browser/base/content/test/alerts/browser_notification_open_settings.js 07:59:38 INFO - TEST-INFO | started process screenshot 07:59:38 INFO - TEST-INFO | screenshot: exit 0 07:59:38 INFO - Buffered messages logged at 07:58:53 07:59:38 INFO - 3 INFO Entering test bound test_settingsOpen_observer 07:59:38 INFO - 4 INFO Opening a dummy tab so openPreferences=>switchToTabHavingURI doesn't use the blank tab. 07:59:38 INFO - 5 INFO Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 174}] 07:59:38 INFO - 6 INFO simulate a notifications-open-settings notification 07:59:38 INFO - 7 INFO TEST-PASS | browser/base/content/test/alerts/browser_notification_open_settings.js | The notification settings tab opened - 07:59:38 INFO - Buffered messages logged at 07:58:54 07:59:38 INFO - 8 INFO Leaving test bound test_settingsOpen_observer 07:59:38 INFO - 9 INFO Entering test bound test_settingsOpen_button 07:59:38 INFO - 10 INFO Adding notification permission 07:59:38 INFO - 11 INFO Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 174}] 07:59:38 INFO - 12 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element ‘-moz-tree-line’. Ruleset ignored due to bad selector." {file: "chrome://global/content/xul.css" line: 654}] 07:59:38 INFO - 13 INFO Waiting for notification 07:59:38 INFO - Buffered messages finished 07:59:38 ERROR - 14 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/alerts/browser_notification_open_settings.js | Test timed out - 07:59:38 INFO - GECKO(3176) | MEMORY STAT | vsize 685MB | vsizeMaxContiguous 804MB | residentFast 195MB | heapAllocated 63MB 07:59:38 INFO - 15 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_open_settings.js | took 45078ms 07:59:38 INFO - Not taking screenshot here: see the one that was previously logged 07:59:38 ERROR - 16 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/alerts/browser_notification_open_settings.js | Found a tab after previous test timed out: http://example.org/browser/browser/base/content/test/alerts/file_dom_notifications.html - 07:59:38 INFO - 17 INFO checking window state 07:59:38 INFO - 18 INFO TEST-START | browser/base/content/test/alerts/browser_notification_remove_permission.js 08:00:23 INFO - Not taking screenshot here: see the one that was previously logged 08:00:23 INFO - Buffered messages logged at 07:59:38 08:00:23 INFO - 19 INFO Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 174}] 08:00:23 INFO - Buffered messages finished 08:00:23 ERROR - 20 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/alerts/browser_notification_remove_permission.js | Test timed out - 08:00:23 INFO - GECKO(3176) | MEMORY STAT | vsize 684MB | vsizeMaxContiguous 804MB | residentFast 195MB | heapAllocated 65MB 08:00:23 INFO - 21 INFO TEST-OK | browser/base/content/test/alerts/browser_notification_remove_permission.js | took 45328ms given that, it looks like there is an issue with focus with the worker changes you are making :pete. Can you run the test on a loaner before/after you changes and watch what is going on? I suspect the answer might be obvious.

Flags: needinfo?(MattN+bmo)

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1403490

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

No longer blocks: 1403490

Dustin J. Mitchell [:dustin] (he/him)

Comment 31

•

7 years ago

There was some investigation of those notification-related things in bug 1364517.

Matthew N. [:MattN]

Comment 32

•

7 years ago

I'm hoping that attachment 8916804 [details] will fix any test failures related to browser/base/content/test/alerts/*. The issue being fixed is a race condition not specific to Windows 10 though so it may not be sufficient.

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1352791

Matthew N. [:MattN]

Comment 33

•

7 years ago

> Depends on: 1352791 FYI, even though that bug is still open, I landed a fix there for Windows 10. Is that bug still blocking this landing? Does this change make the Win7 failures worse?

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1373551

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Summary: Upgrade all win7/win10 gecko workers to generic-worker 10.2.2 or later → Upgrade all win7/win10 gecko workers to generic-worker 10.2.3

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1394557

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1343049

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

7 years ago

Pete, what are the next steps on this bug?

Pete Moore [:pmoore][:pete]

Assignee

Comment 35

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #34) > Pete, what are the next steps on this bug? I've just landed https://github.com/mozilla-releng/OpenCloudConfig/pull/111 which updates all of our beta worker types to be identical to our production worker types, except for generic-worker version and configuration. I've also rigorously updated our worker type definitions in the aws provisioner to make sure the beta worker types also match the production versions. When the OpenCloudConfig changes have propagated to AWS, I'll trigger a new try job using the beta worker types to see what issues remain. This is much like I did in previous comments - just refreshing to latest versions, and then triggering a new try push. I suspect my try push will have to wait for tomorrow as it takes a couple of hours for all the changes to propagate, but hopefully by this time tomorrow we should have a new completed try push that we can evaluate.

Joel Maher ( :jmaher ) (UTC -8)

Comment 36

•

7 years ago

please do a --rebuild 20 on your try push, that will help get data on failure rates.

Pete Moore [:pmoore][:pete]

Assignee

Comment 37

•

7 years ago

So I've been having some problem getting the last beta worker type updated - gecko-t-win10-64-gpu-b - I'm going to have another try now - all my deploys until now have been failing due to either not being able to get instances, or the instances I had losing network connectivity (so it isn't possible to see what is going wrong). This could just be a case of bug 1372172 hitting us during AMI creation. See e.g. last failed OCC task just now: https://tools.taskcluster.net/groups/R4iy9VIJSB6tG1f4UwVgmA/tasks/DTdqlTWNRC-naG3mtCgitg/runs/0/logs/public%2Flogs%2Flive.log That links to a papertrail log, that stops outputting for 85 minutes between 14:50:01 and 16:25:05 CET: https://papertrailapp.com/groups/2488493/events?q=i-09768a467658be83e Dec 01 14:48:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running. Dec 01 14:48:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is running. Dec 01 14:48:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: instance appears to be initialising. Dec 01 14:50:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running. Dec 01 14:50:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is running. Dec 01 14:50:01 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com HaltOnIdle: instance appears to be initialising. Dec 01 16:25:05 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com Microsoft-Windows-GroupPolicy: Shutdown script failed. GPO Name : Local Group Policy GPO File System Path : C:\Windows\System32\GroupPolicy\Machine Script Name: C:\scripts\set_user_data.ps1 Dec 01 16:25:05 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com Microsoft-Windows-Kernel-PnP: The driver \Driver\WudfRd failed to load for the device SWD\WPDBUSENUM\{70ffd6cb-3efa-11e7-9146-806e6f6e6963}#0000000000100000. Dec 01 16:25:05 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com Service_Control_Manager: The CldFlt service failed to start due to the following error: The request is not supported. Dec 01 16:25:05 i-09768a467658be83e.gecko-t-win10-64-gpu-b.usw2.mozilla.com OpenCloudConfig: Windows update service is running The above log extract indicates a problem, since HaltOnIdle is scheduled to run every 2 minutes, which it does up until 14:50, and then for 85 minutes we have no logging until we see the machine is rebooted. This suggests the worker is still running, but either loses network connectivity or the papertrail integration breaks. I did not reboot the machine from the outside, so I presume it rebooted as part of the environment preparation it was internally performing. The observant log reader may also notice a repeated message earlier in the log: "An error occurred (InstanceLimitExceeded) when calling the RunInstances operation: You have requested more instances (2) than your current instance limit of 1 allows for the specified instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit." This occurred multiple times in the log before the above extract, because there was another g3.4xlarge instance running in us-west-2, when we have a limit of 1. This other running instance was probably a runaway instance from a previously timed-out OCC task for a previous push. By terminating that rogue instance, I was able to get the task to continue. The price of this though, was a delay in time that ate into the maxRunTime of the task. So maybe the task might have been successful if it had been able to spawn a g3.4xlarge instance in us-west-2 immediately, rather than needing to wait for someone to terminate the running one. For future reference, such a rogue instance can be seen under: https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#Instances:search=g3.4xlarge;sort=instanceId I've now ensured that there are no g3.4xlarge instance running in us-west-2, and made a new OCC push to try to rebuild gecko-t-win10-64-gpu-b again: https://tools.taskcluster.net/groups/GMo_mDgUSZqubrB74-7rrA/tasks/Kz1LVo5fS5ak5U1nWjR0MQ/details I may not be around when this completes, but if it does complete successfully, I have prepared a try patch here: https://bugzilla.mozilla.org/show_bug.cgi?id=1400012#c9 that makes it trivial to run a try push against the beta worker types - so if anyone wants to make a try push once the above task completes successfully, this patch should do the trick.

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1422870

Pete Moore [:pmoore][:pete]

Assignee

Comment 38

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #37) > So I've been having some problem getting the last beta worker type updated - > gecko-t-win10-64-gpu-b - I'm going to have another try now - all my deploys > until now have been failing due to either not being able to get instances, > or the instances I had losing network connectivity (so it isn't possible to > see what is going wrong). All problems updating gecko-t-win10-64-gpu-b have been solved now, so I have made a new try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=67581af6162e8c0dfaaa726c3fda298ef576a846&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable Let's see how that goes! :-)

Pete Moore [:pmoore][:pete]

Assignee

Comment 39

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #38) > (In reply to Pete Moore [:pmoore][:pete] from comment #37) > > So I've been having some problem getting the last beta worker type updated - > > gecko-t-win10-64-gpu-b - I'm going to have another try now - all my deploys > > until now have been failing due to either not being able to get instances, > > or the instances I had losing network connectivity (so it isn't possible to > > see what is going wrong). > > All problems updating gecko-t-win10-64-gpu-b have been solved now, so I have > made a new try push here: > > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=67581af6162e8c0dfaaa726c3fda298ef576a846&filter- > tier=1&filter-tier=2&filter- > tier=3&duplicate_jobs=visible&group_state=expanded&filter- > searchStr=windows&filter-resultStatus=testfailed&filter- > resultStatus=busted&filter-resultStatus=exception&filter- > resultStatus=runnable > > Let's see how that goes! :-) Some problems with Y: drive not getting mounted on some win 7 gpu jobs - but other than that, I think worth taking a look at already.

Flags: needinfo?(jmaher)

Pete Moore [:pmoore][:pete]

Assignee

Comment 40

•

7 years ago

Rob any idea what the Y: drive mounting problem on gecko-t-win7-32-gpu-b might be related to? Thanks!

Flags: needinfo?(rthijssen)

Pete Moore [:pmoore][:pete]

Assignee

Comment 41

•

7 years ago

Attached image Screenshot gecko-t-win7-32-gpu-b showing drive mounted as E: instead of Y: (deleted) — Details

From this screenshot from a problematic worker[1] we see that the Y: drive got mounted as E: instead of Y: -- [1] https://tools.taskcluster.net/provisioners/aws-provisioner-v1/worker-types/gecko-t-win7-32-gpu-b/workers/us-east-1/i-03115ac894967073d

Assignee: nobody → pmoore

Status: NEW → ASSIGNED

Pete Moore [:pmoore][:pete]

Assignee

Comment 42

•

7 years ago

I spotted drive letters can be mapped in DriveLetterConfig.xml[1]. I think at the moment we are doing mounting drives in rundsc.ps1[2]. See EC2 docs[3] for details on the DriveLetterConfig.xml file. If using DriveLetterConfig.xml works, that could that be an alternative solution than mounting in rundsc.ps1? I checked one of our instances, and saw the file exists, but contains no mappings at the moment: Z:\task_1512497232>type "C:\Program Files\Amazon\Ec2ConfigService\Settings\DriveLetterConfig.xml" <?xml version="1.0" standalone="yes"?> <DriveLetterMapping> </DriveLetterMapping> -- [1] C:\Program Files\Amazon\Ec2ConfigService\Settings\DriveLetterConfig.xml [2] https://github.com/mozilla-releng/OpenCloudConfig/blob/cebf4fc5888510550a09f1ccdcf0d4001d7c32ec/userdata/rundsc.ps1#L306-L367 [3] http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_WinAMI.html#UsingConfigInterface_WinAMI

Joel Maher ( :jmaher ) (UTC -8)

Comment 43

•

7 years ago

this is failing to get rid of xendpriv.exe (look at the logs for 'cl'). This is a hack I put in the allow clipboard to run on the machines- we cannot remove the file due to access denied, I suspect you need to remove this in the setup, or fix the taskcluster worker to allow access to that file. outside of this, many tests are failing for prompt/notification/multi-window- that is a common theme from earlier when looking at taskcluster workers for windows in the past.

Flags: needinfo?(jmaher)

Rob Thijssen [:grenade (EET/UTC+0300)]

Comment 44

•

7 years ago

most likely explanation is that gw started running before occ was able to assign the correct drive letter. we see in the logs that this is often the case on windows 2012 where we use newer gw. since the worker type experiencing the incorrect drive mappings is also running a newer gw, i suspect this is also the case here. when occ detects that gw has started before occ, it simply terminates itself (as a workaroud to other issues experienced earlier) since we can't have both running. note that using ec2 to assign the drive lettere in DriveLetterConfig.xml will not be 100% effective as gw also doesn't wait for ec2config to complete before it starts up. imo the best fix is to add a check inside gw to wait until occ has set the ready state flag before attempting to run tasks. there's simply nothing we can do in occ to get the drive mappings correct if gw starts before occ has run.

Flags: needinfo?(rthijssen)

Pete Moore [:pmoore][:pete]

Assignee

Comment 45

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #43) > this is failing to get rid of xendpriv.exe (look at the logs for 'cl'). > This is a hack I put in the allow clipboard to run on the machines- we > cannot remove the file due to access denied, I suspect you need to remove > this in the setup, or fix the taskcluster worker to allow access to that > file. > > outside of this, many tests are failing for > prompt/notification/multi-window- that is a common theme from earlier when > looking at taskcluster workers for windows in the past. Indeed - it looks like this file is included on the workers. Rob, do you know why this file is there, and where it comes from? Should this whole "C:\Program Files\Citrix\XenTools" directory be there at all? ----- [taskcluster 2017-12-06T11:26:35.251Z] Worker Type (gecko-t-win7-32-beta) settings: [taskcluster 2017-12-06T11:26:35.251Z] { [taskcluster 2017-12-06T11:26:35.251Z] "aws": { [taskcluster 2017-12-06T11:26:35.251Z] "ami-id": "ami-18038d77", [taskcluster 2017-12-06T11:26:35.251Z] "availability-zone": "eu-central-1c", [taskcluster 2017-12-06T11:26:35.251Z] "instance-id": "i-00f0b6a2a91125f87", [taskcluster 2017-12-06T11:26:35.251Z] "instance-type": "c4.2xlarge", [taskcluster 2017-12-06T11:26:35.251Z] "local-ipv4": "10.147.50.97", [taskcluster 2017-12-06T11:26:35.251Z] "public-hostname": "ec2-18-194-58-53.eu-central-1.compute.amazonaws.com", [taskcluster 2017-12-06T11:26:35.251Z] "public-ipv4": "18.194.58.53" [taskcluster 2017-12-06T11:26:35.251Z] }, [taskcluster 2017-12-06T11:26:35.251Z] "config": { [taskcluster 2017-12-06T11:26:35.251Z] "deploymentId": "bec3aef21ffa", [taskcluster 2017-12-06T11:26:35.251Z] "runTasksAsCurrentUser": false [taskcluster 2017-12-06T11:26:35.251Z] }, [taskcluster 2017-12-06T11:26:35.251Z] "generic-worker": { [taskcluster 2017-12-06T11:26:35.251Z] "go-arch": "386", [taskcluster 2017-12-06T11:26:35.251Z] "go-os": "windows", [taskcluster 2017-12-06T11:26:35.251Z] "go-version": "go1.9", [taskcluster 2017-12-06T11:26:35.251Z] "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.3.1", [taskcluster 2017-12-06T11:26:35.251Z] "revision": "bc1ecb9aa266105bf8a936fa451bff4e2a35843e", [taskcluster 2017-12-06T11:26:35.251Z] "source": "https://github.com/taskcluster/generic-worker/tree/bc1ecb9aa266105bf8a936fa451bff4e2a35843e", [taskcluster 2017-12-06T11:26:35.251Z] "version": "10.3.1" [taskcluster 2017-12-06T11:26:35.251Z] }, [taskcluster 2017-12-06T11:26:35.251Z] "machine-setup": { [taskcluster 2017-12-06T11:26:35.251Z] "ami-created": "2017-12-05 14:36:21.569Z", [taskcluster 2017-12-06T11:26:35.251Z] "manifest": "https://github.com/mozilla-releng/OpenCloudConfig/blob/bec3aef21ffac1363747d6d5dc49079be1b61d1c/userdata/Manifest/gecko-t-win7-32-beta.json" [taskcluster 2017-12-06T11:26:35.251Z] } [taskcluster 2017-12-06T11:26:35.251Z] } [taskcluster 2017-12-06T11:26:35.251Z] Task ID: a_J-nNAvT2S_g1HbHQnypg [taskcluster 2017-12-06T11:26:35.251Z] === Task Starting === [taskcluster 2017-12-06T11:26:36.299Z] Uploading redirect artifact public/logs/live.log to URL https://clbduniaaaawak4uxpi4qn4c3mgrkwj5uxhp3xkefmvo3mhn.taskcluster-worker.net:60023/log/TorEb--jSeqsgkgKBjqzPw with mime type "text/plain; charset=utf-8" and expiry 2017-12-06T11:27:35.889Z [taskcluster 2017-12-06T11:26:36.738Z] Executing command 0: dir "C:\Program Files\Citrix\XenTools\XenDPriv.exe" Z:\task_1512559551>dir "C:\Program Files\Citrix\XenTools\XenDPriv.exe" Volume in drive C is OSDisk Volume Serial Number is FC62-2D8F Directory of C:\Program Files\Citrix\XenTools 04/08/2014 04:07 PM 12,288 XenDPriv.exe 1 File(s) 12,288 bytes 0 Dir(s) 11,514,023,936 bytes free [taskcluster 2017-12-06T11:26:36.780Z] Exit Code: 0 [taskcluster 2017-12-06T11:26:36.780Z] Success Code: 0x0 [taskcluster 2017-12-06T11:26:36.780Z] User Time: 15.6001ms [taskcluster 2017-12-06T11:26:36.780Z] Kernel Time: 0s [taskcluster 2017-12-06T11:26:36.780Z] Wall Time: 30ms [taskcluster 2017-12-06T11:26:36.780Z] Peak Memory: 2273280 [taskcluster 2017-12-06T11:26:36.780Z] Result: SUCCEEDED [taskcluster 2017-12-06T11:26:36.780Z] === Task Finished === [taskcluster 2017-12-06T11:26:36.780Z] Task Duration: 42ms

Joel Maher ( :jmaher ) (UTC -8)

Comment 46

•

7 years ago

Pete, that is related to the xen vm toolchain that amazon uses for its workers. We need some of the Xen tools, but not that specific file which luckily works for fixing our clipboard problems.

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1394757

Pete Moore [:pmoore][:pete]

Assignee

Comment 47

•

7 years ago

I added the following to remove it from the golden AMIs (see https://github.com/mozilla-releng/OpenCloudConfig/commit/17db37e19674751ff1baacb9da438f494a148663): + { + "ComponentName": "DeleteXenDPriv.exe", + "ComponentType": "CommandRun", + "Comment": "See https://bugzilla.mozilla.org/show_bug.cgi?id=1399401#c43 and https://bugzilla.mozilla.org/show_bug.cgi?id=1394757", + "Command": "cmd.exe", + "Arguments": [ + "/c", + "del", + "/f", + "/q", + "\"C:\\Program Files\\Citrix\\XenTools\\XenDPriv.exe\"" + ], + "Validate": { + "PathsNotExist": [ + "C:\\Program Files\\Citrix\\XenTools\\XenDPriv.exe" + ] + } + }, But the logs on a live worker show this isn't able to delete the file: 20171206164250-DeleteXenDPriv.exe-stderr.log ============================================ Access is denied. Running attrib shows that the file is not read-only, which was my first thought about why we are not able to delete it: Z:\task_1512656869>attrib "C:\Program Files\Citrix\XenTools\XenDPriv.exe" A I C:\Program Files\Citrix\XenTools\XenDPriv.exe Rob (:grenade) suggested it might be because the file is in use. I'll look more in depth into bug 1394757 to see how this file was deleted in the test setup code before. Note deleting this file during test setup no longer works, because tests do not run as admin. Also deleting it during test setup is bad because it changes system state - i.e. tests running before a test that deletes this file could well behave differently to tests that run after this system file is deleted - therefore better for the file not to make it into a live environment in the first place, and for the test environment to be consistent between test runs - which is why I have chosen to delete it entirely from the worker environment in OpenCloudConfig.

Joel Maher ( :jmaher ) (UTC -8)

Comment 48

•

7 years ago

what we do in the test script is: 1) kill XenDPriv.exe 2) rename the file (but deleting is fine as well) it will try to restart all the time if you just kill the process and the file exists.

Rob Thijssen [:grenade (EET/UTC+0300)]

Comment 49

•

7 years ago

pmoore: if this file exists on the base ami, we can either remove it there and bake a new base ami or put the logic to kill it with fire in this method: https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/rundsc.ps1#L105 i'd kind of like it in the remove-legacystuff method just so there's a record in source code that we are doing so deliberately but not too fussed since this bug is also a good record.

Pete Moore [:pmoore][:pete]

Assignee

Comment 50

•

7 years ago

(In reply to Rob Thijssen (:grenade UTC+2) from comment #49) > i'd kind of like it in the remove-legacystuff method just so there's a > record in source code that we are doing so deliberately but not too fussed > since this bug is also a good record. For now I've removed it in the manifest, but we can move it to rundsc.ps1 if you prefer. --- I've made a new try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce439beeb616415a842e11a69d1ad10a58117eef&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable --- (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #36) > please do a --rebuild 20 on your try push, that will help get data on > failure rates. I've done this in the new try push above - hope I got the syntax right (I just added it to the end)! :-)

Joel Maher ( :jmaher ) (UTC -8)

Comment 51

•

7 years ago

Pete, from comment 43: outside of this, many tests are failing for prompt/notification/multi-window- that is a common theme from earlier when looking at taskcluster workers for windows in the past. this holds true with your latest push, none of the test failures were fixed, so --rebuild 20 seemed a bit overkill.

Pete Moore [:pmoore][:pete]

Assignee

Comment 52

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #51) > Pete, from comment 43: > outside of this, many tests are failing for > prompt/notification/multi-window- that is a common theme from earlier when > looking at taskcluster workers for windows in the past. Thanks - do you know if these problems were ever solved? > this holds true with your latest push, none of the test failures were fixed, > so --rebuild 20 seemed a bit overkill. Sorry, I didn't realise --rebuild 20 would also rerun tasks that passed. Indeed, I was pretty alarmed to see how many tasks got generated when I came in this morning! I won't be doing that again unless there is some very strong justification.

Pete Moore [:pmoore][:pete]

Assignee

Comment 53

•

7 years ago

Summary of permanent failures ============================= Windows 7 opt: 1) test-windows7-32/opt-mochitest-browser-chrome-e10s-3 2) test-windows7-32/opt-mochitest-chrome-3 Windows 7 debug: 3) test-windows7-32/debug-mochitest-5 4) test-windows7-32/debug-mochitest-browser-chrome-7 5) test-windows7-32/debug-mochitest-browser-chrome-e10s-4 6) test-windows7-32/debug-mochitest-clipboard Windows 10 opt: 7) test-windows10-64/opt-mochitest-browser-chrome-e10s-3 8) test-windows10-64/opt-mochitest-chrome-3 9) test-windows10-64/opt-mochitest-e10s-5 Windows 10 debug: 10) test-windows10-64/debug-mochitest-browser-chrome-e10s-1 11) test-windows10-64/debug-mochitest-chrome-3 12) test-windows10-64/debug-mochitest-e10s-5

Pete Moore [:pmoore][:pete]

Assignee

Comment 54

•

7 years ago

One of the failures in test-windows7-32/debug-mochitest-clipboard is: 00:09:34 ERROR - 138 INFO TEST-UNEXPECTED-FAIL | devtools/client/commandline/test/browser_cmd_screenshot.js | arg.filename.value (for 'screenshot C:\Users\task_1512691342\AppData\Local\Temp\TestScreenshotFile.png') - Got C:\Users ask_1512691342\AppData\Local\Temp\TestScreenshotFile.png, expected C:\Users\task_1512691342\AppData\Local\Temp\TestScreenshotFile.png Here was can see the failure is simply because the string is getting escaped, i.e. `C:\Users\task` -> `C:\Users<tab>ask` because `\t` is being interpreted as the tab character. This is clearly a buggy test that needs fixing. I suspect there is something funny going on here: https://dxr.mozilla.org/mozilla-central/rev/457b0fe91e0d49a5bc35014fb6f86729cd5bac9b/devtools/client/commandline/test/browser_cmd_screenshot.js#106

Flags: needinfo?(jmaher)

Pete Moore [:pmoore][:pete]

Assignee

Comment 55

•

7 years ago

Hi Matt, Do you have any ideas about what might be the cause of the failures in comment 50 (and comment 54)? Or do you know if there are any open existing bugs that I can make dependencies of this bug if any of them are currently being investigated? Thanks!

Flags: needinfo?(MattN+bmo)

Joel Maher ( :jmaher ) (UTC -8)

Comment 56

•

7 years ago

:pmoore, interesting file on the clipboard failure, could we make it upper case Task to avoid this? I agree we should look into a fix for the test. :jryans, I see you in the file commit history often for browser_cmd_screenshot.js, would you happen to know where we get the filename value and why it might interpret a \t in the full path as a <tab> character? ^^ see comment 54.

Flags: needinfo?(jmaher) → needinfo?(jryans)

Pete Moore [:pmoore][:pete]

Assignee

Comment 57

•

7 years ago

Out of curiosity I've triggered the tasks from comment 53 again (just once each) but configured the task users to be in the Administrators group (using the "osGroups" feature in generic-worker[1]). I put them in a single task group here: https://tools.taskcluster.net/groups/caeMQxVJQf6ix0UiYgXnvQ I'm curious if this will fix any of them. -- [1] https://docs.taskcluster.net/reference/workers/generic-worker/payload

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 58

•

7 years ago

Attached patch Escape backslashes in GCLI screenshot test (deleted) — Details — Splinter Review

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #56) > :jryans, I see you in the file commit history often for > browser_cmd_screenshot.js, would you happen to know where we get the > filename value and why it might interpret a \t in the full path as a <tab> > character? ^^ see comment 54. Hmm, that's a fun one! I think this patch should fix the issue, but I don't have a simple way to verify it myself.

Flags: needinfo?(jryans)

Pete Moore [:pmoore][:pete]

Assignee

Comment 59

•

7 years ago

Comment on attachment 8935897 [details] [diff] [review] Escape backslashes in GCLI screenshot test Review of attachment 8935897 [details] [diff] [review]: ----------------------------------------------------------------- ::: devtools/client/commandline/test/browser_cmd_screenshot.js @@ +104,3 @@ > check: { > args: { > filename: { value: "" + file.path }, Shouldn't the replace be on line 106 instead of line 103 ? I think line 103 just creates a description, whereas line 106 is the filename that is passed through.

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Flags: needinfo?(jryans)

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 60

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #59) > Comment on attachment 8935897 [details] [diff] [review] > Escape backslashes in GCLI screenshot test > > Review of attachment 8935897 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: devtools/client/commandline/test/browser_cmd_screenshot.js > @@ +104,3 @@ > > check: { > > args: { > > filename: { value: "" + file.path }, > > Shouldn't the replace be on line 106 instead of line 103 ? I think line 103 > just creates a description, whereas line 106 is the filename that is passed > through. I believe the `setup` is the text to enter, while the `check` block states the expected value certain arguments should have after parsing. The issue seems to be related to how we parse the backslashes in the input, so that's why I modified the `setup` to escape the text entered. However if the patch doesn't work and your version does, that's fine too!

Flags: needinfo?(jryans)

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1360198

Pete Moore [:pmoore][:pete]

Assignee

Comment 61

•

7 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #60) > (In reply to Pete Moore [:pmoore][:pete] from comment #59) > > Comment on attachment 8935897 [details] [diff] [review] > > Escape backslashes in GCLI screenshot test > > > > Review of attachment 8935897 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > ::: devtools/client/commandline/test/browser_cmd_screenshot.js > > @@ +104,3 @@ > > > check: { > > > args: { > > > filename: { value: "" + file.path }, > > > > Shouldn't the replace be on line 106 instead of line 103 ? I think line 103 > > just creates a description, whereas line 106 is the filename that is passed > > through. > > I believe the `setup` is the text to enter, while the `check` block states > the expected value certain arguments should have after parsing. > > The issue seems to be related to how we parse the backslashes in the input, > so that's why I modified the `setup` to escape the text entered. > > However if the patch doesn't work and your version does, that's fine too! Thanks Ryan! Trying your patch in * https://treeherder.mozilla.org/#/jobs?repo=try&revision=98eea9e5205be091f8f78af72c48c87f4c544870&filter-tier=1&filter-tier=2&filter-tier=3&duplicate_jobs=visible&group_state=expanded&filter-searchStr=windows&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Summary: Upgrade all win7/win10 gecko workers to generic-worker 10.2.3 → Upgrade all win7/win10 gecko workers to generic-worker 10.4.1

Pete Moore [:pmoore][:pete]

Assignee

Comment 62

•

7 years ago

The try push in comment 61 is looking much better! Note, the try push is based on this mozilla-central push, which has some starred failures already: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=6f5fac320fcb6625603fa8a744ffa8523f8b3d71&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-searchStr=windows I've retriggered failures, to see if they are intermittent.

Pete Moore [:pmoore][:pete]

Assignee

Comment 63

•

7 years ago

Hey Joel, Is there anyone that can help me with the last couple of failures? https://tinyurl.com/ycatybqe Many thanks! Pete

Flags: needinfo?(jmaher)

Matthew N. [:MattN]

Comment 64

•

7 years ago

I don't really see any obvious pattern. The notification tests have been disabled since they were intermittently failing on the old worker :(

Flags: needinfo?(MattN+bmo)

Joel Maher ( :jmaher ) (UTC -8)

Comment 65

•

7 years ago

the failures are all prompts/multi-window failures, when you are logged into a session, can you use the browser and get prompts and multiple windows? Can you run the tests locally in a vnc/rdp session and reproduce the failures? Once we get to that point, it will be easier to determine who can help.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Flags: needinfo?(jmaher)

Pete Moore [:pmoore][:pete]

Assignee

Comment 66

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #65) > the failures are all prompts/multi-window failures, when you are logged into > a session, can you use the browser and get prompts and multiple windows? > Can you run the tests locally in a vnc/rdp session and reproduce the > failures? Once we get to that point, it will be easier to determine who can > help. You're quite right - these are the next things to check. In order to do that, I've implemented a (rather basic) native RDP worker feature that will allow us to RDP in while the task is running (see bug 1172273). This is subtly different to the existing Windows loaner procedure as it will (hopefully, if it works) get you in to the actual session running the task with the real task user. I've released an alpha release of the worker which I'm deploying to our beta worker types in https://tools.taskcluster.net/groups/PKEayE2bRg-RnPWtAYYPHQ so when that deployment is complete, I will test out the new RDP procedure, and see if I can see what is going wrong.

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1172273

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1433854

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1433854

No longer depends on: 1433854

Pete Moore [:pmoore][:pete]

Assignee

Comment 67

•

7 years ago

So I'm able to watch tests running now via RDP. Example try push: (5887a82a1d0416c0724ee355f59d3c90e6fcb83f): * https://tinyurl.com/ydywxj2k I connected initially with my native screen resolution, which seems to have caused the screen resolution update to 1280x1024 to fail, so tests did not run. I then reconnected with 1280x1024 resolution, and was able to manually run the task, only to discover it passed. I'll will trigger the task again, and connecting via rdp with 1280x1024, to see if the tests then run or not, and if we get the same failure[1] we consistently get when we don't connect via RDP using the upgraded worker. -- [1] https://public-artifacts.taskcluster.net/FZncnsa3QmWrvwUNYgfWyg/0/public/test_info/mozilla-test-fail-screenshot_jbm8ac.png

Pete Moore [:pmoore][:pete]

Assignee

Comment 68

•

7 years ago

Note: in order to connect via RDP, the workflow is: 1) Add the following patches to your gecko (firefox) checkout, to enable the beta worker types: > curl -L 'https://bug1399401.bmoattachments.org/attachment.cgi?id=8935897' | hg import - > curl -L 'https://bug1400012.bmoattachments.org/attachment.cgi?id=8948627' | hg import - 2) Prepare any other commits for changes you'd like to test, as normal, and push to try. 3) Find a the failing task you want to play with in treeherder, and visit the failing task in the taskcluster task inspector 4) Go to Actions -> Edit Task 5) In the "payload" section add "rdpInfo": "ldap/<ldapUser>/rdpinfo.txt" (e.g. "ldap/pmoore@mozilla.com/rdpinfo.txt") 6) Add "generic-worker:allow-rdp:aws-provisioner-v1/<workerType>" to scopes list, e.g. > scopes: > - 'generic-worker:allow-rdp:aws-provisioner-v1/gecko-t-win7-32-beta' 7) Ask somebody in #taskcluster to grant you the generic-worker:allow-rdp:aws-provisioner-v1/<workerType> scope and queue:get-artifact:ldap/<ldapUser>/* for the workerType(s) and ldap user you use 8) Run the task, and when it starts, go to "Run Artifacts" to see the rdpInfo.txt file appear with rdp connection information 9) Enter the connection information into your RDP client of choice 10) Connect with screen resolution 1280x1024 !

Pete Moore [:pmoore][:pete]

Assignee

Comment 69

•

7 years ago

Note, bug 1436002 will simplify step 7 in comment 68. :)

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1436002

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1368961

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: tc-stability

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1333957

Chris Cooper [:coop] (he/him)

Updated

•

7 years ago

No longer blocks: tc-stability

Chris Cooper [:coop] (he/him)

Updated

•

7 years ago

Blocks: tc-stability

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Summary: Upgrade all win7/win10 gecko workers to generic-worker 10.4.1 → Upgrade all win7/win10 gecko workers to generic-worker 10.5.1

Pete Moore [:pmoore][:pete]

Assignee

Comment 70

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #68) > Note: in order to connect via RDP, the workflow is: .... <snip/> .... This is now a little bit simpler (step 5 changed, step 7 removed): 1) Add the following patches to your gecko (firefox) checkout, to enable the beta worker types: > curl -L 'https://bug1399401.bmoattachments.org/attachment.cgi?id=8935897' | hg import - > curl -L 'https://bug1400012.bmoattachments.org/attachment.cgi?id=8948627' | hg import - 2) Prepare any other commits for changes you'd like to test, as normal, and push to try. 3) Find a the failing task you want to play with in treeherder, and visit the failing task in the taskcluster task inspector 4) Go to Actions -> Edit Task 5) Add rdpInfo to the payload section: > payload: > rdpInfo: 'login-identity/<login-identity>/rdpinfo.txt' For example, 'login-identity/mozilla-ldap/pmoore@mozilla.com/rdpinfo.txt' (check https://tools.taskcluster.net/credentials to see what your login identity is, e.g. you should have the scope queue:create-artifact:login-identity/<login-identity>/*). 6) Add "generic-worker:allow-rdp:aws-provisioner-v1/<workerType>" to scopes list, e.g. > scopes: > - 'generic-worker:allow-rdp:aws-provisioner-v1/gecko-t-win7-32-beta' 7) Run the task, and when it starts, go to "Run Artifacts" to see the rdpInfo.txt file appear with rdp connection information 8) Enter the connection information into your RDP client of choice 9) Connect with screen resolution 1280x1024 !

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1358545

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1439517

Chris Cooper [:coop] (he/him)

Comment 71

•

7 years ago

Pete: can I ask you to trigger another try run so I can look at current results? My try access is still broken and this will allow me to retrigger as necessary.

Flags: needinfo?(pmoore)

Chris Cooper [:coop] (he/him)

Comment 72

•

7 years ago

(In reply to Chris Cooper [:coop] from comment #71) > Pete: can I ask you to trigger another try run so I can look at current > results? My try access is still broken and this will allow me to retrigger > as necessary. I spoke with Joel and Kendall in the TC migration mtg today. Pete: can I ask you to collate a list of the currently failing tests (from a new Try run, hopefully) in a new bug comment? I'm going to look at the failures myself using your loaner method and then write that process up so we can get a dev to help.

Pete Moore [:pmoore][:pete]

Assignee

Comment 73

•

7 years ago

(In reply to Chris Cooper [:coop] from comment #72) > (In reply to Chris Cooper [:coop] from comment #71) > > Pete: can I ask you to trigger another try run so I can look at current > > results? My try access is still broken and this will allow me to retrigger > > as necessary. No problem - I've made a try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=85b4ef4fa06f4a75d9b50f8a3de2a3ecab3f7afd > > I spoke with Joel and Kendall in the TC migration mtg today. > > Pete: can I ask you to collate a list of the currently failing tests (from a > new Try run, hopefully) in a new bug comment? I'm going to look at the > failures myself using your loaner method and then write that process up so > we can get a dev to help. I'll be gone by the time this try push completes - but following that treeherder link above should be the authoritative source of the information. Note - I made it from running step 1 and 2 from comment 70. If anyone is investigating failures, that same comment explains how to retrigger the task with an interactive loaner, and troubleshoot the issue while the task is actually running.

Flags: needinfo?(pmoore)

Pete Moore [:pmoore][:pete]

Assignee

Comment 74

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #73) > No problem - I've made a try push: > > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=85b4ef4fa06f4a75d9b50f8a3de2a3ecab3f7afd The jobs are not currently running due to bug 1443595.

Depends on: 1443595

Chris Cooper [:coop] (he/him)

Comment 75

•

7 years ago

From jmaher via email: "here is the most recent push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=85b4ef4fa06f4a75d9b50f8a3de2a3ecab3f7afd&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-searchStr=x64 here you can see: c3 - * toolkit/content/tests/chrome/test_bug360437.xul * toolkit/content/tests/chrome/test_dialogfocus.xul * toolkit/content/tests/chrome/test_showcaret.xul * toolkit/content/tests/widgets/test_menubar.xul * toolkit/mozapps/downloads/tests/chrome/test_unknownContentType_delayedbutton.xul 5 - * toolkit/components/prompts/test/test_prompts.html * toolkit/components/prompts/test/test_modal_prompts.html bc2/bc3 - * browser/components/customizableui/test/browser_panelUINotifications_multiWindow.js these are all tests that deal with focus, specifically multi modal/window test cases. In many of the screenshots you can see we pop up the window, but it is opaque in that you can see a shadow of it in the foreground. Ideally you could watch a test run locally and then compare it to a loaner and see the difference. Most of these tests send keys to specific windows to type/click/hotkey. I wonder if there is some quirk where the windows or keys are getting crossed between users such as current user and administrator."

Joel Maher ( :jmaher ) (UTC -8)

Comment 76

•

7 years ago

here are the components for the failures: c3 - * toolkit/content/tests/chrome/test_bug360437.xul ** Toolkit :: Find Toolbar * toolkit/content/tests/chrome/test_dialogfocus.xul ** Toolkit :: XUL Widgets * toolkit/content/tests/chrome/test_showcaret.xul ** Toolkit :: XUL Widgets * toolkit/content/tests/widgets/test_menubar.xul ** Core :: XUL * toolkit/mozapps/downloads/tests/chrome/test_unknownContentType_delayedbutton.xul ** Toolkit :: Downloads API 5 - * toolkit/components/passwordmgr/test/mochitest/test_prompt.html ** Toolkit :: Password Manager * toolkit/components/prompts/test/test_modal_prompts.html ** Toolkit :: General bc2/bc3 - * browser/components/customizableui/test/browser_panelUINotifications_multiWindow.js ** Firefox :: Toolbars and Customization Ideally there is something in the code of the tests that is in common and not seen in other tests, we could pinpoint the actions which seem to cause failure in this new environment.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445350

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445356

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445366

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445372

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445377

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1445385

Joel Maher ( :jmaher ) (UTC -8)

Comment 77

•

7 years ago

the test failures in mochitest-e10s-5 are concerning because when I disable the above mentioned tests, the next test(s) in the list start failing in the same way (timeout). This looks to be that we would end up disabling all prompt and modal tests for password manager and in toolkit general- the other failures seem to go away clean with disabling tests. One observation I noticed was many of these failures are on the 3rd window, so we have the harness and we open a new window for a test and that new window opens a dialog or another new window.

Pete Moore [:pmoore][:pete]

Assignee

Comment 78

•

7 years ago

Latest try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa4347639d11ec7d65ab987e82595b07bd5ec1c2

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Summary: Upgrade all win7/win10 gecko workers to generic-worker 10.5.1 → Upgrade all win7/win10 gecko workers to generic-worker 10.7.1

Chris Cooper [:coop] (he/him)

Comment 79

•

7 years ago

Pete and I met yesterday and discussed this. Here's a summation of our thoughts. Windows has a few user experience interactions (pop-ups, messages, modal windows) that appear on first-run. Since the new worker is using a new user for every run, these interactions may appear every single time we run a test unless we find the correct settings to toggle them off. I can recall this happening before on Mac. We don't know if this is the actual cause, and the timing of these interactions is unknown. Three ways we could proceed here: 1) To quote Pete, we should add a "big, dirty sleep" to the start of the test run, say 5 minutes. This will give us enough time to establish an RDP connection before the test starts to see if there's a errant popup, etc stealing focus. If may also give the pop-ups enough time to clear on their own before the test starts. 2) Failing that, we could use a Windows sys call to try to figure out which window has focus during each test. 3) We could do a screen recording of an entire test run. This would allow us to step through, rewind, etc to observe a behavior that may be too quick to manually notice otherwise.

Pete Moore [:pmoore][:pete]

Assignee

Comment 80

•

7 years ago

(In reply to Chris Cooper [:coop] from comment #79) > 3) We could do a screen recording of an entire test run. This would allow us > to step through, rewind, etc to observe a behavior that may be too quick to > manually notice otherwise. https://www.dvdvideosoft.com/products/dvd/Free-Screen-Video-Recorder.htm looks like it might do the trick here.

Pete Moore [:pmoore][:pete]

Assignee

Comment 81

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #80) > (In reply to Chris Cooper [:coop] from comment #79) > > > 3) We could do a screen recording of an entire test run. This would allow us > > to step through, rewind, etc to observe a behavior that may be too quick to > > manually notice otherwise. > > https://www.dvdvideosoft.com/products/dvd/Free-Screen-Video-Recorder.htm > looks like it might do the trick here. I had some issues with installing "Free Screen Video Recorder", I'm taking a look at "OBS Studio" instead: https://obsproject.com/ instead...

Pete Moore [:pmoore][:pete]

Assignee

Comment 82

•

7 years ago

(In reply to Chris Cooper [:coop] from comment #79) > 1) To quote Pete, we should add a "big, dirty sleep" to the start of the > test run, say 5 minutes. This will give us enough time to establish an RDP > connection before the test starts to see if there's a errant popup, etc > stealing focus. If may also give the pop-ups enough time to clear on their > own before the test starts. I've made a new try push to try this out: remote: View your change here: remote: https://hg.mozilla.org/try/rev/04c887284be5672c06d78ae93624c0624e33e722 remote: remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=04c887284be5672c06d78ae93624c0624e33e722

Pete Moore [:pmoore][:pete]

Assignee

Comment 83

•

7 years ago

Note, I've (hopefully) fixed the issue with the taskbar on both Windows 7 and Windows 10 not being hidden in bug 1433851 and am testing in a new try push: https://tinyurl.com/ycwrff4e

Pete Moore [:pmoore][:pete]

Assignee

Comment 84

•

7 years ago

(In reply to Pete Moore [:pmoore][:pete] from comment #82) > (In reply to Chris Cooper [:coop] from comment #79) > > 1) To quote Pete, we should add a "big, dirty sleep" to the start of the > > test run, say 5 minutes. This will give us enough time to establish an RDP > > connection before the test starts to see if there's a errant popup, etc > > stealing focus. If may also give the pop-ups enough time to clear on their > > own before the test starts. > > I've made a new try push to try this out: > > remote: View your change here: > remote: > https://hg.mozilla.org/try/rev/04c887284be5672c06d78ae93624c0624e33e722 > remote: > remote: Follow the progress of your build on Treeherder: > remote: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=04c887284be5672c06d78ae93624c0624e33e722 I forgot to say - the big dirty sleep didn't help. :(

Pete Moore [:pmoore][:pete]

Assignee

Comment 85

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #77) > the test failures in mochitest-e10s-5 are concerning because when I disable > the above mentioned tests, the next test(s) in the list start failing in the > same way (timeout). This looks to be that we would end up disabling all > prompt and modal tests for password manager and in toolkit general- the > other failures seem to go away clean with disabling tests. > > One observation I noticed was many of these failures are on the 3rd window, > so we have the harness and we open a new window for a test and that new > window opens a dialog or another new window. This is now resolved. TL;DR: It was due to the settings here[1]. In release 10.7.7 I've upgraded generic-worker to go 1.10, and in the process, rediscovered these STARTUPINFO settings[3]. In go 1.10, there is the possibility to use the go standard library to make CreateProcessAsUser system calls, which previously did not exist, so I have migrated the worker to use the standard library for spawning task user processes, and in the process done away with these flags. In the process of migrating, I discovered the STARTUPINFO flags are controlled by the standard library in go 1.10, and do not allow for any customisation. From MSDN docs[3,4] we see the difference between the flag settings I was using, and those used in the standard library: Pre generic-worker 10.7.7 ========================= Before we set the following STARTUPINFO process flags[1]: si.Flags = win32.STARTF_FORCEOFFFEEDBACK | syscall.STARTF_USESHOWWINDOW si.ShowWindow = syscall.SW_SHOWMINNOACTIVE From the MSDN docs[3,4]: STARTF_FORCEOFFFEEDBACK Indicates that the feedback cursor is forced off while the process is starting. The Normal Select cursor is displayed. STARTF_USESHOWWINDOW The wShowWindow member contains additional information. SW_SHOWMINNOACTIVE Displays the window as a minimized window. This value is similar to SW_SHOWMINIMIZED, except the window is not activated. Post generic-worker 10.7.7 ========================== Now we set the flags like this[2]: si.Flags = STARTF_USESTDHANDLES From the MSDN docs[3]: STARTF_USESTDHANDLES The hStdInput, hStdOutput, and hStdError members contain additional information..... Conclusion ========== The problem here was with SW_SHOWMINNOACTIVE which creates a non-activated window. When rereading the docs, I was reminded that our failures were focus related, and led me to try using adopting the standard library instead, to see if that solved the issue. Of course we could have continued to use our custom runlib library, and adapted the flags, but this seemed like a good opportunity to simplify our codebase, and use the new feature of the go standard library. -- [1] https://github.com/taskcluster/runlib/blob/4ab38b9ff487347cfe9707ca800d305baab444b5/subprocess/subprocess_windows.go#L139-L140 [2] https://github.com/golang/go/blob/go1.10/src/syscall/exec_windows.go#L311-L320 [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686331%28v=vs.85%29.aspx [4] https://msdn.microsoft.com/en-us/library/windows/desktop/ms633548(v=vs.85).aspx

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1448197

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Depends on: 1447265

Pete Moore [:pmoore][:pete]

Assignee

Comment 86

•

7 years ago

I believe all blocking issues have now been resolved - but I'm on PTO - so will look at rolling out generic-worker next week when I'm back.

Pete Moore [:pmoore][:pete]

Assignee

Comment 87

•

7 years ago

Latest try push with new settings. Failures were all intermittents, that passed in retries: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7af721a1d8a445af27f3ceaea939c75d6eb6266a&group_state=expanded

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Summary: Upgrade all win7/win10 gecko workers to generic-worker 10.7.1 → Upgrade all win7/win10 gecko workers to generic-worker 10.7.8

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Blocks: 1180187

Pete Moore [:pmoore][:pete]

Assignee

Comment 88

•

7 years ago

Currently preparing the deployment...

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

7 years ago

Blocks: 1441580

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

Attachment #8935897 - Flags: review+

Comment 89

•

7 years ago

Pushed by pmoore@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f8e34bd956b Escape backslashes in GCLI screenshot test,r=pmoore

Andrei Ciure[:aciure]

Comment 90

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1f8e34bd956b

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 91

•

7 years ago

As Pete wrote this upgrade hasn't been done yet, so I will reopen for now.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Pete Moore [:pmoore][:pete]

Assignee

Comment 92

•

7 years ago

Attached file Github Pull Request for OpenCloudConfig (deleted) — Details

This should do all the magic. I've spent a couple of days double and triple checking worker type definitions, and am reasonably confident everything has been accounted for. This is really a *major* upgrade, so the potential for something to go wrong is higher than normal. I've taken a snapshot of the (confidential) worker type definitions, which I will share with the team, so that they can be rolled back if needed. This means if any problems are discovered the rollback process is two-fold: 1) Revert PR 128 from OpenCloudConfig 2) Request that somebody in the taskcluster team reverts the worker type definitions to their current state (i.e. to the versions I am sending them in an email this afternoon)

Attachment #8964886 - Flags: review?(rthijssen)

Pete Moore [:pmoore][:pete]

Assignee

Comment 93

•

7 years ago

Deployment has begun: https://tools.taskcluster.net/groups/FZxJu8K_R7SpRRifgThGeA

Rob Thijssen [:grenade (EET/UTC+0300)]

Updated

•

7 years ago

Attachment #8964886 - Flags: review?(rthijssen) → review+

Pete Moore [:pmoore][:pete]

Assignee

Comment 94

•

7 years ago

We haven't had any complaints of problems yet, so I'll close this now. Please reopen if any issues appear!

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Resolution: --- → FIXED

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1451682

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

7 years ago

No longer depends on: 1448197

Joel Maher ( :jmaher ) (UTC -8)

Comment 95

•

6 years ago

:pmoore, can we remove this comment and code: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/coalesce.py#30

Flags: needinfo?(pmoore)

Pete Moore [:pmoore][:pete]

Assignee

Updated

•

6 years ago

Status: RESOLVED → REOPENED

Flags: needinfo?(pmoore)

Resolution: FIXED → ---

Pete Moore [:pmoore][:pete]

Assignee

Comment 96

•

6 years ago

Attached patch gecko patch: enable coalescing on win7/win10 worker types (deleted) — Details — Splinter Review

Nice spot, Joel! Does this look ok?

Attachment #9023219 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 97

•

6 years ago

Comment on attachment 9023219 [details] [diff] [review] gecko patch: enable coalescing on win7/win10 worker types Review of attachment 9023219 [details] [diff] [review]: ----------------------------------------------------------------- cool!

Attachment #9023219 - Flags: review?(jmaher) → review+

Chris Cooper [:coop] (he/him)

Comment 98

•

6 years ago

Pete: is this live now?

Flags: needinfo?(pmoore)

Comment 99

•

6 years ago

Pushed by pmoore@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cef0a23a3849 enable coalescing on Windows 7 and Windows 10 worker types,r=jmaher

Pete Moore [:pmoore][:pete]

Assignee

Comment 100

•

6 years ago

It should be soon - I've just pushed to mozilla-inbound, and this bug should get automatically closed when it lands on mozilla-central, so let's leave it open.

Fingers crossed! Thanks for chasing me up. :)

Flags: needinfo?(pmoore)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 101

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cef0a23a3849

Status: REOPENED → RESOLVED

Closed: 7 years ago → 6 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Integration → Services

Bogdan Tara[:bogdan_tara | bogdant]

Updated

•

4 years ago

Blocks: 1654126

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

No longer blocks: 1654126

You need to log in before you can comment on or make changes to this bug.