Closed Bug 1356448 Opened 8 years ago Closed 7 years ago

Enable "GPU" process on Windows software backends by default on Nightly

Categories

(Core :: Graphics, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: milan, Assigned: dvander)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

See bug 1356091. We're going to enable this on Nightly and see what results look like for sync telemetry and overall talos performance, etc.
Assignee: nobody → milan
Comment on attachment 8858188 [details] Bug 1356448: Allow software backed compositor process in nightly. https://reviewboard.mozilla.org/r/130126/#review132768
Attachment #8858188 - Flags: review?(dvander) → review+
Pushed by msreckovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2e59e89efef9 Allow software backed compositor process in nightly. r=dvander
Depends on: 1356554
Backed out for frequently crashing test_saveHeapSnapshot_e10s_01.js, test_ChildHistograms.js and more telemetry tests: https://hg.mozilla.org/integration/autoland/rev/d09143959b1af76d12d4429e92b1cd07544d0bef Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2e59e89efef982bee73573f6861f50574bab6b84&filter-searchStr=80650d961e4abde99d8e85461c8b0848d620a6f2 Please check the various xpcshell failures on Windows 7 VM debug.
Flags: needinfo?(milan)
keep in mind there is a perf regression here in bug 1356554: == Change summary for alert #6034 (as of April 14 2017 03:36 UTC) == Regressions: 20% basic_compositor_video summary windows8-64 opt e10s 4.46 -> 5.35 17% basic_compositor_video summary windows7-32 opt e10s 4.34 -> 5.1 17% basic_compositor_video summary windows7-32 pgo e10s 4.33 -> 5.06 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6034 it went away when we backed out.
Mostly assertions about things being on the main thread.
Perhaps interesting that the try run in comment 2 came back green for X, while comment 5 shows the error from the actual push.
Stealing with permission to look at the backout reasons.
Assignee: milan → dvander
Flags: needinfo?(milan)
The bug here appears to be bug 1356365, which is quite nice as maybe now we can reproduce it. Shot in the dark: VideoBridgeParent isn't holding a compositor thread reference, so its ActorDestroy message outlives the compositor thread.
This manifests as a main-thread assert as a red-herring. The video bridge protocol shuts down as an xpcom-shutdown observer, which fires after we've turned off the crash reporter. Bill's bad-MessageChannel-assertion works via a crash reporter annotation, and the crash reporter gets confused.
Depends on: 1356365
Pushed by danderson@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6c4cc6a04cc4 Allow software backed compositor process in nightly. r=dvander
somehow after landing dt5 tests turned red like https://treeherder.mozilla.org/logviewer.html#?job_id=93709330&repo=mozilla-inbound - seems it related also to comment #6
Flags: needinfo?(dvander)
Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/aeea139bf767 Backed out changeset 6c4cc6a04cc4 for dt5 perma failures
backing this out resolved the performance regression in bug 1356554.
The DevTools failures here look similar to those in bug 1319248. I believe I have fixes for the tests, so I'll try to land those soon over in that bug.
I'm mildly confused that this pref even changed anything. It should have no effect on Windows 7 whatsoever since we're accelerated there. In fact it should really yield very little additional testing on try.
Depends on: 1319248
Flags: needinfo?(dvander)
Milan, this try run [1] has a bunch of logging for when feature state changes. For Windows 7 R-e10s columns, I see what we'd expect [2]: 00:03:16 INFO - Changing 1/Direct3D11 Compositing level default to available (<unknown>) I.e., D3D11 is allowed. However on the M-e10s columns [3], I see: 01:57:22 INFO - GECKO(2836) | Changing 1/Direct3D11 Compositing level default to available (<unknown>) 01:57:22 INFO - GECKO(2836) | Changing 1/Direct3D11 Compositing level env to blacklisted (#BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR) This is really surprising. I didn't know we're running most of our Windows 7 tests without acceleration. Even more odd is that they're listed under the same operating system, but it seems like these tests must be running on different machine configurations. Do you know anything about this? [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a9017f1d5ce51bf3968f27c38218258429f2a00 [2] https://archive.mozilla.org/pub/firefox/try-builds/danderson@mozilla.com-8a9017f1d5ce51bf3968f27c38218258429f2a00/try-win32/try_win7_vm_gfx_test-reftest-e10s-bm128-tests1-windows-build776.txt.gz [3] https://archive.mozilla.org/pub/firefox/try-builds/danderson@mozilla.com-8a9017f1d5ce51bf3968f27c38218258429f2a00/try-win32/try_win7_vm_test-mochitest-e10s-5-bm137-tests1-windows-build2658.txt.gz
Flags: needinfo?(milan)
windows 7 mochitests are on c3 instances in AWS, but reftests are on G2 instances in aws. The G2 instances have a dedicated GPU and are much more expensive (and a smaller pool available for us to use) here are some more details about aws instance types: https://aws.amazon.com/ec2/instance-types/
Thanks Joel, I know I get confused on what we're running where all the time.
Flags: needinfo?(milan)
I wasn't able to get the tests in bug 1319248 fixed quickly, so for now I've disabled them on Windows. You should be able to land this change without hitting DevTools failures now.
Comment on attachment 8858188 [details] Bug 1356448: Allow software backed compositor process in nightly. https://reviewboard.mozilla.org/r/130126/#review138872
Attachment #8858188 - Flags: review+
Pushed by msreckovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/97a3a6e6550b Allow software backed compositor process in nightly. r=dvander
Backed out for frequently failing xpcshell's test_ChildScalars.js and test_ChildHistograms.js on Windows 7 VM debug: https://hg.mozilla.org/integration/autoland/rev/67bcd1fe0009aee013bd350a00b1e50a3ea05c05 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=97a3a6e6550b0e5e6df000a0392e54fe5cc844e3&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log example: https://treeherder.mozilla.org/logviewer.html#?job_id=96340266&repo=autoland 10:53:08 WARNING - PROCESS-CRASH | toolkit/components/telemetry/tests/unit/test_ChildScalars.js | application crashed [@ CrashReporter::AnnotateCrashReport(nsACString const &,nsACString const &)] 10:53:08 INFO - Crash dump filename: c:\users\cltbld\appdata\local\temp\xpc-other-wyffeb\0d7bccbd-8098-42c4-9f63-8425d06ebe8a.dmp 10:53:08 INFO - Operating system: Windows NT 10:53:08 INFO - 6.1.7601 Service Pack 1 10:53:08 INFO - CPU: x86 10:53:08 INFO - GenuineIntel family 6 model 62 stepping 4 10:53:08 INFO - 8 CPUs 10:53:08 INFO - GPU: UNKNOWN 10:53:08 INFO - Crash reason: EXCEPTION_BREAKPOINT 10:53:08 INFO - Crash address: 0x60fbf25b 10:53:08 INFO - Assertion: Unknown assertion type 0x00000000 10:53:08 INFO - Process uptime: 9 seconds 10:53:08 INFO - Thread 4 (crashed) 10:53:08 INFO - 0 xul.dll!CrashReporter::AnnotateCrashReport(nsACString const &,nsACString const &) [nsExceptionHandler.cpp:97a3a6e6550b : 2354 + 0x18] 10:53:08 INFO - eip = 0x60fbf25b esp = 0x03edf65c ebp = 0x03edf73c ebx = 0x03edf750 10:53:08 INFO - esi = 0x00000932 edi = 0x03edf75c eax = 0x00000000 ecx = 0x6b9406ef 10:53:08 INFO - edx = 0x00000060 efl = 0x00000206 10:53:08 INFO - Found by: given as instruction pointer in context 10:53:08 INFO - 1 xul.dll!mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() [MessageChannel.cpp:97a3a6e6550b : 680 + 0x2a] 10:53:08 INFO - eip = 0x5ebe144c esp = 0x03edf744 ebp = 0x03edf768 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 2 xul.dll!MessageLoop::~MessageLoop() [message_loop.cc:97a3a6e6550b : 173 + 0x15] 10:53:08 INFO - eip = 0x5ebb1c01 esp = 0x03edf770 ebp = 0x03edf78c 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 3 xul.dll!MessageLoop::`scalar deleting destructor'(unsigned int) + 0xb 10:53:08 INFO - eip = 0x5ebb270a esp = 0x03edf794 ebp = 0x03edf798 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 4 xul.dll!nsThread::ThreadFunc(void *) [nsThread.cpp:97a3a6e6550b : 531 + 0xc] 10:53:08 INFO - eip = 0x5e82431b esp = 0x03edf7a0 ebp = 0x03edf7c0 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 5 nss3.dll!_PR_NativeRunThread [pruthr.c:97a3a6e6550b : 397 + 0x6] 10:53:08 INFO - eip = 0x5e2740ca esp = 0x03edf7c8 ebp = 0x03edf7e0 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 6 nss3.dll!pr_root [w95thred.c:97a3a6e6550b : 95 + 0xa] 10:53:08 INFO - eip = 0x5e26822d esp = 0x03edf7e8 ebp = 0x03edf7ec 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 7 ucrtbase.dll + 0x3d5ef 10:53:08 INFO - eip = 0x6b94d5ef esp = 0x03edf7f4 ebp = 0x03edf828 10:53:08 INFO - Found by: call frame info 10:53:08 INFO - 8 kernel32.dll + 0x53c45 10:53:08 INFO - eip = 0x75763c45 esp = 0x03edf830 ebp = 0x03edf834 10:53:08 INFO - Found by: previous frame's frame pointer 10:53:08 INFO - 9 ntdll.dll!__RtlUserThreadStart + 0x27 10:53:08 INFO - eip = 0x773137f5 esp = 0x03edf83c ebp = 0x03edf874 10:53:08 INFO - Found by: previous frame's frame pointer 10:53:08 INFO - 10 ntdll.dll!_RtlUserThreadStart + 0x1b 10:53:08 INFO - eip = 0x773137c8 esp = 0x03edf87c ebp = 0x03edf88c 10:53:08 INFO - Found by: call frame info
Flags: needinfo?(milan)
Looks like another protocol shutdown bug, I'll investigate.
Flags: needinfo?(milan)
Backout by kwierso@gmail.com: https://hg.mozilla.org/mozilla-central/rev/96605941c002 Backed out changeset 97a3a6e6550b for frequently failing xpcshell's test_ChildScalars.js and test_ChildHistograms.js on Windows 7 VM debug. r=backout a=merge
99% sure the backout reason is bug 1360697.
With bug 1360697 landed, I've pushed a few try runs to see (1) whether this passes the xpcshell tests now, and (2) whether disabling oop media decoding helps suppress the talos regression.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=958ca93f780c&newProject=try&newRevision=65f69f1ceb020345803505a630d907954ef2546e&framework=1&showOnlyImportant=0 This is a try run comparing the patch in this bug to another version disabling out-of-process media decoding. It looks like the regression indeed goes away. Both try runs seemed clean otherwise. I'll do a new patch tomorrow that disables the video decoder for software compositing.
This should do the trick for Talos, or any configuration where the first compositor was software.
Attachment #8867057 - Flags: review?(matt.woodrow)
Attachment #8867057 - Flags: review?(matt.woodrow) → review+
Pushed by danderson@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/dc9185b73e07 Don't use out-of-process video decoding with software compositors. (bug 1356448, r=mattwoodrow) https://hg.mozilla.org/integration/mozilla-inbound/rev/87e95e385f23 Allow software backed compositor process in nightly. r=dvander
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
it looks like we have an improvement in memory from AWSY: == Change summary for alert #6693 (as of May 17 2017 08:08 UTC) == Improvements: 4% Resident Memory summary windows7-32-vm opt 385,340,463.95 -> 370,743,513.58 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6693 thanks!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: