Closed
Bug 1374038
Opened 7 years ago
Closed 7 years ago
Nightly 2017-06-17 prevents Windows 10 shutdown
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
VERIFIED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | + | verified |
firefox56 | + | verified |
People
(Reporter: ssusbauer, Assigned: benjamin)
References
Details
(Keywords: regression)
Attachments
(2 files)
(deleted),
text/x-review-board-request
|
Details | |
(deleted),
text/x-review-board-request
|
jimm
:
review+
mstange
:
review+
lizzard
:
approval-mozilla-beta+
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20170617030206
Steps to reproduce:
Running Nightly 2017-06-17 (x64). Shutdown or reboot Windows without first exiting Nightly.
Actual results:
Windows displays "This program is preventing shutdown" and eventually dumps back to desktop. The Nightly UI is hard frozen, "not responding" and I have to manually kill with Windows. There is nothing shown in about:crashes. Same behavior if I hit "cancel" on the Windows shutdown screen to return to the desktop.
Expected results:
Nightly should close immediately.
I experience this on two computers, Nightly exits immediately when closing manually. I've run through Mozregression, I do not experience this behavior in 2017-06-16, I do experience this in build dfb091ee but none of the other ones that come up. Here is the output from Mozregression:
2017-06-18T01:03:33: INFO : Narrowed inbound regression window from [ff45f0da, b4d533bb] (4 revisions) to [c956b50f, b4d533bb] (2 revisions) (~1 steps left)
2017-06-18T01:03:33: DEBUG : Starting merge handling...
2017-06-18T01:03:33: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=b4d533bb18fc216234c2f18717502050663ba5d8&full=1
2017-06-18T01:03:34: DEBUG : Found commit message:
Bug 1368386 - Only call mInvalidRegion.GetRegion() when an up-to-date valid region is requested. r=mattwoodrow
MozReview-Commit-ID: IHb0fNAzyjj
2017-06-18T01:03:34: INFO : The bisection is done.
2017-06-18T01:03:34: INFO : Stopped
Updated•7 years ago
|
Blocks: 1368386
Has Regression Range: --- → yes
Component: Untriaged → Layout: Web Painting
Keywords: regression
Product: Firefox → Core
Comment 2•7 years ago
|
||
This is very surprising.
Steven, can you check whether the build at https://queue.taskcluster.net/v1/task/UPYbXgstSyiijD2WtsG5hQ/runs/0/artifacts/public/build/firefox-56.0a1.en-US.win64.zip exhibits the problem?
Yes, this build does appear to have the problem. I have noticed that for all of the affected builds, if I start Firefox in Safe Mode the problem goes away until I run normally again. Manually disabling hardware acceleration or setting the content process to 1 does not seem to have an affect.
Comment 4•7 years ago
|
||
(In reply to Steven S. from comment #3)
> Yes, this build does appear to have the problem.
Thanks! This was the build from revision https://hg.mozilla.org/integration/autoland/rev/1c4c304c065706ffb014ca972f23cf19c586c910 , which is just one push *before* the push that landed bug 1368386. So bug 1368386 is not the cause of this regression.
> I have noticed that for all
> of the affected builds, if I start Firefox in Safe Mode the problem goes
> away until I run normally again. Manually disabling hardware acceleration or
> setting the content process to 1 does not seem to have an affect.
Ah.
It sounds like it's rather tricky to get consistent results during the regression range search. Can you try again anyway, now that you know more about the things you need to look out for?
Updated•7 years ago
|
Comment 7•7 years ago
|
||
from dupe bug: 1374284
Affected:
Nightly56.0a1(default e10s enabled)
Not affected:
Beta55.0b2(both e10s disabled by accessibility tools and force e10s enabled)
Reproducible: always
Nightly56.0a1(default e10s on)
Steps To reproduce:
1. Launch Nightly with clean profile(no addon)
2. Sign out from right click of Start button of windows taskbar
3. Wait for few seconds( ~10sec)
AR:
Nightly Not Responding. And Fails to sigh out. You must forcibly sign out
ER:
Successfully proceed sign out
Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c9669890cf99c9db2374cf81055c19cb42e382c4&tochange=bb9f906e70e30d56c386717ee9a45b7d3c774573
Regressed by:
bb9f906e70e3 Markus Stange — Bug 1330185 - Use a top-level PProfiler protocol to control the profiler in other processes. r=billm,njn
In Ver55, I confirmed that it was fixed when merge day (central to beta).
i.e,
Reproduced on
https://hg.mozilla.org/mozilla-central/rev/f9605772a0c9098ed1bcaa98089b2c944ed69e9b
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0 ID:20170612041720
But not reproduced on
https://hg.mozilla.org/releases/mozilla-beta/rev/2191d7f87e2e69af1db8b3459522abe7c1db4c2e
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0 ID:20170612062327
Blocks: 1330185
Status: UNCONFIRMED → NEW
status-firefox54:
--- → unaffected
status-firefox55:
--- → unaffected
status-firefox56:
--- → affected
Component: Layout: Web Painting → Untriaged
Ever confirmed: true
Keywords: regressionwindow-wanted
Updated•7 years ago
|
Thanks,
I found the same.
When I was manually asking mozregression gui for 2017-06-16 or later, it is grabbing builds from mozilla-central that do not appear to have the issue.
Comment 9•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #4)
> So bug 1368386 is not the cause of this regression.
(In reply to Alice0775 White from comment #7)
> Regressed by:
> bb9f906e70e3 Markus Stange — Bug 1330185 - Use a top-level PProfiler
> protocol to control the profiler in other processes. r=billm,njn
Haha, I shouldn't have believed I'd be let off the hook that easily.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Comment 10•7 years ago
|
||
We're hanging while calling Close() on the message channel in the parent process. This happens when the ProfilerParentTracker is destroyed from the UniquePtr destructor during global object destruction when all other threads have already shut down. So we're waiting for the IPC thread to signal us at a time at which there is no IPC thread any more.
> ntdll.dll!NtWaitForAlertByThreadId()
> ntdll.dll!RtlSleepConditionVariableSRW()
> KernelBase.dll!SleepConditionVariableSRW()
> > mozglue.dll!mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl & lock)
> xul.dll!mozilla::CondVar::Wait(unsigned int aInterval)
> xul.dll!mozilla::ipc::MessageChannel::SynchronouslyClose()
> xul.dll!mozilla::ipc::MessageChannel::Close()
> xul.dll!mozilla::ProfilerParentTracker::~ProfilerParentTracker()
> xul.dll!mozilla::UniquePtr<mozilla::ProfilerParentTracker,mozilla::DefaultDelete<mozilla::ProfilerParentTracker> >::reset(mozilla::ProfilerParentTracker * aPtr)
> ucrtbase.dll!<lambda>(void)()
> ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) & __ptr64,<lambda_3883c3dff614d5e0c5f61bb1ac94921c> >()
> ucrtbase.dll!_execute_onexit_table()
> xul.dll!dllmain_crt_process_detach(const bool is_terminating)
> xul.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved)
> ntdll.dll!LdrpCallInitRoutine()
> ntdll.dll!LdrShutdownProcess()
> ntdll.dll!RtlExitUserProcess()
> kernel32.dll!ExitProcessImplementation()
> ucrtbase.dll!exit_or_terminate_process()
> ucrtbase.dll!common_exit()
> xul.dll!nsWindow::ProcessMessage(unsigned int msg, unsigned __int64 & wParam, __int64 & lParam, __int64 * aRetValue)
> xul.dll!nsWindow::WindowProcInternal(HWND__ * hWnd, unsigned int msg, unsigned __int64 wParam, __int64 lParam)
> xul.dll!CallWindowProcCrashProtected(__int64(*)(HWND__ *, unsigned int, unsigned __int64, __int64) aWndProc, HWND__ * aHWnd, unsigned int aMsg, unsigned __int64 aWParam, __int64 aLParam)
> xul.dll!nsWindow::WindowProc(HWND__ * hWnd, unsigned int msg, unsigned __int64 wParam, __int64 lParam)
> user32.dll!UserCallWinProcCheckWow()
> user32.dll!DispatchClientMessage()
> user32.dll!__fnDWORD()
> ntdll.dll!KiUserCallbackDispatcherContinue()
> win32u.dll!NtUserMessageCall()
> user32.dll!RealDefWindowProcWorker()
> user32.dll!DefWindowProcW()
> user32.dll!UserCallWinProcCheckWow()
> user32.dll!CallWindowProcW()
> xul.dll!nsWindow::WindowProcInternal(HWND__ * hWnd, unsigned int msg, unsigned __int64 wParam, __int64 lParam)
> xul.dll!CallWindowProcCrashProtected(__int64(*)(HWND__ *, unsigned int, unsigned __int64, __int64) aWndProc, HWND__ * aHWnd, unsigned int aMsg, unsigned __int64 aWParam, __int64 aLParam)
> xul.dll!nsWindow::WindowProc(HWND__ * hWnd, unsigned int msg, unsigned __int64 wParam, __int64 lParam)
> user32.dll!UserCallWinProcCheckWow()
> user32.dll!DispatchClientMessage()
> user32.dll!__fnDWORD()
> ntdll.dll!KiUserCallbackDispatcherContinue()
> win32u.dll!NtUserPeekMessage()
> user32.dll!PeekMessageW()
> msctf.dll!CThreadInputMgr::PeekMessageW(struct tagMSG *,struct HWND__ *,unsigned int,unsigned int,unsigned int,int *)
> xul.dll!nsAppShell::ProcessNextNativeEvent(bool mayWait)
> xul.dll!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal * thr, bool mayWait)
> xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult)
> xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate)
> xul.dll!MessageLoop::RunHandler()
> xul.dll!MessageLoop::Run()
> xul.dll!nsBaseAppShell::Run()
> xul.dll!nsAppShell::Run()
> xul.dll!nsAppStartup::Run()
> xul.dll!XREMain::XRE_mainRun()
> xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig)
> xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig)
> firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp)
> firefox.exe!wmain(int argc, wchar_t * * argv)
> firefox.exe!__scrt_common_main_seh()
> kernel32.dll!BaseThreadInitThunk()
> ntdll.dll!RtlUserThreadStart()
Comment hidden (mozreview-request) |
Comment 12•7 years ago
|
||
I don't know if this patch is a good idea. It seems very risky.
According to https://connect.microsoft.com/VisualStudio/feedback/details/3111120/vs-2015-ucrt-exit-function-is-racy-with-respect-to-other-threads , the fact that _exit(0) runs the destructor of my UniquePtr could be considered a bug.
It might be nicer to have a termination function which does not run these destructors. Is there such a thing?
Updated•7 years ago
|
Component: Untriaged → Gecko Profiler
Comment 13•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #12)
> It might be nicer to have a termination function which does not run these
> destructors. Is there such a thing?
That would be TerminateProcess, and according to the Connect bug, the Chrome folks switched to that as a workaround. I don't know our shutdown code well enough to say whether it makes sense for us to do the same.
Comment 14•7 years ago
|
||
mozreview-review |
Comment on attachment 8879318 [details]
Bug 1374038 - Run ClearOnShutdown handlers during WM_ENDSESSION.
https://reviewboard.mozilla.org/r/150602/#review156316
I don't know what the right answer is here. I think the `TerminateProcess` route is a little less risky (and better maintenance-wise, etc.)...maybe? Anybody who's doing things that they want saved to disk should be listening for `profile-before-change` or some variant thereof. WDYT?
::: widget/windows/nsWindow.cpp:5139
(Diff revision 1)
> + KillClearOnShutdown(ShutdownPhase::WillShutdown);
> + KillClearOnShutdown(ShutdownPhase::Shutdown);
> + KillClearOnShutdown(ShutdownPhase::ShutdownThreads);
> + KillClearOnShutdown(ShutdownPhase::ShutdownLoaders);
> + KillClearOnShutdown(ShutdownPhase::ShutdownFinal);
This seems pretty risky; I'd be worried about hidden dependencies between these shutdown stages and the work performed by the handlers for the actual xpcom shutdown notifications (that we don't actually do here).
OTOH, I'd wager the reason we haven't taken the `TerminateProcess` course (or, uh, `quick_exit`, I think, on Unix-y platforms) is that we're worried about the possibility that real, useful work is down in those static destructors that wouldn't get called. I think anybody who is trying to do useful work in destructors in that fashion is asking for trouble, but I can understand the concern.
Comment 15•7 years ago
|
||
The mochitest-2 failures on the associated try run also look kind of concerning to me, in addition to comment 14.
Flags: needinfo?(mstange)
Comment 16•7 years ago
|
||
(In reply to Steven S. from comment #0)
>
> Windows displays "This program is preventing shutdown" and eventually dumps
> back to desktop. The Nightly UI is hard frozen, "not responding" and I have
> to manually kill with Windows.
I get the exact same thing for a week or two now, only on v55 beta (e10s=on). Was something pushed to beta that might do this?
Comment 17•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #14)
> I don't know what the right answer is here. I think the `TerminateProcess`
> route is a little less risky (and better maintenance-wise, etc.)...maybe?
> Anybody who's doing things that they want saved to disk should be listening
> for `profile-before-change` or some variant thereof. WDYT?
Sounds reasonable... I think. I don't think I know enough about this problem to really make an informed decision here, though.
I haven't looked into the mochitest-2 failures yet.
Flags: needinfo?(mstange)
Comment 18•7 years ago
|
||
Comment on attachment 8879318 [details]
Bug 1374038 - Run ClearOnShutdown handlers during WM_ENDSESSION.
(In reply to Markus Stange [:mstange] from comment #17)
> (In reply to Nathan Froyd [:froydnj] from comment #14)
> > I don't know what the right answer is here. I think the `TerminateProcess`
> > route is a little less risky (and better maintenance-wise, etc.)...maybe?
> > Anybody who's doing things that they want saved to disk should be listening
> > for `profile-before-change` or some variant thereof. WDYT?
>
> Sounds reasonable... I think. I don't think I know enough about this problem
> to really make an informed decision here, though.
Join the club! :)
> I haven't looked into the mochitest-2 failures yet.
Canceling review while the test failures and an alternate path forward are looked at, then.
Attachment #8879318 -
Flags: review?(nfroyd)
Comment 19•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #14)
> I don't know what the right answer is here. I think the `TerminateProcess`
> route is a little less risky (and better maintenance-wise, etc.)...maybe?
> Anybody who's doing things that they want saved to disk should be listening
> for `profile-before-change` or some variant thereof. WDYT?
the problem is that, unless the consumer uses AsyncShutdown (And not all of our "writers" do), even if the work begins at profile-before-change, it's unlikely to finish there, since we move most IO off the main thread.
Is this still reproducible in Nightly? Andrei, can someone on your team check the STR in comment 7?
Flags: needinfo?(andrei.vaida)
Comment 21•7 years ago
|
||
I am still seeing this on Nightly 20170712030204
Comment 22•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #20)
> Is this still reproducible in Nightly? Andrei, can someone on your team
> check the STR in comment 7?
Hello,
Yes, I can confirm that this issue is still reproducible with latest Nightly 56.0a1 (2017-07-12) on Windows 10 x64.
Flags: needinfo?(andrei.vaida)
Markus, can you follow up? Is this a possible source of updates failing?
Flags: needinfo?(mstange)
tracking-firefox56:
--- → +
Comment hidden (mozreview-request) |
Assignee | ||
Comment 26•7 years ago
|
||
I think we should quick-exit here. We aren't trying to clean up memory in this case: we're just trying to save the profile and then get out of the way as quickly as possible. If static destructors are being relied on to save profile data, we have much bigger problems.
Comment 27•7 years ago
|
||
mozreview-review |
Comment on attachment 8887201 [details]
Bug 1374038 - _exit() can call static destructors on Windows! So replace it with TerminateProcess which can't, +mstange
https://reviewboard.mozilla.org/r/157970/#review163078
Thank you!
I agree with your assessment.
Attachment #8887201 -
Flags: review?(mstange) → review+
Comment 28•7 years ago
|
||
mozreview-review |
Comment on attachment 8887201 [details]
Bug 1374038 - _exit() can call static destructors on Windows! So replace it with TerminateProcess which can't, +mstange
https://reviewboard.mozilla.org/r/157970/#review163086
Attachment #8887201 -
Flags: review?(jmathies) → review+
Comment 29•7 years ago
|
||
Pushed by bsmedberg@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9acc3b670795
_exit() can call static destructors on Windows! So replace it with TerminateProcess which can't, r=jimm r=mstange
Comment 30•7 years ago
|
||
Did you intend to set `process = GetCurrentProcessId()`? I suspect that WaitForSingleObject wouldn't know what to do with the -1 pseudo-handle (is it a process? a thread? something else?).
Flags: needinfo?(benjamin)
Comment 31•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #30)
> Did you intend to set `process = GetCurrentProcessId()`?
Or maybe that's not right either. You might need to DuplicateHandle to get a real handle. I'm not sure offhand.
Assignee | ||
Comment 32•7 years ago
|
||
I don't know for certain that WaitForSingleObject can deal with the -1 pseudohandle, but it appears to work in local testing (I used the tool at http://stefanstools.sourceforge.net/SendMessage.html to do local testing).
Flags: needinfo?(benjamin)
Comment 33•7 years ago
|
||
Oh, the GetCurrentThread() pseudo-handle is -2, so maybe WaitForSingleObject knows how to distinguish them after all. If all is well then sorry for the noise!
Assignee | ||
Comment 34•7 years ago
|
||
See https://blogs.msdn.microsoft.com/oldnewthing/20040302-00/?p=40443 which addresses this!
Comment 35•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Emil, you mentioned in email you were still able to reproduce this issue with the 7-18 Nightly. Can you verify the fix here tomorrow (I guess with the 7/20 nightly.) Thanks!
Flags: qe-verify+
Updated•7 years ago
|
status-firefox-esr52:
--- → unaffected
Comment 37•7 years ago
|
||
Hi Liz,
I can confirm that this issue is not reproducible anymore with latest Nightly 56.0a1 (2017-07-20) both e10s enabled and disabled, on Windows 10 x64.
Comment 38•7 years ago
|
||
Confirmed working, thank you for fixing this really annoying thing!
Comment 39•7 years ago
|
||
I have just checked on 55.0b10 to see if Beta is indeed unaffected. It seems that with e10s enabled (1/1 Enabled by default) I can reproduce this issue. Having e10s disabled the issue can't be reproduced.
Can you please take a look at this, Markus?
Comment 40•7 years ago
|
||
You're right, 55 is affected. The bug that caused this (bug 1330185) landed in 55.
Updated•7 years ago
|
Comment 41•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #40)
> You're right, 55 is affected. The bug that caused this (bug 1330185) landed
> in 55.
So is there any barrier to uplifting this to beta? There are still two weeks left in this beta cycle. (and it's really annoying)
(PS: I mentioned that I'm experiencing this on beta way back in comment #16 )
Assignee | ||
Updated•7 years ago
|
Assignee: mstange → benjamin
Assignee | ||
Comment 42•7 years ago
|
||
Comment on attachment 8887201 [details]
Bug 1374038 - _exit() can call static destructors on Windows! So replace it with TerminateProcess which can't, +mstange
Approval Request Comment
[Feature/Bug causing the regression]: proximate cause is bug 1330185; root bug existed far earlier
[User impact if declined]: users with the profiler will hang on OS shutdown/restart on Windows
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: already in the bug
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: hard to say
[Why is the change risky/not risky?]: This is fairly isolated code that I expect would not be very risky. However, shutdown effects are always hard to predict: especially if there are 3rd-party DLLs in our process, this could possibly cause those to change their behavior in ways that are user-visible.
[String changes made/needed]: None
It might be worth testing this shutdown codepath with the more common injected DLLs: antivirus and accessibility tools; just for sanity checking.
Attachment #8887201 -
Flags: approval-mozilla-beta?
Comment on attachment 8887201 [details]
Bug 1374038 - _exit() can call static destructors on Windows! So replace it with TerminateProcess which can't, +mstange
Nice catch, Ciprian. Let's uplift this and see if we can get it into today's beta build.
Attachment #8887201 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 44•7 years ago
|
||
bugherder uplift |
Comment 45•7 years ago
|
||
Verified fixed on 55.0b11, as well. (e10s on/off)
Updated•7 years ago
|
Updated•6 years ago
|
Flags: in-qa-testsuite+
You need to log in
before you can comment on or make changes to this bug.
Description
•