Closed Bug 1124880 Opened 10 years ago Closed 10 years ago

shutdownhang in mozilla::net::nsHttpConnectionMgr::Shutdown() on pending UDP socket close

Categories

(Core :: Networking: HTTP, defect)

Product:

Component:

Version:

36 Branch

Platform:

x86

Windows NT

Type:

defect

Priority:

Not set

Severity:

critical

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

firefox36

+

wontfix

firefox37

+

wontfix

firefox38

+

fixed

firefox39

+

fixed

relnote-firefox

---

36+

People

(Reporter: kairo, Assigned: mayhemer)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(4 files, 6 obsolete files)

v1 10 years ago Honza Bambas (:mayhemer) (deleted), patch	mcmanus : feedback+	Details \| Diff \| Splinter Review
v1.1 10 years ago Honza Bambas (:mayhemer) (deleted), patch		Details \| Diff \| Splinter Review
windows kernel debug session showing hang 10 years ago Matt McHenry (deleted), text/plain		Details
windows kernel debug session showing hang, 2nd time 10 years ago Matt McHenry (deleted), text/plain		Details
Possible work-around: Windows Firewall rule to block SSDP UDP connections 10 years ago Matt McHenry (deleted), image/png		Details
v1.2 10 years ago Honza Bambas (:mayhemer) (deleted), patch		Details \| Diff \| Splinter Review
v1.3 10 years ago Honza Bambas (:mayhemer) (deleted), patch	mcmanus : review+	Details \| Diff \| Splinter Review
v1.3.1 10 years ago Honza Bambas (:mayhemer) (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review
v1.3.2 10 years ago Honza Bambas (:mayhemer) (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review
v1.3.2 10 years ago Honza Bambas (:mayhemer) (deleted), patch	mayhemer : review+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Reporter

Description

•

10 years ago

This bug was filed from the Socorro interface and is report bp-35103a28-d12f-4999-9066-4f6ed2150122. ============================================================= This is probably what a good part of bug 1103833 is mutating to after the fix of bug 1104317 - at least from the looks of early 36.0b2 data where this constitutes >1% of all crashes. Find stats and more of those reports at https://crash-stats.mozilla.com/report/list?signature=shutdownhang+%7C+WaitForSingleObjectEx+%7C+WaitForSingleObject+%7C+PR_Wait+%7C+nsThread%3A%3AProcessNextEvent%28bool%2C+bool%2A%29+%7C+NS_ProcessNextEvent%28nsIThread%2A%2C+bool%29+%7C+mozilla%3A%3Anet%3A%3AnsHttpConnectionMgr%3A%3AShutdown%28%29

Sylvestre Ledru [:Sylvestre]

Comment 1

•

10 years ago

Important crash, tracking.

status-firefox36: --- → affected

tracking-firefox36: --- → +

Comment 2

•

10 years ago

This is (for me) exactly the same crash, this time on 38.0 a1 https://crash-stats.mozilla.com/report/index/6ac6325d-a96b-424d-b994-3e14b2150126 but the signature is a little bit different to this https://crash-stats.mozilla.com/report/index/6b24ad47-c392-406b-ad63-6bf7c2150126 and so this crash on 38.0 a1 is not linked to an 'Related Bug' like this one here

Sylvestre Ledru [:Sylvestre]

Comment 3

•

10 years ago

David, could you help on this? thanks

Flags: needinfo?(dmajor)

Comment 4

•

10 years ago

Patrick, any idea why nsHttpConnectionMgr::Shutdown is taking so long? (For context: bug 1038342 added a watchdog that turns hung shutdowns into crashes) Here's the stack of the main thread: WaitForSingleObject PR_Wait nsThread::ProcessNextEvent(bool, bool*) NS_ProcessNextEvent(nsIThread*, bool) mozilla::net::nsHttpConnectionMgr::Shutdown() mozilla::net::nsHttpHandler::Observe(nsISupports*, char const*, wchar_t const*) nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) nsObserverService::Release() ScopedXPCOMStartup::~ScopedXPCOMStartup() XREMain::XRE_main(int, char** const, nsXREAppData const*) XRE_main

Flags: needinfo?(dmajor) → needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 5

•

10 years ago

its pretty generic.. it just means the socket thread isn't completing. maybe honza has more insight

Flags: needinfo?(mcmanus) → needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 6

•

10 years ago

The report this bug was for filed for: Thread 4 @kernel32.dll!CloseHandleImplementation|ntdll.dll!NtClose Thread 5 @ntdll.dll!ZwRemoveIoCompletion Thread 9 (sts) @ws2_32.dll!closesocket|ntdll.dll!NtClose Thread 35 (dns) @ws2_32.dll!closesocket|ntdll.dll!NtClose Arbitrary reports from the list: https://crash-stats.mozilla.com/report/index/ade2c3b2-0397-4061-878b-7414b2150122 Thread 4 (sts) @ws2_32.dll!connect|ntdll.dll!ZwDeviceIoControlFile Thread 1 (google breakpad) @ntdll.dll!NtClose https://crash-stats.mozilla.com/report/index/8cdee535-2dad-4261-bb34-869072150122 Thread 2 @ntdll.dll!NtRemoveIoCompletion Thread 4 (sts) @ntdll.dll!NtClose https://crash-stats.mozilla.com/report/index/4234021a-bc59-4757-aab6-4a8952150122 Thread 4 (sts) @ntdll.dll!NtClose Thread 36 @ntdll.dll!NtRemoveIoCompletion There is usually NtClose on the stack. Seems like a windows bug.

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 7

•

10 years ago

Also, in cases we are stuck on closing a socket in sts it is always a UDP socket. There are reports with no stuck (being inside poll) on sts too, tho: https://crash-stats.mozilla.com/report/index/60d37d17-db0c-4db0-bbe3-ab84b2150122 https://crash-stats.mozilla.com/report/index/d1c60009-08a3-4bac-91cb-9fae72150123

Sylvestre Ledru [:Sylvestre]

Comment 8

•

10 years ago

Honza, any idea if we could workaround this windows bug? thanks

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 9

•

10 years ago

Tentatively assigning to me, this needs some deep investigation. No idea for a workaround, no idea of the cause at all.

Assignee: nobody → honzab.moz

Flags: needinfo?(honzab.moz)

Reporter

Comment 10

•

10 years ago

Also note that the comments in the URL given in comment #0 sound like those are really troubled installations, sounds like we would not only run into hangs on shutdown but all over the place.

Reporter

Updated

•

10 years ago

Summary: crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::net::nsHttpConnectionMgr::Shutdown() → shutdownhang in mozilla::net::nsHttpConnectionMgr::Shutdown()

Updated

•

10 years ago

Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] [@ shutdownhang | WaitForSingleObjectEx | WaitForSi…

Updated

•

10 years ago

Crash Signature: , bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] [@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread* → , bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] [@ shutdownhang | WaitForSingleObjectEx | PR_Wait | mozilla::ReentrantMonitor::Wait(unsigned int) | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*

Sylvestre Ledru [:Sylvestre]

Comment 11

•

10 years ago

I don't see that bug fixed for 36. Wontfix but tracking it for the next releases.

status-firefox36: affected → wontfix

status-firefox37: --- → affected

status-firefox38: --- → affected

tracking-firefox37: --- → +

tracking-firefox38: --- → +

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 12

•

10 years ago

Release Note Request (optional, but appreciated) [Why is this notable]: Users will see crash notifications more often. [Suggested wording]: Firefox hangs on shutdown are now reported as crashes, in order to help Mozilla diagnose and fix the specific issues during shutdown. [Links (documentation, blog post, etc)]: (not sure. maybe bug 1038342, bug 1104317. a blog post might help explain this.)

relnote-firefox: --- → ?

Sylvestre Ledru [:Sylvestre]

Comment 13

•

10 years ago

Added to the release notes. Thanks Liz!

relnote-firefox: ? → 36+

Reporter

Updated

•

10 years ago

Crash Signature: , bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] → , bool) | mozilla::net::nsHttpConnectionMgr::Shutdown()] [@ shutdownhang | ntdll.dll@0x3c6bc]

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Version: Trunk → 36 Branch

Comment 14

•

10 years ago

Any idea to hang out (kill) in the Windows TM permanent task Firefox.exe or Nightly.exe which remains after this crash?

Honza Bambas (:mayhemer)

Assignee

Comment 15

•

10 years ago

The slow UDP close (most common cause of this hang) is also known to chromium guys [1], but there is no solution right now on their side (AFAIK, no deeper check on that). According [2] closesocket tries to send unsent data and when network has changed (cable unplugged, wifi changed) it might take up to 5 secs. The blog post claims this is by default since Vista, but I can see reports from version 5.1.2600 (windows XP) tho. Anyway, one of the reports says Fx crashes when network is switched. According our shutdown telemetry [3] seems like there is some distribution of slow shutdowns already. It may be that when some amount of UDP sockets piles up, we are slower and go over the 60s shutdown watchdog limit. My vote is to force abortive close on UDP sockets, first just on Win, according [4]: "If the l_onoff member of the linger structure is nonzero and l_linger member is zero, closesocket is not blocked even if queued data has not yet been sent or acknowledged. This is called a hard or abortive close, because the socket's virtual circuit is reset immediately, and any unsent data is lost." [1] https://code.google.com/p/chromium/issues/detail?id=165382#c81 [2] http://blogs.msdn.com/b/winsdk/archive/2013/10/09/udp-closesocket-takes-upto-5-seconds-to-return-in-disconnect-remote-host-down-scenario-due-to-pending-data-to-send.aspx [3] http://telemetry.mozilla.org/#filter=beta%2F36%2FSHUTDOWN_PHASE_DURATION_TICKS_PROFILE_CHANGE_TEARDOWN%2Fsaved_session%2FFirefox&aggregates=Submissions&evoOver=Builds&locked=true&sanitize=true&renderhistogram=Table [4] https://msdn.microsoft.com/en-us/library/windows/desktop/ms737582%28v=vs.85%29.aspx

Status: NEW → ASSIGNED

Honza Bambas (:mayhemer)

Assignee

Comment 16

•

10 years ago

SO_LINGER is not purposed for UDP sockets, at least not on Windows.

Honza Bambas (:mayhemer)

Assignee

Comment 17

•

10 years ago

One way to at least mitigate this would be to create a separate thread for *each* PR_Close() call on each UDP socket. Then we just need to find a spot to join all the treads during shutdown. Sounds a bit crazy but I can imagine this be upliftable.

Honza Bambas (:mayhemer)

Assignee

Comment 18

•

10 years ago

Attached patch v1 (obsolete) (deleted) — Details — Splinter Review

I so far haven't found any general way to avoid blocks on the socket thread. I believe select() call may also be slow from time to time (few reports are inside select). This should mitigate the udp close delay, if it can be avoided by parallelization of close() calls on respective sockets. Roughly tested. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d9d59454a466

Attachment #8566308 - Flags: review?(mcmanus)

Patrick McManus [:mcmanus]

Comment 19

•

10 years ago

Comment on attachment 8566308 [details] [diff] [review] v1 Review of attachment 8566308 [details] [diff] [review]: ----------------------------------------------------------------- the try build is red. You've done a nice job documenting that this is a known issue and parallelism seems the right mitigation. I wonder if we should be using the existing threadpool mechanism here - maybe unlimited isn't the right thing either. did you see the one blocked in connect? I wonder if we should be testing sockets at connect time to see if they are blocking or not. I think we should do something like this, but only expect it to be a partial mitigation. I think it should get a lot of pre-release air time - implications of spawning a lot of threads might take a while to see. ::: netwerk/base/nsUDPSocket.cpp @@ +114,5 @@ > + void AddObserver(); > + void JoinAndRemove(); > + > + PRFileDesc *mFd; > + PRThread* mThread; PRThread *mThread (throughout) @@ +156,5 @@ > + // Released after the thread is done. > + mSelf = this; > + mThread = PR_CreateThread(PR_USER_THREAD, ThreadFunc, this, > + PR_PRIORITY_NORMAL, PR_GLOBAL_THREAD, > + PR_JOINABLE_THREAD, 128 * 1024); that stack seems huge.. and given that we might have a lot in parallel and VM space is an issue on 32 bit platforms... @@ +165,5 @@ > + > + // Observer service must be worked with on the main thread only. > + // This method is called usually on the socket thread. > + nsCOMPtr<nsIRunnable> event = NS_NewRunnableMethod( > + this, &nsUDPSocketCloseThread::AddObserver); can this deadlock if the main thread is trying to shut down the connection manager?

Attachment #8566308 - Flags: review?(mcmanus) → feedback+

Honza Bambas (:mayhemer)

Assignee

Comment 20

•

10 years ago

(In reply to Patrick McManus [:mcmanus] from comment #19) > Comment on attachment 8566308 [details] [diff] [review] > v1 > > Review of attachment 8566308 [details] [diff] [review]: > ----------------------------------------------------------------- > > the try build is red. > > You've done a nice job documenting that this is a known issue and > parallelism seems the right mitigation. I wonder if we should be using the > existing threadpool mechanism here - maybe unlimited isn't the right thing > either. I don't think there will be that much sockets to close, they are also closing during browser lifetime. We can have our favorite telemetry (I love it :)) to check how far with the numbers we go (can go in with this patch in) > > did you see the one blocked in connect? I wonder if we should be testing > sockets at connect time to see if they are blocking or not. I've seen blocks in connect(), select() and some others I think. I believe there is a general problem with winsock, but google is silent about it. > > I think we should do something like this, but only expect it to be a partial > mitigation. I think it should get a lot of pre-release air time - > implications of spawning a lot of threads might take a while to see. Yes, I am afraid we have to. > > ::: netwerk/base/nsUDPSocket.cpp > @@ +114,5 @@ > > + void AddObserver(); > > + void JoinAndRemove(); > > + > > + PRFileDesc *mFd; > > + PRThread* mThread; > > PRThread *mThread (throughout) I never know.. > > @@ +156,5 @@ > > + // Released after the thread is done. > > + mSelf = this; > > + mThread = PR_CreateThread(PR_USER_THREAD, ThreadFunc, this, > > + PR_PRIORITY_NORMAL, PR_GLOBAL_THREAD, > > + PR_JOINABLE_THREAD, 128 * 1024); > > that stack seems huge.. and given that we might have a lot in parallel and > VM space is an issue on 32 bit platforms... Yep, something to research tho. Nightly is our friend here probably. > > @@ +165,5 @@ > > + > > + // Observer service must be worked with on the main thread only. > > + // This method is called usually on the socket thread. > > + nsCOMPtr<nsIRunnable> event = NS_NewRunnableMethod( > > + this, &nsUDPSocketCloseThread::AddObserver); > > can this deadlock if the main thread is trying to shut down the connection > manager? No, we spin the MT loop when shutting down. Also, this just adds an observer, nothing else.

Comment 21

•

10 years ago

> > VM space is an issue on 32 bit platforms... > Yep, something to research tho. Nightly is our friend here probably. FWIW we don't see a huge amount of address space exhaustion on Nightly these days. (I would guess that 64-bit builds are a big reason, and the daily restarts probably help too)

Honza Bambas (:mayhemer)

Assignee

Comment 22

•

10 years ago

Attached patch v1.1 (obsolete) (deleted) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c41b1761691 Now includes telemetry.

Attachment #8566308 - Attachment is obsolete: true

Attachment #8567077 - Flags: review?(mcmanus)

Honza Bambas (:mayhemer)

Assignee

Comment 23

•

10 years ago

Improved stack size: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f61e11563d3

Honza Bambas (:mayhemer)

Assignee

Comment 24

•

10 years ago

And attempt to fix linux-static-analyzes build fail: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9546d95b460

Honza Bambas (:mayhemer)

Assignee

Comment 25

•

10 years ago

Few failures (comment 23): - netwerk/test/unit/test_udp_multicast.js, socket is expected to be closed on return of nsUDPSocket::Close() ; fixed as: no use of the thread in this method - Assertion failure: sActiveThreadsCount > 0, on x64 - sActiveThreadsCount is 32bit, hence unaligned ; fixed as: using uintptr_t - [1], Main app process: killed by out-of-range signal, number 117 + some UDP socket warning in the console ; probably coming from dom\network\tests\test_udpsocket.html and is probably related to the first error in this list [1] https://treeherder.mozilla.org/logviewer.html#?job_id=5074251&repo=try

Dragana Damjanovic [:dragana]

Comment 26

•

10 years ago

Just for your information. I will open a new bug for the test_udp_multicast.js failure. The patch from bug 1131557 makes it fail reliably on 32bit linux, but the patch is not the cause. Something strange is going on, i am still not sure what is the problem. I am working on it...

Patrick McManus [:mcmanus]

Updated

•

10 years ago

Attachment #8567077 - Flags: review?(mcmanus) → review+

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1135405

Honza Bambas (:mayhemer)

Assignee

Comment 27

•

10 years ago

(In reply to Honza Bambas (:mayhemer) from comment #25) > Few failures (comment 23): > - netwerk/test/unit/test_udp_multicast.js, socket is expected to be closed > on return of nsUDPSocket::Close() ; fixed as: no use of the thread in this > method > - [1], Main app process: killed by out-of-range signal, number 117 + some > UDP socket warning in the console ; probably coming from > dom\network\tests\test_udpsocket.html and is probably related to the first > error in this list Looking more at the code, it seems like calling PR_Close directly from ::Close() is nice but won't fix these problems. I believe there is a listener on the socket class at that time and we go to the socket thread with OnMsgClose that ends up in OnSocketDetached. So, Dragana, I'll wait for your patch of the failing test first.

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Attachment #8567077 - Flags: review+

Dragana Damjanovic [:dragana]

Comment 28

•

10 years ago

(In reply to Honza Bambas (:mayhemer) from comment #25) > - [1], Main app process: killed by out-of-range signal, number 117 + some > UDP socket warning in the console ; probably coming from > dom\network\tests\test_udpsocket.html and is probably related to the first > error in this list > > > [1] https://treeherder.mozilla.org/logviewer.html#?job_id=5074251&repo=try the udp socket warning can be cause by https://hg.mozilla.org/integration/fx-team/annotate/37e5f630049b/toolkit/modules/secondscreen/SimpleServiceDiscovery.jsm#l163 SimpleServiceDiscovery.jsm does not shut down at all. This service was causing bug 1085266 (comment 13 explains it a bit)

Comment 29

•

10 years ago

Attached file windows kernel debug session showing hang (deleted) — Details

I believe that the following three crash reports are all instances of this bug: bp-163de6ab-bc47-4e3d-90d5-f91992150304 3/4/2015 16:53 bp-5a081db2-ffde-4d6c-857f-3924c2150304 3/4/2015 10:02 bp-311f753d-21d3-4164-83e8-fe9c92150303 3/3/2015 11:17 The attached debug log is according to the instructions at http://blogs.technet.com/b/markrussinovich/archive/2005/08/17/unkillable-processes.aspx. (Apologies for the initial flailing.) I've been hitting this bug a couple of times a day for the last few days; let me know if I can gather any more information to help! :)

Comment 31

•

10 years ago

so has this been introduced by bug 1054959?

Comment 32

•

10 years ago

I looked for a preference to disable device discovery by UDP (I'm assuming that's the part of that bug you're suspecting), but couldn't find anything. (I searched for "device", "video", and "discover" in about:config.) If you can point me to it, I'd be happy to turn it off and report back after a week or so. It's also not obvious to me whether that feature exists in the build I'm using or was backed out; I'm on: > Build identifier: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0

Comment 33

•

10 years ago

Attached file windows kernel debug session showing hang, 2nd time (deleted) — Details

This corresponds to https://crash-stats.mozilla.com/report/index/2b690fb1-b9a5-4ce5-9c7f-c64192150309. It happened after being in a 3.5-hour meeting, so on wireless and with a few suspend-resumes thrown in. But Firefox was shut down during most of the time (I'm almost positive it was shut down during all of the suspend/resume cycles). In general I've been able to reduce the frequency of this issue by exiting FF whenever I would normally alt-tab away from it, only leaving it active when I'm directly using it.

Comment 35

•

10 years ago

Attached image Possible work-around: Windows Firewall rule to block SSDP UDP connections (deleted) — Details

I dug around a bit in the code in bug 1054959, and found that SimpleServiceDiscovery.jsm uses UDP port 1900. I created a Windows Firewall rule to block outbound connections on that remote port, and haven't seen this bug since then. I never would've thought of that -- thank you philipp!

Updated

•

10 years ago

Blocks: 1054959

Comment 36

•

10 years ago

My problem with Firefox not connecting and not restarting is now gone. I had some trouble with the motherboard on my computer. I reflashed it, as well it will now only work with one stick of ram, but at least I have no problem with Firefox. The motherboard uses the Intel G33/G31 chipset. I don't know if it had gone bad somehow. I did have some trouble with Window Media Player. Hope this may help.

Chandan Kumar Baba

Comment 37

•

10 years ago

(In reply to Matt McHenry from comment #35) > Created attachment 8577239 [details] I am unable to see the properties window (digging into firewall outbound rule,but not able to get the properties window of firefox) if you able to gives the steps,then it would be great for me. > Possible work-around: Windows Firewall rule to block SSDP UDP connections > > I dug around a bit in the code in bug 1054959, and found that > SimpleServiceDiscovery.jsm uses UDP port 1900. I created a Windows Firewall > rule to block outbound connections on that remote port, and haven't seen > this bug since then. I never would've thought of that -- thank you philipp!

Lawrence Mandel [:lmandel] (use needinfo)

Comment 38

•

10 years ago

(In reply to Matt McHenry from comment #35) > Created attachment 8577239 [details] > Possible work-around: Windows Firewall rule to block SSDP UDP connections > > I dug around a bit in the code in bug 1054959, and found that > SimpleServiceDiscovery.jsm uses UDP port 1900. I created a Windows Firewall > rule to block outbound connections on that remote port, and haven't seen > this bug since then. I never would've thought of that -- thank you philipp! If this is correct, this bug should be fixed in 37+ as tab casting will be disabled by default via bug 1142521. We should be able to tell by the crash volume in 37 Beta 6.

Comment 39

•

10 years ago

(In reply to Chandan Kumar Baba from comment #37) > (In reply to Matt McHenry from comment #35) > > Possible work-around: Windows Firewall rule to block SSDP UDP connections > > I am unable to see the properties window (digging into firewall outbound > rule,but not able to get the properties window of firefox) > if you able to gives the steps,then it would be great for me. At this point it sounds like you might be better off grabbing a FF 37 beta 6+ build. But if you still want to create a firewall rule, here's how: Start -> Windows Firewall right-click on Outbound Rules in the left-hand pane and select "New Rule ..." select the "Port" radio button and click next select the "UDP" radio button, enter 1900 in the "Specific remote ports" text box, and click next select the "block the connection" radio button and click next select all three of the "Domain", "Private" and "Public" check boxes and click next name your rule, paste the URL of this bug into the description, and click finish find your new rule in the list and double-click to open it on the "Programs and Services" tab, select the "This program" radio button click the browse button and locate firefox.exe

Lawrence Mandel [:lmandel] (use needinfo)

Comment 41

•

10 years ago

I still see this crash in 37.0b6 so looks like bug 1142521 did not resolve this issue. We're now too late in 37 to take a fix so I've marked 37 as wontfix.

status-firefox37: affected → wontfix

status-firefox39: --- → affected

tracking-firefox39: --- → +

Honza Bambas (:mayhemer)

Assignee

Comment 42

•

10 years ago

Attached patch v1.2 (obsolete) (deleted) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1ba053c2b5c

Attachment #8567077 - Attachment is obsolete: true

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Attachment #8581758 - Attachment filename: 1124880-new-thread.patch → v1.2

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1143866

Honza Bambas (:mayhemer)

Assignee

Comment 43

•

10 years ago

Comment on attachment 8581758 [details] [diff] [review] v1.2 Patrick, I've added telemetry and few other cosmetic changes. It's not for carrying r+, so asking again for a full r. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a800b82c5725

Attachment #8581758 - Flags: review?(mcmanus)

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Attachment #8581758 - Attachment description: 1124880-new-thread.patch → v1.2

Attachment #8581758 - Attachment filename: v1.2 → 1124880-new-thread.patch

Honza Bambas (:mayhemer)

Assignee

Comment 44

•

10 years ago

Attached patch v1.3 (obsolete) (deleted) — Details — Splinter Review

Fixed (finally!) the thread count assertion failure. https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5743ef11575

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Attachment #8582498 - Flags: review?(mcmanus)

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Attachment #8581758 - Attachment is obsolete: true

Attachment #8581758 - Flags: review?(mcmanus)

Patrick McManus [:mcmanus]

Comment 45

•

10 years ago

Comment on attachment 8582498 [details] [diff] [review] v1.3 Review of attachment 8582498 [details] [diff] [review]: ----------------------------------------------------------------- if the problem persists because a single thread an block too long I guess we can move to a timeout and kill the thread (and leak the socket) approach, right? ::: netwerk/base/nsUDPSocket.cpp @@ +137,5 @@ > + static bool sPastShutdown; > + > + // Active threads (roughly) counter, modified only on the main thread > + // and used only for telemetry reports. > + static uintptr_t sActiveThreadsCount; not really a review comment, but why uintptr_t if the var never holds a ptr? it seems accumulate just wants uint32_t

Attachment #8582498 - Flags: review?(mcmanus) → review+

Honza Bambas (:mayhemer)

Assignee

Comment 46

•

10 years ago

(In reply to Patrick McManus [:mcmanus] from comment #45) > Comment on attachment 8582498 [details] [diff] [review] > v1.3 > > Review of attachment 8582498 [details] [diff] [review]: > ----------------------------------------------------------------- > > if the problem persists because a single thread an block too long I guess we > can move to a timeout and kill the thread (and leak the socket) approach, > right? Yes, probably have it win specific. TerminateThread would be the way. > > ::: netwerk/base/nsUDPSocket.cpp > @@ +137,5 @@ > > + static bool sPastShutdown; > > + > > + // Active threads (roughly) counter, modified only on the main thread > > + // and used only for telemetry reports. > > + static uintptr_t sActiveThreadsCount; > > not really a review comment, but why uintptr_t if the var never holds a ptr? > it seems accumulate just wants uint32_t My poor attempt for alignment enforcement. uint32_t it is.

Honza Bambas (:mayhemer)

Assignee

Comment 47

•

10 years ago

Attached patch v1.3.1 (obsolete) (deleted) — Details — Splinter Review

Attachment #8582498 - Attachment is obsolete: true

Attachment #8583239 - Flags: review+

Benjamin Smedberg

Comment 48

•

10 years ago

Don't ever use TerminateThread. If the thread is in any critical section (including the allocator) the whole process will deadlock.

Honza Bambas (:mayhemer)

Assignee

Comment 49

•

10 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #48) > Don't ever use TerminateThread. If the thread is in any critical section > (including the allocator) the whole process will deadlock. And I think that is our case, I believe NtClose keeps one. So, we should slowly turn to MS for help here...

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 50

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8be35539cc88

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 51

•

10 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5044ad98d160 for build bustage: https://treeherder.mozilla.org/logviewer.html#?job_id=8086380&repo=mozilla-inbound

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 52

•

10 years ago

Attached patch v1.3.2 (obsolete) (deleted) — Details — Splinter Review

Attachment #8583239 - Attachment is obsolete: true

Attachment #8584524 - Flags: review+

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Flags: needinfo?(honzab.moz)

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 53

•

10 years ago

Something's seriously wrong with that attachment.

Keywords: checkin-needed

Honza Bambas (:mayhemer)

Assignee

Comment 54

•

10 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #53) > Something's seriously wrong with that attachment. Hmm.. how that happened... Re-attaching, sorry.

Honza Bambas (:mayhemer)

Assignee

Comment 55

•

10 years ago

Attached patch v1.3.2 (deleted) — Details — Splinter Review

Attachment #8584524 - Attachment is obsolete: true

Attachment #8584632 - Flags: review+

Honza Bambas (:mayhemer)

Assignee

Comment 56

•

10 years ago

Seems ok now.

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 57

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/88b311e87678

Keywords: checkin-needed

Phil Ringnalda (:philor)

Comment 58

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/88b311e87678

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox39: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

Nickolay_Ponomarev

Comment 59

•

10 years ago

FWIW, there is a number of bugs saying "Firefox can't connect, hangs on shutdown, cannot be killed" recently, and at least one crash id linking the hang to the stack here. bug 1125686 (I CCed everyone there), bug 1137994, bug 1105577, bug 1142981, bug 1141627, bug 1143969, bug 646189 comment 13. ...and a forum thread saying this is new in 36 http://forums.mozillazine.org/viewtopic.php?f=38&t=2916383

Reporter

Comment 60

•

10 years ago

Looks like we still have crashes in 39 dev edition and Nightly 40 which all came out after this patch landed in 39: https://crash-stats.mozilla.com/search/?product=Firefox&version=39.0a2&version=40.0a1&process_type=browser&process_type=content&signature=~nsHttpConnectionMgr%3A%3AShutdown&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Sylvestre Ledru [:Sylvestre]

Comment 61

•

10 years ago

Honza, can we have an uplift to beta? Thanks

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 62

•

10 years ago

1. Some of the reports in comment 60 show we are stuck in PR_Poll. This could be an effect of a known bug with signaling through the pollable event. It may be that there is infinite timeout put to PR_Poll. We could modify nsSocketTransportService::PollTimeout() to return no more then few seconds timeout (5/7/10 or so) to let the STS thread loop occasionally. With Dragana's patch for event processing we might fix this bug. 2. https://crash-stats.mozilla.com/report/index/5ba8f826-a4b2-4131-a63c-e4b112150404#allthreads, thread 11 (socket thread) shows a dialog via nsAutodial::DialDefault: RasDialDlgW nsAutodial::DialDefault(wchar_t const*) nsNativeConnectionHelper::OnConnectionFailed(wchar_t const*) nsSocketTransport::RecoverFromError() mozilla::net::nsHttpConnectionMgr::nsHalfOpenSocket::OnTransportStatus(nsITransport*, nsresult, __int64, __int64) 3. Some reports show we are stuck in connect(), few of them show we are stuck in close() (nsNSSSocketInfo::CloseSocketAndDestroy). 4. https://crash-stats.mozilla.com/report/index/5acfd3bf-631b-46fd-b67b-731e42150407#allthreads, thread 9 https://crash-stats.mozilla.com/report/index/73b21efe-1098-4285-b57d-771342150402#allthreads, thread 8 NtWaitForSingleObject ntdll.dll@0x1e1d6 RtlpEnterCriticalSectionContended RtlpEnterCriticalSectionContended PR_Lock mozilla::net::nsWSAdmissionManager::GetSessionCount(int&) mozilla::net::WebSocketChannel::StopSession(nsresult) mozilla::net::WebSocketChannel::ReleaseSession() mozilla::net::WebSocketChannel::ProcessInput(unsigned char*, unsigned int) mozilla::net::WebSocketChannel::OnInputStreamReady(nsIAsyncInputStream*) nsInputStreamReadyEvent::Run() nsThread::ProcessNextEvent(bool, bool*) NS_ProcessNextEvent(nsIThread*, bool) nsSocketTransportService::Run() 5. https://crash-stats.mozilla.com/report/index/27ae8618-8676-436d-8056-c6db72150406#allthreads, thread 5 https://crash-stats.mozilla.com/report/index/bb6d4bd5-75b5-475d-a926-4207d2150404#allthreads, thread 4 PR_Lock nsSocketTransportService::Dispatch(nsIRunnable*, unsigned int) http://hg.mozilla.org/releases/mozilla-aurora/annotate/239508ee646f/netwerk/base/nsSocketTransportService2.cpp#l128 nsSocketTransportService::Run() http://hg.mozilla.org/releases/mozilla-aurora/annotate/239508ee646f/netwerk/base/nsSocketTransportService2.cpp#l815 nsThread::ProcessNextEvent(bool, bool*) NS_ProcessNextEvent(nsIThread*, bool) I'll file bugs. mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) MessageLoop::RunHandler() MessageLoop::Run() nsThread::ThreadFunc(void*) _PR_NativeRunThread pr_root _callthreadstartex

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1152040

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1152041

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1152046

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1152047

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Depends on: 1152048

Honza Bambas (:mayhemer)

Assignee

Comment 63

•

10 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #61) > Honza, can we have an uplift to beta? Thanks I can, what is the deadline for it? (kinda busy now)

Sylvestre Ledru [:Sylvestre]

Comment 64

•

10 years ago

Honzam, today please (or tomorrow morning if you can't). Stability is not good in 38, any fix is good to take! Thanks

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 65

•

10 years ago

There needs to be uplifted also patch from bug 1135405 then, AFAIK.

Honza Bambas (:mayhemer)

Assignee

Comment 66

•

10 years ago

Comment on attachment 8584632 [details] [diff] [review] v1.3.2 applies clearly on Beta but not tested on try. LAND AFTER BUG 1135405! Approval Request Comment [Feature/regressing bug #]: none [User impact if declined]: shutdown hang, inability to browse [Describe test coverage new/current, TreeHerder]: landed on m-c on 2015-03-29, currently on m-a [Risks and why]: this not a simple patch and carries some risk, tho should be small. This creates new threads, there is no limit for it. [String/UUID change made/needed]: none

Flags: needinfo?(honzab.moz)

Attachment #8584632 - Flags: approval-mozilla-beta?

Sylvestre Ledru [:Sylvestre]

Comment 67

•

10 years ago

Comment on attachment 8584632 [details] [diff] [review] v1.3.2 Shutdown hangs have been an important issue lately. I am taking the risk as we are early in the beta cycle and we can always backout this change if it introduces more regressions that it fixes. Should be in 38 beta 3 Thanks Honza!

Attachment #8584632 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 68

•

10 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/b04842ef36ca

status-firefox38: affected → fixed

Sylvestre Ledru [:Sylvestre]

Comment 69

•

10 years ago

Robert, you mentioned during that the last channel meeting that we still have crashes caused by this bug. Do you confirm?

Flags: needinfo?(kairo)

Reporter

Comment 70

•

10 years ago

Yes, it should have been fixed in 38.0b3 but it's still at #9 with 1.4% of all crashes in 38.0b4. I'm reopening.

Status: RESOLVED → REOPENED

Flags: needinfo?(kairo)

Resolution: FIXED → ---

Honza Bambas (:mayhemer)

Assignee

Comment 71

•

10 years ago

kairo, I filed new bugs I can see stacks for, see blocks. This one is fixed.

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Summary: shutdownhang in mozilla::net::nsHttpConnectionMgr::Shutdown() → shutdownhang in mozilla::net::nsHttpConnectionMgr::Shutdown() on pending UDP socket close

Honza Bambas (:mayhemer)

Assignee

Comment 72

•

10 years ago

If you want an umbrella bug for this issue, please open a new one and move the blocking bugs there (and make this one blocking it too, left closed). Thanks.

Reporter

Comment 73

•

10 years ago

New bugs are fine as long as they have the right entries in crash signatures and the crash keywords set. Thanks for looking into this!

Sylvestre Ledru [:Sylvestre]

Comment 74

•

10 years ago

Honza, this bug is still presented during our channel meeting and I would like to see it fixed for 38. I looked at the "block" and it seems unrelated (send video to device). Are you sure about the bug number? Thanks

Flags: needinfo?(honzab.moz)

Reporter

Updated

•

10 years ago

Keywords: topcrash-win

Honza Bambas (:mayhemer)

Assignee

Comment 75

•

10 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #74) > Honza, this bug is still presented during our channel meeting and I would > like to see it fixed for 38. > I looked at the "block" and it seems unrelated (send video to device). Are > you sure about the bug number? Thanks It probably means someone believes this bug is caused by that one (is its regression). What do you want me to do? The patch here is on beta (38). Other bugs with different cause but identical symptoms are tracked. I don't have time to work on them right now tho.

Flags: needinfo?(honzab.moz)

Reporter

Comment 76

•

10 years ago

It means that this is the only bug on file that is connected with the signature (see comment #73), and the signature is still happening in large enough volume to be a concern. Is there any bug filed that should be about the majority of the issues left with this signature?

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 77

•

10 years ago

What about all of the open "Depends on" bugs?

Flags: needinfo?(honzab.moz)

Reporter

Comment 78

•

10 years ago

(In reply to Honza Bambas (:mayhemer) from comment #77) > What about all of the open "Depends on" bugs? None of them have the crash signature and crash keyword set, so they do not appear on crash-stats for the signature.

Reporter

Comment 79

•

10 years ago

(In reply to Honza Bambas (:mayhemer) from comment #75) > The patch here is on beta (38). Other bugs with different cause but > identical symptoms are tracked. I don't have time to work on them right now > tho. Also none of them are tracked for any release via tracking flags. So the meaning of "tracked" is an issue as well. ;-)

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

Blocks: 1158189

Honza Bambas (:mayhemer)

Assignee

Comment 80

•

10 years ago

I cloned this to bug 1158189. Please set the flags, I don't have perms.

Honza Bambas (:mayhemer)

Assignee

Updated

•

10 years ago

No longer blocks: 1054959

No longer depends on: 1152041, 1152046, 1152047, 1152048, 1135405, 1152040

You need to log in before you can comment on or make changes to this bug.