Closed Bug 1217807 Opened 9 years ago Closed 9 years ago

Assertion failure: GetThread() == NS_GetCurrentThread(), at netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp:746

Categories

(Core :: Networking: DNS, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla45
blocking-b2g 2.5+
feature-b2g 2.5+
Tracking Status
firefox45 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: schien, Assigned: xeonchen)

References

Details

Attachments

(3 files, 12 obsolete files)

(deleted), patch
xeonchen
: review+
gwagner
: feedback+
Details | Diff | Splinter Review
(deleted), patch
xeonchen
: review+
Details | Diff | Splinter Review
(deleted), patch
xeonchen
: review+
Details | Diff | Splinter Review
This assertion happened on latest m-c debug build with TV Gaia. I saw multiple similar assertions happened on the same file but different line (RegisterOperator and GetAddrInfoOperator). https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp#536 https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp#746 In latest master, discoverable is default on for TV Gaia so it's easy to enter reboot cycle.
The root cause is those callback functions might be invoked under certain condition. However, our code didn't handle it. We'll need to double check the mDNSResponder API document. https://developer.apple.com/library/mac/documentation/Networking/Reference/DNSServiceDiscovery_CRef/index.html During code investigation I also found reference count issue in ResolveOperator and GetAddrInfoOperator. |mDeleteProtector| set to nullptr too early. This will potentially cause use-after-free issue. https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp#623 https://dxr.mozilla.org/mozilla-central/source/netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp#748
Flags: needinfo?(xeonchen)
Flags: needinfo?(kechang)
B2G with debug build doesn't boot any more. Seems like its caused by bug 1200132. Can we back out asap?
Blocks: 1200132
Severity: normal → blocker
blocking-b2g: --- → 2.5+
Priority: -- → P1
STR: Flash 'make pruduction' gaia build on debug gecko.
Attached patch disable-mdns.patch (deleted) — Splinter Review
Backout bug 1200132 will not be an easy job because the are several modification based on it. We can disable this feature first for unblocking people. How do you think?
Attachment #8678715 - Flags: review?(xeonchen)
Attachment #8678715 - Flags: feedback?(anygregor)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #5) > Created attachment 8678715 [details] [diff] [review] > disable-mdns.patch > > Backout bug 1200132 will not be an easy job because the are several > modification based on it. We can disable this feature first for unblocking > people. How do you think? As long as this fixes the startup crash I am fine with disabling.
Comment on attachment 8678715 [details] [diff] [review] disable-mdns.patch Thanks for the quick turnaround!
Attachment #8678715 - Flags: feedback?(anygregor) → feedback+
Attachment #8678715 - Flags: review?(xeonchen) → review+
Use |ServiceWatcher| to hold |*Operators| to make sure they are not destroyed before stopping any potential callbacks.
Flags: needinfo?(xeonchen)
Attachment #8678773 - Flags: feedback?(schien)
Need to check-in disable-mdns.patch first.
Keywords: checkin-needed
Whiteboard: [leave open]
Assignee: nobody → xeonchen
Status: NEW → ASSIGNED
Comment on attachment 8678773 [details] [diff] [review] [WIP] 0001-Bug-1217807-use-ServiceWatcher-to-extend-life-cycle-.patch Review of attachment 8678773 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the quick patch, @xeonchen! Verified ok on my local environment and here is my feedback. ::: dom/presentation/provider/MulticastDNSDeviceProvider.cpp @@ +864,5 @@ > UnregisterService(aReason); > > + if (mDiscoverable) { > + NS_DispatchToCurrentThread( > + NS_NewRunnableMethod(this, &MulticastDNSDeviceProvider::RegisterService)); I'm worried about if there is any possible |mDiscoverable| change or |RegisterService| happened before this Runnable being executed. Need to double check the error handling in |RegisterService|. ::: netwerk/dns/mdns/libmdns/MDNSResponderOperator.h @@ +52,5 @@ > > nsresult Start() override; > nsresult Stop() override; > + using MDNSResponderOperator::AddRef; > + using MDNSResponderOperator::Release; use |NS_DECL_ISUPPORTS_INHERITED| instead.
Attachment #8678773 - Flags: feedback?(schien) → feedback+
Removing blocking flag since feature is disabled. Please add a test for the failing use-case when you re-enable it.
Severity: blocker → normal
blocking-b2g: 2.5+ → ---
Priority: P1 → P2
Depends on: 1218660
* Check flags in callback functions to accept multiple callback calls. * Handle life-cycle of |ResolveOperator| and |GetAddrInfoOperator|.
Attachment #8679367 - Flags: feedback?(schien)
Attachment #8679367 - Flags: feedback?(kechang)
Attached patch [WIP] network-event-handler-in-server.patch (obsolete) (deleted) — Splinter Review
1. monitor network interface change to restart TCPPresentationServer 2. notify MDNSDeviceProvider server port change to redo service registration 3. allow MDNSDeviceProvider::RegisterService to do service registration without restart TCPPresentationServer 4. found and fix RegisterService <-> OnRegistrationFail infinite loop [TODO] please help integrate with your current patch and provide corresponding test case.
Attachment #8679418 - Flags: feedback?(xeonchen)
Comment on attachment 8679418 [details] [diff] [review] [WIP] network-event-handler-in-server.patch Review of attachment 8679418 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/presentation/provider/TCPPresentationServer.js @@ +230,5 @@ > + if (this._isServiceInit()) { > + this.close(); > + > + try { > + this.startService(); Is this called only when the service was initialized?
Attachment #8679418 - Flags: feedback?(xeonchen) → feedback+
Comment on attachment 8679418 [details] [diff] [review] [WIP] network-event-handler-in-server.patch Review of attachment 8679418 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/presentation/provider/TCPPresentationServer.js @@ +230,5 @@ > + if (this._isServiceInit()) { > + this.close(); > + > + try { > + this.startService(); Yes, I intended to do so. There is no point to start the server socket if it's never created. Also, that's the reason I named this function "restartService".
Attachment #8678773 - Attachment is obsolete: true
Attachment #8679367 - Attachment is obsolete: true
Attachment #8679367 - Flags: feedback?(schien)
Attachment #8679367 - Flags: feedback?(kechang)
Flags: needinfo?(kechang)
Attachment #8680020 - Flags: review?(schien)
@junior: please help to review the TCPPresentationServer part, thanks.
Attachment #8679418 - Attachment is obsolete: true
Attachment #8680023 - Flags: review?(schien)
Attachment #8680023 - Flags: review?(juhsu)
Comment on attachment 8680023 [details] [diff] [review] Part 2: Handle network online/offline event in TCPPresentationServer Review of attachment 8680023 [details] [diff] [review]: ----------------------------------------------------------------- Generally good to me, but I'd like to have a final check :) ::: dom/presentation/provider/TCPPresentationServer.js @@ +43,5 @@ > /** > * 0 or undefined indicates opt-out parameter, and a port will be selected > * automatically. > */ > let serverSocketPort = (aPort !== 0) ? aPort : -1; let serverSocketPort = (typeof aPort !== "undefined" && aPort !== 0) ? aPort : -1; @@ +231,5 @@ > + this.close(); > + > + try { > + this.startService(); > + this._listener && this._listener.onPortChange(this._port); If |this._listener| is not set, we should call |onPortChange| once |this._listener| is set. ::: dom/presentation/tests/xpcshell/test_tcp_control_channel.js @@ +187,3 @@ > tps.listener = { > + onPortChange: function(aPort) { > + Assert.notEqual(aPort, port, 'TCPPresentationServer port changed'); I'm not sure whether the port is always changed. I'm afraid of causing a intermittent test failure. Possibly just a |Assert.ok|. Breaking test would lead a time-out. However if I misunderstand something, just go ahead.
Attachment #8680023 - Flags: review?(juhsu) → feedback+
Comment on attachment 8680023 [details] [diff] [review] Part 2: Handle network online/offline event in TCPPresentationServer Review of attachment 8680023 [details] [diff] [review]: ----------------------------------------------------------------- I'll let @junior to handle this review. I shouldn't review the patch that actually created by myself. BTW, you might want to change the "User" section in patch file. ::: dom/presentation/provider/TCPPresentationServer.js @@ +43,5 @@ > /** > * 0 or undefined indicates opt-out parameter, and a port will be selected > * automatically. > */ > let serverSocketPort = (aPort !== 0) ? aPort : -1; Oh, why not just use |aPort > 0|?
Attachment #8680023 - Flags: review?(schien)
Attachment #8680023 - Attachment is obsolete: true
Attachment #8680516 - Flags: review?(schien)
Attachment #8680516 - Flags: review?(juhsu)
Attachment #8680516 - Attachment is obsolete: true
Attachment #8680516 - Flags: review?(schien)
Attachment #8680516 - Flags: review?(juhsu)
@schien: please review: nsITCPPresentationServer.idl MulticastDNSDeviceProvider.cpp test_multicast_dns_device_provider.js @junior: please review: nsITCPPresentationServer.idl TCPPresentationServer.js test_tcp_control_channel.js
Attachment #8680532 - Flags: review?(schien)
Attachment #8680532 - Flags: review?(juhsu)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #22) > Comment on attachment 8680023 [details] [diff] [review] > Part 2: Handle network online/offline event in TCPPresentationServer > > Review of attachment 8680023 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'll let @junior to handle this review. I shouldn't review the patch that > actually created by myself. > BTW, you might want to change the "User" section in patch file. > > ::: dom/presentation/provider/TCPPresentationServer.js > @@ +43,5 @@ > > /** > > * 0 or undefined indicates opt-out parameter, and a port will be selected > > * automatically. > > */ > > let serverSocketPort = (aPort !== 0) ? aPort : -1; > > Oh, why not just use |aPort > 0|? Both work for me, but we still need to check the |undefined| condition.
Comment on attachment 8680532 [details] [diff] [review] [v3] Part 2: Handle network online/offline event in TCPPresentationServer Review of attachment 8680532 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me, thanks! ::: dom/presentation/tests/xpcshell/test_tcp_control_channel.js @@ +187,3 @@ > tps.listener = { > + onPortChange: function(aPort) { > + Assert.notEqual(aPort, 0, 'TCPPresentationServer port changed'); 'TCPPresentationServer port changed and the port should be valid'
Attachment #8680532 - Flags: review?(juhsu) → review+
Comment on attachment 8680532 [details] [diff] [review] [v3] Part 2: Handle network online/offline event in TCPPresentationServer Review of attachment 8680532 [details] [diff] [review]: ----------------------------------------------------------------- I can only gave feedback on this patch and remember to remove "r=schien" before landing.
Attachment #8680532 - Flags: review?(schien) → feedback+
Comment on attachment 8680020 [details] [diff] [review] Part 1: use ServiceWatcher to extend life cycle of mDNS operators Review of attachment 8680020 [details] [diff] [review]: ----------------------------------------------------------------- Please follow the instruction in https://groups.google.com/d/msg/mozilla.dev.platform/uDP_7COsu_o/FHijLdSLCAAJ to make MDNSResponderOperator log module thread safe. I saw the |LOG_E| has been used on multiple threads. ::: dom/presentation/provider/MulticastDNSDeviceProvider.cpp @@ +864,5 @@ > UnregisterService(aReason); > > + if (mDiscoverable) { > + NS_DispatchToCurrentThread( > + NS_NewRunnableMethod(this, &MulticastDNSDeviceProvider::RegisterService)); This part of modification is not necessary and should take only the final modification in part 2 patch. ::: netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp @@ +612,5 @@ > return ResetService(service); > } > > nsresult > ResolveOperator::Stop() There is no point to implement this function since we change to public inheritance. @@ +739,5 @@ > return ResetService(service); > } > > nsresult > GetAddrInfoOperator::Stop() ditto. @@ +777,5 @@ > if (NS_WARN_IF(NS_FAILED(info->SetAddress(addressStr)))) { return; } > > + if (aFlags & kDNSServiceFlagsMoreComing) { > + guard.release(); > + } Please add a comment for it.
Attachment #8680020 - Flags: review?(schien) → review-
Follow reviewer's comment, and also remove some function redeclaration by changing private to public inheritance.
Attachment #8680020 - Attachment is obsolete: true
Attachment #8681098 - Flags: review?(schien)
adjust comment, and replace PRLogModuleInfo* by LazyLogModule
Attachment #8680532 - Attachment is obsolete: true
Attachment #8681108 - Flags: review?(juhsu)
Comment on attachment 8681098 [details] [diff] [review] [v2] Part 1: use ServiceWatcher to extend life cycle of mDNS operators Review of attachment 8681098 [details] [diff] [review]: ----------------------------------------------------------------- lgtm! Thanks for the quick update.
Attachment #8681098 - Flags: review?(schien) → review+
Comment on attachment 8681108 [details] [diff] [review] [v4] Part 2: Handle network online/offline event in TCPPresentationServer Review of attachment 8681108 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/presentation/provider/MulticastDNSDeviceProvider.cpp @@ +264,5 @@ > if (NS_WARN_IF(NS_FAILED(rv = mPresentationServer->GetPort(&servicePort)))) { > return rv; > } > > + if (!servicePort) { add a comment to address your intention ::: dom/presentation/tests/xpcshell/test_tcp_control_channel.js @@ +182,5 @@ > }; > } > > function setOffline() { > + let port = tps.port; remove this line
Attachment #8681108 - Flags: review?(juhsu) → review+
Do we know what went wrong with the initial landing and could we add a test for this case?
add comment and carry r+
Attachment #8681108 - Attachment is obsolete: true
Attachment #8681129 - Flags: review+
(In reply to Gregor Wagner [:gwagner] from comment #33) > Do we know what went wrong with the initial landing and could we add a test > for this case? Yes we know what went wrong, but unfortunately it is not that easy to test. :(
fix test error on B2G platform.
Attachment #8681129 - Attachment is obsolete: true
Attachment #8681819 - Flags: review?(juhsu)
remove debug code and fix test error on B2G platform.
Attachment #8681819 - Attachment is obsolete: true
Attachment #8681819 - Flags: review?(juhsu)
Attachment #8681825 - Flags: review?(juhsu)
(In reply to Gary Chen [:xeonchen] from comment #35) > (In reply to Gregor Wagner [:gwagner] from comment #33) > > Do we know what went wrong with the initial landing and could we add a test > > for this case? > > Yes we know what went wrong, but unfortunately it is not that easy to test. > :( We have code that is not testable? :/
Attachment #8681825 - Flags: review?(juhsu) → review+
(In reply to Gregor Wagner [:gwagner] from comment #38) > (In reply to Gary Chen [:xeonchen] from comment #35) > > (In reply to Gregor Wagner [:gwagner] from comment #33) > > > Do we know what went wrong with the initial landing and could we add a test > > > for this case? > > > > Yes we know what went wrong, but unfortunately it is not that easy to test. > > :( > > We have code that is not testable? :/ The root cause of this bug is related to |nsIDNSServiceDiscovery| implementation on B2G platform. The implementation is actually an integration of 3rd-party library to serve as an XPCOM component. I wouldn't say it's impossible to test, but I'll suggest to open a follow-up bug if we have to deal with this.
Keywords: checkin-needed
blocking-b2g: --- → 2.5+
(In reply to Gary Chen [:xeonchen] from comment #39) > (In reply to Gregor Wagner [:gwagner] from comment #38) > > (In reply to Gary Chen [:xeonchen] from comment #35) > > > (In reply to Gregor Wagner [:gwagner] from comment #33) > > > > Do we know what went wrong with the initial landing and could we add a test > > > > for this case? > > > > > > Yes we know what went wrong, but unfortunately it is not that easy to test. > > > :( > > > > We have code that is not testable? :/ > > The root cause of this bug is related to |nsIDNSServiceDiscovery| > implementation on B2G platform. > The implementation is actually an integration of 3rd-party library to serve > as an XPCOM component. > > I wouldn't say it's impossible to test, but I'll suggest to open a follow-up > bug if we have to deal with this. Opened Bug 1220952 for follow-up.
Hi, this failed to apply: renamed 1217807 -> 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch applying 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch patching file netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp Hunk #2 FAILED at 20 1 out of 10 hunks FAILED -- saving rejects to file netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp.rej patch failed, unable to continue (try -v) patch failed, rejects left in working directory errors during apply, please fix and refresh 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch
Flags: needinfo?(xeonchen)
Keywords: checkin-needed
rebase inbound, carry r+
Attachment #8681098 - Attachment is obsolete: true
Attachment #8682372 - Flags: review+
rebase inbound, carry r+
Attachment #8681825 - Attachment is obsolete: true
Attachment #8682375 - Flags: review+
(In reply to Carsten Book [:Tomcat] from comment #42) > Hi, this failed to apply: > > renamed 1217807 -> > 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch > applying 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch > patching file netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp > Hunk #2 FAILED at 20 > 1 out of 10 hunks FAILED -- saving rejects to file > netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp.rej > patch failed, unable to continue (try -v) > patch failed, rejects left in working directory > errors during apply, please fix and refresh > 0001-Bug-1217807-Part-1-use-ServiceWatcher-to-extend-life.patch I've fixed this issue, thanks!
Flags: needinfo?(xeonchen)
Keywords: checkin-needed
Whiteboard: [leave open]
feature-b2g: --- → 2.5+
Comment on attachment 8682372 [details] [diff] [review] [v3] Part 1: use ServiceWatcher to extend life cycle of mDNS operators NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): 1217807 User impact if declined: Presentation API discovery may be broken. Testing completed: TreeHerder Risk to taking this patch (and alternatives if risky): handlers should handle multiple callbacks well, otherwise the result is unexpected. String or UUID changes made by this patch: N/A
Attachment #8682372 - Flags: approval‑mozilla‑b2g44?
Comment on attachment 8682375 [details] [diff] [review] [v8] Part 2: Handle network online/offline event in TCPPresentationServer NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): 1217807 User impact if declined: Presentation API discovery may be broken. Testing completed: TreeHerder Risk to taking this patch (and alternatives if risky): behavior on service registration is different now, if something highly depends on the previous implementation, it may not work. String or UUID changes made by this patch: UUID of nsITCPPresentationServerListener is changed.
Attachment #8682375 - Flags: approval‑mozilla‑b2g44?
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
Comment on attachment 8682372 [details] [diff] [review] [v3] Part 1: use ServiceWatcher to extend life cycle of mDNS operators Setting the approved flag. Patch already landed on 11/5
Attachment #8682372 - Flags: approval‑mozilla‑b2g44? → approval‑mozilla‑b2g44+
Comment on attachment 8682375 [details] [diff] [review] [v8] Part 2: Handle network online/offline event in TCPPresentationServer Setting the approved flag. Patch already landed on 11/5
Attachment #8682375 - Flags: approval‑mozilla‑b2g44? → approval‑mozilla‑b2g44+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: