Closed Bug 181230 Opened 22 years ago Closed 22 years ago

profile-change-net-teardown must work

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.3beta

People

(Reporter: KaiE, Assigned: darin.moz)

References

Details

(Keywords: topembed+)

Attachments

(1 file, 4 obsolete files)

Working on bugs 177260 and 97622 I ran into an additional problem.
I thought I had a good workaround, but the workaround requires direct
manipulation of the lower NSPR data structures from outside the socket transport
thread, and that is possibly dangerous.

If, at the time we need to shut down PSM/NSS in order to switch profiles, there
are still sockets alive, we have a real problem.

Because the SSL library is layering itself on the file descriptor stack, the
resources that SSL is holding into NSS can not be freed until the socket is closed.


Actual behaviour:
After the profile-change-net-teardown phase is over, there are still open sockets.

Requested behaviour:
During the profile-change-net-teardown phase, all open sockets must be closed
synchronously.
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: topembed
Priority: -- → P2
Target Milestone: --- → mozilla1.3alpha
Attached patch test patch (obsolete) (deleted) β€” β€” Splinter Review
this patch ensures that any active connections will not be added as idle
connections after the net-teardown event fires.
Attached file AIM conversation with kai (deleted) β€”
kai says the "test patch" doesn't help.
Darin, in your test patch you try to modificate the http handler. I would have
expected that the problem lies somewhere else. I see that mail is also not
behaving correctly and stops us from switching profiles, because SSL sockets
have not been closed.

Are you able to tear down all sockets the hard way in the socket transport code,
regardless where they come from?
kai: my assumption was that socket transport shutdown was happening, but that
some connections were becoming IDLE (on a background thread) at the same time
that we were processing the net-teardown event.  i think my patch closes this
hole, but obviously there is something else going on as well.  i'm hoping to
spend time on this today.
Attached patch v1 patch (obsolete) (deleted) β€” β€” Splinter Review
this patch fixes the "going offline while security dialog is visible" problem.
Attachment #106996 - Attachment is obsolete: true
kai: can you please give this latest patch a try and see if it fixes all of your
testcases?  thx!
Sorry, this patch does not change the behaviour, I still see the same problems.

I would have assumed there is only one approach than can help us:
When we receive the call to SetOffline, go to the socket transport code,
synchronously iterate over the list of all sockets known by it, call close on
each, and do not return from the notification until that is done.

But that is probably dangerous, because other code has still references to the
socket.

So I suggest, instead of closing the files, which invalidates the file
descriptors and makes other components crash: Use the hack that I proposed in
bug 177260, but which you found to be unsafe when done from outside the socket
thread.

So the new approach might be:

- SetOffline received
- iterate over all sockets
- for each socket, do the following hack:
  - use PR_NewTCPSocket to create a new unconnected socket
  - swap all the contents of the socket filedescriptor structure
    between the old and the hack socket
  - close the fd that is now no longer referenced by others
> I would have assumed there is only one approach than can help us:
> When we receive the call to SetOffline, go to the socket transport code,
> synchronously iterate over the list of all sockets known by it, call close on
> each, and do not return from the notification until that is done.

that is what the code tries to do.  it cancels all of the socket transport
requests, which must be processed on the socket transport thread.  the thread
which shutdown the socket transport service (the main thread) must wait until
the socket transport thread exits (it does a join).  this makes the call to
shutdown the socket transport synchronous.

kai: what i need is a testcase.  i used the testcase of going offline while the
security dialog is showing to arrive at this patch.  is that what you were
testing?  before my patch i noticed PR_Close being called after the security
dialog is closed, but with my patch PR_Close is called right away.  please
describe the steps you used to repro the problem.  thx!
The testcase is:

- add printf statements as the first line to nsNSSIOLayerClose and
nsSSLIOLayerConnect in file nsNSSIOLayer.cpp

- compile in optimized mode (I tested on Linux)

- open two browser windows

- in one window, open https://meine.db24.de - wait for the security window and
leave it open

- watch the console, you'll see the output from the open function

- in the other window, click the statusbar offline icon


Expected behaviour: The SSL close function should have been reached as soon as
you clicked the offline icon. As a result, the console log should show the
printf output from the close function.

Actual behavior: No output on the console.



Another testcase:
- Configure mail to use IMAP / SSL.
- Configure edit/privacy/certificates to "ask every time"
- Open a mail and a browser window
- in one window, connect to the server
- should it prompt to select a certificate, cancel
- you should see the prompt to enter your mail password
- with the password prompt dialog open, switch to the browser window
- click the offline icon.

Expected: close is reached immediately.
Actual: Nothing happens.
> i used the testcase of going offline while the
> security dialog is showing to arrive at this patch.  is that what you were
> testing?  before my patch i noticed PR_Close being called after the security
> dialog is closed, but with my patch PR_Close is called right away

Hmm, I can not confirm that, on my Linux system, close is not reached right
away. It is not reached until I confirm the security warning dialog.
Attached patch v1.1 patch (obsolete) (deleted) β€” β€” Splinter Review
OK, this patch is similar to the other patch in that it only attempts to fix
the problem for HTTPS.
Attachment #107248 - Attachment is obsolete: true
Attached patch v1.2 patch (obsolete) (deleted) β€” β€” Splinter Review
this one should really do the trick for HTTPS.
Attachment #107379 - Attachment is obsolete: true
This latest patch does indeed work for the https case. (Although it does not yet
work for mail/ssl, as discussed).
Blocks: 182803
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: review?(dougt)
Discussed in edt.  Plussing because we need this for profile sharing.
Keywords: topembedtopembed+
Darin, is your intention to implement the general solution (which would include
tearing down mail ssl sockets) within the scope of this bug?
Kai:

yes, it is.  but i want to get this HTTP patch in because i think it is anyways
a good idea, and it might even fix a topcrash.
Ok, thanks!
Attachment #107386 - Flags: superreview?(rpotts) → superreview+
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: superreview+
Attachment #107386 - Flags: review?(dougt)
Attachment #107386 - Flags: review+
Comment on attachment 107386 [details] [diff] [review]
v1.2 patch

seeking drivers approval to check this into the trunk for moz 1.3 alpha.  this
patch not only helps fix this bug for HTTPS connections, but it might also
solve topcrash bug 179391.  i think this patch has moderate risk.
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: superreview+
Attachment #107386 - Flags: approval1.3a?
Comment on attachment 107386 [details] [diff] [review]
v1.2 patch

a=asa for checkin to 1.3a
Attachment #107386 - Flags: approval1.3a? → approval1.3a+
landed patch on trunk.  next, i'll look into how to fix this in general. 
hopefully this patch will help with the topcrash shutdown bug.
Nominating to block 1.3b, because not having this fix results in crash bug 182803. 
Flags: blocking1.3b?
Blocks: 187501
Comment on attachment 107386 [details] [diff] [review]
v1.2 patch

In the past, when people landed patches on the trunk, but decided to keep the
bug open, I saw they marked the landed patches as obsolete to minimize
potential confusion. I think that's a good idea. Marking obsolete.
Attachment #107386 - Attachment is obsolete: true
should be fixed by my patch for bug 176919.
Depends on: 176919
Target Milestone: mozilla1.3alpha → mozilla1.3beta
Blocks: 188558
FIXED with patch from bug 176919.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Flags: blocking1.3b?
*** Bug 188558 has been marked as a duplicate of this bug. ***
V/fixed.

This has worked for some time. I haven't tested it with a set of unit cases, since profile changing is available only in seamonkey UI. 

I'd like to do this in the future as an high-level integration test, maybe as an extension of the networking-core functional test I wrote (which pokes a lot of network protocols at once). Bother me if you care.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: