Closed
Bug 865314
Opened 12 years ago
Closed 8 years ago
ssl parallelism to new host restricted to 1
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: mcmanus, Assigned: mcmanus)
References
Details
(Whiteboard: [necko-active])
Attachments
(1 file, 6 obsolete files)
(deleted),
patch
|
mcmanus
:
review+
|
Details | Diff | Splinter Review |
on first connect to a new ssl host we restrict the parallelism to 1 until the ssl handshake is complete. We do that in order to identify spdy hosts (which won't use the parallelism).
In retrospect that is optimizing for the wrong thing. It is designed to avoid connection terminations but what we really want to avoid is un-necessary parallel active connections. To fix this we can just lift the restrict-to-1 restriction in the case of hosts which aren't known to speak spdy (hosts known to speak that will still have it in place). If we should happen to have more than 1 active spdy connection to the same host then we will just gracefully shut down the extra ones.
In practice I expect any extra connections to carry 0 transactions - because the first one to establish will be able to dispatch the whole queue pretty much immediately due to the muxxing nature of spdy. but HTTP/1 connections will get more parallelism and therefore better latency properties.
There is a significant side effect here. The first SSL connection to a host is also the one that will trigger OCSP (which is afterwards cached) and also establish resumable SSL sessions. So with this change we can expect to see more OCSP and non-resumed handshakes - though it should all be done in parallel and I would expect not actually add latency.
https://tbpl.mozilla.org/?tree=Try&rev=02d84ac9e07f
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
Comment 4•12 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/83a790e5acd8 as the apparent cause of fairly frequent Linux failures in test_spdy.js, like https://tbpl.mozilla.org/php/getParsedLog.php?id=22224595&tree=Mozilla-Inbound (2 in 10 frequent in the retriggers on your push on Linux64 opt, which seems to have been the most failure-prone).
Comment 5•12 years ago
|
||
Interestingly enough it also seems, probably, perhaps, I think maybe, to have moved the heinous complex of Android shutdown certificate crashes in bug 761987 and friends away from being shutdown crashes and into being earlier on during test crashes. I triggered a crapload of Android jsreftests on your push in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=96a806212cac and got several of those in-a-test ones; I triggered a crapload on your parent push in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=ba1bd4eccd7c and I'm quite looking forward to seeing whether I can manage to not get any at all there.
Comment 6•12 years ago
|
||
And indeed, all of the bug 761987 crashes on your parent were during shutdown, so somehow that patch made Android both more likely to crash @nssCertificate_Destroy and made it do it during test runs instead of only at shutdown.
Assignee | ||
Comment 7•12 years ago
|
||
dougt, did you say there was a promising mismatched allocator patch for 761987? (see comments 5 and 6 in this bug)
I want to repush this patch when I figure out comment 4, but I'd hate to spread out some orange that at least is centralized right now. The patch will give us a latency win in setting up parallel http/1 ssl connections.
Flags: needinfo?(doug.turner)
Comment 8•12 years ago
|
||
I was hopeful that it was bug 850332 which landed a week ago.
Flags: needinfo?(doug.turner)
Comment 9•12 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #7)
> dougt, did you say there was a promising mismatched allocator patch for
> 761987? (see comments 5 and 6 in this bug)
>
> I want to repush this patch when I figure out comment 4, but I'd hate to
> spread out some orange that at least is centralized right now. The patch
> will give us a latency win in setting up parallel http/1 ssl connections.
There must be a race condition in the certificate validation logic. I will fix it as soon as I can find it, which I am going to start on today.
Assignee | ||
Comment 10•12 years ago
|
||
updated for whitespace and bitrot
Attachment #741599 -
Attachment is obsolete: true
Attachment #743233 -
Flags: review+
Updated•11 years ago
|
Assignee | ||
Comment 11•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8368640 -
Flags: review+
Assignee | ||
Updated•11 years ago
|
Attachment #743233 -
Attachment is obsolete: true
Assignee | ||
Comment 12•11 years ago
|
||
the new patch is just updated for bitrot.
there is a try build over here
https://tbpl.mozilla.org/?tree=Try&rev=9532f191d24e
I triggered ~20 JS1 runs on android 4.. and got 2 free arenalist shutdown crashes from it.
I more or less read that as situation unchanged - what do you think brian?
Flags: needinfo?(brian)
Comment 13•11 years ago
|
||
Thanks Patrick. We need to fix the underlying issue. I will work with cviecco and keeler to solve it.
FWIW, I suspect the main problem is that the socket transport service thread is not implementing the nsNSSShutDownObject interface, and thus it doesn't know to stop all the SSL threads and avoid starting up new ones when NSS shuts down. If there are any networking team people available to work on that particular part of the problem, please let me know.
Depends on: 934902
Flags: needinfo?(brian)
Assignee | ||
Comment 14•11 years ago
|
||
i'd be happy to help myself
Comment 15•11 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #12)
> I triggered ~20 JS1 runs on android 4.. and got 2 free arenalist shutdown
> crashes from it.
>
> I more or less read that as situation unchanged - what do you think brian?
Yesterday I ran 775 runs of jreftest J1 and there were 3 such crashes on Android 2.2 opt and 1 such crash on Android 4.0 opt. 4/775 = 5/1000 failures. Also, those failures occur intermittently even without this patch on a different try run. So, I don't think that the crash bugs need to block this bug from landing. What do you think, Patrick?
https://tbpl.mozilla.org/?tree=Try&rev=b70890f16240
Assignee | ||
Comment 16•11 years ago
|
||
dkeeler landed something a few weeks ago that I wondered at the time if it would help with my experience in comment 12.. perhaps that is the improvement you see in comment 15.
let me think for a bit about what that was.. I don't recall exactly off the top of my head.
Flags: needinfo?(mcmanus)
Comment 17•11 years ago
|
||
I think it is bug 967629 that you are referring to.
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Brian Smith (:briansmith, was :bsmith; NEEDINFO? for response) from comment #17)
> I think it is bug 967629 that you are referring to.
yes - thank you.
imo - let's go forward with this right after we get a fresh ff31 tree, to avoid a crash landing.
Assignee | ||
Comment 19•10 years ago
|
||
Attachment #8507384 -
Flags: review?(hurley)
Assignee | ||
Comment 20•10 years ago
|
||
this fell off my radar accidentally - we really want this patch.
I've un-bit rotted it and improved the logic for quickly activating transactions that are attached to secondary connections when a spdy connection is negotiated.
Assignee | ||
Comment 21•10 years ago
|
||
Comment 22•10 years ago
|
||
Comment on attachment 8507384 [details] [diff] [review]
ssl parallelism to new host should not be 1
Review of attachment 8507384 [details] [diff] [review]:
-----------------------------------------------------------------
This LGTM, but let's make sure we get a whole bunch of linux xpcshell runs before landing to make sure this isn't still susceptible to the intermittent orange that philor saw on the original patch (comment 4). I don't *think* it will be, but better safe than sorry.
Attachment #8507384 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 23•10 years ago
|
||
try is complaining about an unused rv (debug only rv) on opt anyhow.. will respin and rerun tests.
Assignee | ||
Comment 24•10 years ago
|
||
Your results will be at https://tbpl.mozilla.org/?tree=Try&rev=c709b8819aea
Assignee | ||
Comment 25•10 years ago
|
||
Assignee | ||
Comment 26•10 years ago
|
||
Assignee | ||
Comment 27•10 years ago
|
||
comments 25 and 26 are the checkin and backout of the patch from comment 21.
the version from comment 24 should have been commited - the only difference is a DebugOnly declaration that is needed to avoid breaking opt builds that error on unused variables.
Assignee | ||
Comment 28•10 years ago
|
||
Comment 29•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Comment 30•10 years ago
|
||
Backed out for causing topcrashes. I'm also respinning nightlies.
https://hg.mozilla.org/mozilla-central/rev/da125623d9cb
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla36 → ---
Updated•10 years ago
|
Version: 18 Branch → Trunk
Assignee | ||
Comment 31•10 years ago
|
||
Attachment #8528591 -
Flags: review?(hurley)
Assignee | ||
Updated•10 years ago
|
Attachment #8507384 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8368640 -
Attachment is obsolete: true
Assignee | ||
Comment 32•10 years ago
|
||
this is the interdiff from the previously reviewed version to this one. fyi
Assignee | ||
Comment 33•10 years ago
|
||
The previous version had code that dealt with what happens when we get a confirmed spdy/h2 session and still have other connections pending. It tried to migrate the transactions on those half open connections onto the new one.
That was too clever by half. The transactions are actually listed twice - once as a weak ref on the half open connection and once in the pending queue for the origin connection entry. Because the patch only looked at the former it allowed for the transaction to be executed twice.
I say it was too clever, because all that needed to be done was to Abandon() the half open connection. Because the transaction is still in the pending queue it will automatically be assigned to the new connection the next time the queues are considered (which happens at the end of the function).
The previous patch also had the same problem as I fixed in 1104993 (which led to me filing that bug).
Attachment #8528591 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 34•10 years ago
|
||
given that this patch has proven tricky to land I'm going to wait to do so until gecko-37 is created.
Assignee | ||
Comment 35•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Target Milestone: --- → mozilla37
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 37•10 years ago
|
||
this caused crashes (bug 1069838) so I'll backout while its investigated.
m-i is closed right now - so the backout is pending.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 38•10 years ago
|
||
Assignee | ||
Comment 39•10 years ago
|
||
comment 39 is the backout of the comment 35 push
Keywords: leave-open
Backout merged to m-c: https://hg.mozilla.org/mozilla-central/rev/36d7afaaca18
Assignee | ||
Updated•10 years ago
|
Keywords: leave-open
Assignee | ||
Updated•9 years ago
|
Whiteboard: [necko-active]
Assignee | ||
Comment 41•9 years ago
|
||
Assignee | ||
Comment 42•9 years ago
|
||
failures from dec 14 include (comment 37 had a typo)
https://bugzilla.mozilla.org/show_bug.cgi?id=1089638
which was probably a dup of
https://bugzilla.mozilla.org/show_bug.cgi?id=1111217
which in turn was fixed by cset
https://hg.mozilla.org/integration/mozilla-inbound/rev/511d0d709406
none of that is directly caused by 865314 but because cancel's of h2 spiked with this patch, the above mentioned crashes spiked too. Indeed you can find > 1000 crashes related to 1111217 in the last 4 weeks of crash stats data, but they are pretty much limited to esr 31 (predates the fix).
So nothing should prevent landing this patch, except brad lassey ran a try build of 865314 plus 1111217 and experienced two crashes that had useless stacks. Now useless stacks have not really been a signature part of this bug, so I suspect that perhaps it was more indicative of something about try (or an ephemeral unrelated issue).
Assignee | ||
Comment 43•9 years ago
|
||
hey brad, in the past you've had a knack for making this patch crash. I feel like its ready to reland but we still have an unresolved issue from 1111217 where you had a couple crashes with useless stacks that weren't explained. Enough water under the bridge to retest I think. Could you run it for a day and see if its stable for you?
there is a try build in comment 41
the try builds sets network.http.connection-retry-timeout to 1 just to make sure to exercise some code on the try tests. Feel free to set it back to 250 (which is how it will land) or just enjoy the background connections!
Flags: needinfo?(blassey.bugs)
Comment 44•9 years ago
|
||
No crashes, but the last two mornings this build has been completely hung. Unfortunately I don't have symbols so I can't get a stack for where we're hung.
Flags: needinfo?(blassey.bugs)
Assignee | ||
Comment 45•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8751206 -
Flags: review+
Assignee | ||
Updated•8 years ago
|
Attachment #8528594 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8528591 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Target Milestone: mozilla37 → mozilla49
Assignee | ||
Comment 46•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1e8d6cf54e6e3f46aa64872b3f376cc69c7effe
Bug 865314 - allow 6 parallel connects to new https origin r=hurley
Comment 47•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 10 years ago → 8 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•