Closed
Bug 777354
Opened 12 years ago
Closed 11 years ago
Regression in shutdown time of httpd.js (~6s) when multiple local pages have been opened
Categories
(Testing :: General, defect)
Tracking
(firefox14 unaffected, firefox15- wontfix, firefox16- wontfix, firefox17- wontfix, firefox18 wontfix, firefox19 wontfix, firefox20 wontfix, firefox25 wontfix, firefox26 fixed, firefox27 fixed, firefox-esr10 unaffected, b2g-v1.2 fixed)
RESOLVED
FIXED
mozilla27
People
(Reporter: whimboo, Assigned: mcmanus)
References
Details
(Keywords: regression)
Attachments
(3 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jduell.mcbugs
:
review+
|
Details | Diff | Splinter Review |
Over on bug 767332 I have noticed that at some point in the past we have introduced a big regression range in shutting down httpd.js. I did some time measurements with the Mozmill test attached to the before mentioned bug and a Nightly and Release build, whereby the latter behaves correctly.
Nightly:
TEST-START | testHTTPd1.js | setupModule
TEST-START | testHTTPd1.js | testOpenLocalPages
TEST-PASS | testHTTPd1.js | testOpenLocalPages
INFO | ** Shutdown duration: 5617
TEST-START | testHTTPd2.js | setupModule
TEST-START | testHTTPd2.js | testHttpFailure
TEST-PASS | testHTTPd2.js | testHttpFailure
TEST-START | testHTTPd2.js | teardownModule
INFO | ** Shutdown duration: 96
Release (14.0)
TEST-START | testHTTPd1.js | setupModule
TEST-START | testHTTPd1.js | testOpenLocalPages
TEST-PASS | testHTTPd1.js | testOpenLocalPages
INFO | ** Shutdown duration: 100
TEST-START | testHTTPd2.js | setupModule
TEST-START | testHTTPd2.js | testHttpFailure
TEST-PASS | testHTTPd2.js | testHttpFailure
TEST-START | testHTTPd2.js | teardownModule
INFO | ** Shutdown duration: 97
I will have to check for the regression range and hope that we can fix that before the current beta gets merged to release.
Reporter | ||
Comment 1•12 years ago
|
||
This is a huge regression somewhere in the core or httpd.js code which causes our Mozmill tests to execute absolutely slow. After nearly each single test we can expect a delay of 5-6s.
If QA should still be up in signing off from any kind of release or build, we have to get this fixed. Asking for tracking the affected branches.
status-firefox-esr10:
--- → unaffected
status-firefox14:
--- → unaffected
status-firefox15:
--- → affected
status-firefox16:
--- → affected
status-firefox17:
--- → affected
tracking-firefox15:
--- → ?
tracking-firefox16:
--- → ?
tracking-firefox17:
--- → ?
Reporter | ||
Comment 2•12 years ago
|
||
Regression range matches the one from bug 767332, which regressed on mozilla-central between the builds 2012-04-25-03-06-47 and 2012-04-26-03-05-04.
PASS: http://hg.mozilla.org/mozilla-central/rev/75c7378c87b6
FAIL: http://hg.mozilla.org/mozilla-central/rev/cc5254f9825f
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=75c7378c87b6&tochange=cc5254f9825f
Looks like I should do a hg bisect to find the causing changeset.
Reporter | ||
Comment 3•12 years ago
|
||
The first bad revision is:
changeset: 92418:c773ee0f0245
user: Patrick McManus <mcmanus@ducksong.com>
date: Wed Apr 25 08:54:42 2012 -0400
summary: bug 729133 - internal necko speculative connections r=honzab
Blocks: 729133
Keywords: regressionwindow-wanted
Reporter | ||
Updated•12 years ago
|
Component: httpd.js → Networking: HTTP
Product: Testing → Core
Summary: Regression in shutdown time for ServerSocket when multiple local pages have been opened → Regression in shutdown time of httpd.js (~6s) when multiple local pages have been opened
Reporter | ||
Comment 4•12 years ago
|
||
So steps to reproduce:
1. Create a virtual env and install mozmill: 'pip install mozmill'
2. Clone https://github.com/mozilla/mozmill
3. Run: mozmill -b %path_to_firefox% -m mutt/mutt/tests/js/frame/httpd/tests.ini
Notice the long delay after 'TEST-PASS | testHTTPd1.js | testOpenLocalPages'.
Reporter | ||
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Adding qawanted and regression-window wanted - yes, let's get on this while we're still in Beta and identify the problem here.
Keywords: qawanted,
regressionwindow-wanted
Assignee | ||
Comment 7•12 years ago
|
||
henrik already indicated the changeset that triggers the bug is the speculative connection code in necko.
but the bug is in the server not shutting down based on the new idle connection, not the new code. The idle connection is an expected artifact of the feature.
Reporter | ||
Comment 8•12 years ago
|
||
(In reply to Lukas Blakk [:lsblakk] from comment #6)
> Adding qawanted and regression-window wanted - yes, let's get on this while
> we're still in Beta and identify the problem here.
Not sure QA or I can do anything more on it.
(In reply to Patrick McManus [:mcmanus] from comment #7)
> but the bug is in the server not shutting down based on the new idle
> connection, not the new code. The idle connection is an expected artifact of
> the feature.
So how should this be solved? Is it part of the httpd.js code or is a core fix necessary? If the first, should the connection be hard-killed by getting rid of the existent idle connection? Or what would be the right behavior of the server? Shall we move it back to testing/httpd.js?
Keywords: qawanted,
regressionwindow-wanted
Reporter | ||
Comment 9•12 years ago
|
||
Ed, are you aware of any slowdowns when running our in-tree tests with mochitest and others? I just would like to know how wide-spread this issue is and how much does it affect us in testing.
Comment 10•12 years ago
|
||
Unfortunately much of the slowdowns I notice are in total end-to-end time, which are dependant on pending time + build time + test run time + setup/cleanup inbetween. These vary significantly by time of day and by week-to-week infra issues, so it's hard to notice regressions like this.
I would say to look a the GoFaster dashboard (http://brasstacks.mozilla.com/gofaster/#/executiontime/test?range=all&include_outliers=0), but it only goes back as far as June.
Believe catlee has helped espindola answer end-to-end time questions recently, so might be able to advise further; CCing.
Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #10)
> Unfortunately much of the slowdowns I notice are in total end-to-end time,
> which are dependant on pending time + build time + test run time +
> setup/cleanup inbetween. These vary significantly by time of day and by
> week-to-week infra issues, so it's hard to notice regressions like this.
Wait. Probably other frameworks are not affected because they don't shutdown httpd.js after each test. So please scratch that idea.
Assignee | ||
Comment 12•12 years ago
|
||
This is a problem in httpd.js, which won't shut itself down when it has an open client connection. I don't know all that much about how httpd.js is more widely used, but presumably some users call stop() to stop accepting new connections and want to kindly wind down the ones underway.
onStopListening: function(socket, status)
{
dumpn(">>> shutting down server on port " + socket.port);
this._socketClosed = true;
if (!this._hasOpenConnections())
{
dumpn("*** no open connections, notifying async from onStopListening");
I guess the easiest thing to do if you don't want that behavior is to add a reap() function to nsIHttpServer that closes open connections.
Reporter | ||
Comment 13•12 years ago
|
||
Thanks Patrick. I will try if I can get it working with Mozmill. If it does it's hopefully easy to port forward to the original implementation under network/test/.
Lets move it back to the httpd.js component then.
Component: Networking: HTTP → httpd.js
Product: Core → Testing
Version: unspecified → 15 Branch
Assignee | ||
Comment 14•12 years ago
|
||
another approach would be something like this (completely untested and undebugged!) - to automagically close any socket that hadn't seen at least a request line at the time of stop().. this has the advantage that the calling code doesn't need to call reap() to make the right thing happen.
hth
diff --git a/netwerk/test/httpserver/httpd.js b/netwerk/test/httpserver/httpd.js
--- a/netwerk/test/httpserver/httpd.js
+++ b/netwerk/test/httpserver/httpd.js
@@ -467,16 +467,24 @@ nsHttpServer.prototype =
* the reason the socket stopped listening (NS_BINDING_ABORTED if the server
* was stopped using nsIHttpServer.stop)
* @see nsIServerSocketListener.onStopListening
*/
onStopListening: function(socket, status)
{
dumpn(">>> shutting down server on port " + socket.port);
this._socketClosed = true;
+ for (var n in this._connections) {
+ if (!n._requestStarted)
+ n.close();
+ }
+
if (!this._hasOpenConnections())
{
dumpn("*** no open connections, notifying async from onStopListening");
// Notify asynchronously so that any pending teardown in stop() has a
// chance to run first.
var self = this;
var stopEvent =
@@ -1127,16 +1135,19 @@ function Connection(input, output, serve
/**
* The request for which a response is being generated, null if the
* incoming request has not been fully received or if it had errors.
*/
this.request = null;
/** State variables for debugging. */
this._closed = this._processed = false;
+
+ /** whether or not 1st line of request has been received */
+ this._requestStarted = false;
}
Connection.prototype =
{
/** Closes this connection's input/output streams. */
close: function()
{
dumpn("*** closing connection " + this.number +
" on port " + this._outgoingPort);
@@ -1190,16 +1201,21 @@ Connection.prototype =
},
/** Converts this to a string for debugging purposes. */
toString: function()
{
return "<Connection(" + this.number +
(this.request ? ", " + this.request.path : "") +"): " +
(this._closed ? "closed" : "open") + ">";
+ },
+
+ requestStarted: function()
+ {
+ this._requestStarted = true;
}
};
/** Returns an array of count bytes from the given input stream. */
function readBytes(inputStream, count)
{
@@ -1376,16 +1392,17 @@ RequestReader.prototype =
if (!readSuccess)
return false;
// we have the first non-blank line
try
{
this._parseRequestLine(line.value);
this._state = READER_IN_HEADERS;
+ this._connection.requestStarted();
return true;
}
catch (e)
{
this._handleError(e);
return false;
}
Reporter | ||
Comment 15•12 years ago
|
||
Something along those lines? Works at least for me but not sure if it is the right way to close all the open connections. Not sure whom I should ask for feedback given that Jeff is out right now. Patrick, would you mind to have a look at? The patch doesn't include any tests yet.
Attachment #646141 -
Flags: feedback?(mcmanus)
Reporter | ||
Comment 16•12 years ago
|
||
Hm, haven't seen the latest comment. So this WIP implements the formerly proposed solution.
Assignee | ||
Comment 17•12 years ago
|
||
Comment on attachment 646141 [details] [diff] [review]
WIP
the problem here is that it shuts down already established connections that are maybe actively doing something. Jeff wrote a bunch of code to make that happen, so presumably something is using it.
so I think the approach in comment 14 works better.. if you want to go with reap it probly needs to be called from mozmill, not internal to httpd.js
but I'm not an httpd.js expert :)
Attachment #646141 -
Flags: feedback?(mcmanus)
Comment 18•12 years ago
|
||
Untracking based upon Comment 7.
Comment 19•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [away 07/27 - 08/05] from comment #11)
> (In reply to Ed Morley [:edmorley] from comment #10)
> > Unfortunately much of the slowdowns I notice are in total end-to-end time,
> > which are dependant on pending time + build time + test run time +
> > setup/cleanup inbetween. These vary significantly by time of day and by
> > week-to-week infra issues, so it's hard to notice regressions like this.
>
> Wait. Probably other frameworks are not affected because they don't shutdown
> httpd.js after each test. So please scratch that idea.
Quick drive by comment - fwiw, both reftest and xpcshell start up httpd.js on demand like mozmill does.
Comment 20•12 years ago
|
||
As I understand, this is caused by the 250ms syn-retry, right? We could then turn the syn-retry pref off for xpcshell tests. Any test depending on it, like the test for that feature it self, should explicitly change the pref and then clean it up.
Assignee | ||
Comment 21•12 years ago
|
||
it could also be some form of speculative connect. In generally we'll be doing more of this stuff by default, and we should test in the default config if at all possible.. so imo fixing the server is preferrable (and I think pretty easy).
Reporter | ||
Comment 22•12 years ago
|
||
It's probably worth to wait until Jeff Walden is back. I will hold of with other patches for now and await his feedback.
Reporter | ||
Comment 23•12 years ago
|
||
Jeff, would you mind to give us some feedback on how we should proceed here? Thanks!
Reporter | ||
Comment 24•12 years ago
|
||
Re-ping again. Jeff Walden, it would be really great to get feedback from you. This bug is causing a massive slowdown of our Mozmill tests execution, which I want to see fixed.
Severity: normal → major
Reporter | ||
Updated•12 years ago
|
status-firefox18:
--- → affected
Comment 25•12 years ago
|
||
Sigh, been juggling too many browser profiles, this tab got dropped in one I haven't opened recently. Looking now...
Flags: needinfo?(jwalden+bmo)
Comment 26•12 years ago
|
||
I don't think it's acceptable for stopping the server to kill in-progress connections by default. The approach in comment 14 is about as far as I think we might, maybe, reasonably go in the server itself in a way that would affect all httpd.js users unavoidably. A separate reap() or whatever that clients could call if they wanted wouldn't be a bad idea, either. It's just something people should have to make an affirmative effort to use -- connections once made successfully, with data transmitted, shouldn't be closed of the server's own volition.
Flags: needinfo?(jwalden+bmo)
Reporter | ||
Comment 27•12 years ago
|
||
Comment on attachment 646141 [details] [diff] [review]
WIP
Sounds like a good approach. Jeff, would you have time to work on this? I'm totally loaded with WebRTC and B2G tests at the moment.
Attachment #646141 -
Attachment is obsolete: true
Reporter | ||
Comment 28•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #27)
> Sounds like a good approach. Jeff, would you have time to work on this? I'm
> totally loaded with WebRTC and B2G tests at the moment.
Re-ping a month later. Would you be able to help us?
Flags: needinfo?(jwalden+bmo)
Reporter | ||
Comment 29•12 years ago
|
||
Jeff, can we please get feedback from you? I requested the needinfo flag a half month ago. Thanks!
status-firefox19:
--- → affected
status-firefox20:
--- → affected
Whiteboard: [qa-automation-blocked]
Comment 30•12 years ago
|
||
...given I haven't had time to get to this in that time, I think it's pretty clear the answer's going to have to be no. :-\
I can answer questions from people who have them, but I have some pretty pressing JS work that needs doing well before I could possibly get to this on my own time.
Flags: needinfo?(jwalden+bmo)
Reporter | ||
Comment 31•12 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #30)
> I can answer questions from people who have them, but I have some pretty
> pressing JS work that needs doing well before I could possibly get to this
> on my own time.
Thanks for the update Jeff. I hope you have nothing against it when I put this bug into mentored bucket. We can hopefully find someone to work on it.
Given that we will have a larger cluster for Mozmill tests soon this is no longer a real blocker for us, it's just an annoying issue.
Whiteboard: [qa-automation-blocked] → [mentor=waldo][lang=js]
Comment 32•12 years ago
|
||
Reading the bug, there isn't a clear solution. Not sure that this is a great first bug.
Whiteboard: [mentor=waldo][lang=js] → [lang=js]
Reporter | ||
Comment 33•12 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #32)
> Reading the bug, there isn't a clear solution. Not sure that this is a great
> first bug.
It indeed doesn't contain all the information necessary to fix the bug. Waldo, if you are interested to be the mentor can you please point out what would be necessary to do and give links to the source code? The following link will help in terms of necessary information: http://www.joshmatthews.net/deck.js/mentor/#onemore
Flags: needinfo?(jwalden+bmo)
Whiteboard: [lang=js] → [mentor=waldo][lang=js]
Reporter | ||
Updated•12 years ago
|
Whiteboard: [mentor=waldo][lang=js]
Comment 34•12 years ago
|
||
Source code is <http://mxr.mozilla.org/mozilla-central/source/netwerk/test/httpserver/httpd.js>. As for what to do, you'd have to skim this bug to figure it out. :-\ My recollection is that the last dozen or so comments discuss the issue, but they probably don't explain it all in TPS-note-formatted details.
This is still first-bug territory, for someone motivated and not looking for just a change-the-code-simply patch. But I probably don't have time right now to handhold anyone through a solution to it. :-(
Flags: needinfo?(jwalden+bmo)
Comment 35•11 years ago
|
||
With this patch, I am able to get test_update.js to finish, but there are various assertions:
ASSERTION: closing a connection [object Object] that we never added to the set of open connections?
ASSERTION: double-notifying?
It would be great if somebody who understands httpd.js could resolve this soon, because this is now blocking bug 754356, which is a high-priority bug for the networking team.
Reporter | ||
Comment 36•11 years ago
|
||
Jeff, it looks like we need your help here to get this blocker fixed and to finish the work on bug 754356. Could you have a look at Brian's last comment please?
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 37•11 years ago
|
||
Attachment #801752 -
Flags: review?(brian)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Assignee | ||
Comment 38•11 years ago
|
||
Brian. I was able to easily repro your asserts and I made a couple changes to cope with them
1] a flag to figure out the double close because closing the socket has the side effect of making the state code that is reading from the input stream see an EOF (which it assumes is coming from the other end) and close it again.
2] reorder the existing "server is closed" flag to be after we reap the idle sockets instead of before it to take care of the double-notify problem.
I'm sorry it took to long - I'm still very interested in 754356 :)
Assignee | ||
Comment 39•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #801752 -
Flags: review?(brian) → review?(hurley)
Assignee | ||
Comment 40•11 years ago
|
||
Comment on attachment 801752 [details] [diff] [review]
make httpd.js shutdown quickly with idle never used sockets
Actually I think jason said he knew something about this code
Attachment #801752 -
Flags: review?(hurley) → review?(jduell.mcbugs)
Comment 41•11 years ago
|
||
Comment on attachment 801752 [details] [diff] [review]
make httpd.js shutdown quickly with idle never used sockets
Review of attachment 801752 [details] [diff] [review]:
-----------------------------------------------------------------
nice stuff. Thanks for fixing this, Patrick. I assume you tested and it makes the shutdown regression go away? Someone should verify that that's the case (maybe Henrik can do that if mcmanus didn't get around to it).
::: netwerk/test/httpserver/httpd.js
@@ +483,1 @@
> if (!this._hasOpenConnections())
testing "if (foo)" then "if (!foo)": would be prettier as "if... else". Not a big deal.
Attachment #801752 -
Flags: review?(jduell.mcbugs) → review+
Updated•11 years ago
|
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 42•11 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #41)
> nice stuff. Thanks for fixing this, Patrick. I assume you tested and it
> makes the shutdown regression go away? Someone should verify that that's
> the case (maybe Henrik can do that if mcmanus didn't get around to it).
I verified that it will shutdown with an open connection that has not yet been used, and will not do so without this patch. I did not specifically verify with mozmill - I'm not well versed in that harness.
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3b2327d30b10
Reporter | ||
Comment 43•11 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #41)
> makes the shutdown regression go away? Someone should verify that that's
> the case (maybe Henrik can do that if mcmanus didn't get around to it).
Yes, I can check the fix with Mozmill. Will do it soon.
Reporter | ||
Comment 44•11 years ago
|
||
Wow, that makes a huge difference! With it a lot of our tests are running twice that fast now! Functional tests which take 15 minutes now are run in nearly 7 minutes with the patch.
Comment 45•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Updated•11 years ago
|
Flags: needinfo?(ryanvm)
Comment 46•11 years ago
|
||
status-firefox25:
--- → wontfix
status-firefox26:
--- → fixed
status-firefox27:
--- → fixed
Flags: needinfo?(ryanvm)
Comment 47•11 years ago
|
||
status-b2g-v1.2:
--- → fixed
Updated•7 years ago
|
Component: httpd.js → General
You need to log in
before you can comment on or make changes to this bug.
Description
•