Closed
Bug 662555
Opened 13 years ago
Closed 13 years ago
Race condition when measuring requestEnd
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
RESOLVED
FIXED
mozilla9
People
(Reporter: mayhemer, Assigned: mayhemer)
References
Details
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
jduell.mcbugs
:
review+
|
Details | Diff | Splinter Review |
nsHttpChannel::OnStopRequest calls gHttpHandler->CancelTransaction. That method posts an event to the socket thread that cancels the transaction asynchronously (by call to nsHttpTransaction::Close). In that method we stamp the requestEnd time. Sometimes this event fires later then we read the value of requestEnd time on the main thread, giving us a null value for it.
The time (also more precise) has be stamped at nsHttpTransaction::HandleContent when we detect eof.
In general we should check if the timings are (and can be/have to be) in sync with the activity distributor.
Comment 1•13 years ago
|
||
Would it be good enough to set the requestEnd timestamp in nsHttpChannel::OnStopRequest? AFAICT we really only care about when the data finished getting to the user, not when the transaction was cancelled asynchronously.
Assignee | ||
Comment 2•13 years ago
|
||
We could do that in OnStopRequest, right, but just to note that this is also an event posted from the socket thread to the main thread. So, there is some delay. Question is, do we want to include that delay or not? All other measures are captured on the socket thread, so why not this one as well? Also these should be pure-as-can-be network only timings.
Also it shows that we don't protect the timings with a mutex. I wouldn't actually add a mutex (negative perf effect) but rather (if possible, wouldn't break the web timing API) make the values accessible only after OnStopRequest has been called, e.g. mPending == false. But it is not for this bug.
Adding a hard block for bug 658894 as it shots leak test down (undetected by try before).
Blocks: 658894
Comment 3•13 years ago
|
||
> We could do that in OnStopRequest, right, but just to note that this is also an event posted from the socket thread to the main thread. So, there is some delay. Question is, do we want to include that delay or not?
Well, OnStop is when the channel "ends" from the client's perspective, so it seems like a good place to take the measurement.
> All other measures are captured on the socket thread, so why not this one as well?
Some events (DNS resolution, TCP connection, etc) do not get notifiction on the user thread, so we can't measure them from there. But the timings are ultimately for a user to understand how much time in the network layer their code takes, and that include whatever delay we take in posting OnStop to the main thread. I'm not too worried about which thread we measure which event on, as long as the metrics are sensible. And OnStopReq seems the sensible place for requestEnd.
Assignee | ||
Comment 4•13 years ago
|
||
There are 3 points of view IMO:
1. To be precise. To exactly measure the times we spend by some activities. We are using this API for our C++ coded telemetry, that has to be precise as it can be, but we can use different ways to get the telemetry, so not a strong argument...
2. To be compatible with other ways we measure. Mainly with the activity observer that Firebug uses. NS_HTTP_ACTIVITY_SUBTYPE_RESPONSE_COMPLETE is what Firebug uses for stamping the "response end" time and it is exactly at the place I suggest, [1]. I checked we are compatible for the rest of the notification, which is good.
3. Deliver what Web Timing API wants: "This attribute (responseEnd) must return the time immediately after the user agent receives the last byte of the current document or immediately before the transport connection is closed, whichever comes first. The document here can be received either from the server, relevant application caches or from local resources." Also, there is a web version of Firebug that would like to use it. It wouldn't be good to give different numbers using XPI Firebug and web Firebug.
So, it is clear this needs to be put to EOF detection, the same place where the activity observer notifies.
[1] http://hg.mozilla.org/mozilla-central/annotate/e319657f9c92/netwerk/protocol/http/nsHttpTransaction.cpp#l1125
Assignee | ||
Comment 5•13 years ago
|
||
This is a patch that puts the stamp to the right place. We have to leave it as well in nsHttpTransaction::Close when we get an error state. In case of an error that method seems to be called before nsHttpChannel::OnStopRequest, so we are safe.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #540246 -
Flags: review?(jduell.mcbugs)
Comment 6•13 years ago
|
||
Comment on attachment 540246 [details] [diff] [review]
v1
Review of attachment 540246 [details] [diff] [review]:
-----------------------------------------------------------------
> This attribute (responseEnd) must return the time immediately
> after the user agent receives the last byte of the current document
> or immediately before the transport connection is closed, whichever comes first
You're right.
Attachment #540246 -
Flags: review?(jduell.mcbugs) → review+
Assignee | ||
Comment 7•13 years ago
|
||
Whiteboard: [inbound]
Comment 8•13 years ago
|
||
thanks for fixing this! but shouldn't this have a TimingEnabled() check? I guess I forgot that in the original codepath...
Assignee | ||
Comment 9•13 years ago
|
||
(In reply to comment #8)
> thanks for fixing this! but shouldn't this have a TimingEnabled() check? I
> guess I forgot that in the original codepath...
Probably should. I can do that in a followup.
Comment 10•13 years ago
|
||
Backed out for apparently causing reftest assertion failures on Windows debug: http://hg.mozilla.org/integration/mozilla-inbound/rev/3890c438f22d
Whiteboard: [inbound]
Assignee | ||
Comment 11•13 years ago
|
||
I have no idea why this should cause reftest assertions. Unfortunatelly logs are gone by now.
I believe this was backed out because the original version of the bug 658894 patch was causing assertion failures (changeset fcf08eff68eb) but a new version currently landed on m-c (eb5866601f88) doesn't suffer from it.
I vote for just relanding this.
Assignee | ||
Comment 12•13 years ago
|
||
Same as the v1 patch just added the TimingEnabled() check.
Attachment #540246 -
Attachment is obsolete: true
Attachment #550146 -
Flags: review+
Assignee | ||
Comment 13•13 years ago
|
||
Hmm... looks like the spec has changed in the mean time:
responseEnd: "This attribute must return the time immediately after the user agent finishes receiving the last byte of the resource from from relevant application caches or from local resources."
There is no more anything about closing the connection.
I will update the patch.
Assignee | ||
Comment 14•13 years ago
|
||
This will also fix bug 673226.
http://tbpl.mozilla.org/?tree=Try&rev=07e1908a3b94
Attachment #550146 -
Attachment is obsolete: true
Attachment #550161 -
Flags: review?(jduell.mcbugs)
Updated•13 years ago
|
Attachment #550161 -
Flags: review?(jduell.mcbugs) → review+
Updated•13 years ago
|
Keywords: checkin-needed
Comment 15•13 years ago
|
||
Sent to try:
http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=626e32292fdc
Assuming all green, will push to inbound after.
I'm happy to fix it this time, but for the future please can you make sure the format (context, commit message, author) is correct:
http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed
Thanks :-)
Comment 16•13 years ago
|
||
Keywords: checkin-needed
Whiteboard: [inbound]
Updated•13 years ago
|
Target Milestone: --- → mozilla9
Comment 17•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
You need to log in
before you can comment on or make changes to this bug.
Description
•