Closed
Bug 1513135
Opened 6 years ago
Closed 6 years ago
Downloads show as incomplete but network trace indicates data has arrived
Categories
(Core :: Networking, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla66
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox64 | --- | wontfix |
firefox65 | --- | verified |
firefox66 | --- | verified |
People
(Reporter: bmckeever, Assigned: CuveeHsu)
References
Details
(Keywords: regression, Whiteboard: [necko-triaged])
Attachments
(3 files, 1 obsolete file)
(deleted),
application/x-shockwave-flash
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Steps to reproduce:
See attached video. User indicates that they want to download a file, and my application sends that file over HTTP.
Actual results:
Firefox's download UI indicates that no data has arrived and a 0 byte ".part" file is created in the download folder. A Wireshark trace (attached) shows that all of the data has arrived, and it looks ok to me. The problem occurs about half the time with Firefox 63, and it seems to require the file to be above a certain size (a few MB). It seems like a recent regression, I wasn't able to reproduce in FF 60. Chrome downloads fine.
Expected results:
File should be saved to disk.
Attachment #9030369 -
Attachment description: tcp stream view from Wireshark showing the data arrived → text view of the Wireshark capture showing the data arrived
Comment 4•6 years ago
|
||
(In reply to bmm6o from comment #0)
> It seems like a recent regression, I wasn't able to reproduce in FF 60.
It would be useful if you could pinpoint the regression range.
https://mozilla.github.io/mozregression/quickstart.html
Has Regression Range: --- → no
status-firefox-esr60:
--- → unaffected
Component: Untriaged → Downloads API
Keywords: regression
Product: Firefox → Toolkit
I ran mozregression, the last portion of output is:
2018-12-11T13:29:19: INFO : platform_buildid: 20180816031316
2018-12-11T13:29:19: INFO : platform_changeset: 45e540e5e0fc2bfca60fe015a546fdb6f2f01318
2018-12-11T13:29:19: INFO : platform_repository: https://hg.mozilla.org/integration/autoland
2018-12-11T13:29:19: INFO : platform_version: 63.0a1
2018-12-11T13:30:28: INFO : Narrowed inbound regression window from [483ff2d2, ec225468] (3 builds) to [483ff2d2, 45e540e5] (2 builds) (~1 steps left)
2018-12-11T13:30:28: DEBUG : Starting merge handling...
2018-12-11T13:30:28: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=45e540e5e0fc2bfca60fe015a546fdb6f2f01318&full=1
2018-12-11T13:30:28: DEBUG : Found commit message:
Bug 1280629 - Part 1: Suspend the http channel if the child process is not able to consume on time r=dragana
Differential Revision: https://phabricator.services.mozilla.com/D2745
2018-12-11T13:30:28: DEBUG : Did not find a branch, checking all integration branches
2018-12-11T13:30:28: INFO : The bisection is done.
2018-12-11T13:30:28: INFO : Stopped
It does sound like that commit could have had an impact on download behavior.
The discussion in bug 1280629 makes it sound like they planned on having a way to turn this off, but neckoIpcWindowSize=0 doesn't do it and I don't see anything else obvious from the code. It would be great if there was a configuration way to work around this, please let me know if I missed something.
Disabling e10s as outlined in https://support.mozilla.org/en-US/questions/1191898 seems to resolve this issue
Comment 9•6 years ago
|
||
(In reply to bmm6o from comment #5)
> I ran mozregression, the last portion of output is:
Thank you for making the effort.
Junior Hsu, it looks like bug 1280629 caused this regression. Please have a look at this, and comment 7 in particular.
Blocks: 1280629
Has Regression Range: no → yes
Component: Downloads API → Networking
Flags: needinfo?(juhsu)
Product: Toolkit → Core
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to bmm6o from comment #7)
> The discussion in bug 1280629 makes it sound like they planned on having a
> way to turn this off, but neckoIpcWindowSize=0 doesn't do it and I don't see
> anything else obvious from the code. It would be great if there was a
> configuration way to work around this, please let me know if I missed
> something.
Hello Reporter,
The video in Comment 0 is unfamiliar to me, so I don't know if there's a delicate setup.
I can't reproduce in FF 64 with a simple download, but the Header Content-Length: is correctly set in Comment 2.
To disable the impact on bug 1280629 for experiment,
can you set the network.http.send_window_size to 0 in about:config to reproduce it again?
If yes, can you give us a STR to reproduce or attach the log to analyze?
See instructions here: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Thanks!
Flags: needinfo?(juhsu) → needinfo?(bmckeever)
Reporter | ||
Comment 11•6 years ago
|
||
Flags: needinfo?(bmckeever)
Reporter | ||
Comment 12•6 years ago
|
||
This download succeeded
Reporter | ||
Comment 13•6 years ago
|
||
Comment on attachment 9030830
This download reproduced the problem (0 of 0 bytes, empty part file)
Reporter | ||
Comment 14•6 years ago
|
||
Hi Junior,
For more context, the application is an intranet document management web app. This setup has the ASP.Net server running on a separate VM. I have reproduced the problem in a variety of setups - Firefox on the local physical machine, on a different VM, on a different physical machine, web app on a physical machine.
See attached HTTP logs, 1 each for the default window size and window size 0. Setting window size to 0 resulted in a successful download.
Assignee | ||
Comment 15•6 years ago
|
||
(In reply to bmm6o from comment #14)
> Hi Junior,
>
> For more context, the application is an intranet document management web
> app. This setup has the ASP.Net server running on a separate VM. I have
> reproduced the problem in a variety of setups - Firefox on the local
> physical machine, on a different VM, on a different physical machine, web
> app on a physical machine.
>
> See attached HTTP logs, 1 each for the default window size and window size
> 0. Setting window size to 0 resulted in a successful download.
I see it's a pptx file.
I'm more interested in how the data is consumed.
Is it a XHR/Fetch or a hyperlink download?
If the consumer is busy or blocked and not telling the network engine how many data is consumed, we
may suspend the Http channel.
Flags: needinfo?(bmckeever)
Reporter | ||
Comment 16•6 years ago
|
||
Oh, I see. We trigger the download by setting the src property of an iframe to the download url.
Flags: needinfo?(bmckeever)
Assignee | ||
Updated•6 years ago
|
Priority: -- → P1
Whiteboard: [necko-triaged]
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → juhsu
Assignee | ||
Comment 17•6 years ago
|
||
The issue happens in the diversion (Download).
When we download resource, the child process will initiate HttpChannelChild::DivertToParent during OnStartRequest.
Send an IPC to parent, triggering HttpChannelParent::SuspendForDiversion
Usually it happens so early that we won't triggered any flow control bug 1280629.
Here the back pressure suspend is earlier than diversion suspend,
which indicates the IPC from child to parent is so slow, compared to the internet speed
(actually it's a super fast intranet, 3MB/0.1s).
We didn't send SendRecvBytes for the diversion cases
The channel is suspended by bp in the case.
Given this condition, SuspendForDiversion is the first message of no more e10s.
We should
(a) resume the channel if it is suspend by bp flow control
(b) never suspend by flow control (although we have suspend for diversion, there may some packets arrive later)
Assignee | ||
Comment 18•6 years ago
|
||
Assignee | ||
Comment 19•6 years ago
|
||
I can reproduce while download something big in local computer via HTTP.
Updated•6 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment 20•6 years ago
|
||
(In reply to Junior Hsu from comment #17)
> The issue happens in the diversion (Download).
>
> When we download resource, the child process will initiate
> HttpChannelChild::DivertToParent during OnStartRequest.
> Send an IPC to parent, triggering HttpChannelParent::SuspendForDiversion
> Usually it happens so early that we won't triggered any flow control bug
> 1280629.
>
> Here the back pressure suspend is earlier than diversion suspend,
> which indicates the IPC from child to parent is so slow, compared to the
> internet speed
> (actually it's a super fast intranet, 3MB/0.1s).
>
> We didn't send SendRecvBytes for the diversion cases
> The channel is suspended by bp in the case.
>
Why we did not? We should. Is it possible to resolve it by sending SendRecvBytes?
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #20)
> (In reply to Junior Hsu from comment #17)
> > The issue happens in the diversion (Download).
> >
> > When we download resource, the child process will initiate
> > HttpChannelChild::DivertToParent during OnStartRequest.
> > Send an IPC to parent, triggering HttpChannelParent::SuspendForDiversion
> > Usually it happens so early that we won't triggered any flow control bug
> > 1280629.
> >
> > Here the back pressure suspend is earlier than diversion suspend,
> > which indicates the IPC from child to parent is so slow, compared to the
> > internet speed
> > (actually it's a super fast intranet, 3MB/0.1s).
> >
> > We didn't send SendRecvBytes for the diversion cases
> > The channel is suspended by bp in the case.
> >
>
> Why we did not? We should. Is it possible to resolve it by sending
> SendRecvBytes?
Yes we can and this issue is resolved by SendRecvBytes.
Some quick analysis by SendRecvBytes approach
Pro: code is cleaner and consistent
Con: |SendRecvBytes| is a redundant IPC call since
(a) it's later than HttpChannelChild::DivertToParent in OnStartRequest
(b) we suspend the HttpChannelParent in HttpChannelParent::SuspendForDiversion anyway, same effect with the flow control algorithm
What do you think?
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 22•6 years ago
|
||
Updated•6 years ago
|
Attachment #9030962 -
Attachment is obsolete: true
Comment 23•6 years ago
|
||
Pushed by juhsu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fabd9a347ca2
disable flow control of HTTP e10s back pressure r=dragana
Comment 24•6 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Assignee | ||
Comment 25•6 years ago
|
||
Comment on attachment 9031284 [details]
Bug 1513135 - disable flow control of HTTP e10s back pressure
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: Bug 1280629
User impact if declined: Downloading a file > 1MB in a high speed environment (e.g., intranet) will fail
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: Bug 1513135
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): This change disables the impact form bug 1280629, acting as previous release (62 and before)
String changes made/needed: No
Attachment #9031284 -
Flags: approval-mozilla-release?
Attachment #9031284 -
Flags: approval-mozilla-beta?
Updated•6 years ago
|
Flags: qe-verify+
Comment 26•6 years ago
|
||
Comment on attachment 9031284 [details]
Bug 1513135 - disable flow control of HTTP e10s back pressure
[Triage Comment]
Sounds like a pretty big regression for users in some environments. Approving for 65.0b5, but it would also be nice if we could get some verification from QA and/or the reporter that things are behaving correctly in builds with the fix.
Attachment #9031284 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 27•6 years ago
|
||
bugherder uplift |
Reporter | ||
Comment 28•6 years ago
|
||
The fix appears to work in Nightly "66.0a1 (2018-12-14) (64-bit)". Thanks for acting on this so quickly!
Comment 29•6 years ago
|
||
I tried to reproduce the issue using an older version of Nightly from 2018-12-10, but I couldn't. Can you please give me the link that allowed you to reproduce the issue? Or if you did any additional steps please tell me. I could really use some steps to reproduce.
Thank you.
Flags: needinfo?(bmckeever)
Comment 30•6 years ago
|
||
(In reply to Junior Hsu from comment #21)
> (In reply to Dragana Damjanovic [:dragana] from comment #20)
> > (In reply to Junior Hsu from comment #17)
> > > The issue happens in the diversion (Download).
> > >
> > > When we download resource, the child process will initiate
> > > HttpChannelChild::DivertToParent during OnStartRequest.
> > > Send an IPC to parent, triggering HttpChannelParent::SuspendForDiversion
> > > Usually it happens so early that we won't triggered any flow control bug
> > > 1280629.
> > >
> > > Here the back pressure suspend is earlier than diversion suspend,
> > > which indicates the IPC from child to parent is so slow, compared to the
> > > internet speed
> > > (actually it's a super fast intranet, 3MB/0.1s).
> > >
> > > We didn't send SendRecvBytes for the diversion cases
> > > The channel is suspended by bp in the case.
> > >
> >
> > Why we did not? We should. Is it possible to resolve it by sending
> > SendRecvBytes?
>
> Yes we can and this issue is resolved by SendRecvBytes.
>
> Some quick analysis by SendRecvBytes approach
> Pro: code is cleaner and consistent
> Con: |SendRecvBytes| is a redundant IPC call since
> (a) it's later than HttpChannelChild::DivertToParent in OnStartRequest
> (b) we suspend the HttpChannelParent in
> HttpChannelParent::SuspendForDiversion anyway, same effect with the flow
> control algorithm
>
> What do you think?
We can do this, but we will need a good comment on the child that the child does not need to send SendRecvBytes when diversion starts and why. I hate our suspend-resume stuff. If we get rid of the diversion some stuff will be easier. Also check for !diverting in RecvRecvBytes.
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 31•6 years ago
|
||
(In reply to Oana Botisan from comment #29)
> I tried to reproduce the issue using an older version of Nightly from
> 2018-12-10, but I couldn't. Can you please give me the link that allowed you
> to reproduce the issue? Or if you did any additional steps please tell me. I
> could really use some steps to reproduce.
> Thank you.
My STR:
1. set a *local* HTTP server and we can *download* a >1MB file (I use nodejs but it's up to you)
2. Download it via browser
Reporter | ||
Comment 32•6 years ago
|
||
(In reply to Oana Botisan from comment #29)
> I tried to reproduce the issue using an older version of Nightly from
> 2018-12-10, but I couldn't. Can you please give me the link that allowed you
> to reproduce the issue?
The problem will only show up on an intranet so there really isn't a link anyone can give you. I don't know if Junior Hsu has been able to calculate a minimum speed required, but it's possible your network is below it.
Flags: needinfo?(bmckeever)
Assignee | ||
Comment 33•6 years ago
|
||
(In reply to bmm6o from comment #32)
> (In reply to Oana Botisan from comment #29)
> > I tried to reproduce the issue using an older version of Nightly from
> > 2018-12-10, but I couldn't. Can you please give me the link that allowed you
> > to reproduce the issue?
>
> The problem will only show up on an intranet so there really isn't a link
> anyone can give you. I don't know if Junior Hsu has been able to calculate
> a minimum speed required, but it's possible your network is below it.
It's hard to say, and depending on the computing power.
Local network (especially download from local computer) should be always fast enough.
Comment 34•6 years ago
|
||
Thank you for the info.
I managed to reproduce the issue using an older version of Nightly (2018-12-10) on Windows 10 x64.
I verified the fix using latest Nightly 66.0a1 and beta 65.0b5 on Windows 10 x64, Ubuntu 16.04 x64 and macOS 10.13. The bug is not reproducing anymore.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Reporter | ||
Comment 35•6 years ago
|
||
Since it turns out you can repro internally, would you mind removing or making private the packet captures I attached? There's nothing very sensitive, but it contains internal IP addresses and host names that I'd prefer not be public forever.
Updated•6 years ago
|
Attachment #9030369 -
Attachment is private: true
Updated•6 years ago
|
Attachment #9030370 -
Attachment is private: true
Updated•6 years ago
|
Attachment #9030830 -
Attachment is private: true
Updated•6 years ago
|
Attachment #9030831 -
Attachment is private: true
Comment 36•6 years ago
|
||
(In reply to bmm6o from comment #35)
> Since it turns out you can repro internally, would you mind removing or
> making private the packet captures I attached? There's nothing very
> sensitive, but it contains internal IP addresses and host names that I'd
> prefer not be public forever.
I've marked them as private.
Assignee | ||
Comment 37•6 years ago
|
||
Comment on attachment 9031284 [details]
Bug 1513135 - disable flow control of HTTP e10s back pressure
It's already close to next train shipping
Attachment #9031284 -
Flags: approval-mozilla-release?
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•