Closed Bug 237623 Opened 21 years ago Closed 9 years ago

download manager sometimes thinks that incomplete downloads are complete; cannot resume/retry

Categories

(Core :: Networking: HTTP, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed
relnote-firefox --- 39+

People

(Reporter: navneetkarnani, Assigned: bagder)

References

(Blocks 1 open bug, )

Details

(Keywords: dataloss, Whiteboard: [please read and respect Comment 107] [lame-network])

Attachments

(5 files, 31 obsolete files)

(deleted), application/msword
Details
(deleted), image/png
Details
(deleted), application/xml
Details
(deleted), application/octet-stream
Details
(deleted), patch
bagder
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8

I have a situation where the download is of 35MB+ and after 15MB-20MB the
download manager assumes that the download is complete. This could be due to the
source server terminating the connection or any connection related problem.


Reproducible: Always
Steps to Reproduce:


Actual Results:  
If i try to download the file again, the download manager overwrites my download
and does it all over again. 

Expected Results:  
I would want a retry button in this case so that i can resume the download from
where it stands today.
Reporter,

WORKSFORME. Resuming is a FTP server feature, and As Far As I Know is not
available when downloading from http. The retry button in the download manager
hangs, which I semi-expect. 

As far as the overwrite, because we are not downloading from an FTP server I
expect that.
This is a dupe of either bug 230870 or bug 236514. Reporter can you please dupe
against either one of these? If not, please explain in some more detail how this
is a separate bug.
The issue here is that Firefox can't tell the difference between a prematurely
terminated download (i.e. when your connection drops half way through) and a
finished download. It needs to distinguish between the two and offer a retry
option on failed downloads, though perhaps this problem will just go away with
bug 230870.
Are you sure about this? Using a recent nightly build, I started a large
download and unplugged my ethernet cord halfway through, and Firefox said the
download had failed, not completed.

Can you reproduce this with a newer build?
(In reply to comment #4)
> Can you reproduce this with a newer build?

I've stopped using the DM manager over the last few months, so perhaps this has
been fixed since 0.8. I'll start using it again and update this bug if I see the
problem (unplugging my ethernet cable gave the same results you got).
I've been using the download manager since my last comment and I've seen the
problem that I described in comment 3 several times. Last time I was downloading
a 4MB file, but at about 400KB the file stopped downloading and the DM showed it
in the finished state (i.e. with the "show" link instead of the "retry" link).
When I clicked on the original link on the webpage again it started downloading
but immediately leapt up to 400KB before continuing at a normal rate.

I can't believe me and the reporter are the only people who've experienced this
bug but I can't find any similar reports.
I have faced situations similar to yours while downloading Netbeans off their
server http://www.netbeans.org

But the only difference has been that if i try to re-download the file, it
starts from zero instead of the 20MB that has already been downloaded.
Is this still a problem in recently nightly builds? Please do test with clean
profiles and no extensions. Thanks.
Confirming on: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a)
Gecko/20040504 Firefox/0.8.0+

Updating summary.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: download manager does not allow retry after download competles → download manager sometimes thinks that incomplete downloads are complete; cannot resume/retry
i got the same while downloading 
http://web.archive.org/web/20021203213324/www.ctl.sri.com/escot/dist/escot_1_0_2/InstData/Windows/VM/Install_Escot.exe
it just thinks that file download is complete, but it isn't.
I would like to confirm that I have the same problem.
My setup is Windows98SE, Firefox 0.9.1... P200 56MB ram...
I have the download manager enabled (to appear).
I frequently download a number of files at once.

Often one or two of my downlaods are incomplete, but firefox says they completed
succesfully.
Frusterating later when I go to use them.  Anything but very small files that I
will check right away; I don't even use Firefox to download.  It's a hassle to
copy/paste to a prompt and use wget or filezilla though...
I have also experienced this problem downloading files from 
http://www.mp3.qo.pl/ Some (not all) files will start and then show completed in
the DM.   The most recent example was the file should have been 17 mb and was
only  3.4 mb when the firefox showed it as complete.  I am using the a recent
(from Sep 1 2004) nightly build on a new XP installation.
This happens to me even when downloading a local file.  If I am viewing a local
directory and click on a large file it starts to download (copy) the file to my
download directory.  The download will abort partway through but the download
manager says it was complete.  Since this happens with a local copy, the problem
must involve more than just dropped network connections.  I am using version
0.9.3 on x86_64 Linux.
Possible duplication with <a
href="https://bugzilla.mozilla.org/show_bug.cgi?id=287932">287932</a>?
The longer the file the bigger the chance of seeing the problem; look at bug
#230451 for a better description of the problem.
Firefox 1.0.3 still has this annoying bug that finally sent me back to Mozilla.
*** Bug 291665 has been marked as a duplicate of this bug. ***
I also have this problem it seems to happen when there is a connection problem
between me and the download host.

it seems some hosts will drop the http transfer but end the TCP session nicely
FireFox wont recognize that the total bytes and bytes received are not the same
and doesn’t reconnect to finish the transfer, the result is files of any size
(I’ve had files as small as 200k not finish) are marked completed when they aren’t…

If the TCP session doesn’t finish nicely FireFox will behave correctly and won’t
mark the download as complete (although FireFox doesn’t seem to want to resume
unfinished downloads…)

This can be replicated by setting up a say a web server start downloading from
the web server and pull out the network cable ff will just sit and wait…

but if in the same scenario you kill apache instead of pulling the network cable
the TCP connection will finish nicely (well nicer :P ) and assume the download
has finished..

I assume that this happens in real life when the web server you are downloading
from for some reason kills your http session (e.g. TCP still finishes nicely)

This problem had been in every version of ff I have used from 0.9 - 1.04
Just tested this bug using 

Test Enviroment
---------------

Client: latest Firefox (Nightly Build - 8th Jun 2005) in a clean version of
Windows XP (with all updates installed)..

Server: Debian Sarge running apache and squid (with all updates installed)

Results
-------

FF thinks that the file has successfully downloaded if you are proxying through
squid and kill squid. (http doesnt finish nicley, tcp does)

FF thinks that the file has successfully downloaded if you are downloading from
apache and kill apache. (http doesnt finish nicley, tcp does)

FF thinks that the file hasnt successfully downloaded if you are downloading
from apache and pull the network cable out of the server. (http doesnt finish
nicley, tcp doesnt finish nicley)
(In reply to comment #18)
> Just tested this bug using ...

Do you know if the same thing happens with the Mozilla suite, or is this Firefox
specific?
Firefox: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2)
Gecko/20050609 Firefox/1.0+ ID:2005060915

Apache: 2.0.47-1.9.91.mdk (under Mandrake Linux 9.2)

Attempting downloading of a large file, server returns headers of:

HTTP/1.1 206 Partial Content
Date: Thu, 09 Jun 2005 23:55:48 GMT
Server: Apache-AdvancedExtranetServer/2.0
Last-Modified: Tue, 18 May 2004 02:08:42 GMT
ETag: "3ff4-637a8f0-6bbcde80"
Accept-Ranges: bytes
Content-Length: 102105285
Content-Range: bytes 2205739-104311023/104311024
Content-Type: application/x-executable

After a few hundred KB downloaded I start killing off httpd processes using
"kill -KILL <pid>". After a number are gone, the download stops. Once, I got a
successful download, despite the download obviously not being complete. Another
time I did get an error saving the file, but it was the error I get all the time
with successful downloads anyway, I think there is something up with my disk on
that front.
Suite also believes it downloaded fine. In fact in the transferred column it
reckons it got all 101MB.
Same here. I too experience the defect even with Firefox 1.04 version. Nothing
has changed. When the connnection fails (slow connection, route not available
etc.) Firefox simply marks it as completed!

It doesn't even allow you to retry the failed download, nor does it shows the
actual URL from where it downloaded. 

At the least irrespective of its opinion on the state of download, it should
allow for re-download!
Assignee: bugs → darin
Component: Download Manager → Networking: HTTP
Product: Firefox → Core
QA Contact: aebrahim-bmo → networking.http
Version: unspecified → Trunk
(In reply to comment #19)

> Do you know if the same thing happens with the Mozilla suite, or is this
> Firefox specific?

I used Mozilla for quite some time before FF and had only occasionally damaged
downloads. With FF in the beginning I didn't think the problem would be FF
specific but with repeated occurrences I got more and more annoyed. Since I went
back to Mozilla I haven't had a single broken download yet (I am very attentive
to this problem now :-)
I have experienced this problem in 1.04, 1.03, and 1.0Pre, but I can 't remember
farther back.

When downloading a large file, usually about 600MB .iso files on my LAN, the
download will begin. Downloads will start around 45000K and level out around
11000-12000KB/sec, and then fail. It can fail at 30MB, 40MB, 50MB...130MB...
When it fails, I do not receive any errors, it (Downloads dialog) just says
done. I have not had the problem on slower networks (internet), where download
speeds are much slower.

On a Windows box, it downloads 1900KB--5000KB/sec. and it hangs (Downloads
dialog does not go away and does not say done).

The server I am downloading from is an IBM x336, 4GB RAM running Apache on SuSe
Enterprise 9. This is connected through a gig Cisco switch. My PCs are on 100MB
FULL Cisco connection. I do not see other issues with the connection, and the
Galeon downloads work through all the same hardware and software.

I have no other downloads in the list, and the Windows box is a fresh install of
Firefox.

I am able to download the complete file in Galeon, consistently. Does anyone
know if this is a bug, or is there a setting I can alter to troubleshoot this? I
have created a new profile, not adding extensions, bookmarks, themes, etc. Same
problem. This is a local LAN with a reliable connection. Both sides are set to
100/Full.

Linux 2.6.9 - 2.6.11.x kernel
Firefox 1.x
P4 3.4Ghz
2GB RAM
root filesystem is standard IDE ext3
2 SATA Raptors in RAID 0 with JFS filesystem for a storage area
Failure happens on both filesystems/drives

Winblows box:
P4 IBM Thinkcentre
512MB RAM
Standard SATA drive
Firefox 1.0Pre
I'm seeing this bug with disturbing frequency with the latest nightly trunk
(Deer Park) build, when trying to ftp the latest Firefox nightlies from the
mozilla.org mirrors. I'm not sure if it's the mirrors, the browser, my network
or what that is causing the termination, but the download manager is fooled into
thinking it was a complete download.

Also on Mac OS X -> All/All.

I'm ignorant of the ftp process, but is it possible that a server can send some
sort of termination signal that the browser incorrectly assumes means that the
download has finished instead of just aborted? Alternatively, is there any
time-out or lost-connection code in the browser that could be incorrectly
signalling a completed download instead of an aborted one?
OS: Windows XP → All
Hardware: PC → All
IIRC the download manager code treats any end of data receving as normal end of
file, irrespective of what other code tells it.
*** Bug 296785 has been marked as a duplicate of this bug. ***
*** Bug 214328 has been marked as a duplicate of this bug. ***
*** Bug 295408 has been marked as a duplicate of this bug. ***
The following example fails reliably for me.  I'm accessing it via a 300kbit
cable connection, in case that matters:

http://rochebruneadsl.free.fr/gcc/pdf/gccint.pdf

This is a 2,096,519 byte file.  Mozilla (FF 1.0) downloads about 1.8M and then
the progress bar in the download manager jumps to 100% and reports "done".  The
remaining 200k is not downloaded.

Here is the result of downloading using wget, which is interesting:

$ wget http://rochebruneadsl.free.fr/gcc/pdf/gccint.pdf
--12:09:04--  http://rochebruneadsl.free.fr/gcc/pdf/gccint.pdf
           => `gccint.pdf.1'
Resolving rochebruneadsl.free.fr... 212.27.40.178
Connecting to rochebruneadsl.free.fr[212.27.40.178]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2,096,519 [application/pdf]

86% [==============================>      ] 1,813,007     35.32K/s    ETA 00:08

12:09:56 (34.74 KB/s) - Read error at byte 1813007/2096519 (Connection reset by
peer). Retrying.

--12:09:57--  http://rochebruneadsl.free.fr/gcc/pdf/gccint.pdf
  (try: 2) => `gccint.pdf.1'
Connecting to rochebruneadsl.free.fr[212.27.40.178]:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 2,096,519 (283,512 to go) [application/pdf]

100%[+++++++++++++++++++++++++++++++=====>] 2,096,519     31.18K/s    ETA 00:00

12:10:05 (32.37 KB/s) - `gccint.pdf.1' saved [2096519/2096519]


So it looks like the server drops the connection after 1813007 bytes.  wget
detects this, and retries with a "partial content" request [re comment #1, HTTP
certainly does support partial downloads], eventually getting all content, while
Mozilla erroneously reports it "done".

For comparison, IE6 (under Wine) downloads 76% of the file and then reports
"Internet Explorer cannot download gccint.pdf from rochebruneadsl.free.fr.  The
connection with the server was reset."  Opera 8 gets to the same point and shows
"Error"; it is then possible to select "Resume", and it will complete.

*** Bug 304985 has been marked as a duplicate of this bug. ***
*** Bug 301595 has been marked as a duplicate of this bug. ***
*** Bug 311057 has been marked as a duplicate of this bug. ***
What's even more confusing, the download window (be it progress window or download manager) incorrectly reports that the expected number of bytes has been downloaded.

In fact, this problem persists since Mozilla 1.0 and apparently nobody ever bothered to report that.

Attached patch patch v0 (obsolete) (deleted) — Splinter Review
Check the final progress notification versus the expected file size to determine if download was successful. I think this should solve the problem.
Attachment #210862 - Flags: review?(neil)
Comment on attachment 210862 [details] [diff] [review]
patch v0

Sorry, I can't tell if this approach is reasonable.
Attachment #210862 - Flags: review?(neil)
I'm not sure that this should be done in frontend code... necko should be able to detect this. or, if not necko, then it should be exthandler/webbrowserpersist.
(In reply to comment #35)
> Created an attachment (id=210862) [edit]
> patch v0
> 
> Check the final progress notification versus the expected file size to
> determine if download was successful. I think this should solve the problem.

This patch implements the minimum level of support that Moz should have; it just gets us what IE does, i.e. reporting an error.  All that the user can do is retry, and the download will probably then fail again in the same way.

It would be better to implement a "resume" button as Opera does, or to automatically resume as wget does.  See comment #30.  This needs to use the HTTP range mechanism to continue from the point where the previous fetch stopped.

--Phil.
Although this is a little off of subject, are there any known plans to update Firefox's Download Manager completely?  It sure could use it, I am having this problem, too, and many others.
I filed a bug report that appears to be a duplicate of this, bug #329895. I'm sure I've seen it in Mozilla suite, and today in a current Seamonkey tinderbox on Win98SE and 56K dialup. The possibility of timing out has often occurred to me. Does Mozilla or server software calculate estimated time of a download based on connection speed? If so, all Mozilla versions I've seen may be sending over-estimated speeds to servers.

An excessive speed estimate (or too-short time estimate) could happen because the speed isn't calculated until the user enters info in the save-as dialog box, but the download is progressing during that time into the system temp folder. After the dialog box data is entered, THEN the rate is calculated based on the data already received but not on the corresponding time interval. It's based on maybe one second of downloading after the dialog data is entered. So the rate always starts substantially higher than the actual instantaneous rate and declines as the download progresses. The estimated time is initially too small and gradually approaches a realistic estimate as the rate falls.

If servers use the initial rate or time estimate from the recipient, it could under-estimate the time, perhaps seriously, and treat the realistic download time as a stall and kill the connection.

Correctly calculating download speed and estimated time based on an instantaneous rate instead of the faulty present method might help to fix dropped downloads.

Mozilla/Seamonkey/Firefox should also compare the filesize on the server to what's been received to indicate whether a download is complete or failed.
> If servers use the initial rate or time estimate from the recipient, it could
> under-estimate the time, perhaps seriously, and treat the realistic download
> time as a stall and kill the connection.

servers don't know the time estimates from the client...

> Mozilla/Seamonkey/Firefox should also compare the filesize on the server to
> what's been received to indicate whether a download is complete or failed.

That's what this bug is about, as I understand it.

*** Bug 329895 has been marked as a duplicate of this bug. ***
(In reply to comment #39)
> Although this is a little off of subject, are there any known plans to update
> Firefox's Download Manager completely?  It sure could use it, I am having this
> problem, too, and many others.

According to the Firefox 2 wiki pages ( http://wiki.mozilla.org/Firefox2/Features ), there was a "pluggable download manager" scheduled for that release. But it now has a line through it, so I assume it's been postponed.

Could we get at least minimal checking in for the Firefox 2 branch (i.e. something based on the current patch), and worry about a better fix later on the trunk? I assume 1.8.1 is the branch leading to Fx 2?
Flags: blocking1.8.1?
*** Bug 330794 has been marked as a duplicate of this bug. ***
A new twist: using Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.9a1) Gecko/20060309 SeaMonkey/1.5a, I had an apparently unsuccessful download that unzipped OK. This Seamonkey's download manager reported the filesize to be 11054Kb, called the download done at 11052Kb, and it unzipped ok anyway with no errors. Why would there be a 2K discrepancy between original filesize and bytes downloaded?
The 2K discrepancy is probably due to the dl mgr missing the final notification. This patch should address that issue as well.
Attachment #210862 - Flags: review?(darin)
note comment 37. I'm not sure this is the best way to fix this.
Comment on attachment 210862 [details] [diff] [review]
patch v0

Delegating first review to cbiesinger.
Attachment #210862 - Flags: review?(darin) → review?(cbiesinger)
Comment on attachment 210862 [details] [diff] [review]
patch v0

ok then, review-. imo this should be done in necko or exthandler, which should report an error to the frontend.
Attachment #210862 - Flags: review?(cbiesinger) → review-
Not going to block 1.8.1 for this, but "we'll" happily consider a trunk patch once it has baked.
Flags: blocking1.8.1? → blocking1.8.1-
Priority: -- → P3
Target Milestone: --- → mozilla1.9alpha
I've had this occur frequently when pausing, then disconnecting my dial-up. Upon reconnecting, pressing the resume button causes firefox to mark the download as complete, with no option other than to delete the truncated file and start again. FF 1.5.0.7 on WinXP SP2.
I still have this problem using:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9

I almost gave up on FF downloader for big/long downloads. I find wget more reliable when dealing with difficult downloads that keep failing.
wget detects broken/failed/incomplete downloads and has no problem with resuming those.

...something I would expect from FF download manager from V.1.0 but still not there, because it's "stupid" to have this super browser with such a weak download managing code. :(

Some of the Firefox plug-in download managers I've tested also have this problem (I guess because all of them are simply using the same code built into Firefox, the one with this problem).

I found another open-source app built using Mozilla code-base with this same problem of incomplete downloaded files marked as complete, and that's the Participatory Culture's Democracy RSS Video player.
I stopped using it for this same reason. Lot's of incomplete video files downloaded as if they were complete (and then, unplayable, of course).

I'm thinking of upgrading to FF 2.0 to see if this got solved, although I'm affraid this bug's still unfixed :/
(In reply to comment #53)
> I'm thinking of upgrading to FF 2.0 to see if this got solved, although I'm
> affraid this bug's still unfixed :/

The bug still exists in Fx 2.0, or it'd be marked as closed.

We seem to have a better download manager... in the Software Update code. Could that be harnessed for this?
I'm another surprise victim of DM confusion.

I was downloading three large files simultaneously, and decided to suspend two of them so that I'd get one completed sooner. When it finished and I went to resume the other two, they both downloaded a bit more successfully, but then stopped and were marked complete. The DM knew how many bytes to expect, and the result was well short, yet it didn't note the error.

So from my perspective:

1. If the final size doesn't match the expected size, this should not be reported by default as a successful transfer.
2. If DM knows or suspects that it cannot successfully suspend and resume the transfer, then either the Pause option should be removed, be greyed out, or require the user to confirm intent despite the risk.
3. The DM should learn how to resume a download more reliably.
I've encountered this problem numerous times using FireFox2 on an WinXP machine. The download seems to complete but when you try to install the application you just downloaded it fails because the files are incomplete.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070222 SeaMonkey/1.1.1

I just updated from Seamonkey 1.0.7 (I think).  I do NOT use the Download Manager.

With 1.1.1 if I download a file the normal dialog box appears that has the "Close", "Launch File", "show File Location" and the Percent complete.

However, if I try to download it again it "instantly" is "downloaded" which is impossible for the huge file (10mb).  The file is not actually being download but instead coming from some hidden cache.  This is the case regardless of whether the "Save As" file name is kept the same and is overwritten, the file was removed just prior, or the name is changed.

The previous version did NOT react this way.  If one did a download repeat it would actually download the file.

This is a problem particularly if the src file one is trying to download has the same name but different location from the previous.

How does one get a fresh copy download with 1.1.1. now?  Obviously something has changed.

thanks,
rich
Hi Rich,

Guessing here, but did you ensure that there wasn't a hidden/temp file in your download directory, and also that the browser cache was emptied?
Several things- 

1. I have been downloading a zillion files previously with several years worth of mozilla suite versions and then a previous verion of Seamonkey.  I have never had to clear the cache and I have never used the Download Manager.  I have never had this problem before.  I have at times for other reasons cleared the cache though.

2. I uninstalled the previous version of Seamonkey prior to installing this Seamonkey 1.1.1.  Then I almost immediately went off to do these other downloads that exposed this problem.

3. I'm not sure what you mean by a hidden or temp file in this circumstance.  I was saving an exe for zone alarm free.  Also something I have done a lot without this kind of problem.  This filename was ordinary and the directory I was saving it to was nothing special.  This dir was something like c:\d\software\updates\mozilla.  Nowhere near places that may have temp files or even program-related files.

4. I have my Windows xp pro as I have had all of my **** windows systems since the beginning of time set to display all files and display all extensions.  I don't see any temp files in these dirs.

5. Are you saying that ever time after I download a file I have to clear the cache?  If so there is no question that this is a bug.

This happened just after installing 1.1.1 following an outstall of the previous (maybe 1.0.7 I can't remember).  Stands to reason that the new version is the culprit.  It is what changed.  I have used this procedure dozens of times with moz and now a few times with Seamonkey.  This is the first time I have had this problem.  I have not exhaustively tested the combinations that are possible with the in- and outstall followed by repeated downloading so I can't tell the cause nor the entire scope.

thanks,
rich
My suggestions were intended to try to find a work-around for you until this bug was fixed, and did not imply that this isn't a real bug, or that (even if they did work) my suggestions were a satisfactory long-term solution.

I have had the problem in the past where the browser continued to grab a truncated cached download instead of starting afresh, but it's not something that happened very often... usually only after certain bad/truncated downloads. From memory, emptying the browser cache got around the problem in those rare cases, but it's been a while since that happened and it's not something I can reproduce at will to test again. As for the "temp file" possibility, if the browser hasn't placed a temporary download file in the same location as the intended final download, then  that's clearly not an issue.
Those instant downloads might be coming not from a temp file but from a buffer in RAM. I've seen this too, I think, though not recently. Haven't tested 1.1x much.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

This error still occurs. It just happened to me when a "network unplugged" error ocurred on either my NIC or my router. As soon as Windows displayed this error in the system tray, my 70MB download (which was only at 20MB so far) immediately completed with no option to retry.

It is really sad that this has not been fixed since 2004. Please upgrade this to critical. Not sure why you developers think it is okay for a download manager to falsely claim that a download has completed sucessfully.

There are later bug reports which seem to be a duplicate of this one, such as #253328 and #262179 .
It seems like an old-fashioned flow chart ought to be able to describe the steps to perform in a download, and then one could make sure that the ifs, thens, elses and case structures to perform the steps are correct, so there's no dropthrough to the end of the function with a false report of a successful download.

I saw this bug once in the past week, though I forget whether it was a 1.5a, 1.1x or 1.0x version.

-> reassign to default owner
Assignee: darin.moz → nobody
this is still an issue on Firefox Trunk, Download manager often reports files as DONE instead of FAILED in cases like:
- corrupt download
- interrupted download
- not valid http auth credentials
- non existant remote file

imho this should get into the radar before 3.0 release
(In reply to comment #66)
> - interrupted download

The suggested patch, should at least take care of this particular issue (the most common that I get in the few occasions where I still try to use Firefox downloader).

(patch v0   (5.39 KB, patch) 2006-02-06 04:45 PDT, Son Le)
[...]
-      mPercentComplete = 100;
+      // If file sizes don't match, mark the download as failed.
+      if (LL_NE(mCurrBytes, mMaxBytes))
+        mDownloadState = FAILED;
[...]


Also, when dealing with the "unknown size" file download (mainly FTP?) couldn't it try to LIST it and parse the file size? (erm... if that's how it's done currently, forget that I ever wrote this paragraph, please ;)
(Note to self: take a closer look at nsDownloadManager.cpp)

does anyone know if this patch is safe to merge into 1.8 branch.  going to review my local tree, initial dry-run failed partially but I'm experiencing similar issues when doing multiple (>2) from same server with potentially throttled cpu's, etc. thanks.
I can confirm that manually merging this patch into my 1.8 branch builds gives a much better experience and tags downloads much better than previous "Completed" status that was stamped on failed files.
Attached patch updated 1.8 Branch patch (obsolete) (deleted) — Splinter Review
Attachment #284191 - Flags: review?(cbiesinger)
Comment on attachment 284191 [details] [diff] [review]
updated 1.8 Branch patch

I don't own download manager and already said that I don't like this approach (comment 37/comment 49)
Attachment #284191 - Flags: review?(cbiesinger)
I'm getting this on Camino, reported as Bug 420318. At the very least the download manager needs to provide both the expected size and the final size, regardless of the cause of the discrepancy.

If the file size is unknown it obviously can't do anything about the problem, but if it is it should report it.
firefox 3b5 still have this issue.

i'm using it behing a (very) slow proxy and offen the file is displayed finish this a size of few octets downloaded when download manager dispayed the good size when estimating download duration.

i don't see any difficulties to add a warning after comparing the to size.
Firefox 3 has this bug.

Is it so hard to fix? Just ensure:

excepted_file_size == actual_file_size

If it is not equal, then mark download as broken.
You had about convinced me that it was my proxy that was at fault, but I've had it happen when there was no proxy involved.

I'd also like to note that it claims the download was the *expected* size, not the *actual* size.

If the expected size doesn't match the actual size, you should get a warning, *and* you should get an option to continue the download.

I'm pondering some kind of extension to use curl or wget instead of the browser's own downloader.
I used Firefox 3.0 (final) to download files from Gmail.

When I started download, expected file size was ~15Mb. After a while I found this file downloaded in Download Manager with size ~7Mb. No warnings, no messages, no *any* difference from correct download (and no options to retry or continue).

Isn't is this bug?
Experienced this bug today running SM 1.1.13 for Linux/i586.
Experienced this bug today running SM 1.1.13 for Linux/i586. The download was a SM 2.0a2pre nightly, 13Mb, reported complete at the indicated filesize, actual file was a partial of 9Mb.
Last I knew, seamonkey was not using the toolkit download manager UI.  As a result, I don't think you want this bug.
I just noticed: This bug is still there with Firefox 3.0.5 (WinXP).
It's true: bug persist in the Firefox 3.0.5 Portuguese at Windows Vista Portuguese. 
Download Manager doesn't identify the interrupt/break of download and show as COMPLETED... :-( 

BUT... if you DELETE saved file (target), keep files history at Download Manager and open the same download address (URL) at same session file is resumed(?) and CONTINUE download from last part/length save... 

Why FF Download Manager can't identify break and do this steps ? :-( 

Thanks.
Morais
Needs fixing, and seems like necko is the right place for the logic.  Putting high on my list of bugs.
Assignee: nobody → jduell.mcbugs
Status: NEW → ASSIGNED
Target Milestone: mozilla1.9alpha1 → ---
Any news on fixing this? Bugs tend to become forgotten-
It's still on my list.
Not sure if this should come under the same bug, but I've found that some Netgear routers have a 'Trend security' feature which, when enabled, produces the same problem in Firefox. Other browsers don't seem to have a problem when this is enabled though.
I recommend Sun Download Manager (free at java.sun.com) as a workaround.
Well, I signed up to put in a RFE to allow restarting.  This is a bit off topic but I feel that it is part of this issue.

Looking through issues and other stuff over the last 24 hours has shown me that the Download Manager should be able to detect a bad/incomplete download and resume it.  The file size is stored in the sqlite database according to the schema.

To add to this issue.  With IP, copyrights and licensing, it can be a problem to download files from various sources without going through a proxy.  The issue is some programs or sources are larger than the proxy will allow to download or download so slow that the proxy drops the connection.  Result is to start again.

I did a quick search and found that the Download manager can pause and restart through a businesses proxy server as long as it gets the correct information for:

entityID  
TotalEntitySize
mResuming
ContentLength()


in

nsIDownloadManager, nsIDownload, nsIDownloadProgressListener nsIDownloadManagerUI, nsIResumableChannel

As the ability to pause between sessions is there, then the ability to resume a failed download should be possible using the same idea.  Store the size, file name, blocks completed and other information in the download schema.

The only issue I see is the referrer reference.  If using a proxy, the referrer link could be different.

The added ability to save the *.part file and then restart would be nice.

The Download manager would have to not close out the temporary files when they are not complete and flag them in the UI as incomplete.

I have just started to look at the code for Linux but I am not a programmer (not with much experience anyway) and I have never worked with C++ though I am trying to learn.

In general, the incomplete downloads should be marked as such, if the size is not met.  I have seen downloads that are listed at 90Meg and when 5Meg is downloaded, the proxy shuts the link and the download manager marks the file as complete.  This is on Linux with 3.5.8 from last night.

Fixing the resume for corrupted or stopped downloads will, at least as I see it, fix this bug as well.  I am going to examine the source code further.
Hi Robin, 
this is great, thank you for your interest. 
I did not catch the only thing - referrer problem:
"The only issue I see is the referrer reference. If using a proxy, the referrer link could be different."

Regards,
Milan
I am not 100% but if my knowledge doesn't fail me, the referrer would be the link for the file.  When using some web proxies, the link for the file would change on each access to the file due to the way that some public proxies function.

I have not had time to check the schema and test this so I am hoping my aging brain is correct.

referrer TEXT 	The referrer for the download. 

Just looking at the schema and I see this also.

source 	TEXT 	The source URI for the download. 

Now I am talking public proxies similar to what you would find from here.

  http://www.publicproxyservers.com/

It may have to be an option to resume a proxied download but I will have to think about it some more.
Okay.  This is the reference that I have started with.

https://developer.mozilla.org/en/The_Download_Manager_schema

This refers to downloads.sqlite in Firefox's home directory.  The data stored in this file is as per the schema.  I have compared it with actual downloads.

I am going to test with proxies on the same file later.

The data base does do what I expected it did.  The "referrer" is the url for the page that you click to download.  The "source" is where the file came from.

The "source" is going to be the problem when working through proxies as the addresses for the "source" can and in many cases will change.  Some way to select the *.part file that goes with the source we wish.  Use the "target" to help confirm the correct file.

I will post more after I have done some proxie tests.
Why would proxies affect what the source or the referrer is?
First, I am talking about public proxies.

Some proxies will change the URL for the source site on each access.  It is part of the encoding that the proxy uses.  I was doing some testing over the last couple of nights and I would enter a link and the source was different each time it accessed the download site.

I didn't check the referrer that much in detail yet.  Maybe tonight.

Some proxies don't even keep the source file name from my tests over the last couple of nights.  One I used last night changed the file name to browse.php and the other name was init.php.

I was thinking that if the download manager checked the reported file size when the download is terminated and then saved the file as a *.cor file indicating corrupted, it could allow a way to access the downloaded part.  I know that the file size is transmitted on many downloads but if the file is not complete, the download manager doesn't indicated it as incomplete.  This would fix the original complaint about not indicating partial downloads as incomplete.

I tried to look through the code last night but in the short time I had, I didn't find the part that saves the *.part file during the download.
I just found out this bug I've also reported in 2003 is still alive and well, I supose the fact that it becomes rarer as the user internet connection improves (over these 7 years i've gone from 56k up to 20mbps fiber) will make it very unlikely to get fixed.
(In reply to comment #100)
> I just found out this bug I've also reported in 2003 is still alive and well, I
> supose the fact that it becomes rarer as the user internet connection improves
> (over these 7 years i've gone from 56k up to 20mbps fiber) will make it very
> unlikely to get fixed.

There will always be those with unreliable connections, even if they are fast. Plus, as connections get faster, people download bigger files.

I'd think that such an annoying bug being nearly seven years old would make fixing it more of a priority. Can't be good for the image.
(In reply to comment #101)
> There will always be those with unreliable connections, even if they are fast.
> Plus, as connections get faster, people download bigger files.

The last time I've seen this bug happen was a couple a days ago when a friend of mine was downloading some drivers using the Firefox 4.0 nightly over a 3G link.

The installer would just fail with "not valid archive" error, but ofc we both knew what was going on so we just used a download manager to finish the download properly. Most users would just re-download the 100mb+ file (of which half was already on the hdd) without even realizing what exactly happened.

Don't get me wrong, I would love to see this bug fixed. When I had a very pour internet service this bug would happen on 90% of the downloads, but after 7 years I'm not holding my breath anymore. Me and every other firefox user I know have just learned to live with it by keeping a download manager around and not trusting firefox download manager.
I use mobile connection. This topic is still actual after seven years.
I can not understand how this behavior can be considered acceptable. If an operation as important as downloading fails, that HAS to be reported to the user.

And it's not hard to fix. Firefox knows the size of the download. It knows it got less than the anticipated size. Even if a proxy hides network errors, it has all the information it needs to flag the download as incomplete.
(In reply to comment #104)
> And it's not hard to fix. Firefox knows the size of the download. It knows it
> got less than the anticipated size. Even if a proxy hides network errors, it
> has all the information it needs to flag the download as incomplete.

Exactly, like suggested in Comment 67
https://bugzilla.mozilla.org/show_bug.cgi?id=237623#c67

(patch v0   (5.39 KB, patch) 2006-02-06 04:45 PDT, Son Le)
[...]
-      mPercentComplete = 100;
+      // If file sizes don't match, mark the download as failed.
+      if (LL_NE(mCurrBytes, mMaxBytes))
+        mDownloadState = FAILED;
[...]

and even FTP downloads of unkown size, could be pre-parse a LIST to get the size.
btw, this bug and the JavaScript speed were 2 of the main reasons to change to Chrome, a long time ago.

...and I'm still not sure now, but I'm under the suspicion that even Chrome has an issue like this. It's harder to check this with larger broadband access now.
Not sure when I get to this, so cc-ing some other likely suspects.  It would be great to fix this in FF4 timeframe if possible.

Folks: please don't add any more comments unless they've got technical details that will help with fixing this. Otherwise this bug report just gets harder to read.
Assignee: jduell.mcbugs → nobody
This annoying behavior (major bug) existed from Firebird 0.7 to Firefox 1, 2, 3, 4, 5, 6, 7, 8 (the version I'm using now), and today I was testing FirefoxAurora 11.0a2. Guess what! Firefox still cannot download files properly, interrupts downloads as "finished", offers no notification and resume option. I was trying to download today three times Thunderbird from the Mozilla website with Aurora 11. It downloaded 33 MB, then 19 MB, then 1.8 MB and always ended as "finished" and giving me only corrupted files. It's a shame that I have to use a special download program for downloading files instead of Firefox. BTW: Today I was using a slow mobile modem (25 kB/s). Maybe this bug is not notable when on broadband. Please fix this for good!
Whiteboard: [lame-network]
When file downloaded incompletely, no warning from firefox.
Firefox should assert a warning whenever the actual size of the downloaded file is smaller than the expected file size (Regardless the reason of the download failure).
(In reply to Jason Duell (:jduell) from comment #107)
> Folks: please don't add any more comments unless they've got technical
> details that will help with fixing this. Otherwise this bug report just gets
> harder to read.

Apologies for adding to the bugspam, but I wanted to point out that the Downloads Panel extension displays a details panel similar to Opera's, allowing to see at a glance if a download is complete. So it can serve as a workaround to alleviate this problem for the time being.
https://addons.mozilla.org/firefox/addon/download-panel/
Status: ASSIGNED → NEW
FYI, I came across this article giving workarounds for the problem that sometimes work.
http://maggiesfarm.anotherdotcom.com/archives/21021-Docs-Computin-Tips-Resuming-broken-uploads-downloads.html#extended
This issue has just broken my download of quite a large file from one of these slow file sharing services with time restrictions two times in a row, not speaking about many other times before. I thought that my issue was en exception, but now (as I found this bugreport) I'm really surprised this bug is well-known and moreover, it's more than 8 years old now, it has been present since the first version. Ppl, this bug makes firefox almost useless for downloading large files, IMO, it's a major issue which needs to be fixed. I even can't just redownload file so I won't have that time penalty, firefox just leaves me without any options, simply marking a 0 bytes file as downloaded..
still present in 18.0.2.
I believe it's present in 22.0a1 too.
I've had this happen with s3.amazonaws.com. 22.0a1 (2013-03-25)
It happened for 2 out of 5 files today (with s3.amazonaws.com).

Firefox showed me the declared file size before starting to save the file. This means that FF has everything it needs to detect an incomplete download operation — it should just compare two numbers.

Please raise importance of this bug!
Simply copy the download URL to a separate download program. That is what I do since ages. Firefox is unable to reliably download files. And since nine years no-one at Mozilla cares.
The upgrade that I suggested is very easy to implement. I don't see a reason why it will not be done in the very near future.

I repeat it, here:

Firefox should assert a warning whenever the actual size of the downloaded file is smaller than the expected file size (Regardless the reason of the download failure).

I will not solve all the problems, but at least the user will be aware of partial downloads.
Somebody with good writing skills, please write an article in a well-known computer publication about Firefox's inability to detect incomplete failed downloads. This will probably attract attention of those who is responsible for FF's development.

And God's sake, raise the importance of this bug report!
Whiteboard: [lame-network] → [please read and respect Comment 107] [lame-network]
:mak, what do you think about fixing this in the download manager?

That has been rejected in the past in favor of a necko approach, but I'm not convinced that's the right thing to do. Generic http consumers have to be a little more flexible in order to be backwards compatible, but I wouldn't be surprised if the download manager would be successful in enforcing the content-length (as long as there was on - making sure chunked scenarios work ok is important, and I wouldn't enforce the trailing 0 chunk requirement, that really does come up missing all the time.).
Flags: needinfo?(mak77)
Now, writing an article is a great idea.

The idea of coping the link to insert in a different program is not a great suggestion for many people.  Heck, there are people that have a hard time just understanding that the file is actually downloaded to their computer when they download.  I have seen cases of the same file being downloaded multiple times on the same computer.

If FF is supposed to be a great browser, then it has to be complete.  A download manager that doesn't manage is just useless.
(In reply to Patrick McManus [:mcmanus] from comment #133)
> :mak, what do you think about fixing this in the download manager?

I'm all-in to reopen the discussion and evaluate fixes at the DM level and based on discussions I had with Paolo (that is currently the owner of the DM module) I think this is a global acknowledge.
It's pretty clear the "wait for magic Necko fixes" strategy didn't pay out, considered the age of this bug.

Now, would be good to start by identifying all of the possible situations where this may happen, and for each evaluate what could be the strategy to attack it at a necko and a DM level, and taking the best path for each.

At this time Paolo is working on a new implementation of the Downloads API and may already have a plan to attack this in the new API, or we we may start planning that.
Flags: needinfo?(mak77) → needinfo?(paolo.mozmail)
(In reply to Marco Bonardo [:mak] from comment #135)
> Now, would be good to start by identifying all of the possible situations
> where this may happen, and for each evaluate what could be the strategy to
> attack it at a necko and a DM level, and taking the best path for each.

Yes, I agree this is how we should proceed. I had already filed bug 847184 with
the goal of finding at least one such possible situation, leveraging the test
suite of the new Downloads API.

While in these days I'm focused on other tasks, it's my goal for the imminent
Networking Work Week to find as many as possible of those situations on various
platforms, and understand the best way to resolve them.
Flags: needinfo?(paolo.mozmail)
Please make sure to cover the case described in bug 856282.
@XtC4UaLL: Normally I'd agree with "reading and respecting" such a comment as comment #107; however, the comment was made in October 2010. Granted, it said they didn't know when they'd get around to it, but... 2½ years, closing in fast on 3, without any word from jduell? I know he said he put it high on his list of bugs - in January 2009, 4½ years ago - but AFAIK, no progress has been made since then? (I'd love to be proven wrong.) I can definitely see why people find it difficult to respect comment 107.
I think I was recently hit by this bug as well. I have a WebGL application that loads relatively large static models using Javascript and XMLHttpRequests. If the loading process is interrupted by reloading the page quickly enough, subsequent page reloads might get only partial data for the interrupted file from the cache. 

I have seen this also happen to included javascript files as well so it's clearly a generic interrupted download issue.

It would be really nice to see this bug fixed before its 10th anniversary.
Download manager's timeout mechanism causes many incomplete downloads, and should be improved.

I was trying to download a 23MB TIFF image from the American congress library. The download always stopped at the middle, probably due to a lag at the server's side.

I used the following manual method/algorithm to force Firefox to download the image for me:

1. Download of a 23MB image starts.

2. I watch the download speed to see if its changing - an indication that the download is still in progress. No other way in Firefox to know.

3. Whenever I see that the download hangs, I wait ten seconds and then I click "pause". (if I fail to click "pause" fast enough, Firefox will terminate the entire download).

4. I wait a while. Lets say ten seconds.

5. I click "resume".

6. I wait max of ten seconds to see if download continues. 

   If it does not - I go back to step 3 (click pause and wait).
   If it does - I go back to step 2.

For the given images I had to repeat this "algorithm" for 5-6 times before download was completed.

Why can't Firefox do automatically what I am forced to do manually?

Also, maybe the timeout is to short before Firefox decides to terminate a download.

This bug is very easy to fix - why nobody does it?
Somebody needs to contact Boris Zbarsky (https://air.mozilla.org/mozilla-distinguished-engineer-boris-zbarsky/) to get this fixed. When he gets involved, things get done.
If his involvement wouldn't help fix this nearly 10 year old issue, probably nothing will. Then it should just be closed and accepted as mozilla "feature".
wow, when i saw this hasn't been fixed for almost 10 years

i have no hope to fix another bug which i mentioned here
https://bugzilla.mozilla.org/show_bug.cgi?id=834870

great job mozilla
I am curious about streamed http downloads...  FTP you automatically I think know the file size I would hope. I haven't examined the protocol lately. but HTTP you get several different kinds of things that can happen I should guess, two of which are:
- streamed HTTP with filesize
- streamed HTTP without filesize
it's the latter which can throw things off. a dropped connection (router, switch on a hop, whatever) will kill the download. and the internet isn't known for being reliable for HTTP downloads, and some companies which handle large file hosting know that, which is why they offer download managers for downloads and software products.

google chrome I think has this down.
so does IE.
safari? o?

while I don't know what all causes a tcp connection to die in the middle (too many dropped packets?) and I can only guess, exactly HOW to handle the streamed stuff

am I stating the problem correctly? I don't have an awful lot of knowledge in this area because I have not a lot of knowledge of the underpinnings of the browser or the HTTP protocol, I am going by what I see sometimes.
 some PHP files seem to do byte streaming somehow. perhaps that should be looked up in the php manual, it has helpful comments, on how to do this and that, and perhaps such an implementation can be shown which causes this.
It's not an issue with the server not sending file sizes. (I would consider that a buggy server; even though the spec doesn't require it, there's no way to ensure the download succeeded without.) You can watch in the download manager and it will show the correct size, but when the connection drops, the displayed size changes to the amount it's managed to download so far, and the download is considered successful.

Perhaps this is an intentional workaround for servers that send the wrong size, but there are much better ways to deal with that. (If the download is interrupted at the same place every time, after a couple attempts, *then* you assume the server is giving the wrong size. Or you just keep trying endlessly, in hopes that the server admin notices and fixes their broken script.)
Jim, you make an interesting point about download managers. In my experience those are rarely trustworthy and are often a vector for malware to get into your system, but unwary people use them because they're more reliable. You might say that by making such an important feature unreliable in one of the most popular browsers for *over nine years*, Mozilla is just helping malware authors...
(In reply to Jim Michaels from comment #144)
> I am curious about streamed http downloads...  FTP you automatically I think
> know the file size I would hope. I haven't examined the protocol lately. but
> HTTP you get several different kinds of things that can happen I should
> guess, two of which are:
> - streamed HTTP with filesize
> - streamed HTTP without filesize
> it's the latter which can throw things off. a dropped connection (router,
> switch on a hop, whatever) will kill the download. and the internet isn't
> known for being reliable for HTTP downloads, and some companies which handle
> large file hosting know that, which is why they offer download managers for
> downloads and software products.
> 
> google chrome I think has this down.
> so does IE.
> safari? o?
> 
> while I don't know what all causes a tcp connection to die in the middle
> (too many dropped packets?) and I can only guess, exactly HOW to handle the
> streamed stuff
> 
> am I stating the problem correctly? I don't have an awful lot of knowledge
> in this area because I have not a lot of knowledge of the underpinnings of
> the browser or the HTTP protocol, I am going by what I see sometimes.
>  some PHP files seem to do byte streaming somehow. perhaps that should be
> looked up in the php manual, it has helpful comments, on how to do this and
> that, and perhaps such an implementation can be shown which causes this.

Well the problem is not failing at downloading the file, it is of course understandable
The problem is showing Complete for a failed downloads which is really annoying and causes many problems

i never download large files with firefox, this happens even to 2-3mb files
and the filesize is known, i mean firefox shows the download filesize is 5mb but fails at 3mb and shows it as Completed!
there is no way to be sure unless we check the size ourself
for example in the middle of downloading a mp4 clip, it fails but shown as completed, i only realise it when i'm watching the clip and it stops in the middle of clip
Re. download managers (comments 144 and 146)
There are some good ones out there, and others to watch out for.
Sun's Java site had a good one which went away when they became part of Oracle.  I was using that as a workaround when this bug was still young. :)
One to avoid is FileZilla, which is spyware (unless you have a firewall with fine enough controls that you can block FZ from phoning home).

All download managers do seem to share one drawback, which I hope a fix to this bug will be able to avoid -- namely, they get lost if the download site is in the habit of changing filenames every few minutes in order to force you to go through a front-end web page to retrieve files from them.  The Files sections of groups on both Google and Yahoo do this, so they are not a good place to upload your material if an alternative is available.
Solution to this bug is simply working around this broken Firefox download manager. I use Down Them All, which has an own inbuilt download manager, and for videos FlashGot, which sents the download command directly to Down Them All or any other specified download program, on your PC. Since I do that I don't have anymore broken files.
(In reply to Rotis from comment #149)
> Solution to this bug is simply working around this broken Firefox download
> manager. I use Down Them All, which has an own inbuilt download manager, and
> for videos FlashGot, which sents the download command directly to Down Them
> All or any other specified download program, on your PC. Since I do that I
> don't have anymore broken files.

I tried this add-on tool, but when I download a video from YouTube, I get it without any file extension (the .flv or .mp4 has disappeared). Also renaming file names in the tool is difficult or not possible. So for now I will probably stick to the old and buggy Firefox download manager.
Very often when I download videos from a website such as Youtube, the download speed is very low (I am not sure if the problem is in the website or in Firefox).

I found out that in many cases you can greatly increase the download speed simply by clicking "pause" and then "resume".
A third party download manager isn't a "solution". It's an "unacceptable workaround".

Also, this should be a higher priority. It's a data-loss bug. It should be release-blocker.
Since the download manager is a communication tool, it should be implemented as one.

In communication between two points A and B, you must verify that the data is valid. This is basic.

1. All the time, the download manager should monitor the download speed. Whenever the download is hang or too slow, follow the steps that I suggested in comments 140 and 151.

2. Compare file size and CRC before and after.

3. The tool should report to the user about the status of his/her download in terms of completeness.

I am sure that there are many other parameters that can be checked and monitored. I am not a communication expert, but I guess that all these factors are already known to Firefox developers, so what are we waiting for?
question: does the HTTP (maybe FTP does) implement a CRC hash/checksum on the data and put it as part of the packets as part of the protocol?
does IPV6 HTTP or FTP do this, which if any?
I am almost sure you get at least a filesize with ftp, but I am only guessing since I don't have the protocol in front of me, there's probably an RFC for those...
well, here is proftpd's list of ftp-related rfc's. http://www.proftpd.org/docs/rfc.html
It's clearly possible to detect broken downloads since other browsers don't have the same problem.
It's enough to rely on the file length as reported by HTTP or FTP server (and seen by user). It doesn't present always, but the case of absence may be just ignored (+1 for comment #145).
As for checksum, it is absent in HTTP and FTP headers for a common case. Not sure for a possible extensions, but the checksum verification isn't a point here, I think. TCP protocol does some of this work.
Recently I get a lot of these errors:

"file.mp4 could not be saved, because the source file could not be read. Try again later, or contact the server administrator.".

Few remarks:

1. It is not true that the file could not be saved. The file was saved to the disk, but was not complete (0.98GB instead of 1.2GB, in this case).

2. "The source file could not be read" - maybe it is a temporary communication problem. Did Firefox try to 'pause' and 'resume', as I suggested in previous comments? How long did Firefox try to communicate with the server before stopping to try the download of a 1.2GB file?

3. "contact the server administrator." - is it a joke?

4. Error messages should NOT appear in a new window. I hate it when there are many windows open in my computer. Usually I don't even notice that window because it hides behind other windows. This message belongs to the main window, or, even better, to the download manager window.
(In reply to travneff from comment #156)
> It's enough to rely on the file length as reported by HTTP or FTP server
> (and seen by user). It doesn't present always, but the case of absence may
> be just ignored (+1 for comment #145).
> As for checksum, it is absent in HTTP and FTP headers for a common case. Not
> sure for a possible extensions, but the checksum verification isn't a point
> here, I think. TCP protocol does some of this work.

MD5, CRC, are great, but so far, even file size is not checked whenever Firefox decides that a download "has completed". Firefox should at least compare file sizes before and after download, as said in many previous comments.

In other words: 'at least do something'.
Today I tried to download a 500MB file.

After half of the file was downloaded, the download manager stopped progressing.

Usually, in such cases, I right-click "pause" and "resume", and that solves the problem.

However, sometimes, including today, the "pause" option is inactive (light gray color). I guess it was a moody day for the DM.

I would like to use this opportunity to thank Firefox download-manager developers for not fixing this 2004 bug yet.
I am still trying to download that 500MB file. Long night ahead.

In Trial #2, the download manager stopped after 94MB, but the indication in the DM showed that the entire 500MB is there. In other words: the DM thinks that the download has completed but actually it hasn't.

I don't understand how this bug can occur - can't this DM simply check the file size in the local disk?

In Trial #3, the DM stopped after 70MB, saying that "source file cannot be read". Wouldn't it at least be polite from the DM's side to let me decide whether I want to try and continue this download without early termination?
When I try to download a large file, firefox assumes that it is complete when it is only half way through.
This as well happens with google chrome, and on other laptops.
but it always happen.

but you know what works? wget. the terminal download manager from GNU. when something in the connection happens it just attempts to connect again and just continue.

I am pretty sure if wget can do it, you guys can do it too.
hmmm. I like cURL better, maybe cURL can do a better job? it handles more protocols.
How can I change the timeout of the download manager? 

As I previously wrote, many "lost" downloads can be saved manually using simple "pause" and "resume", but the user must respond very fast, when a download stops, before the download-manager decides to terminate it.
  
Increasing the timeout will give the user more time to respond.

(this change should be temporary, until the DM will automatically do this procedure of "pause" and "resume").
Depends on: 947846
Patrick McManus is the current Necko owner, so we can conclude that the former decision has been reversed. (See comment #133 and bug 947846.)
Changing the component to clarify the new decision.
Component: Networking: HTTP → Downloads Panel
No longer depends on: 947846
Flags: blocking1.8.1-
Product: Core → Firefox
The new Downloads Panel landed long ago. Paolo, any ETA?
Flags: needinfo?(paolo.mozmail)
We've identified some possible actions here based on some research (<https://wiki.mozilla.org/User:P.A./Download_networking_improvements>), but we've not started working on this yet.
Flags: needinfo?(paolo.mozmail)
Whiteboard: [please read and respect Comment 107] [lame-network] → [please read and respect Comment 107] [lame-network][triage]
Assignee: nobody → daniel
Reading through the history of this bug, it isn't easy to fully grasp which the single problem the original poster may have talked about but I've started out to focus on the part "Firefox doesn't notice/alert when my HTTP transfer gets cut off".

Firefox doesn't care about cut HTTP transfers when Content-Length says N bytes but the connection gets cut off when only Y bytes have been delivered (N > Y). If that's a random binary filed being saved, of course it'll be broken. If it is a text/plain or text/html page the content is of course incomplete.

I have a stupid test server of mine send Content-Length: 17, then send 6 bytes of response-body and then close the connection. Simulating an early broken TCP connection basically.

Chrome silently doesn't show any content for this URL. curl shows the content but then exits with an error that says the transfer was partial.

Firefox mozilla-central silently renders a page with the 6 bytes. If I make it a more binary Content-Type, Firefox will instead silently store the 6 bytes as a complete file without a hint about it being cut off.

Compressed Content-encoding will ruin this scheme but I assume that if we break a compressed stream before the end of the response body, the gzip/deflate decompressor will instead signal an error so it shouldn't be as important.

The history of this bug and I believe also the source code seem to imply that Content-Length is unreliable and therefore it is ignored like this on purpose. That's however not my experience and I would object to that. For HTTP/1.1 responses, pages without chunked and content encoding, the Content-Length seem to generally be fairly accurate today - as long as it larger than zero (a decade or so ago, before the wide introduction of large file support everywhere sometimes servers wrapped the file size counters and went negative over 4GB).

I'm currently experimenting with CloseWithStatus(NS_ERROR_NET_INTERRUPT) from within nsHttpTransaction::Close() which prevents both a rendering of a broken transfer and tricking the user that a binary transfer succeeded, but it is not good enough for several reasons, including: 1 - the wording of the big error message is slightly off and will lead a user to looking for problems in the wrong place. I couldn't find an error code that would trigger a correct error message. 2 - probably generating an error message for this is wrong in the first place so perhaps the error should not be user visible like this but it should at least not allow users to believe a file download was fine.

My minimal experiment patch looks like this:

diff --git a/netwerk/protocol/http/nsHttpTransaction.cpp b/netwerk/protocol/http/nsHttpTransaction.cpp
--- a/netwerk/protocol/http/nsHttpTransaction.cpp
+++ b/netwerk/protocol/http/nsHttpTransaction.cpp
@@ -825,21 +825,41 @@ nsHttpTransaction::SaveNetworkStats(bool
 
 void
 nsHttpTransaction::Close(nsresult reason)
 {
     LOG(("nsHttpTransaction::Close [this=%p reason=%x]\n", this, reason));
 
     MOZ_ASSERT(PR_GetCurrentThread() == gSocketThread);
 
     if (mClosed) {
         LOG(("  already closed\n"));
         return;
     }
 
+    if(mContentLength >= 0 &&
+       (mContentRead != mContentLength)) {
+        /* If Content-Length was given (this response was not chunked
+           encoded), the transaction is done without the full promised content
+           having been delivered */
+        reason = NS_ERROR_NET_INTERRUPT;
+    }
+
     if (mTokenBucketCancel) {
         mTokenBucketCancel->Cancel(reason);
         mTokenBucketCancel = nullptr;
     }
 
     if (mActivityDistributor) {
         // report the reponse is complete if not already reported
         if (!mResponseIsComplete)
(In reply to Daniel Stenberg [:bagder] from comment #167)

> Compressed Content-encoding will ruin this scheme but I assume that if we
> break a compressed stream before the end of the response body, the
> gzip/deflate decompressor will instead signal an error so it shouldn't be as
> important.

But the decompression occurs only later. When I download a file and Firefox does not tell me about an error, I suppose that the file is complete. I may decompress it one month later. Then, it may be too late to re-download the file. The sooner we know about the error, the better.
Can FF just rely on Content-type response header? For example, if it's "text/*" (gzipped content goes here) — things could be left as is. If it's "application/*" — be more strict.

However, it would be better to take into account the display/save method instead. For example, I don't need any additional info for a partially loaded HTML while browsing, but I'd like to have a (new) common "not completed" mark if I save it using download manager.
I'd much rather it retry at least a few times, and only give up if it disconnects at the same place every time, since the most likely reason for it disconnecting is **** wifi, and I've told it to download the file, not download part of it and stop. I hate having to tell things to try again until they manage to succeed.
(In reply to Nicolas Barbulesco from comment #168)

> But the decompression occurs only later. When I download a file and Firefox
> does not tell me about an error, I suppose that the file is complete. I may
> decompress it one month later. Then, it may be too late to re-download the
> file. The sooner we know about the error, the better.

I was talking about the automatically decompression that's done by Firefox itself on compressed content... But I just gave this a quick test and my crude patch will also detect cut off content-encoding content when sent with Content-Length since the "mContentRead" there is the raw data size - before being unpacked.
daniel - for the moment I'd like to leave part of the code that deals with unreliable content-lengths in place.. but I think we can take a step forward and toss such an error specifically with >= http/1.1

1] move your check inside the existing if (!mResponseComplete) block.. all you need to do is check (mChunkedDecoder || (mContentLength >= int64_t(0))) && (mHttpVersion >= NS_HTTP_VERSION_1_1) before setting the reason because the block is only exectued if the framing did not find a end of file marker.

2] LOG() it

3] I would also put your content length check in the download manager - so it applies to all protocol versions... for files we can afford to be more strict.


sound good?
(In reply to Patrick McManus [:mcmanus] from comment #172)
> 3] I would also put your content length check in the download manager - so
> it applies to all protocol versions... for files we can afford to be more
> strict.

I'd put it only there and also base it for auto retry and resume.  I think it's somewhat dangerous to play with the code in the channel if not strictly necessary.
At the VERY LEAST, it must be possible for the user to know if the content-length doesn't match the length of the downloaded data. That seems to be the kind of minimal functionality that would pass muster, just from a data loss standpoint. It's incomprehensible to me that this bug passed the first review without someone going "holy cow, we need to fix this, people are LOSING DATA" and fixing it. That it's not been fixed after this many years just boggles my mind.
(In reply to Honza Bambas (:mayhemer) from comment #173)
> (In reply to Patrick McManus [:mcmanus] from comment #172)
> > 3] I would also put your content length check in the download manager - so
> > it applies to all protocol versions... for files we can afford to be more
> > strict.
> 
> I'd put it only there and also base it for auto retry and resume.  I think
> it's somewhat dangerous to play with the code in the channel if not strictly
> necessary.

Seconded, any potentially blocking change should be done at the upper level.

We'll need some Telemetry first to gather data about the situation, and that can be added in the channel code as well, provided it's non-blocking.
we don't need telemetry here - this has been paralyzed by non action way to long. anyhow, it can't do the interesting thing of separating real errors from false positives.

as far as I am concerned both changes can proceed.
I think that what I have just encountered is one of the cases of this bug.

Firefox 26, Mac OS X Mavericks.

Firefox was in the background, downloading a file. 
I was using Safari, and Safari went spinning around. 
My Mac's processor was in the cabbage during a few minutes. 
Then, things calmed down, and everything was responding again. 
Firefox informed me that my download was complete. 
But this is not true. 
My download has ≈ 200 MiB and by luck I feel that “it went too fast”. Indeed, the full file has ≈ 400 MiB. But I could very well not have noticed any problem. 
Typically, the user just thinks the file is complete because Firefox says so. 

This behaviour of Firefox is incorrect. It means data loss.

What has happened is that Firefox went “not responding” during maybe two minutes, and so the download was interrupted.

Here is what I expect instead :

1/ * Ideally * — Don’t interrupt the download. If necessary, just stall the download for a while and then continue where it was. Firefox may not be the only problem here, the server too may need to allow that. 
2/ * If necessary * — Interrupt the download. Inform the user of the problem. And propose the user to resume downloading. From where it was if this is possible, otherwise from the beginning. 
3/ * At least * — Interrupt the download. And inform the user of the problem.

Thank you for improving this.

Nicolas
Blocks: 654577
> as far as I am concerned both changes can proceed.

FWIW I also support making the change in both the channel and the download manager--we can always back out the channel change if we suspect it's causing trouble.  Of course it's Patrick's call, and he's made it, so let's proceed and see how it goes.
No longer blocks: fxdesktopbacklog
Whiteboard: [please read and respect Comment 107] [lame-network][triage] → [please read and respect Comment 107] [lame-network]
Here's an updated take. This code detects broken transfers with HTTP1.1 if Content-Length is used or if the chunked-encoded transfer didn't end properly (ie it was cut off). Updated two test cases due to the changes. Added a new test case for basic testing of this new treatment.
 
This patch also addresses Bug 654577.
Attachment #8389842 - Flags: review?(mcmanus)
Component: Downloads Panel → Networking: HTTP
Product: Firefox → Core
Comment on attachment 8389842 [details] [diff] [review]
0001-Bug-237623-detect-broken-HTTP1.1-transfers.patch

> --- a/netwerk/test/unit/test_gzipped_206.js
> +++ b/netwerk/test/unit/test_gzipped_206.js

I don't think changes in this file are correct. This test intentionally terminates the connection in the middle and verify that the resume works correctly.
Attachment #210862 - Attachment is obsolete: true
(In reply to Masatoshi Kimura [:emk] from comment #180)
> Comment on attachment 8389842 [details] [diff] [review]
> 0001-Bug-237623-detect-broken-HTTP1.1-transfers.patch
> 
> > --- a/netwerk/test/unit/test_gzipped_206.js
> > +++ b/netwerk/test/unit/test_gzipped_206.js
> 
> I don't think changes in this file are correct. This test intentionally
> terminates the connection in the middle and verify that the resume works
> correctly.

Thanks, but the point with my code changes in this bug is that termination prematurely is now an error so something needs to be modified in test case as otherwise it'll abort on that error (NS_ERROR_NET_INTERRUPT)!

Do you have an alternative modification to suggest to make the test case pass?
I'm not sure that this is the right level to fix it. The HTTP level needs to retain the information necessary for higher layers to determine if the transfer is complete or not, but flagging an incomplete transfer as an _error_ rather than _incomplete_ is likely to trigger a cascade of other problems... this being just one of them.

If the transfer tracks expected size and actual size, that provides enough information for the download manager to mark incomplete downloads, give the _user_ better information about the state of the transfer, and provide a UI for continuing or retrying it... without triggering other error code unnecessarily.
Comment on attachment 8389842 [details] [diff] [review]
0001-Bug-237623-detect-broken-HTTP1.1-transfers.patch

Review of attachment 8389842 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/xpc.msg
@@ +81,4 @@
>  XPC_MSG_DEF(NS_ERROR_UNEXPECTED                    , "Unexpected error")
>  XPC_MSG_DEF(NS_ERROR_OUT_OF_MEMORY                 , "Out of Memory")
>  XPC_MSG_DEF(NS_ERROR_INVALID_ARG                   , "Invalid argument")
> +XPC_MSG_DEF(NS_ERROR_NET_INTERRUPT                 , "Interrupted network transfer")

get r? from bholley (or his delegate) on this file

::: netwerk/test/unit/head_channels.js
@@ +29,4 @@
>  const CL_EXPECT_LATE_FAILURE = 0x20;
>  const CL_FROM_CACHE = 0x40; // Response must be from the cache
>  const CL_NOT_FROM_CACHE = 0x80; // Response must NOT be from the cache
> +const CL_IGNORE_CL = 0x100; // don't both to check the content-length

s/both/bother

::: netwerk/test/unit/test_chunked_responses.js
@@ +101,4 @@
>  ////////////////////////////////////////////////////////////////////////////////
>  // Test 3: OK in spite of non-hex digits after size in the length field
>  
> +test_flags[3] = CL_ALLOW_UNKNOWN_CL|CL_EXPECT_LATE_FAILURE;

update this test so it is a positive test

::: netwerk/test/unit/test_gzipped_206.js
@@ +21,4 @@
>    response.setHeader("Content-Encoding", "gzip", false);
>    response.setHeader("ETag", "Just testing");
>    response.setHeader("Cache-Control", "max-age=3600000"); // avoid validation
> +  response.setHeader("Content-Length", "" + 17);

as discussed in person, the better fix to this test is to conditionally set the content-length in each iteration
Attachment #8389842 - Flags: review?(mcmanus)
also, as discussed, update nsHttpChannel::CloseCacheEntry() to consider a failed channel with this condition to be eligible for partial (resumable with range request) caching. ask :mayhemer for review on that
(In reply to Peter da Silva from comment #182)
> I'm not sure that this is the right level to fix it. The HTTP level needs to
> retain the information necessary for higher layers to determine if the
> transfer is complete or not, but flagging an incomplete transfer as an
> _error_ rather than _incomplete_ is likely to trigger a cascade of other
> problems... this being just one of them.

We just discussed this at the recent Networking Work Week, and concluded that our best bet is to go ahead with the change at the network level, and promptly handle any follow-up bugs that might arise. We know that it is likely that some behavior changes will appear at the upper integration levels, like Aurora or Beta, but we can handle regressions with specific follow-ups, and in the worst case just revert the change, since this is a simple patch.
bholley: please take a look at the xpc.msg change I want (single-line, adding a new error code)

This patch is otherwise updated according to feedback, with test cases updated and now an additional check in nsHttpChannel::CloseCacheEntry so that interrupted transfers still can be cached.
Attachment #8389842 - Attachment is obsolete: true
Attachment #8390525 - Flags: review?(bobbyholley)
Blocks: 983197
v3, removed the CloseCacheEntry change again as it isn't necessary
Attachment #8390525 - Attachment is obsolete: true
Attachment #8390525 - Flags: review?(bobbyholley)
Attachment #8390587 - Flags: review?(bobbyholley)
Comment on attachment 8390587 [details] [diff] [review]
v3-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

The xpc.msg change wasn't necessary, widthdrawn!
Attachment #8390587 - Flags: review?(bobbyholley)
This patch sent to the try server shows mochitest 5 test_partially_cached_content.html failing:

https://tbpl.mozilla.org/?tree=Try&rev=7c897be5271f

Working on an update.
one thing I just thought about... is if the server gets a replacement file uploaded that has a different file hash or maybe size. could this be a part of the problem that is being seen? if this happens during a download... how would you detect that? the content-type has already been sent. not sure how this works...  maybe this particular case can't be remedied. if there was a way - good. but I was just thinking about it as a possible situation that could occur.
Well I hope this can get resolved soon, because this bug has now been open for TEN YEARS.
(In reply to Jim Michaels from comment #190)
> one thing I just thought about... is if the server gets a replacement file
> uploaded that has a different file hash or maybe size. could this be a part
> of the problem that is being seen? if this happens during a download... how
> would you detect that? the content-type has already been sent. not sure how
> this works...  maybe this particular case can't be remedied. if there was a
> way - good. but I was just thinking about it as a possible situation that
> could occur.

No, that would be a totally separate case (if at all). This bug is "simply" about the transfer being prematurely cut off and nothing signals an error about that fact.
Maybe my approach to the problem is naive, but since FF cannot download files without data loss, I'm using the Add-on DownThemAll for years. DTA tells me, when a download is interrupted and I can resume it without a problem. Zero bytes data loss with DTA. Why not contact the author of DTA and use his code in FF? Obviously the DTA code is far superior to the code FF uses. Looking at the response time to this major problem (10 years) mankind had never been to the moon if NASA engineers had been slow like the Mozilla coders. ;)
Attachment #8394104 - Flags: feedback?(bzbarsky)
Comment on attachment 8394104 [details] [diff] [review]
let docshell ignore NET_INTERRUPT when something has already been rendered

What about other necko consumers?  What should happen with images in the interrupt case?  Stylesheets?  External scripts?  Why are pages special?
Attachment #8394104 - Flags: feedback?(bzbarsky) → feedback-
Note that that came up in comment 133...

What do other UAs do in those situations?
(In reply to Boris Zbarsky [:bz] from comment #196)
> Note that that came up in comment 133...
> 
> What do other UAs do in those situations?

I have softened my tune on making the DL manager do this itself because I don't actually think it has the right tools to do it correctly.

Sure, if it has a content-length and the DL doesn't match the CL then it can take action on its own. But what about other indications of transfer problems such as aborted http/1 chunks or reset http/2 streams? The channel knows that those transactions have problems just as much as the mixmatched content-length case but the channel consumer (e.g. dl manager) has no insight into it. That doesn't seem right. The transaction failed by the definition of the protocol, so OnStopRequest ought to reflect that so that the consumer can decide how it wants to handle the error. The DL manager really doesn't have the tools to make that decision.

I can think of two reasons to argue against making the channel report the error and I'd like to think neither or them carry the day:

1] If the Internet was filled with servers with broken framing then the channel should probably be working around bad servers. Having broken framing actually was the case 10+ years ago but persistent connections have really forced people to get this right on the server side because the framing is so critical to make persistent connections work. Note that daniel's patch actually only creates the OnStopRequest error when version is >= 1.1.. so while this is out there as something to keep on the radar, I don't find the argument compelling.. and the time that has passed here is largely what has changed my mind (as I pretty much never see this problem manifest itself innocuously anymore).

2] There are just too many consumers that want to ignore NET_INTERRUPT but aren't doing so - so its just a unsolvable compatibility problem in practice. I am hearing a little of that. If that's the feedback, then we can add something silly like nsIHttpChannel::SetRequireStrictFraming that makes this opt-in.. but I worry that there are consumers who are silently accepting NET_INTERRUPT that should actually not be.. I think the script consumer is an excellent example.. is using a truncated https:// script a security vulnerability?(given that an attacker can inject a TCP FIN without cracking the crypto) It certainly sounds like one to me. Other consumers that happily stream their output (like images) can probably safely ignore it. (and I believed they already were.. mistakenly?)
(In reply to Boris Zbarsky [:bz] from comment #196)
> Note that that came up in comment 133...
> 
> What do other UAs do in those situations?

e.g. url that reports too large of a C-L http://www.ducksong.com/cgi-bin/nph-bad-cl

firefox renders it.. chrome fails to render it but I sometimes get a error page and sometimes get a blank page.
(In reply to Patrick McManus [:mcmanus] from comment #198)
> e.g. url that reports too large of a C-L
> http://www.ducksong.com/cgi-bin/nph-bad-cl
> 
> firefox renders it.. chrome fails to render it but I sometimes get a error
> page and sometimes get a blank page.

IE11 also renders.
> is using a truncated https:// script a security vulnerability?

It's possible, yes.

This is why we should decide what the desired behavior is for the various scenarios.  Then we can figure out how to implement that desired behavior.  Narrowly hacking the docloader to fix the test may be the right thing, but it's hard to tell without knowing what behavior we really want in all the other cases.
(In reply to Boris Zbarsky [:bz] from comment #200)
> > is using a truncated https:// script a security vulnerability?
> 
> It's possible, yes.
> 
> This is why we should decide what the desired behavior is for the various
> scenarios.  Then we can figure out how to implement that desired behavior. 
> Narrowly hacking the docloader to fix the test may be the right thing, but
> it's hard to tell without knowing what behavior we really want in all the
> other cases.

So, since we would like to fix this download bug primarily, I'd personally expose the incompleteness information as a read-only attribute on nsIHttpChannel and let the download manager handle it in its OnStopRequest and do any 'proper' solution in a followup.

Up to you to decide.  This is my vote.

(In reply to Patrick McManus [:mcmanus] from comment #198)
> chrome fails to render it but I sometimes get a error
> page and sometimes get a blank page.

Chrome doesn't execute a partially downloaded script, IE11 and Fx do.

I have a small local php that sets cl to some 10000 but only delivers few bytes of some otherwise valid (but incomplete) html.  Chrome times out on it...  After a minute or so I get a pop-up dialog that the page doesn't answer.  Clicking 'end the page' replaces the content with an error page similar to ours.
(In reply to Patrick McManus [:mcmanus] from comment #197)

> 2] There are just too many consumers that want to ignore NET_INTERRUPT but
> aren't doing so - so its just a unsolvable compatibility problem in
> practice. I am hearing a little of that.

I just wanted to add that at least some of those cases may actually benefit from getting this problem in their face so that they actually realize that the transfer may not be complete. Like the download manager does now.
Hey Boris,

I need to figure out what to do here, as this is actually a fairly impactful bug.

I feel like the change to throw NET_INTERRUPT is the right thing. not just from a purist sense, but channels need to be aware of that they are handling truncated data.

It is defined as:
/* The connection was established, but the data transfer was interrupted. */

I've looked for other things in necko that already generate it and the most prominent user is the spdy code - which generates it when a stream gets canceled by the server before a natural finish.. that's a very tight match to the behavior introduced by this patch. The difference is probably that the issue doesn't come up very much until to tie it to socket closures as is done here.

http://www.ducksong.com/cgi-bin/nph-bad-cl generates invalid content lengths for the html, an external js, and 3 references to 2 image urls.

chrome does not render the page.

firefox silently ignores all of those errors and displays the page with images and executes the truncated javascript.

if we add daniel's patch to firefox we also do not render the page.

if we add honza's uriloader patch to the mix we will render the html, but not execute the truncated javascript or display the images. Given that they are known to be corrupted, I think that's the right thing to do.. but let's flag some people to chime in

:seth - what do you think about images or whom should we ask? If you would like to display the truncated images do you think you should be handling the INTERRUPTED error?

boris, do you think we should be pinging someone else wrt js? Would css logically be different?
Flags: needinfo?(seth)
Flags: needinfo?(bzbarsky)
Hmm.  So the current things that can fail a channel with NS_ERROR_NET_INTERRUPT in addition to the spdy code you mention are:

1)  Anything that sends PR_END_OF_FILE_ERROR through ErrorAccordingToNSPR() over in the socket transport.

2)  The range request failure on ContinueConnect in nsHttpChannel::OnStopRequest.

3)  Maybe the bits in nsHttpTransaction::ParseHead ?

Docshell showing an error page for this error code was implemented in bug 106865, looks like; that was one of the cases that went through PR_END_OF_FILE_ERROR.

I'm pretty happy to try not running truncated JS (just like we don't run JS with an unclosed <script> tag, for similar reasons).

I'm also pretty happy to try not using truncated CSS, esp. if some other browsers already do that.

I'd like to understand Chrome's behavior here a bit better.  Does it start doing progressive rendering and then throw it away when there is a sudden connection close?
Flags: needinfo?(bzbarsky) → needinfo?(mcmanus)
(In reply to Boris Zbarsky [:bz] from comment #204)
> Hmm.  So the current things that can fail a channel with
> NS_ERROR_NET_INTERRUPT in addition to the spdy code you mention are:
> 
> 1)  Anything that sends PR_END_OF_FILE_ERROR through ErrorAccordingToNSPR()
> over in the socket transport.

right - I think that is probably the interesting one. afaict this change was meant to cover the empty transfer (but successful handshake) issue.. and we shouldn't be retriggering 106865 because daniel's patch only applies when there is an explicit mismatch in the framing information - and without some transfer we can't get any explicit framing information (i.e. response headers) to have a mismatch with.

> I'd like to understand Chrome's behavior here a bit better.  Does it start
> doing progressive rendering and then throw it away when there is a sudden
> connection close?

I updated http://www.ducksong.com/cgi-bin/nph-bad-cl to be a bit bigger.. now it has 266KB of HTML (but a content length 10x that). Chrome renders "some" of the HTML and then stops. It doesn't draw it all - it appears racy as to what has been laid out when the error comes in. The final bit of text of the document says "premature eof" but chrome in my tests never gets that far even though it is always transferred to the client. I'm going to assume when it gets the error it stops drawings/parsings in progress.

one happy thing to conclude from this is that the web is in a pretty good place wrt framing conformance.

chrome also doesn't replace the partial draw with an error page.. that's basically the same behavior honza's patch gives us except we seem to deterministically draw whatever we get.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #205)
> (In reply to Boris Zbarsky [:bz] from comment #204)
> > Hmm.  So the current things that can fail a channel with
> > NS_ERROR_NET_INTERRUPT in addition to the spdy code you mention are:
> > 
> > 1)  Anything that sends PR_END_OF_FILE_ERROR through ErrorAccordingToNSPR()
> > over in the socket transport.
> 
> right - I think that is probably the interesting one. afaict this change was
> meant to cover the empty transfer (but successful handshake) issue.. and we
> shouldn't be retriggering 106865 because daniel's patch only applies when
> there is an explicit mismatch in the framing information - and without some
> transfer we can't get any explicit framing information (i.e. response
> headers) to have a mismatch with.
> 
> > I'd like to understand Chrome's behavior here a bit better.  Does it start
> > doing progressive rendering and then throw it away when there is a sudden
> > connection close?
> 
> I updated http://www.ducksong.com/cgi-bin/nph-bad-cl to be a bit bigger..
> now it has 266KB of HTML (but a content length 10x that). Chrome renders
> "some" of the HTML and then stops. It doesn't draw it all - it appears racy
> as to what has been laid out when the error comes in. The final bit of text
> of the document says "premature eof" but chrome in my tests never gets that
> far even though it is always transferred to the client. I'm going to assume
> when it gets the error it stops drawings/parsings in progress.

For me chrome (with my own testing page that has just few bytes) shows no content and the error after some 60 seconds of some kind of a timeout...

Can you try similar 266kb test in Chrome with a javascript?

> 
> one happy thing to conclude from this is that the web is in a pretty good
> place wrt framing conformance.
> 
> chrome also doesn't replace the partial draw with an error page.. that's
> basically the same behavior honza's patch gives us except we seem to
> deterministically draw whatever we get.
(In reply to Boris Zbarsky [:bz] from comment #204)
> I'm pretty happy to try not running truncated JS (just like we don't run JS
> with an unclosed <script> tag, for similar reasons).
> 
> I'm also pretty happy to try not using truncated CSS, esp. if some other
> browsers already do that.

Just to confirm: in both cases for CSS and JS - to not use means: do start parsing/compilation as the data chunks are coming but don't ever apply/execute until the content is finally complete (with NS_OK in OnStop) ?
> and we shouldn't be retriggering 106865

That bug just means we need to keep some of the current handling of NET_INTERRUPT in the docshell.  Which means to the extent that we want different handling for the case at hand we may want a different error code here.  Or something.  That would get us the desired behavior for free, I believe, without requiring the extra checking in Honza's patch.

> to not use means: do start parsing/compilation as the data chunks are coming but don't
> ever apply/execute until the content is finally complete

In theory, yes.  In practice, both use a streamloader for now, so don't do anything with the data until OnStopRequest anyway (though we'd like to start changing that).
OK - Daniel, let's introduce a new error code.. NS_ERROR_NET_PARTIAL_TRANSFER (unless there is a better suggestion) and rework the patch to use that. we'll see if that gets the desired behavior for free or if we still needs something like honza's patch.. if we do need it at least it won't conflict with existing uses.

please also change the spdy/http2 code that generates INTERRUPT to use the new error code (as it is signaling the same condition from a different protocol).
As requested, I'm attaching a new version of this patch that introduces and returns a new error code called NS_ERROR_NET_PARTIAL_TRANSFER.

It is used to signal exactly what it says: that the peer said there was going to be more data arriving but for some reason it didn't. It was cut off prematurely.

(I didn't obsolete the v3 version just yet since we're still discussing approaches.)
(In reply to Patrick McManus [:mcmanus] from comment #203)
> :seth - what do you think about images or whom should we ask? If you would
> like to display the truncated images do you think you should be handling the
> INTERRUPTED error?

Imagelib decodes images in a streaming fashion and already has to handle cases where data is truncated or otherwise corrupted, so I don't think that side of things needs to be updated to support this error. We can just run the existing code in imgRequest::OnStopRequest as usual, and things should just work.

However, I do think that if we hit NS_ERROR_NET_PARTIAL_TRANSFER, we need to consider carefully how this should interact with the image cache. We definitely don't want to create a situation where an image loads partially due to a transient network error, the user fixes the network problem and refreshes the page, and we display the same truncated version from the cache!

Probably the right thing is to destroy the cache entry if we encounter this error.
Flags: needinfo?(seth)
I wrote up a small local test with 4 pieces if content, all of them delivered broken with NS_ERROR_NET_PARTIAL_TRANSFER being signaled - according to the current patch. I did this by making sure the server sent a too large Content-Length and closed the connection after having sent the actual (smaller) content.

A) HTML that used...
B) a small png image
C) CSS
D) an external javascript

Without this patch all 4 entities were used to render the page as if everything worked fine.

With this patch, only part (A) renders like before.

B - The image gets discarded in imgRequest::OnStopRequest() due to the error, so it isn't cached but also isn't displayed. Instead one of those "image missing" symbols are used.

C - The CSS is ignored

D - The javascript seems to have been discarded, the button in the HTML I had run code from it doesn't work anymore

Detecting what's going on: Using the "Web Developer->Networking" isn't very helpful to help a user to diagnose what's going on. It shows the requests, it shows the responses and it can even show the (partial) image rendered but there's no (to me) clear hint anywhere why they don't render (or run in the javascript case).

To me, the not-showing-the-image is the surprising and perhaps not entirely desirable part.
so let's work with :seth to decide if the image should be rendered to partial completion or not.. my experience with chrome was that this was basically undefined (it depends on how much of the image was delivered before the error was found).

let's keep the ball moving.
I altered my test slightly. In this test #2, the image (B), was a bigger JPEG that was delivered _very_ slowly. I could then see it being partially rendered as the image is being downloaded all the way during many seconds until the connection finally gets cut and then *blink* the now almost completely rendered image instead gets replaced by the (much smaller) image-not-found symbol...
(In reply to Daniel Stenberg [:bagder] from comment #214)
> I altered my test slightly. In this test #2, the image (B), was a bigger
> JPEG that was delivered _very_ slowly. I could then see it being partially
> rendered as the image is being downloaded all the way during many seconds
> until the connection finally gets cut and then *blink* the now almost
> completely rendered image instead gets replaced by the (much smaller)
> image-not-found symbol...

So, on a very slow connection (mobile somewhere in woods or satellite somewhere in a sea) I'll see the image loads but slowly.  I'll go to make a coffee with intention to later look at the image as it has download later.  When connection drops in at 99% I'll get my coffee and also a tiny "broken image" symbol instead - I will uninstall Firefox immediately - by throwing my laptop over board!! :))
Slightly off topic: could we indicate in the address bar (Larry) area that something didn't load completely on the page (mainly scripts) and offer user an option to attempt a reload only of those resources?
(In reply to Honza Bambas (:mayhemer) from comment #215)

> So, on a very slow connection (mobile somewhere in woods or satellite
> somewhere in a sea) I'll see the image loads but slowly.  I'll go to make a
> coffee with intention to later look at the image as it has download later. 
> When connection drops in at 99% I'll get my coffee and also a tiny "broken
> image" symbol instead - I will uninstall Firefox immediately - by throwing
> my laptop over board!! :))

I completely agree with that. We want it to behave better. Is this really the only error situation that can happen that _shouldn't_ clear the image? It surprises me. I'll have to read more code...
does this maybe have something to do with http fragmentation? safari has a fragmentation bug where it won't load resources completely.
(In reply to Jim Michaels from comment #218)
> does this maybe have something to do with http fragmentation? safari has a
> fragmentation bug where it won't load resources completely.

It depends on what you mean with "http fragmentation". It isn't a term I recognize and I couldn't find any definite explanation of it - nor did I find any info about any such Safari bug...
(In reply to Honza Bambas (:mayhemer) from comment #215)

> So, on a very slow connection (mobile somewhere in woods or satellite
> somewhere in a sea) I'll see the image loads but slowly.  I'll go to make a
> coffee with intention to later look at the image as it has download later. 
> When connection drops in at 99% I'll get my coffee and also a tiny "broken
> image" symbol instead - I will uninstall Firefox immediately - by throwing
> my laptop over board!! :))

Here Firefox is on the safe side : better no info than partial info. 
This is a good principle. 

Firefox must not give the partial image *without saying that is is partial*. 

But we can handle this case better : giving the partial image AND saying that the image is partial. We can do this by displaying the partial image with the sign "Broken image" just next to it. Ideally, the sign here would be a more precise version of the sign "Broken image", a sign "Partial image" that would mean that the image is partial. For example, we could add a torn photo missing a part.
I was about to suggest that the partial image or file still load and be indicated that it is damaged.  Reload as an option.

I would rather see a partial and know it is broken than to be dumped into an icon saying that only part downloaded.  I would then have an option to re-download or possibly to start again and have Firefox use the cache and hopefully finish what it started.

Put a border around the image?  Text at the bottom or as suggested at the side with a question to redownload the image?

This is still progress.
Anyone with an idea about a suitable person involved or interested enough in the image or UI parts to involve in this discussion? I would like to push this topic forward!

I could also imagine that an extra UI element to mark the image as "broken" isn't really worth it, but I'm certainly not qualified to tell.
(In reply to Daniel Stenberg [:bagder] from comment #222)
> Anyone with an idea about a suitable person involved or interested enough in
> the image or UI parts to involve in this discussion? I would like to push
> this topic forward!
> 
> I could also imagine that an extra UI element to mark the image as "broken"
> isn't really worth it, but I'm certainly not qualified to tell.

Daniel - I would recommend you try and wade into the image lib code and make a patch to simply map PARTIAL to OK that you can propose to :seth or joe drew. If you can get their attention they might be able to give you direction or commentary
You probably want to look at imgRequest::OnStopRequest for starters.
I think it's really unlikely that we'll come up with UI for "showing this image, but really it's only partially loaded" that makes sense to most users. So I think Seth's plan to display what we get, but not cache it, makes sense.  (The only other approach I could see doing would be to try to reload the full image in the background to see if the partial content failure was transient, and replace it with the full image if so.  But that's at best a followup bug.)

bz: so far we've been talking about throwing away partial CSS.  Does that still seem like the right answer to you?
Flags: needinfo?(bzbarsky)
Throwing away partial CSS sounds just fine.
Flags: needinfo?(bzbarsky)
Attached image UI of partial image status (deleted) —
I'd suggest to not "enhance" UI with "question to redownload" or other buttons and texts at all. Just display partial images (as seen in the attached picture). Note, there is already "Reload image" in the context menu. 

If page is not okay (some images, CSS and images defined in CSS, etc. are partial/not loaded) user can see it (and experienced user can reload). Yes, FF could display info about it (see in attachment "Page not loaded correctly") but I think we should rather avoid such disturbing. At maximum, FF could display balloon ONLY AFTER clicking at that small icon.

I vote for throwing away partial CSS.
I'm glad this is finally being worked on, but let's not bikeshed with UI ideas for broken images. The bug is the download manager not handling failed downloads correctly, not the display of incomplete files. What the UI should do when page elements fail to load is another issue.
Here's another update to the patch. The notable difference this time is that the imgRequest::OnStopRequest() function now is modified to not remove partial images from being rendered, but also to not cache them.

It will still throw away partial CSS and partial javascript.
Attachment #8390587 - Attachment is obsolete: true
Attachment #8399259 - Attachment is obsolete: true
Attachment #8411651 - Flags: feedback?(seth)
Here's what the try server says about the current patch: https://tbpl.mozilla.org/?tree=Try&rev=976834fce528
Grrr, sorry commented on wrong bug. Scratch that try server link.
Here's the try server push of the latest patch from here: https://tbpl.mozilla.org/?tree=Try&rev=2b1cd654e041 seems fine to me.

Anyone has anything else for me to test or adjust with this before we can land it?
(In reply to Daniel Stenberg [:bagder] from comment #234)
> Here's the try server push of the latest patch from here:
> https://tbpl.mozilla.org/?tree=Try&rev=2b1cd654e041 seems fine to me.
> 
> Anyone has anything else for me to test or adjust with this before we can
> land it?

you need an image peer to review the image bit.. :seth?
Attachment #8411651 - Flags: feedback?(seth) → review?(seth)
Comment on attachment 8411651 [details] [diff] [review]
v5-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

Review of attachment 8411651 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/xpc.msg
@@ +150,4 @@
>  XPC_MSG_DEF(NS_ERROR_PORT_ACCESS_NOT_ALLOWED        , "Establishing a connection to an unsafe or otherwise banned port was prohibited")
>  XPC_MSG_DEF(NS_ERROR_NET_RESET                      , "The connection was established, but no data was ever received")
>  XPC_MSG_DEF(NS_ERROR_NET_INTERRUPT                  , "The connection was established, but the data transfer was interrupted")
> +XPC_MSG_DEF(NS_ERROR_NET_PARTIAL_TRANSFER           , "A transfer was only partially done when it completed")

Now that we figured out the desired behavior in case of partial transfers for several consumers, do we have a better understanding of whether we really need the new error code?

If so, do we have a clear set of cases defining when each one of these three network error codes should be generated, or do they have different meanings based on where they are used?

For example, in bug 983187 I verified that we generate NS_ERROR_NET_RESET when an RST packet is received, even if some data was already transferred, which doesn't seem to match the short description above. This is likely to affect the result, for example, when partial image rendering is involved, but I can't tell if that is a feature or a bug :-) The error code is probably generated here:

http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsSocketTransport2.cpp#156

I also wonder whether some consumers may better detect "partial transfer" based on whether they actually received any data from the stream, instead of the specific error code with which it was closed.

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +870,5 @@
> +        (mHttpVersion >= NS_HTTP_VERSION_1_1)) {
> +
> +        if (NS_SUCCEEDED(reason) && !mResponseIsComplete) {
> +            reason = NS_ERROR_NET_PARTIAL_TRANSFER;
> +            LOG(("Partial transfer, incomplete HTTP responese received: %s",

typo: response
Well, that was my original point... the download manager knows the expected size of the transfer and can therefore provide that information in the user interface regardless of whether the underlying TCP connection was reset or timed out or otherwise terminated abnormally.

However...

If there is going to be detection of a mismatch between the size indicated in the header and the size in an otherwise complete transfer, it should be a new code. The new error is not an error at the TCP layer, but at the HTTP layer. It's distinct from a reset or interrupt.
The download manager knows that for one of the cases this error can occur, yes - when Content-Length is used for HTTP 1.x. For the other cases involving chunked decoding or SPDY/http2, it doesn't.

Partial transfer means that the error was detected in the application protocol layer, above TCP, and that there were more data promised to get delivered but wasn't. A partial transfer means the transfer was cut off before it could finish.
(In reply to Daniel Stenberg [:bagder] from comment #238)
> The download manager knows that for one of the cases this error can occur,
> yes - when Content-Length is used for HTTP 1.x. For the other cases
> involving chunked decoding or SPDY/http2, it doesn't.

that comment made me look at the code again. tell me if this makes sense:

you need to update the spdy/http-2 paths to only generate partial for this case, rather than everywhere we currently do interrupt.

that pretty much just means testing the stream for having read a data frame to decide which error to return.. for spdy that's stream->RecvdData().. for http2 you'll need to add RecvdData() and SetRecvdData() methods. (I managed to remove them, but I guess they are coming back). You can easily do SetRecvdData() in http2session::readytoprocessdataframe().

likewise session::shutdownenumerator should be changed to generate partial instead of abort in cases of stream->recvddata()
Another iteration. Now with added checks in the SPDY and http2 code that only returns the "partial transfer" error if data has actually been received at all.

The image part remains like in the v5 version of the patch that is still pending review so I leave v5 "open" here for that reason.
Attachment #8417236 - Flags: review?(mcmanus)
Comment on attachment 8417236 [details] [diff] [review]
v6-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

Review of attachment 8417236 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/Http2Session.cpp
@@ -140,5 @@
>    // registered an ID haven't actually been sent yet so they can always be
>    // restarted.
>    if (self->mCleanShutdown &&
>        (stream->StreamID() > self->mGoAwayID || !stream->HasRegisteredID())) {
> -    self->CloseStream(stream, NS_ERROR_NET_RESET); // can be restarted

this path should remain unchanged. It applies to transactions greater than the id of a clean goaway. They haven't actually seen any response data (we could warn if they had - but not necessary) and will be restarted by nsHttpTransaction with this code.

@@ -142,5 @@
>    if (self->mCleanShutdown &&
>        (stream->StreamID() > self->mGoAwayID || !stream->HasRegisteredID())) {
> -    self->CloseStream(stream, NS_ERROR_NET_RESET); // can be restarted
> -  } else {
> -    self->CloseStream(stream, NS_ERROR_ABORT);

this is the path that should test recvdData and choose between PARTIAL_TRANSFER and ERROR_ABORT.. basically these are streams that are either ambiguous by the goawayID or we have never receied a goaway (for example when the tcp session fails)..

::: netwerk/protocol/http/SpdySession3.cpp
@@ +1993,3 @@
>      if (mDownstreamRstReason == RST_REFUSED_STREAM)
>        rv = NS_ERROR_NET_RESET;            //we can retry this 100% safely
>      else if (mDownstreamRstReason == RST_CANCEL ||

I apologize for iterating like this through review comments Daniel!

I think we only want this recvdData check on CANCEL.. PROTOCOL_ERR, INTERNAL_ERR, and UNSUPPOTED_VERSION should continue to always return NET_INERRUPT. (same for spdy 3.1)
Attachment #8417236 - Flags: review?(mcmanus)
Comment on attachment 8411651 [details] [diff] [review]
v5-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

Review of attachment 8411651 [details] [diff] [review]:
-----------------------------------------------------------------

You'll need to update the patch to say 'r=mcmanus,seth', it looks like.

I only reviewed the changes in imgRequest.cpp; please let me know if I was supposed to look at anything else.

::: image/src/imgRequest.cpp
@@ +670,5 @@
>  
> +  bool isPartial = false;
> +  if (mImage && (status == NS_ERROR_NET_PARTIAL_TRANSFER)) {
> +    isPartial = true;
> +    status = NS_OK; // fake happy face

It might be a good ideal to add some kind of debug-mode log message here. NS_WARNING is fine with more, or you could use the PR_LOG stuff.

@@ +695,5 @@
>      // loading compressed source data, which is part of our size calculus.
>      UpdateCacheEntrySize();
>    }
> +  else if (!isPartial) {
> +    // if the error isn't "just" a partial transfer

This looks like it needs a little more refinement. The cases are:

- Things succeeded (mImage && NS_SUCCEEDED(status)) and we didn't get a partial transfer. This looks to be handled correctly to me - just call UpdateCacheEntrySize.

- Things succeeded and we *did* get a partial transfer. In this case we should still call UpdateCacheEntrySize, or else what we're doing is retaining the cache entry but treating it as if it only occupied 0 bytes, which isn't true. For the cache logic to behave correctly we need to know the actual size of the cache entries. Since this case and the preceding case end up needing to run the same code, you should just get rid of the |&& !isPartial| in the first |if| condition.

- Things failed. In this case we want to call Cancel no matter what, because presumably we don't actually have an image to show. If we make the change discussed above this should work correctly, because we only set isPartial to true if mImage is non-null, and when we do that we set status to NS_OK, meaning that if isPartial is true we'll always end up in the |if| case. That does mean, though, that the check for |if (!isPartial)| in the else branch doesn't add anything and should be removed - it should still just be |else|.

All of that was a very long way of saying that the additional checks for |!isPartial| seem to me like they should not be there.

Were you running into an issue that caused you to add them?
Attachment #8411651 - Flags: review?(seth)
Argh, sorry, I got too caught up in that 'if' logic and missed the bigger picture. You actually will need three branches of the |if| now, because in the case where things succeeded and we did get a partial transfer, we want to *throw out* the cache entry.

You can just call RemoveFromCache to do this, except - isn't there always a catch? - OnStopRequest may be running off main thread, but you need to call RemoveFromCache on the main thread. You should be able to write code very similar to the code in |imgRequest::Cancel| that calls ContinueCancel on the main thread if needed. (But don't reuse ContinueCancel since you don't want to do the other thing that ContinueCancel does.)
Thanks for the reviews.

mccmanus: its really no problem, talking about specific source code changes makes it very concrete and direct. I believe I've now fixed the nits you pointed out in my v6 version.

seth: yes thanks a lot, that was exactly the kind of review I wished for. I've now cloned the trick cancel() used and added a call that removes the partial image from the cache, I believe this is in the style you meant and it seems to work fine in my local tests.
Attachment #8411651 - Attachment is obsolete: true
Attachment #8417236 - Attachment is obsolete: true
Attachment #8417880 - Flags: review?(seth)
Attachment #8417880 - Flags: review?(mcmanus)
Comment on attachment 8417880 [details] [diff] [review]
v7-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

Review of attachment 8417880 [details] [diff] [review]:
-----------------------------------------------------------------

r+ on netwerk bits
Attachment #8417880 - Flags: review?(mcmanus) → review+
Blocks: 1008075
Blocks: 1008091
Seth, any change you can review the last bits here?  We only need you to look at the 2 ImgRequest changes.
Flags: needinfo?(seth)
Comment on attachment 8417880 [details] [diff] [review]
v7-0001-Bug-237623-detect-broken-HTTP1.1-transfers.-r-mcm.patch

Review of attachment 8417880 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: image/src/imgRequest.cpp
@@ +743,5 @@
>    }
> +  else if (isPartial) {
> +    // Remove the partial image from the cache.
> +    this->EvictFromCache();
> +  }

This looks good, but a minor style nit: I'd suggest that one of the last two |else if|s be replaced with an |else| so that it's immediately obvious to the reader that we *always* take one of these branches.

If it were me, I'd probably use this sequence:

if (mImage && ...) {
} else if (isPartial) {
} else {
}

I'd do it that way only out of a personal preference to avoid negative conditions where possible.
Attachment #8417880 - Flags: review?(seth) → review+
Flags: needinfo?(seth)
This is a rebased/refreshed version of the patch with the minor suggestion from seth addressed.

This patch is in tested in this try run: 
https://tbpl.mozilla.org/?tree=Try&rev=ae6a776e72ee
Attachment #8417880 - Attachment is obsolete: true
Attachment #8436705 - Flags: review+
Keywords: checkin-needed
Do we have an explanation for the test_partially_cached_content.html perma-fail on that Try run?
Keywords: checkin-needed
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #249)
> Do we have an explanation for the test_partially_cached_content.html
> perma-fail on that Try run?

In the test I have a iframe that is supposed to load partial content. Without the patches in this bug, I can call document.getElementById('contentFrame').addEventListener('load'...) and the load listener is called. The test does its checks and continues to completion.

With the patch, however, the load event for the iframe object does not fire and the test times out. Is this expected behavior?

I can get around it by calling window.addEventListener(...) - not sure if this is the right thing to do.
(In reply to Steve Workman [:sworkman] from comment #250)

> With the patch, however, the load event for the iframe object does not fire
> and the test times out. Is this expected behavior?

Thanks Steve! I'll admit this is a territory I don't master yet so I can't answer this quickly, I need to dig deeper.

The patch basically now makes the nsHttpTransaction::Close() method set mStatus to a non-zero value as compared to 0 (OK) before. A log from a test run shows that this condition happens in this case: "Partial transfer, incomplete HTTP responese received: c-l underrun" is present.
I've now learned that when nsHttpChannel::OnStopRequest() stores NS_ERROR_NET_PARTIAL_TRANSFER in mStatus (which is the 'status' argument passed in to it after this failed HTTP transaction), it stops the test from passing.

I can make it run again with a weird hack like this:

-    mStatus = status;
+
+    if (status == NS_ERROR_NET_PARTIAL_TRANSFER) {
+        LOG(("nsHttpChannel::OnStopRequest: partial, store fake OK\n"));
+        mStatus = NS_OK;
+    }
+    else
+        mStatus = status;

I'm not suggesting this is the correct way forward, I'm just showing what I've found.
I pre-emptively apologise if I have misunderstood what's going on here from only reading the last four comments. I've been following the bug progress on CC list for 7 years but not looking at the code at all.

(In reply to Daniel Stenberg [:bagder] from comment #252)
> I can make it run again with a weird hack like this:

Red flags rising at this point.

> +    if (status == NS_ERROR_NET_PARTIAL_TRANSFER) {
> +        LOG(("nsHttpChannel::OnStopRequest: partial, store fake OK\n"));
> +        mStatus = NS_OK;
> +    }

Red flags waving and alarm bells ringing at this point.

Pretending to the next higher layer that nothing bad happened in the lower layer is what gave rise to this bug in the first place 10 years ago. It looks to me like those 3 lines would preserve this bug for HTML client javascript code just as Mozilla is on the verge of fixing the bug at the file download and image display level.

Now it may be that the test case as currently written doesn't pass with the patch installed, but is that because this fix will be visible to client code in mStatus value, so any web page application JS that has assumed only 100% or 0% transfers in IFRAMEs will be broken by the fixed behaviour? If yes, both legacy app code and this test case would have to be updated, and the test would ensure load is NOT fired? 

In particular, what does firing of the "load" event mean in the HTML DOM spec, and therefore should it fire if any transfer was only partial?

The old DOM Level 2 spec seems quite clear:
 http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-eventgroupings-htmlevents
> " The load event occurs when the DOM implementation finishes loading all content within a document, all frames within a FRAMESET, or an OBJECT element. "

A partial transfer means not all content was loaded, so I would intepret this spec to imply that any NS_ERROR_NET_PARTIAL_TRANSFER in any part of loading the page would prevent .load() from firing, even on an IFRAME. (That's what I guessed above.)

Checking the most recent draft L3 spec from last year it adds an exception:
  http://www.w3.org/TR/DOM-Level-3-Events/#event-type-load
> "A user agent MUST dispatch this event when the DOM implementation finishes loading the resource (such as the document) and any dependent resources (such as images, style sheets, or scripts). Dependent resources that fail to load MUST NOT prevent this event from firing if the resource that loaded them is still accessible via the DOM."

Well in this case the IFRAME object is still accessible from the parent page's DOM, right? So this would mean the .load() would fire on the top Document even if a NS_ERROR_NET_PARTIAL_TRANSFER happened in loading the child IFRAME.

The two specs seem to imply different browser behaviour for the same comms error sequence for the same HTML.

So even after solving the lower level issue of passing on the true value of mStatus, you may even need some code at a higher layer to decide whether Document.load() will be fired based on whether a partial transfer (or zero transfer) occurred in an IFRAME *and* which compatibility/conformance profile is being used to render the page.

I really hope I haven't just confused things further and that the above is helpful.
I suspect that test_partial_content if failing when it tries to set up the partial cache entry. I.e. the cache code stores a partial entry when it gets NS_OK and a content length that != Content-Length.   But it doesn't store one when is get NS_ERROR_NET_PARTIAL_TRANSFER.  So the test fails.

I'm not sure how much partial caching really buys us anyway, and this isn't a correctness issue (we'll just download whole object instead of remainder). But I suspect it's an easy fix in the cache code, so let's try that.  Assuming I'm right about the cause... :)
(In reply to Jason Duell (:jduell) from comment #254)
> I suspect that test_partial_content if failing when it tries to set up the
> partial cache entry. I.e. the cache code stores a partial entry when it gets
> NS_OK and a content length that != Content-Length.   But it doesn't store
> one when is get NS_ERROR_NET_PARTIAL_TRANSFER.  So the test fails.

I'm not so sure about that. The test carries out 2 requests. The sjs checks that the first one is not a byte-range request, but only returns part of the data. This seems to prime the cache as expected. The 2nd request is a  byte-range request for the rest of the data, and the sjs checks this. I don't see any error due to this part. (And I can make the

The problem that I see is to do with the test expecting but not getting a 'load' event being fired from the iframe in which it is loading data. I have a workaround that listens for 'load' on the window object instead, but we should make sure that the lack of these events is expected and desirable before making that change.

I'm going to look into these events later today.
Not a whole lot to add yet. I can see nsHTMLDocument::EndLoad and nsDocument::EndLoad being called for the first response, which has a content-length underrun. They're indirectly called by the nsHtml5StreamParser when it's done parsing. I see them being called with and without the patch, but, of course, no js 'load' event with the patch.

nsDocument::DispatchContentLoadedEvents looks interesting and I'd be looking there next. Daniel, maybe you could try looking at that code first. If you're stuck, maybe someone else looking at the bug can help.
(In reply to Steve Workman [:sworkman] from comment #255)

> The problem that I see is to do with the test expecting but not getting a
> 'load' event being fired from the iframe in which it is loading data. I have
> a workaround that listens for 'load' on the window object instead, but we
> should make sure that the lack of these events is expected and desirable
> before making that change.

The need for having the load event on the window object and not on the iframe seems to me to be perfectly consistent and aligned with what Andrew says (and the specs he links to) in comment 253.
If a load event fires on the window in an iframe, a load event will also file on the iframe.  If that's not happening, there's a bug somewhere
(In reply to Daniel Stenberg [:bagder] from comment #257)
> The need for having the load event on the window object and not on the
> iframe seems to me to be perfectly consistent and aligned with what Andrew
> says (and the specs he links to) in comment 253.

Cool - yes, no load on the embedded iframe, but a load for the main window/document.

(In reply to Boris Zbarsky [:bz] from comment #258)
> If a load event fires on the window in an iframe, a load event will also
> file on the iframe.  If that's not happening, there's a bug somewhere

Ah, let me clarify, sorry: it's not the iframe's window on which the load event is firing; it's the owning window.

So, given all this, changing the test to listen for load on the main document window is ok. I have done this and the test passes.

In response to Jason's comment #254 which queries if partial data should be cached, Pat and Honza have confirmed that resumable partial data is cache-able. Resumable means can do byte-ranges, and this test case can, so we're good.

Try run of Daniel's patch and the amended test patch:
https://tbpl.mozilla.org/?tree=Try&rev=f1ee7285ebd4
Attachment #8439535 - Flags: review?(mcmanus)
Attachment #8439535 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
Updating commit line with r=mcmanus.
Attachment #8439535 - Attachment is obsolete: true
Attachment #8440096 - Flags: review+
How does this affect XMLHttpRequest, streamability of CSS, etc?
(In reply to Anne (:annevk) from comment #262)
> How does this affect XMLHttpRequest, streamability of CSS, etc?

If they get partial content they might be affected, otherwise not.

Code paths that really want to handle a partial transfer in some distinct way other than treating it as one of the other possible network errors need additional logic.

A partial transfer being a transfer that has promised to deliver more data than what was actually delivered before it was closed.

Is there any particular use case you think could be affected?
What concerned me was your blog post and respecting Content-Length. Most of the stuff we have today assumes that if Content-Length was wrong, we don't need to clean up what we got so far.
A partial error is actually not that common so first of all this really shouldn't trigger very often.

In my perception, the biggest pain this has caused is what the subject of this bug is: the download manager thinking a download works fine even tough it was cut off before the whole transfer was done.

I would like to view this from the other end: this change helps us all over to not do the wrong assumption on partially received content. Incomplete data is now clearly signaled. This change should not add any extra need for cleanups than before. Unless I'm missing something.
I don't really care about frequency.

So it's only a signal, down-level APIs can decide whether to use or discard that information? That seems fine and would match the model described by Fetch.
(In reply to Anne (:annevk) from comment #266)

> So it's only a signal, down-level APIs can decide whether to use or discard
> that information?

Yes
(In reply to Daniel Stenberg [:bagder] from comment #263)
> If they get partial content they might be affected, otherwise not.
> 
> Code paths that really want to handle a partial transfer in some distinct
> way other than treating it as one of the other possible network errors need
> additional logic.
> 
> A partial transfer being a transfer that has promised to deliver more data
> than what was actually delivered before it was closed.
> 
> Is there any particular use case you think could be affected?

Hope this can help fixing bug 976608
https://hg.mozilla.org/mozilla-central/rev/bb7ae1cc7789
https://hg.mozilla.org/mozilla-central/rev/2707ae2da0eb
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Depends on: 1027353
Added in the release notes with the wording:
"Fix bad downloads by detecting broken HTTP1.1 transfers (237623)"
(better wording would be appreciated)
"Fix incomplete downloads being marked as complete by detecting broken HTTP1.1 transfers (bug 237623)"? (expanding on what 'bad' means here)

I'm not 100% on what the new behavior here is though: does a broken transfer now just properly cause a download to fail, or does the download manager also attempt to retry it?
Much better. Updated. Thanks :
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #272)

> I'm not 100% on what the new behavior here is though: does a broken transfer
> now just properly cause a download to fail, or does the download manager
> also attempt to retry it?

previously a certain type of broken transfer could be reported as ok in the download manager. now it is properly reported as broken and it can be resumed if the user clicks the resume button. (there was no resume button before because it was incorrectly perceived as ok).
I wonder myself if the release notes are correct.

https://www.mozilla.org/en-US/firefox/33.0a2/auroranotes/
lists this bug as fixed

https://www.mozilla.org/en-US/firefox/33.0beta/releasenotes/
lists this bug as unresolved
(In reply to karl155 from comment #276)
> I wonder myself if the release notes are correct.
> 
> https://www.mozilla.org/en-US/firefox/33.0a2/auroranotes/
> lists this bug as fixed
> 
> https://www.mozilla.org/en-US/firefox/33.0beta/releasenotes/
> lists this bug as unresolved

Still listed as unresolved in 33.0beta
Flags: needinfo?(sledru)
Depends on: 1079136
That is a bug of our release notes interface ... I reported by 1079136

> Still listed as unresolved in 33.0beta
Yes and no. The tag is unresolved but the line under shows "Resolved in v33.0a2"
Flags: needinfo?(sledru)
Depends on: 1083090
Depends on: 1083740
Depends on: 1084395
Depends on: 1085094
This 'fix' has 'broken' a lot of web sites.

Older versions of apache put the uncompressed length in the header (instead of the compressed length).
In particular this causes style sheets to be ignored and pages badly rendered.
Re. comment 281 - I would complain to those sites.  Then again, how hard would it be to add a right-click menu option, "Reload this page ignoring file sizes in headers"?

If you file a new bug, please tell us about it here.
Depends on: 1088850
Depends on: 1088940
How about checking both the compressed and uncompressed size?
reopened based on 1088850

(In reply to Peter da Silva from comment #284)
> How about checking both the compressed and uncompressed size?

that's only a subset of the problem. A large number of pdf issues are traced to invalid chunked encodings.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
So how about that we, in the "relaxed" state (default) add two checks:

1) if gzipped content-ecoding is being present: we consider an exact match for the "other size" to also be fine if the transfer ended. There's a small risk for false positives and we only detect it at the end of an HTTP transaction. If the other size doesn't match either, it is a partial error.

2) when the transaction ends, we consider leftover non-received bytes in the last chunk to be an error. That'll make us be fine with responses just missing a chunk (it seems to be most commonly the final zero sized one) but still detect most/many premature disconnects.

However, given how this area is problematic to get tested properly in real-life, would it make sense to do introduce these within some additional prefs checks or similar so that they can be switched on/off in case of need?

BTW, the check for #2 above can probably be made as simple as this:

--- a/netwerk/protocol/http/nsHttpTransaction.cpp
+++ b/netwerk/protocol/http/nsHttpTransaction.cpp
@@ -893,11 +893,12 @@ nsHttpTransaction::Close(nsresult reason)
         (NS_SUCCEEDED(reason) && !mResponseIsComplete)) {
 
         NS_WARNING("Partial transfer, incomplete HTTP response received");
 
         if ((mHttpVersion >= NS_HTTP_VERSION_1_1) &&
-            gHttpHandler->GetEnforceH1Framing()) {
+            (gHttpHandler->GetEnforceH1Framing() ||
+             (mChunkedDecoder && mChunkedDecoder->GetChunkRemaining()) ) ) {
             reason = NS_ERROR_NET_PARTIAL_TRANSFER;
             LOG(("Partial transfer, incomplete HTTP response received: %s",
                  mChunkedDecoder ? "broken chunk" : "c-l underrun"));
         }
(In reply to Daniel Stenberg [:bagder] from comment #286)
> So how about that we, in the "relaxed" state (default) add two checks:
> 
> 1) if gzipped content-ecoding is being present: we consider an exact match
> for the "other size" to also be fine if the transfer ended. There's a small
> risk for false positives and we only detect it at the end of an HTTP
> transaction. If the other size doesn't match either, it is a partial error.
> 

Actually, we can probably consider it OK if either we read the right number of bytes (to match the CL) or the zlib decoder gives Z_STREAM_END.. we don't need to try and match the mismatched lengths as the deflate has its own framing.

> 2) when the transaction ends, we consider leftover non-received bytes in the
> last chunk to be an error. That'll make us be fine with responses just
> missing a chunk (it seems to be most commonly the final zero sized one) but
> still detect most/many premature disconnects.
> 

I think the problem with the assertion about the missing last chunk is that we only have evidence it applies to the false positives that gave us fits (i.e. the pdfs). So I agree it would keep us away from the interop problems - but we don't really have any evidence to suggest that this will catch the kind of real failure that 237623 is trying to address. Nonetheless its conservative, and that's good.
How about if the download manager downloads a file, it gets the partial error, and triggers another download. If the size for the second download matches the first, it means there's probably a server side issue, and we should keep the file.

Of course, downloading things twice might excessive for multi-megabyte files.
(In reply to Valentin Gosu [:valentin] from comment #288)
> How about if the download manager downloads a file, it gets the partial
> error, and triggers another download. If the size for the second download
> matches the first, it means there's probably a server side issue, and we
> should keep the file.
> 
> Of course, downloading things twice might excessive for multi-megabyte files.

That, and it would only be a fix for the case #1 which I believe mcmanus's Z_STREAM_END check is a better work-around for.
How about at least getting the download manager to notify the user when there's an inconsistent download size (as was suggested earlier in this thread) and letting the user decide what to do? This wouldn't impact web pages and would avoid invisible data loss from incomplete downloads.
In addition to comment 290, it would be nice in general to be notified in some way that the server has this kind of problems in general so that they would eventually get fixed. This information should be available at least in some Web Developer component and/or in Page Info window.

It is quite possible that person doing the development or server management does not even notice that their site is broken otherwise.
Given that we shipped Firefox 33 being unable to download from them, that should give them some idea they have a problem, right :-) 

Feel free to file a bug against our WebDev tools, might be suitable for their network component. But such servers are thankfully a minority and I'm not sure we'll want or should detect and warn for every possible broken server config in the browser.
No longer blocks: 1008091
Here's my plan to address this issue. I want to make the checks stricter but still allow some common protocol violations.

In particular I want to allow two flaws that have proven to be somewhat common in the wild and were the reasons for the previous fix being backed out again:

A - HTTP chunked responses that lack the last 0-sized chunk.

B - HTTP gzipped responses where the Content-Length is not the same as the actual contents.

So, in order to allow A + B and yet be able to detect prematurely cut off transfers I want to:

1 - Detect incomplete chunks then the transfer has ended. So, if a chunked-encoded transfer ends on exactly a chunk boundary we consider that fine. Good: This will allow case (A) to be considered fine. Bad: It will make us not detect a certain amount of cut-offs.

2 - When receiving a gzipped response, we consider a gzip stream that doesn't end fine according to the gzip decompressing state machine to be a partial transfer. IOW: if a gzipped transfer ends fine according to the decompressor, we do not check for size unalignments. This allows case (B) as long as the content could be decoded.

3 - When receiving HTTP that isn't content-encoded/compressed (like in case 2) and not chunked (like in case 1), perform the size comparison between Content-Length: and the actual size received and consider a mismatch to mean a NS_ERROR_NET_PARTIAL_TRANSFER error.

Not really decided on yet: how to do with the pref previously added to enable the strict check: "network.http.enforce-framing.http1". Right now I keep the pref set to false by default to do it like described above while setting it to strict will make it act like it worked in the previous version of my patch which is: strict checking for content-length == response-size and if chunked-encoded: check that the stream ended with a fine 0-chunk.

Stay tuned for the first take on the first version of patches for 1 - 3.
When doing chunked decoding, detect and return error for a cut off last chunk.
Attachment #8559098 - Flags: review?(mcmanus)
If the gzip decompression hasn't ended fine when the HTTP request is stopped, consider it partial.
Attachment #8559100 - Flags: review?(mcmanus)
reject "unaligned" (Content-Length is not the same size as has been received) HTTP transfers that are not content-decoded and not chunked-encoded
Attachment #8559101 - Flags: review?(mcmanus)
Comment on attachment 8559098 [details] [diff] [review]
0001-bug-237623-check-for-incomplete-chunks-on-transactio.patch

Review of attachment 8559098 [details] [diff] [review]:
-----------------------------------------------------------------

the whole patchset needs to be controllable via pref
Attachment #8559098 - Flags: review?(mcmanus) → review+
Comment on attachment 8559100 [details] [diff] [review]
0002-bug-237623-check-for-incomplete-gzip-on-transaction-.patch

Review of attachment 8559100 [details] [diff] [review]:
-----------------------------------------------------------------

the whole patchset needs to be controllable via pref

::: netwerk/streamconv/converters/nsHTTPCompressConv.cpp
@@ +87,5 @@
>  NS_IMETHODIMP
>  nsHTTPCompressConv::OnStopRequest(nsIRequest* request, nsISupports *aContext, 
>                                    nsresult aStatus)
>  {
> +    if (!mStreamEnded) {

if (!mStreamEnded && NS_SUCCEEDED(aStatus)) {
Attachment #8559100 - Flags: review?(mcmanus) → review+
Comment on attachment 8559101 [details] [diff] [review]
0003-bug-237623-reject-unaligned-HTTP-transfers-that-are-.patch

Review of attachment 8559101 [details] [diff] [review]:
-----------------------------------------------------------------

I think its ok to morph EnforceH1Framing to mean enforce this new algorithm (which is a slight weakening) and then flip the pref to true. That's up to you if you want that or a new pref, but the new path definitely needs a pref based disabling mechanism.

::: netwerk/protocol/http/nsHttpTransaction.h
@@ +146,5 @@
>          mPushedStream = nullptr;
>          return r;
>      }
>      void SetPushedStream(Http2PushedStream *push) { mPushedStream = push; }
> +    void SetContentDecoding(bool decode) { mContentDecoding = decode; }

I think you can figure out mContentDecoding inside nsHttpTransaction without creating this method and having nsHttpChannel do it for you.. maybe the same place we make the pushback() decision

@@ +282,5 @@
>      bool                            mResponseTimeoutEnabled;
>      bool                            mDontRouteViaWildCard;
>      bool                            mForceRestart;
>      bool                            mReuseOnRestart;
> +    bool	                        mContentDecoding;

indentation
Attachment #8559101 - Flags: review?(mcmanus)
Thanks for the review. Did you mean something like this?

(and yes, I'll make another iteration and make the set use a pref)
Attachment #8559101 - Attachment is obsolete: true
Attachment #8559810 - Flags: review?(mcmanus)
Added a new pref for this. This is called "network.http.enforce-framing.soft" and is set true by default.

I made it this way so that the previous "network.http.enforce-framing.http1" pref still overrides this new pref if enabled, and this variable sets a middle strictness level.
Attachment #8559875 - Flags: review?(mcmanus)
With this additional patch that updates two tests as well to use the new pref, I get the following try-run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce6c67a8cf1e

Some oranges I can't see happen because of my patch series.
Attachment #8561281 - Flags: review?(mcmanus)
daniel - can you mark old patches on this bug as obsolete. I'm getting confused :)

(its under details)
Comment on attachment 8559810 [details] [diff] [review]
v2-0003-bug-237623-reject-unaligned-HTTP-transfers-that-a.patch

Review of attachment 8559810 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +913,5 @@
>  
>          if ((mHttpVersion >= NS_HTTP_VERSION_1_1) &&
>              (gHttpHandler->GetEnforceH1Framing() ||
> +             (mChunkedDecoder && mChunkedDecoder->GetChunkRemaining()) ||
> +             (!mChunkedDecoder && !mContentDecoding) ) ) {

!mChunkedDecoder && !mContentDecoding && mContentDecodingCheck  (maybe?)

@@ +1742,5 @@
>          }
> +
> +        if (!mContentDecodingCheck && mResponseHead) {
> +            mContentDecoding =
> +                (mResponseHead->PeekHeader(nsHttp::Content_Encoding) != nullptr);

write as !!foo not foo != nullptr
Attachment #8559810 - Flags: review?(mcmanus) → review+
Comment on attachment 8559875 [details] [diff] [review]
v3-0004-bug-237623-use-network.http.enforce-framing.soft-.patch

Review of attachment 8559875 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/init/all.js
@@ +1348,5 @@
>  
>  pref("network.http.tcp_keepalive.long_lived_connections", true);
>  pref("network.http.tcp_keepalive.long_lived_idle_time", 600);
>  
>  pref("network.http.enforce-framing.http1", false);

comment that this one should really be named hard

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +1537,4 @@
>          rv = prefs->GetBoolPref(HTTP_PREF("enforce-framing.http1"), &cVar);
> +        if (NS_SUCCEEDED(rv) && cVar) {
> +            mEnforceH1Framing = FRAMECHECK_STRICT;
> +        }

one line for } else {

@@ +1542,5 @@
> +            rv = prefs->GetBoolPref(HTTP_PREF("enforce-framing.soft"), &cVar);
> +            if (NS_SUCCEEDED(rv) && cVar) {
> +                mEnforceH1Framing = FRAMECHECK_BARELY;
> +            }
> +            else {

one line
Attachment #8559875 - Flags: review?(mcmanus) → review+
Attachment #8561281 - Flags: review?(mcmanus) → review+
Attachment #8436705 - Attachment is obsolete: true
Attached patch gzip-check-use-pref-too.patch (obsolete) (deleted) — Splinter Review
Thanks for all the reviews Patrick!

It struck me that I didn't make the check for a clean gzip-stop use the new pref in my patch, so I wanted to amend the 0003-patch to do that. This is my attempt. I'm not entirely happy with reading out the http framing prefs from within the nsHTTPCompressConv code, but I also couldn't come up with a much better solution. What do you think? Do you have a better suggestion?

(This patch file is done on top of the 0003 one and I made it this delta only to make it easier to read and discuss.)
Attachment #8562023 - Flags: feedback?(mcmanus)
Comment on attachment 8562023 [details] [diff] [review]
gzip-check-use-pref-too.patch

Review of attachment 8562023 [details] [diff] [review]:
-----------------------------------------------------------------

can you actually just do a little random web browsing and ensure that this is main thread normally?

::: netwerk/streamconv/converters/nsHTTPCompressConv.cpp
@@ +70,5 @@
>  
>      else if (!PL_strncasecmp(aFromType, HTTP_DEFLATE_TYPE, sizeof(HTTP_DEFLATE_TYPE)-1))
>          mMode = HTTP_COMPRESS_DEFLATE;
>  
> +    mFailUncleanStops =

what do you think of?

mFailUncleanStops = false; // actually I would put this in the ctor
if (NS_ISMainThread()) {
 mFailUncleanStops = (Preferences::GetBool("network.http.enforce-framing.soft", false) ||
    Preferences::GetBool("network.http.enforce-framing.theonethatshouldbenamedhard", false));
}
Attachment #8562023 - Flags: feedback?(mcmanus) → feedback+
(In reply to Patrick McManus [:mcmanus] from comment #309)

> what do you think of?

Looks good, and yeah putting it in the constructor makes sense.
My patch series squashed into a single one. 73 insertions(+), 21 deletions

I obsoleted all other patches attached to this bug.

Try-run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=69bc13d9d5e3
Attachment #284191 - Attachment is obsolete: true
Attachment #8394104 - Attachment is obsolete: true
Attachment #8440096 - Attachment is obsolete: true
Attachment #8559098 - Attachment is obsolete: true
Attachment #8559100 - Attachment is obsolete: true
Attachment #8559810 - Attachment is obsolete: true
Attachment #8559875 - Attachment is obsolete: true
Attachment #8561281 - Attachment is obsolete: true
Attachment #8562023 - Attachment is obsolete: true
Attachment #8562776 - Flags: review+
Severity: normal → critical
Status: REOPENED → ASSIGNED
Keywords: dataloss
How about taking a step back and fixing the actual download manager problem, where the download manager actually has enough information to determine that a download has failed and throws tat information away and presents a successful download to the user? Worry about whether it's even practical to deal with all the weird edge cases that may or may not be issues for live content later.
(In reply to Peter da Silva from comment #318)
>
> where the download manager actually has enough information to determine that

you misunderstand the patches. Currently the download manager does not have this information - the patches are designed to provide it.
Keywords: checkin-needed
In the most common visible failure mode the download manager has enough information. It has the correct size of the download, and displays the correct size of the download during the download. The server or a proxy then terminates the download early, without crashing (eg, the server process terminates and the socket is closed normally during process cleanup, or the server crashes and the proxy closes the connection normally). The download manager then discards the original size and updates the display to show the download as having completed successfully.

This problem could be resolved _right now_ (or could have been resolved at any time in the past decade and change) without waiting for lower level fixes or breaking working web pages.
>In the most common visible failure mode the download manager has enough information.

From what I understand it's not only the failure modes, it's also the modes that work correctly now but *would break* with your proposal because the server gives the wrong length.
There are many possible user interface decisions that would avoid false positives being a problem. For example, a warning icon could be presented that would let the end user decide whether the download was successful or not, and accept it or request a retry.
In addition the most likely false positives seen in the previous fix attempt rarely if ever apply to downloads, because they are rarely if ever being dynamically compressed and dynamically decompressed by the server, since most download formats are already statically compressed.
Target Milestone: mozilla33 → ---
https://hg.mozilla.org/mozilla-central/rev/7abb1cb9739e
Status: ASSIGNED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1133156
backed it out for unintended regression on css transfer

https://bugzilla.mozilla.org/show_bug.cgi?id=1133156#c3
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I don't get it.  Why can't we leave the fix in place and just have the hallmark users set the preference?
Because most users wouldn't know where to look, and would either complain or (more likely) switch to another browser. In addition, Hallmark may simply be a popular example of a more widespread problem.
Guys... you know... it only landed on Nightly,
which is pre-alpha version for testers and should be used for testing purposes only.
So IMO backing this out was an exaggeration,
especially when you can disable it and it's not causing any crashes.
Fixing the underlying problem is hard and may not be generally solvable. That was clear the first time it had to be backed out.

Modifying the download manager to let the end user know when downloads _may_ be incomplete is straightforward and easily understood.
[1] is another example of a website that breaks with network.http.enforce-framing.soft set to true, so it's not just Hallmark. That being said, I think the patch could have stayed in with the default flipped to false.

[1] http://forum.bioware.com/topic/529838-a-collection-of-tweaks-and-fixes-for-the-pc-version/
Depends on: 1133193
Depends on: 1133239
First I'll sob for a bit then I'll analyze these sites and see what made them end up as partial transfers, and come up with a new route forward.
(In reply to Virtual_ManPL [:Virtual] from comment #329)

> So IMO backing this out was an exaggeration,
> especially when you can disable it and it's not causing any crashes.

I tried disabling it, but it failed at least one test that way.. so the disabling patch was going to have to be more than a pref change.. made more sense just to do a clean backout at that point.
The part of the patch that checks that the nsHTTPCompressConv state machine ends cleanly or otherwise reports an error clearly is too trigger happy. Removing that little piece seems to be what I need to do, to have the sites that reportedly broke to show up nicely again...
(In reply to Patrick McManus [:mcmanus] from comment #333)
> (In reply to Virtual_ManPL [:Virtual] from comment #329)
> 
> > So IMO backing this out was an exaggeration,
> > especially when you can disable it and it's not causing any crashes.
> 
> I tried disabling it, but it failed at least one test that way.. so the
> disabling patch was going to have to be more than a pref change.. made more
> sense just to do a clean backout at that point.

If that was the whole aspect of the story I have to agree,
as before I was only thinking it was backed out because only one site doesn't looks properly.

(In reply to Patrick McManus [:mcmanus] from comment #326)
> backed it out for unintended regression on css transfer
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1133156#c3

Was it really backed out properly, as I'm still seeing a new preference "network.http.enforce-framing.soft" in about:config?
(In reply to Patrick McManus [:mcmanus] from comment #333)

> I tried disabling it, but it failed at least one test that way.. so the
> disabling patch was going to have to be more than a pref change.. made more
> sense just to do a clean backout at that point.

Do you remember what test that still failed even with the pref set to false?
(In reply to Virtual_ManPL [:Virtual] from comment #335)

> 
> Was it really backed out properly, as I'm still seeing a new preference
> "network.http.enforce-framing.soft" in about:config?

I think there just hasn't been a merge from inbound to central all weekend.
(In reply to Daniel Stenberg [:bagder] from comment #336)
> (In reply to Patrick McManus [:mcmanus] from comment #333)
> 
> > I tried disabling it, but it failed at least one test that way.. so the
> > disabling patch was going to have to be more than a pref change.. made more
> > sense just to do a clean backout at that point.
> 
> Do you remember what test that still failed even with the pref set to false?

sorry - no. it was local not try so I don't have a history to go back to.
Depends on: 1133559
This new version of the patch no longer checks the gzip decompress state as it was clearly an unreliable method. I've tested this version of the patch on all those sites reported to break with the former version and they all render fine now.

The downside with this approach is that it doesn't detect prematurely cut off gzipped-encoded transfers anymore, but I cannot figure out a reliable to do that. :-/

Comments on this?
Attachment #8562776 - Attachment is obsolete: true
(In reply to Daniel Stenberg [:bagder] from comment #339)
> Comments on this?

It's been a while since I commented on this bug, but I took a look at the code and while I understand what it does, I believe that in its current form it will be difficult to read for anyone returning in a few months, including myself.

While source code comments can always be tweaked, I would more simply use a different approach. I'd expose a property on the channel giving access to an interface showing the potential issues (names are just examples):

httpChannel.decodingState == {
  incompleteChunk: false,
  missingLastChunk: true,
  badGzipState: false,
  ...
};

This can be done independently and land in another bug. We could dump this information always, as part of the HTTP log. In fact logging on the current patch does not tell us why we failed exactly, and does not occur in case we didn't fail but observed an issue.

I'd use this bug to define which combinations would cause us to:
 - Raise an error for partial transfer, and/or
 - Reset the connection state (for new requests)
Since we're checking the appropriately named properties, what we're doing and why would become crystal clear.

This is all we need, but then we can file new bugs for the future:
 - Expose the information in the Network panel of Developer Tools.
 - Present ignorable warnings for some other cases in the Downloads Panel.
 - Provide more details in JavaScript exception messages for failed transfers.

(To be clear, the first two follow-ups require much more effort than what would appear intuitively, in particular changing what is displayed in the Downloads Panel for the Application Reputation checks is requiring multiple calendar months.)

I think the advantage of this approach is that it gives us better debuggability from the start, and better clarity. It will be much simpler to tweak things in the future when, for example, there will be less servers sending the wrong Content-Length for gzip-encoded transfers.
(In reply to Daniel Stenberg [:bagder] from comment #339)
> Created attachment 8565393 [details] [diff] [review]
> v6-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch
> 
> This new version of the patch no longer checks the gzip decompress state as
> it was clearly an unreliable method. I've tested this version of the patch
> on all those sites reported to break with the former version and they all
> render fine now.
> 
> The downside with this approach is that it doesn't detect prematurely cut
> off gzipped-encoded transfers anymore, but I cannot figure out a reliable to
> do that. :-/
> 
> Comments on this?

Your gzip idea is probably still sound... the first thing you need to determine in more detail is why it's failing the check but producing valid WordPress CSS.

I'm no compression expert, but I can think of two possible reasons for that:

1. How permissive is Firefox's decompressor? Maybe the site's producing a stream that's invalid but only because it's missing some kind of "we're done" end-cap. (eg. maybe it's not generating the 8-byte GZip footer containing the CRC32 checksum and uncompressed content length and/or maybe the last DEFLATE block has an incorrect "last block in stream?" bit)

2. Maybe you're checking it too early and it becomes valid at some later point before the WordPress CSS or whatever is processed.

I'd need to think more on the former case (does the API let you determine whether the truncation was at a DEFLATE block boundary?) and I have no experience with Firefox internals but, for the latter case, would something like this be possible?

1. The failed approach is modified so that, instead of reporting failure, the gzip check is deciding between "success" and some sort of "decision pending" behaviour.
2. "Decision pending" is resolved to pass/fail at whatever later point the browser currently uses to determine whether the WordPress CSS *actually* works.
(In reply to Daniel Stenberg [:bagder] from comment #339)
> Created attachment 8565393 [details] [diff] [review]
> v6-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch
> 
> This new version of the patch no longer checks the gzip decompress state as
> it was clearly an unreliable method. I've tested this version of the patch
> on all those sites reported to break with the former version and they all
> render fine now.
> 
> The downside with this approach is that it doesn't detect prematurely cut
> off gzipped-encoded transfers anymore, but I cannot figure out a reliable to
> do that. :-/
> 
> Comments on this?

Looking at the broken examples in bug 1133156 and bug 1133559, both of those serve the file with "Content-Encoding: deflate", but don't serve a standard body. This Stack Overflow answer helped me understand this: http://stackoverflow.com/a/1579506

1. Content-Encoding: gzip is meant to be DEFLATE-compressed complete with a gzip header and footer.
2. Content-Encoding: deflate is meant to be DEFLATE-compressed complete with zlib header and footer.

3. Content-Encoding: deflate (broken) is sometimes (incorrectly) DEFLATE-compressed with no header or footer.

The 3rd option is what you get from the PHP gzdeflate method (used at http://one.stanke.cz/deflate_bugtest/). The gzcompress method would generate the correct zlib header and footer for "deflate", or the gzencode method would generate the correct gzip header and footer for "gzip".

The 3rd option is what I think is being worked-around here: http://hg.mozilla.org/mozilla-central/annotate/008aa6494f90/netwerk/streamconv/converters/nsHTTPCompressConv.cpp#l214

"insert a dummy header and try again".

Perhaps in that block you could set a flag indicating that this is a broken (raw-without-metadata) deflate stream, and not perform the strictness check in OnStopRequest? The examples fail the strictness test presumably because they don't have a footer (missing checksum).

This can be seen with the gzip tool if I do the following:

1. curl --verbose --header "Accept-Encoding: deflate" --output parent.rawdeflate.html https://0920145h.index-education.net/pronote/parent.html?login=true
2. printf "\x78\x01" | cat - parent.rawdeflate.html > parent.zlibheader.html
3. ./zpipe -d < parent.zlibheader.html

1) downloads the file and says we accept "deflate" encoding
2) Prepends the same dummy zlib header that Firefox does
3) Decompresses the resulting file using http://www.zlib.net/zpipe.c

The contents (HTML) are correctly decompressed, but the output shows a warning at the end: zpipe: invalid or incomplete deflate data

This is what I think the problem is (there is no zlib footer). Therefore, in the case of having prepended a dummy header, do not perform the strictness check.
(In reply to Daniel Cater from comment #342)

> This is what I think the problem is (there is no zlib footer). Therefore, in
> the case of having prepended a dummy header, do not perform the strictness
> check.

Yes, it clearly seems to be the case. But then that begs the question:

Is the lack of a header really going to be a reliable hint that it is fine to not have the footer? I mean, what says there aren't a bunch of sites out there with a header but without the footer?

I'll do some tests of my own and I'll post a new patch for this soon to allow you (all) to comment and try out as well.
Okay, this is v7 and I left v6 around for reference. This version is close to the v5 that was the backed out version. This now checks that the gzip decoding stopped fine by checking two things:

1. It reached the end of the compressed stream with zlib signaling the end of it

but

2. The state machine reaches the end of a deflate block.

If it doesn't stop fine, it returns PARTIAL TRANSFER error.

It works with all the failed cases that v5 broke and it also seems to properly detect broken transfers in my local test set.
(In reply to :Paolo Amadini from comment #340)

> While source code comments can always be tweaked, I would more simply use a
> different approach. I'd expose a property on the channel giving access to an
> interface showing the potential issues (names are just examples):

I can't object to that Paolo. It would be a good idea, but I would *really* prefer to do that separately from actually landing this bug. I still have this goal in life to have this issue closed and be done with before it turns 11 years old... :-)
(In reply to Daniel Stenberg [:bagder] from comment #345)
> (In reply to :Paolo Amadini from comment #340)
> 
> > While source code comments can always be tweaked, I would more simply use a
> > different approach. I'd expose a property on the channel giving access to an
> > interface showing the potential issues (names are just examples):
> 
> I can't object to that Paolo. It would be a good idea, but I would *really*
> prefer to do that separately from actually landing this bug. I still have
> this goal in life to have this issue closed and be done with before it turns
> 11 years old... :-)

That's a worthy goal. What can I say, let's hope this time the solution works everywhere!
(In reply to Daniel Stenberg [:bagder] from comment #344)

> Okay, this is v7 and I left v6 around for reference. This version is close
> to the v5 that was the backed out version. This now checks that the gzip
> decoding stopped fine by checking two things:

Turns out this version effectively cannot detect prematurely cut off deflate streams since it'll always reach the boundary = true condition. I'll need to do it differently.
(In reply to Daniel Stenberg [:bagder] from comment #343)
> Is the lack of a header really going to be a reliable hint that it is fine
> to not have the footer? I mean, what says there aren't a bunch of sites out
> there with a header but without the footer?

Unless there is any evidence that sites are doing this, then I don't think you need to handle that case.

What there is evidence of, is sites sending raw DEFLATE streams with "Content-Encoding: deflate" when actually they should be sending zlib streams. This is the case when using PHP's gzdeflate, and apparently older versions of IIS.

I think this case can be detected and ignored (at the point where the dummy header is inserted), whilst still providing stricter checks for valid deflate (zlib) streams which may get cut-off prematurely.
Adjusted again.

zlib doesn't really expose the deflate blocks from the stream the way we use it, so I could not detect when we end exactly on a block boundary. That made me opt to simply not try to detect them for deflate streams.

gzip streams are expected to end cleanly so the code detects premature cut-offs for those.

The patch seems to work for my local test cases and the previously reported problems when the patch landed.

Additionally, this patch includes two test cases that verifies behavior on a cut off deflate download and a cut off gzip download.

The try-run (is in progress as I type this): https://treeherder.mozilla.org/#/jobs?repo=try&revision=424cbe864d98
Attachment #8565393 - Attachment is obsolete: true
Attachment #8565932 - Attachment is obsolete: true
Comment on attachment 8567087 [details] [diff] [review]
v8-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch

Patrick, any comments on this slightly modified approach? It seems to be safer and yet still catches most cut off transfers. You did review+ it before but I wanted to get your opinion on this version.
Attachment #8567087 - Flags: review?(mcmanus)
let's get a try run with enforce-soft set to false to figure out if I screwed up the local test last time..
Comment on attachment 8567087 [details] [diff] [review]
v8-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch

Review of attachment 8567087 [details] [diff] [review]:
-----------------------------------------------------------------

I appreciate the tests.

Can we get a good comment somewhere.. something like

"framing integrity is enforced for content-encoding: gzip, but not for content-encoding: deflate - note that gzip vs deflate is determined by content sniffing and not necessarily via header." (is that true?)
Attachment #8567087 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #352)

> Can we get a good comment somewhere.. something like
> 
> "framing integrity is enforced for content-encoding: gzip, but not for
> content-encoding: deflate - note that gzip vs deflate is determined by
> content sniffing and not necessarily via header." (is that true?)

Right, I'll make sure to add a comment about this in that spirit.

But no, it is not true. The decompression algorithm is selected strictly based on the response header contents. It can be seen here: https://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#696
This version has an added comment about the gzip/deflate situation.

I also pushed a separate try-run with the soft pref switch to false by default (ie disabled) and it runs as fine as the previous try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c69c3c651a9

I'm ready to bite the bullet again.
Attachment #8567087 - Attachment is obsolete: true
Attachment #8568510 - Flags: review+
Keywords: checkin-needed
sorry this didn't apply cleanly:

renamed 237623 -> v9-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch
applying v9-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch
patching file netwerk/protocol/http/nsHttpTransaction.cpp
Hunk #1 FAILED at 123
1 out of 3 hunks FAILED -- saving rejects to file netwerk/protocol/http/nsHttpTransaction.cpp.rej
patching file netwerk/protocol/http/nsHttpTransaction.h
Hunk #1 FAILED at 282
1 out of 1 hunks FAILED -- saving rejects to file netwerk/protocol/http/nsHttpTransaction.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh v9-0001-bug-237623-introduce-soft-checks-for-HTTP-framing.patch

could you take a look, thanks!
Flags: needinfo?(daniel)
Keywords: checkin-needed
Roger that, here's a rebased version!
Attachment #8568510 - Attachment is obsolete: true
Flags: needinfo?(daniel)
Attachment #8569047 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/51a3652ce2ec
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
http://mirror.nexcess.net/ubuntu-releases/14.04.2/ubuntu-14.04.2-desktop-amd64.iso would make a good test. I downloaded it with Firefox and what do you know, it was incomplete. Downloading it again with wget and it's reporting premature closes and restarts on a regular basis.
(In reply to Peter da Silva from comment #359)
> http://mirror.nexcess.net/ubuntu-releases/14.04.2/ubuntu-14.04.2-desktop-
> amd64.iso would make a good test. I downloaded it with Firefox and what do
> you know, it was incomplete.

I presume you were using a Firefox version that doesn't have this fix, meaning < 39 ? 39 and later should notice the breakage and you can click retry.
I'm running the latest released version. 36.0.4

If I have to click "retry" every time, I think the better option will still be "copy the address and use wget".
(In reply to Peter da Silva from comment #361)
> I'm running the latest released version. 36.0.4
> 
> If I have to click "retry" every time, I think the better option will still
> be "copy the address and use wget".

Good point. Why not retry automatically when a download is interrupted?
Release Note Request (optional, but appreciated)
[Why is this notable]:  This had a release note in 33, https://www.mozilla.org/en-US/firefox/33.0/releasenotes/  but seems to only now be fixed, and Patrick suggested it needs a relnote. 
[Suggested wording]: Fix incomplete downloads being marked as complete by detecting broken HTTP1.1 transfers (237623)
[Links (documentation, blog post, etc)]:   

Should the wording be the same?
Flags: needinfo?(mcmanus)
re 364 lgtm
Flags: needinfo?(mcmanus)
Depends on: 1154426
Target Milestone: mozilla38 → mozilla39
Depends on: 1186830
Since FireFox 39 I have a download issue which might be related to this: If I download a file which is reported by the website to be 400MB, but the file actually is 380MB, then the download finishes at 380, but since FireFox assules it should be 400MB it reports the download as failed, but in fact it is completly download. But I have to manually rename the downloaded file from *.zip.part to *.zip
Depends on: 1196882
Depends on: 1189862
Depends on: 1326435
Attachment #8896773 - Attachment is obsolete: true
Attachment #8896775 - Attachment is obsolete: true
Attachment #8896776 - Attachment is obsolete: true
Attachment #8896777 - Attachment is obsolete: true
Attachment #8896778 - Attachment is obsolete: true
Attachment #8896779 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: