Open Bug 138117 Opened 23 years ago Updated 2 years ago

Completed downloads are not removed from Cache folder

Categories

(Toolkit :: Downloads API, defect, P3)

defect

Tracking

()

People

(Reporter: pgauriar, Unassigned)

References

()

Details

(Keywords: relnote)

Attachments

(1 file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:0.9.9+) Gecko/20020417 BuildID: 2002041703 After completing a download, the cached file is not removed from the cache folder. The cache folder will continue to grow no matter what the size limit is set to in the prefs. Reproducible: Always Steps to Reproduce: 1. Clean out the cache folder (so that you can observer the difference in step 3) 2. Go to the provided link and choose Save As... or Open With... 3. When the file is finished, check the cache folder. Actual Results: The cached file (of approximately 16 megs) is still in the cache folder, even though the download was completed. Expected Results: The cached file should be removed upon the completion of the download This is a major bug since you could conceivably lose a lot of space very quickly (like me downloading nightlies everyday for a week, plus Chimera builds). I think that this would best be fixed by removing the cached file if the download is interupted (via loss of connection, crash of Mozilla, etc) or completed. If the download is simply paused, you're obviously going to want to keep the file. This was seen on Mac OS X 10.1.3 and 10.1.4 on a Powerbook G3 500mHz with 12gig hard drive and 512mb RAM.
I have been experiencing the same problem on MacOS 9, so it's just not specific to Fizilla.
The archives and encoded files are removed by StuffitExpander after expanding, I suppose. Check the preferences of StuufitExpander.
This isn't an issue with Stuffit. This is definitely a Mozilla issue. The completed download should be removed from the cache folder. Stuffit isn't responsible for that, Mozilla is. This really needs to be fixed by Moz 1.0. This is a serious problem in efficiency. Mozilla goes over my cache limit daily because I download a lot of files and none of them are removed from the cache folder.
Just to clarify the problem a bit further: When you download a file, two copies are created: one goes to your designated download folder (say, the desktop), and has a normal filename like "archive.sit". A second copy is saved to the Mozilla cache folder under a hashed filename such as "0183A748d01". The problem is that this second copy is completely ignored by the cache manager and never gets deleted, even with the "Clear Disk Cache" button. Obviously, this leads to a rapid and persistent bloating of the cache folder. I wonder if this bug should be filed under "Networking: Cache"?
I totally missed that category. I'm moving it. Maybe work will be done on it.
Component: File Handling → Networking: Cache
.
Assignee: law → gordon
QA Contact: sairuh → tever
*** This bug has been confirmed by popular vote. ***
Status: UNCONFIRMED → NEW
Ever confirmed: true
I am also having this problem on Windows XP. Is there a separate bug for Windows on this that I missed, or should this be set to OS: All? Was a real problem for me downloading Linux ISO files, having my boot disk filled up and crashing Windows...
I'm going to go ahead and change this to OS->All, Platform->All based on comment 8.
OS: MacOS X → All
Hardware: Macintosh → All
*** Bug 142791 has been marked as a duplicate of this bug. ***
Proposed relnote: Downloaded files are never removed from the disc cache. This can be problematic if downloaded files are very large. Workaround: go to the profile directory and manually delete them.
With pre-downloading, a download may momentarily require storage for as many as three copies, depending upon the placement of the user's Cache directory, the temporary directory in use for the downloads, and the user-specified target. Despite its disclaimer of authoritativeness, the glossary accessible to the end-user via the Help menu defines cache as "A collection of web page copies..." . This would seem to obviate putting download data into the user's Cache directory. This patch will suppress storage of download data within the user's Cache directory. Not scrubbing the metadata from the Cache doesn't seem to affect things under Linux, while minimizing the change to the codebase.
Keywords: patch
*** Bug 155298 has been marked as a duplicate of this bug. ***
Darin, any comments on the attachment? If Javascript is involved in the download process, it's lazy garbage collection may allow cache entry descriptors to stay in use much longer than necessary.
Priority: -- → P1
Target Milestone: --- → mozilla1.2beta
As I understand it this bug is now only relevant to the 1.0 branch. The trunk builds since 1.2 don't have this problem (the removing) anymore.
Keywords: mozilla1.0.1
Version: Trunk → 1.0 Branch
Let mark this fixed then.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
If this happens on the 1.0 branch, it should stay open, while the version says "1.0 branch". I've cleared the milestone and sent this to Download Manager.
Status: RESOLVED → REOPENED
Component: Networking: Cache → Download Manager
QA Contact: tever → petersen
Resolution: FIXED → ---
Target Milestone: mozilla1.2beta → ---
reassigning to owner of Download component.
Assignee: gordon → blaker
Status: REOPENED → NEW
Nav triage team: removed nomination. Not relevant to trunk.
Keywords: nsbeta1
(In reply to comment #15) > As I understand it this bug is now only relevant to the 1.0 branch. The trunk > builds since 1.2 don't have this problem (the removing) anymore. To make it clear: With trunk builds downloads are removed if cache limit is reached, but not directly after the download is completed. So the cache doesn't grow infinitely but the files are not deleted if the disk cache limit isn't reached yet. I tested it now with Mozilla 1.0.2 and this release shows the same behavior as the trunk builds, so IMO this bug could be resolved. The deletion is not directly after download as the reporter wanted in comment 0, but this is IMHO a minor issue (the real problem was the size limit exceeding, see dupe). A fix for bug 55307 would solve that. If not, it should be filed a new bug (for trunk) to do that.
Product: Browser → Seamonkey
*** Bug 289890 has been marked as a duplicate of this bug. ***
*** Bug 270519 has been marked as a duplicate of this bug. ***
I can reproduce this on FF1.5b1 and SeaMonkey 1.0a. Re: bug 270247 comment 3, using LiveHTTPHeaders confirms that the cache content is not being validated on subsequent requests to the same URL; no HEAD request is made, so afiacs (please correct me if I am wrong) this problem does in fact cause all current releases of Mozilla products to ignore the HTTP standard, section 9.3 GET: The response to a GET request is cacheable if and only if it meets the requirements for HTTP caching described in section 13. 13.1.4 Explicit User Agent Warnings ... The user agent SHOULD NOT default to either non-transparent behavior, or behavior that results in abnormally ineffective caching, but MAY be explicitly configured to do so by an explicit action of the user.
*** Bug 270247 has been marked as a duplicate of this bug. ***
Aside from the cache limit issue, this bug means that if a file is downloaded, an updated version of it cannot be downloaded without clearing the entire cache. This makes mozilla quite a hassle to use when working with files that can be frequently modified. Having to regularly clear the cache to ensure that only current versions of files are downloaded defeats the usefulness of having a cache in the first place.
Can someone please provide specific steps to reproduce this problem? A link to a file to download, with details about what goes wrong would be appreciated. I personally have never experienced this problem. Firefox appears to follow the HTTP/1.1 specification's cache rules to the letter (with only some minor exceptions).
The problem is when an old file is updated on the remove server, but it is still in the browser disk cache. Steps to reproduce: 1. set browser.cache.check_doc_frequency to 3 2. upload a file to a web server, and set the modification time to 1/1/2005 3. fetch the file 4. upload a version of the file 5. fetch the file again 6. restart the browser 7. fetch the file again Steps 2 & 4 illustrate a use case, rather than the cause; the problem can also be seen with a static file and LiveHTTPHeaders or a network analyser. http://ftp.mozilla.org/pub/mozilla.org/mozilla/nightly/latest/mozilla-win32-stub-installer.exe Expected results: ftp.mozilla.org doesn't appear to provide an expiry time or other age-controlling header, so I would expect the cache to validate its copy. Actual results: the browser cache returns the stale version. For my download of mozilla-win32-stub-installer.exe in FF1.5b, the cache entry has an expiry date nine days after when it was last fetched. There are three workarounds: 1. set browser.cache.check_doc_frequency to 1, or 2. hold shift and right click on the link, and select Save Link As... 3. the webserver can explicitly require regular revalidation, but this needs to have been set up before the first download. The problems on the last three duplicates are identical, and would appear to be resolved if `Downloads' are removed from the cache. If this is not the right home for the problem, should I re-open bug 270247 as an enhancement to deal with this scenario?
The use case is working as designed. HTTP/1.1 says that a document served with a Last-modified header may be cached for a period of time determined heuristically by the browser. A value of 1/10th the period between now and when the file was last modified is recommended. So, when the server hosts a file like this, it is saying to all browsers that the file will not change again for a long time. Are you suggesting that downloading a file should always bypass the browser cache? I recommend marking this bug as invalid or wontfix.
> Are you suggesting that downloading a file should always > bypass the browser cache? No. I am suggesting that the cache should verify its contents in certain circumstances where the original headers did not provide explicit expiration times. In the scenario when the user has specifically requested a file, and the browser has classified it as a `download', the browser should not break semantic transparency; it should request a HEAD to be sure what it is giving the user is actually a download of the file they requested. The file could have been removed, the server could be down; etc. Once the cache contents are validated, the download should proceed using the cache. > I recommend marking this bug as invalid or wontfix. The problem in the original description appears to have been resolved. IMO, a toggle to disable the cache for files elsewhere saved to disk would be a useful improvement; for the same reason that bug 81640 is a P1.
Are we forgetting about corrupted downloads here? It does happen, and saving a corrupted download out of the cache is a very dumb thing.
I can see why it might be nice to force an end-to-end cache validation when downloading an item, but I'm not sure that I would implement that for every link click. Most downloads start from a link click that results in a file that the browser cannot render. In those cases, it would be bad to restart the download because it is hard to know if a download can be restarted without side-effects. So, if we only validate explicit downloads (file->save as), then we are not being consistent. > In the scenario when the user has specifically requested a file, and the > browser has classified it as a `download', the browser should not break > semantic transparency... We're not breaking semantic transparency -- at least not according to RFC 2616.
> We're not breaking semantic transparency The user agent is not fetching the file that is on the server, and its not informing the user that it is not doing what they requested. Whenever a browser doesnt perform exactly like wget (barring bugs of course), its breaking semantic transparency; which is ok, but it should only do this with good cause (i.e. the user has specifically requested this e.g. user pref.), and it should keep the user in the loop. Irrespective of http caching issues, it is reasonable that files that are listed in the Download Manager dont need to be also retained in the disk cache. This bug only relates to items in the cache that are also in the Download Manager. A use case that is more pertinant to this bug: 1. Set a cache size of 1000 KiB, and clear the cache 2. Load this page in one tab: http://outreach.jach.hawaii.edu/pressroom/2003-estar/ 3. Do a little browsing of images.google.com (small images only) in another tab until the disk cache is full 4. View the contents of the disk cache: about:cache?device=disk 5. Go back to the first tab, and right-click on estardiagram-large.png (full size PNG 230kB), and save it to disk 6. Refresh about:cache?device=disk. 7. Do a little browsing of images.google.com (small images only), and keep an eye on the cache contents. Expected results: No change to the existing items in the cache at step 5. Actual results: 25% of the disk cache has been replaced with an image that has been saved outside of the cache. In Step 7, the 230kb image stays in my cache for quite a while, pushing out lots of useful little files.
(In reply to comment #29) > it should request a HEAD to be sure what it is giving the user is > actually a download of the file they requested. (surely you mean GET with If-Modified-Since, or with If-None-Match) > it is reasonable that files that are listed > in the Download Manager dont need to be also retained in the disk cache. What if I use File|Save Page As? Surely that shouldn't remove files from the cache. So that statement in its generally is not useful.
> (surely you mean GET with If-Modified-Since, or with If-None-Match) Yes; HEAD was illustrative. > What if I use File|Save Page As? I did not realise that operation populated the Download Manager. This behaviour seems broken to me (btw bug 143949 intends to fix this for images), and feels like it is a artifact from the days when View Source and Save Page As really did download fresh copies. Are there scenarios when a Save Page As wont be coming out of the cache?
Assignee: bross2 → download-manager
QA Contact: chrispetersen
GOOD WORK
Assignee: download → nobody
Component: Download & File Handling → Download Manager
Product: SeaMonkey → Toolkit
QA Contact: download.manager
Version: 1.0 Branch → unspecified
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: