Closed
Bug 588804
Opened 14 years ago
Closed 14 years ago
"File not found" error for (extant) HTTPS resource
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta7+ |
People
(Reporter: shaver, Assigned: bjarne)
References
(Blocks 1 open bug)
Details
(Keywords: dogfood, regression)
Attachments
(6 files, 1 obsolete file)
(deleted),
patch
|
michal
:
review-
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jst
:
review+
|
Details | Diff | Splinter Review |
"Firefox can't find the file at https://book.aircanada.com/pl/AConline/en/img/caption.gif"
I noticed that when I visited aircanada.com for the second time today, I was getting broken images. When I tried to load just the image, I got a message that I think should only appear for file: URLs. Anyone have ideas about how that could happen?
Reporter | ||
Comment 1•14 years ago
|
||
Restarted, it happened again on a later page. I guess I should get a networking log of it?
Reporter | ||
Comment 2•14 years ago
|
||
if I shift-reload the "view image" URL (after the error message), it works, but then a simple reload gives me the error again. http logging shows we hit the cache in the broken case, so I'll try again with cache logging. adding some cache-y sorts, too.
Reporter | ||
Comment 3•14 years ago
|
||
the cache log is not very informative, since it just says "using the cached entry" and then no sign of an error (though it does say that it's closing the entry).
staralliance.gif (another image that was showing this behaviour) shows up in the about:cache, but the file to which it refers does not exist, perhaps explaining this behaviour!
I'm also seeing it now for scripts as well, on the same site.
I wonder if this is bad out-of-cache-space handling?
Number of entries: 12489
Maximum storage size: 256000 KiB
Storage in use: 255965 KiB
Reporter | ||
Comment 4•14 years ago
|
||
Adding Byron too, thought I did that before.
Reporter | ||
Updated•14 years ago
|
Component: Networking → Networking: Cache
QA Contact: networking → networking.cache
Comment 5•14 years ago
|
||
I can look into this.
Updated•14 years ago
|
Assignee: nobody → byronm
Updated•14 years ago
|
blocking2.0: --- → ?
Comment 6•14 years ago
|
||
Blocking, at least until we understand how widespread this is, or what caused it. Do we know yet whether this is a regression? Sounds to me like it is, if so, we need the regression keyword.
blocking2.0: ? → betaN+
Reporter | ||
Comment 7•14 years ago
|
||
I had never seen it until the cache size change, I'm gonna call it a regression.
Keywords: regression
Comment 8•14 years ago
|
||
Fwiw, I've also seen this when using view-source on https: pages. It'll show a file not found error page instead. Clearing the cache fixes... temporarily. The problem reappears sometime later.
If you go to page info when you have broken-image icons in various places where they shouldn't be, e.g. for https://documents.google.com/images/doclist/cleardot.gif, the type, size and dimensions are all correct, but the "Media Preview" pane doesn't update at all. If I try to Save As the image, I get ".... could not be saved, because the source file could not be read." I see this sometimes when downloading files from bugzilla too, especially when trying to save a patch (even if I have the patch visible in the window, just hitting Control-S).
Updated•14 years ago
|
Blocks: http_cache
Updated•14 years ago
|
OS: Windows 7 → All
Hardware: x86 → All
Assignee | ||
Comment 10•14 years ago
|
||
I agreed with Byron to take over this since he is back to school now.
Hard to reproduce. If anyone knows actions which typically trigger this problem please post them here. A log would also be useful, of-course. I'm currently playing with a local apache with throttled upload-bandwidth to see if I can provoke some errors in the cache, but no luck so far.
Assignee: byronm → bjarne
Reporter | ||
Comment 11•14 years ago
|
||
My desktop computer, which has a profile/cache in this state saved against such a need, is due to arrive in California early next week, so I'll be able to repro then if needed. I am not really up to sending the cache to others, but I would be more than willing to do a windbg session to gather more info!
Assignee | ||
Comment 12•14 years ago
|
||
I'm really not able to reproduce this situation... :( If anyone gets into it, it would also be very interesting to see the headers of a failing cache-entry (about:cache, click "List Cache Entries" of disk cache and then click on the entry itself).
Http*s* seems to be important here (although I don't see why it should make a difference). Can we assume that this issue is seen only/primarily for resources loaded with http*s* ?
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #11)
> My desktop computer, which has a profile/cache in this state saved against such
> a need, is due to arrive in California early next week, so I'll be able to
> repro then if needed. I am not really up to sending the cache to others, but I
> would be more than willing to do a windbg session to gather more info!
It would be very useful if you would be able to use this machine to reproduce while logging. I'd appreciate a ping when you have time (and hardware) to try this.
Comment 14•14 years ago
|
||
I've just experienced this bug too. But on a release build without logging :-/ Anyway the cache info is:
Cache entry information
key: https://bug594882.bugzilla.mozilla.org/attachment.cgi?id=474574
fetch count: 16
last fetched: 2010-09-16 02:07:11
last modified: 2010-09-16 01:54:05
expires: 1970-01-01 01:00:00
Data size: 867
file on disk: /home/michal/.mozilla/firefox/6n4piytr.default/Cache/63C4C64Bd01
Security: This is a secure document.
Client: HTTP
request-method: GET
request-Accept-Encoding: gzip, deflate
response-head: HTTP/1.1 200 OK Server: Apache X-Backend-Server: pm-app-bugs03 Vary: Accept-Encoding Content-Type: text/plain; name="594882-sync-suspend-coutners.patch"; charset= Content-Encoding: gzip Strict-Transport-Security: max-age=2629744; includeSubDomains Date: Wed, 15 Sep 2010 23:54:05 GMT Content-Disposition: inline; filename="594882-sync-suspend-coutners.patch"
charset: ISO-8859-1
security-info: ...
necko:classified: 1
The interesting information is that although the data size is only 867 bytes it is stored on the disk. But in fact the data file on the disk doesn't exist.
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #14)
> The interesting information is that although the data size is only 867 bytes it
> is stored on the disk. But in fact the data file on the disk doesn't exist.
You mean that it got its own file? It wasn't stored in a bucket-file? What about the real resource: Is it 867 bytes or is it really larger? I observe it's gzipped: Is 867 the unzipped or the zipped length? And what about the Content-Length-header: Is it really not present?
Assignee | ||
Comment 16•14 years ago
|
||
An observation: 3.6 does not cache this attachment on disk. If memory-cache is enabled, it stores it there.
Assignee | ||
Comment 17•14 years ago
|
||
Another observation: (FF3.6.9) If I load the attachment with plain http, I receive a 301 redirecting to the https-url. The 301 *is* stored in the disk-cache though, with this piece of info
file on disk: /home/bjarne/.mozilla/firefox/qewu2n5s.default/Cache/ABEB3347d01
and that file doesn't exist, just like in Michals case.
Wonder what happens if the plain http-url on a subsequent request returns 200 and content... the cache-entry does then already exist and should be updated.
Assignee | ||
Comment 18•14 years ago
|
||
Using my trunk-build and loading the resource from above with plain http, the 301-response does not appear in the cache anymore. This seems to be caused by the STS-support, and makes somewhat sense.
However (with my trunk-build) the resource loaded over https is now cached on disk. Triggers two questions: Why has the policy changed from 3.6 now allowing the resource to be cached on disk? And how will this play with a cache from pre-STS times (or in a profile where STS is enabled and disabled - see bug #594990)?
Comment 19•14 years ago
|
||
> Why has the policy changed from 3.6 now allowing the resource to be cached on
> disk?
See bug 531801.
Assignee | ||
Comment 20•14 years ago
|
||
(In reply to comment #19)
> > Why has the policy changed from 3.6 now allowing the resource to be cached on
> > disk?
>
> See bug 531801.
Right - thanks. That clears up some of this.
Assignee | ||
Comment 22•14 years ago
|
||
(In reply to comment #17)
> Another observation: (FF3.6.9) If I load the attachment with plain http, I
> receive a 301 redirecting to the https-url. The 301 *is* stored in the
> disk-cache though, with this piece of info
This is reproducible on trunk with a non-sts host redirecting. Try e.g.
https://api-secure.recaptcha.net/challenge?k=6LckBgMAAAAAAHlkAGWVrdc3gorusTVlLjixmeFh
which redirects to google. The problem is that there is no content, hence no content is written to the cache-entry, which sc*** up some assumptions of the disk-cache. A similar effect can be provoked with xpcshell or in a debugger on any response in fact.
I believe I have a fair idea of the core of this problem by now, and it may very well also cause bug #596443. However, I need some time to phrase it and clean up the xpcshell-test which demonstrates it.
Assignee | ||
Comment 23•14 years ago
|
||
The core of the problem is that nsDiskCacheRecord::DataFile() is used as the primary indicator of whether an entry is stored in a separate file. If no data (as opposed to metadata) has been written to the entry (like in the case of 3xx-responses without content), this method returns 0, indicating that the entry is stored in a separate file, even when this is not the case.
Any cache-able response is subject to the same problem, including 200-responses: If a 200 response for whatever reason is cut off before FF starts saving the content to file, it is (falsely) reported by the disk-cache to be stored in a separate file. Moreover, all subsequent requests to this resource *may* result in empty content returned from the cache.
Why would a response be cut off before writing content to the cache?
I can imagine various reasons ranging from bugs in necko to hick-ups in the network itself. Much of this is out of our control so we must make the cache or necko resilient to this issue in any case (IMHO).
Which entries will have this problem?
Obviously the entry must be cache-able and not subject to validation at the server. An important factor here is that the Content-Length header must not be present in the response because the cache validates entries where this value is different from the size of data stored in the cache. But if we get a response with no Content-Length http-header, sufficient expiration-details to avoid validation with the server, and no content is stored to the cache, we end up with a valid cache-entry with no data, which also claims to reside in a non-existent separate file. Examples of this are 3xx responses without content, like the one on comment #22. But it can also happen for 200-responses.
How to fix?
The consequence for about:cache is that a filename for the entry is reported erroneously, and this is easily fixed by e.g. checking that the file exists before reporting it. However all subsequent retrievals of such entries return no data and claim that data is in a file, and we cannot check this for all resources everywhere in the code. So IMO this must be fixed in necko.
200 responses without data doesn't make sense (IMHO) so we can easily extend nsHttpChannel::CheckCache() to validate such entries. IMO this will handle the most severe cases.
3xx responses may or may not come with content, and the file-cache is not designed to cache the latter. However, the content of such responses is rarely used and probably even more rarely saved to a file by the user, so IMO we can get away here by fixing the about:cache code as indicated above.
A couple of open questions still remains and I'll dig into these asap:
1) What actually happens on hard reload for such entries? I.e. how does the cache deal with replacing the content (assuming the content is actually loaded this time) in an entry which is believed to reside in a separate file, but this file does not exist.
2) What happens if a cached 3xx response is validated and replaced with a 200 response. I.e. even if we fix about:cache to not wrongly report files, but the cached resource is replaced with a response with actual content, but the file doesn't exist. (In fact, this is probably the same basic issue as above, but I'll leave it for now...)
Assignee | ||
Comment 24•14 years ago
|
||
Attaching this for comments, but will fix the about:cache issue and try to figure out the open questions before submitting a proper fix.
Assignee | ||
Comment 25•14 years ago
|
||
> The consequence for about:cache is that a filename for the entry is reported
> erroneously, and this is easily fixed by e.g. checking that the file exists
> before reporting it. However all subsequent retrievals of such entries return
> no data and claim that data is in a file, and we cannot check this for all
> resources everywhere in the code. So IMO this must be fixed in necko.
I am wondering if we can change the description and implementation of nsICacheEntryDescriptor::file to say that it throws and returns null if the file doesn't exist. Any thoughts?
Assignee | ||
Comment 26•14 years ago
|
||
> What actually happens on hard reload for such entries? I.e. how does the
> cache deal with replacing the content (assuming the content is actually loaded
> this time) in an entry which is believed to reside in a separate file, but this
> file does not exist.
Cache seems to handle this well on Linux. There are, however a certain amount of file-operations involved and these have different implementations for Linux, OSX and Windows. I'll make a test and make tryserver check this...
Assignee | ||
Comment 27•14 years ago
|
||
The open questions from comment 23 were inspired by bug #596443 and following the reasoning in bug #596443, comment #6 I'll leave the two open questions about reloading/replacing a broken cache-entry out of this.
This patch makes sure a cached cut-off response is validated at the server. It also checks for existence of any file reported by about:cache.
The test loads a resource twice. First time called the handler cuts off the response and second time it returns properly. If the entry was fetched from cache on the second load, the test will fail.
Attachment #476939 -
Attachment is obsolete: true
Attachment #477699 -
Flags: review?(michal.novotny)
Comment 28•14 years ago
|
||
Comment on attachment 477699 [details] [diff] [review]
Validate cached response with no content, test, avoid reporting non-existent file in about:cache
> + PRBool fileExists = PR_FALSE;
> rv = descriptor->GetFile(getter_AddRefs(cacheFile));
> - if (NS_SUCCEEDED(rv)) {
> + if (NS_SUCCEEDED(rv))
> + rv = cacheFile->Exists(&fileExists);
> +
> + if (NS_SUCCEEDED(rv) && fileExists) {
> nsAutoString filePath;
> cacheFile->GetPath(filePath);
> APPEND_ROW("file on disk", NS_ConvertUTF16toUTF8(filePath));
> }
> else
> APPEND_ROW("file on disk", "none");
We should show "none" only in case when the datasize is 0 or the data is stored in the cache block files. Your change can hide some real problems when the separate file should exist but is missing (due to bug in cache code or simply because user has deleted it manually). In this case we should show "missing" (and maybe it should be red and bold).
> + } else if (size < 1) { // No data, no content-length. See bug #588804
> + LOG(("Validating due to no cached data and no Content-Length header"));
> + return NS_OK;
First of all I don't see how this solves the problem described in comment #14. Size is 867 so this code doesn't help at all. Maybe I'm missing something but I don't think that your comment #23 explains this problem.
Also this change fixes only case when there is no content length and we have no data. This wouldn't solve problem when we have partial data, i.e. corrupted images will not reload.
Attachment #477699 -
Flags: review?(michal.novotny) → review-
Assignee | ||
Comment 29•14 years ago
|
||
(In reply to comment #28)
> We should show "none" only in case when the datasize is 0 or the data is stored
> in the cache block files. Your change can hide some real problems when the
> separate file should exist but is missing (due to bug in cache code or simply
> because user has deleted it manually). In this case we should show "missing"
> (and maybe it should be red and bold).
Maybe an assertion would be sufficient to flag real issues? I'm no expert on GUIs, but I believe a red and bold "missing" might be intimidating to a user.
> First of all I don't see how this solves the problem described in comment #14.
> Size is 867 so this code doesn't help at all. Maybe I'm missing something but I
> don't think that your comment #23 explains this problem.
>
> Also this change fixes only case when there is no content length and we have no
> data. This wouldn't solve problem when we have partial data, i.e. corrupted
> images will not reload.
Agreed - this only fixes the issue when *no* data is written. The question is actually *how* do we know that partial data has been written when there is no Content-Length header. The cache doesn't really have any way of knowing how much data to expect, afaics.
Perhaps this should be split into two separate bugs: One to deal with no writes (which has the unfortunate side-effect described in comment #23), and one to deal with partially written data (which do *not* suffer from the side-effect, but which of-course results in corrupt data) ?
Comment 30•14 years ago
|
||
(In reply to comment #29)
> Agreed - this only fixes the issue when *no* data is written. The question is
> actually *how* do we know that partial data has been written when there is no
> Content-Length header. The cache doesn't really have any way of knowing how
> much data to expect, afaics.
Google chrome seems to discard cache entry if chunked transfer doesn't stop cleanly (i.e. there must be a last empty chunk). We should probably do the same. This would eliminate partial entries due to network failure.
> Perhaps this should be split into two separate bugs: One to deal with no writes
> (which has the unfortunate side-effect described in comment #23), and one to
> deal with partially written data (which do *not* suffer from the side-effect,
> but which of-course results in corrupt data) ?
What side-effect do you mean? The only problem I see is that reading from such entries returns no data. But reading data from partial cache entries is the same problem.
Btw you wrote that "200 responses without data doesn't make sense". E.g. main page on my server (http://michal.etc.cz/) is empty (not empty HTML but just empty file). I understand that this isn't valid argument, anyway such pages exist... :)
Comment 31•14 years ago
|
||
> Perhaps this should be split into two separate bugs: One to deal with no writes
> and one to deal with partially written data ?
The problem in comment #14 doesn't fall into any of these bugs. This isn't a partial entry. The total size should be 867, so whole response was received. But the cached data is simply missing.
Comment 32•14 years ago
|
||
(In reply to comment #30)
> Btw you wrote that "200 responses without data doesn't make sense". E.g. main
> page on my server (http://michal.etc.cz/) is empty (not empty HTML but just
> empty file). I understand that this isn't valid argument, anyway such pages
> exist... :)
A properly delimited 0 length 200 response might be sensible.. consider if it was a CSV list of must fix bugs for FF 4. 0 Length would be ideal :)
Re the more important truncated document bit, I would like to include truncated documents as "negative pipelining feedback" in my pipelining odyssey.. For me I think it is as simple as checking !mResponseIsComplete in sHttpTransaction::Done(), because I only want fully formed CL or chunked entries for pipelining.. you probably need some check allowing EOF only there was no CL or chunking on the transaction..
Assignee | ||
Comment 33•14 years ago
|
||
(In reply to comment #30)
> What side-effect do you mean? The only problem I see is that reading from such
> entries returns no data. But reading data from partial cache entries is the
> same problem.
The side-effect comes when no data is ever written to the entry. As described in comment #23 nsDiskCacheRecord::DataFile() (*note* ...File()) returns 0 in such cases. This is interpreted in e.g. about:cache as if the entry has data in its own file (load e.g. the url in comment #22 in Fx4 (not 3.6 because it will cache it in memory ref comment #19) and lookup the original url in the disk-cache - it should be a 302 with data stored in its own (non-existent) file). When you get partial data, this side-effect is not a problem.
> Btw you wrote that "200 responses without data doesn't make sense". E.g. main
> page on my server (http://michal.etc.cz/) is empty (not empty HTML but just
> empty file). I understand that this isn't valid argument, anyway such pages
> exist... :)
I see your point (and Patricks point below). However, your homepage set Content-Length at least... :) Would we be willing to not re-use such pages from cache if they have no Content-Length ?
(In reply to comment #31)
> The total size should be 867, so whole response was received.
> But the cached data is simply missing.
Comment #14 says that the file is missing. Was there no data visible either?
I agree that this is odd... need to think more about this variation.
Comment 34•14 years ago
|
||
I see this pretty regularly on bugzilla.
Comment 35•14 years ago
|
||
Both RedHat Network and the management interface on our Zeus load balancers are almost unusable because of this. I constantly have to shift+reload to be able to actually see the status indicators.
Assignee | ||
Comment 36•14 years ago
|
||
Ok - a few questions:
1) Are we still talking https? Or does this also happen over plain http?
2) Would any of you have the chance to dump info (headers, size, filename etc) for any such failing entry? (In particular if anything different from comment #14 comes up...) Also please dump stats for your disk-cache, btw.
3) justdave: is there a public URL which would allow me to reproduce?
Comment 37•14 years ago
|
||
I've been seeing this behaviour on two (unfortunately non-public) https sites. Images aren't showing up sporadically until I shift-reload. Not sure what other information I could provide that would be useful, but let me know.
Reporter | ||
Comment 38•14 years ago
|
||
I see this all the time when viewing source of bugzilla attachments. Try clicking some small test cases and then using view-source? I can dump things in here when it happens next, but it basically happens all the time!
Comment 39•14 years ago
|
||
(In reply to comment #36)
> 1) Are we still talking https? Or does this also happen over plain http?
yes, https only, I haven't noticed this on any http sites.
> 2) Would any of you have the chance to dump info (headers, size, filename etc)
> for any such failing entry? (In particular if anything different from comment
> #14 comes up...) Also please dump stats for your disk-cache, btw.
If you can provide instructions (or a pointer to a page with existing instructions) how to retrieve said info to supply it to you, sure.
> 3) justdave: is there a public URL which would allow me to reproduce?
Unfortunately, no. These are all internal admin sites behind logins, except for Beltzner's example.
Comment 40•14 years ago
|
||
Here's some screenshots (not that it'll help much as far as data is concerned)
Here's what the Zeus interface looks like when you first load the page (assuming you've been there before to have the images cached) :
https://people.mozilla.com/~justdave/zeusfirst.png
Here's what it looks like (and what it's supposed to look like) after a shift+reload:
https://people.mozilla.com/~justdave/zeusshiftreloaded.png
Note the enabled/disabled indicators on each of the virtual servers, the tabs across the top, and the traffic level/status indicator on the top left.
An examination of the header traffic via LiveHTTPHeaders shows that Minefield is sending a request for the missing files to the server with an "If-Modified-Since" header, and the server is responding with "304 Not Modified". The Media tab in the Page Info dialog shows the corresponding files with 0px x 0px listed as their sizes.
Assignee | ||
Comment 41•14 years ago
|
||
(In reply to comment #39)
> If you can provide instructions (or a pointer to a page with existing
> instructions) how to retrieve said info to supply it to you, sure.
Sorry: Open a new tab, type "about:cache" in the url-field. Then click on List Cache Entries" for the DiskDevice. Dump summary-info (top of this page) of your disk-cache here, please. Then try to locate the entry for one of the missing entries (you need to know the url for it, or at least parts of it in order to find it). If you find it, click on the entry and copy the content of the page you get here.
(In reply to comment #40)
> An examination of the header traffic via LiveHTTPHeaders shows that Minefield
> is sending a request for the missing files to the server with an
> "If-Modified-Since" header, and the server is responding with "304 Not
> Modified".
That is interesting! Do you think you can get the info mentioned above for a missing entry, both before and after shift-reload?
Comment 42•14 years ago
|
||
Yeah, I've also been seeing cache problems only on https; I see it regularly on bugzilla attachments, and I needed to clear my cache to pay a bill this morning (an https site, requiring login).
Assignee | ||
Comment 43•14 years ago
|
||
Some followup-questions: Do you guys see this on 3.6 or 4? Does it seem to happen more frequently when cache is close to full?
Reporter | ||
Comment 44•14 years ago
|
||
4. Cache isn't even half-full.
Comment 45•14 years ago
|
||
This url might work: https://mail.mozilla.com/zimbra/public/blank.html
Comment 46•14 years ago
|
||
Only in 4 for me. BTW, I can reproduce it at will viewing this bug. Look at the CC list at the top of the bug. On initial load of the bug it's a bulleted list. The title of the bug is normal text on a white background. Upon shift-reloading the bug, the CC list is a scrollbox, and the bug title is on a gray box. So it's affecting CSS, too.
Comment 47•14 years ago
|
||
(In reply to comment #41)
> > An examination of the header traffic via LiveHTTPHeaders shows that Minefield
> > is sending a request for the missing files to the server with an
> > "If-Modified-Since" header, and the server is responding with "304 Not
> > Modified".
>
> That is interesting! Do you think you can get the info mentioned above for a
> missing entry, both before and after shift-reload?
Why is this interesting? IMO that's expected behavior. We have an invalid entry but with "correct" metadata, so we send a conditional request.
Comment 48•14 years ago
|
||
Comment 49•14 years ago
|
||
I can consistently reproduce this on a test machine hosted inside mozillalabs.com, which is sitting behind a Zeus SSL terminator.
By submitting an enter to the location bar, I see GET http://mozillaball.mozillalabs.com requests happening, with 301 responses.
If I press the reload button, however, I see a GET https://mozillaball.mozillalabs.com with a 304 response.
I tcpdumped the initial HTTP request. about:cache details for the entry are attached; there was no cache entry for http://mozillaball.mozillalabs.com.
The 304 request/response pair, as reported by the Web Console, follows.
The Last Modified date of the cache entry does NOT update if I press enter on the location bar; it DOES update if I press the reload button.
Assignee | ||
Comment 50•14 years ago
|
||
(In reply to comment #49)
> I can consistently reproduce this on a test machine hosted inside
> mozillalabs.com,
Brilliant! :)
> I tcpdumped the initial HTTP request. about:cache details for the entry are
> attached;
Does the file /Users/michaelhanson/Library/Caches/Firefox/Profiles/fvfl3wpn.fres4.1/Cache/B9125F29d01 exist at this point?
> The Last Modified date of the cache entry does NOT update if I press enter on
> the location bar; it DOES update if I press the reload button.
Does anything else update in the cache-entry after reload? E.g. the "file on disk" field?
Any chance to get a nspr-log of this? See https://developer.mozilla.org/en/HTTP_Logging for details - please also add "cache:5" to the comma-separated NSPR_LOG_MODULES list...
Comment 51•14 years ago
|
||
(In reply to comment #44)
> 4. Cache isn't even half-full.
I think I got it. Actually it is caused by the full cache. Although we have now 250MB cache size by default (and even much bigger via smart sizing) the limits for block files _CACHE_00[1-3]_ haven't changed.
I.e. we have only:
8MB for files with size 1-1024
32MB for files with size 1025-4096
128MB for files with size 4097-16384
The problem arises when writing of the data fails but metadata is written successfully. HTTPS is just a red herring here. Metadata of HTTPS response usually fits into _CACHE_002_ because of the large security-info. When the data is small enough to fit into _CACHE_001_ (small patch, image, etc...) and it is full, then the problem comes up.
Comment 52•14 years ago
|
||
(In reply to comment #50)
> Does the file /Users/michaelhanson/Library/Caches/Firefox/Profiles/fvfl3wpn.fres4.1/Cache/B9125F29d01 exist at this point?
It does not.
>> The Last Modified date of the cache entry does NOT update if I press enter on
>> the location bar; it DOES update if I press the reload button.
>Does anything else update in the cache-entry after reload? E.g. the "file on
>disk" field?
The fetch count, last fetched, last modified, and expires fields change; the Date field of the response-head changes. The file on disk does not (and still does not exist).
> Any chance to get a nspr-log of this? See
https://developer.mozilla.org/en/HTTP_Logging for details - please also add
"cache:5" to the comma-separated NSPR_LOG_MODULES list...
Attached.
Assignee | ||
Comment 53•14 years ago
|
||
(In reply to comment #51)
> The problem arises when writing of the data fails but metadata is written
> successfully. HTTPS is just a red herring here. Metadata of HTTPS response
> usually fits into _CACHE_002_ because of the large security-info. When the data
> is small enough to fit into _CACHE_001_ (small patch, image, etc...) and it is
> full, then the problem comes up.
This sounds like a real issue, yes. And it can explain why such cache entries claim to be stored in a separate file (i.e. "response is cut off before content is written to cache" as discussed in comment #23). But wouldn't the else-block in nsDiskCacheDevice::DeactivateEntry() remove metadata if writing content fails?
Updated•14 years ago
|
Attachment #483600 -
Attachment mime type: text/plain → application/x-gzip
Comment 54•14 years ago
|
||
(In reply to comment #53)
> This sounds like a real issue, yes. And it can explain why such cache entries
> claim to be stored in a separate file (i.e. "response is cut off before content
> is written to cache" as discussed in comment #23). But wouldn't the else-block
> in nsDiskCacheDevice::DeactivateEntry() remove metadata if writing content
> fails?
I didn't debug it deeply yet. I just tried to simulate the full cache by failing appropriate nsDiskCacheBlockFile::AllocateBlocks() call in debugger and can confirm that it creates the corrupted cache entry.
Assignee | ||
Comment 55•14 years ago
|
||
(In reply to comment #52)
> > Any chance to get a nspr-log of this? See
> https://developer.mozilla.org/en/HTTP_Logging for details - please also add
> "cache:5" to the comma-separated NSPR_LOG_MODULES list...
>
> Attached.
Thanks! The (relevant) requests/response pairs are
- one request to http://... + 301-response redirecting to https
- another request to http://... + 301-response redirecting to https
- then the request with i-m-s header to https://... + the 304 response
(Not sure why there are two requests to the http-url, but never mind...)
So the https-url is stored in your cache, Necko believes the content is stored in a separate file, which doesn't exist. IMO we have the situation described in comment #23, and Michal may very well have figured out (in his comments above) the reason *why* the content is not stored (or "cut off" to use the term from comment #23).
Would you mind capturing another nspr-log while doing a hard reload, i.e. a shift-ctrl-r ? It should bypass the cache completely and reload the url....
Comment 56•14 years ago
|
||
(In reply to comment #55)
Log of force reload, attached.
Assignee | ||
Comment 57•14 years ago
|
||
Thanks again! And now the page looks ok? What does the cache-entry look like?
Comment 58•14 years ago
|
||
The page looks fine; the cache-entry has a fetch count of 1 (before I click on it); the dates look reasonable.
The file on disk, however, is still a path to a non-existent file: /Users/michaelhanson/Library/Caches/Firefox/Profiles/fvfl3wpn.fres4.1/Cache/B9125F29d01
That's surprising... my about:cache shows 12754 files, 1048576 KiB max, 119584 KiB in use.
I visited a couple more pages and saw that new entries were appearing in the cache - 22k and 83k files appeared, I checked one of them and it was gziped CSS from yahoo, so the cache is still functioning at some level.
I'm not sure whether it's relevant to Michal's observation, but sorting the directory by size, the smallest file is 16389 bytes. No obvious pattern jumps out looking at the dates and sizes.
Comment 59•14 years ago
|
||
(In reply to comment #58)
> The file on disk, however, is still a path to a non-existent file:
> /Users/michaelhanson/Library/Caches/Firefox/Profiles/fvfl3wpn.fres4.1/Cache/B9125F29d01
That's OK. If it was caused by full cache block file then force reload can't help. It won't free any space.
> That's surprising... my about:cache shows 12754 files, 1048576 KiB max, 119584
> KiB in use.
As I said, there are separate limits for small entries. So although only 10% of your cache is used it isn't possible to cache entries with certain size. This is a bug and needs to be fixed.
Comment 60•14 years ago
|
||
Attached patch should fix the problem. Of course it doesn't solve the other issue that we stop caching small entries when cache block files are full. I'll file a follow-up bug for this.
Assignee | ||
Comment 61•14 years ago
|
||
(In reply to comment #60)
> Attached patch should fix the problem. Of course it doesn't solve the other
> issue that we stop caching small entries when cache block files are full.
Very nice catch! :) I really cannot say anything about whether it resolves the issue seen here since I've not been able to reproduce it, but maybe others can.
I'm still a little puzzled by some why this only seems to happen when using https, though. And it also seems to be connected to encoded streams, but heck... your solution is valid in any case.
I've filed bug #604880 describing the particular issue you uncovered and made this bug depend on it. Bug #604880 should be trivial to get approved and landed, and we'll see if the issue seen here goes away. If it doesn't, there are other things to handle, but your fix would be valid in any case.
> I'll file a follow-up bug for this.
Bug number..?
Comment 62•14 years ago
|
||
(In reply to comment #61)
> Bug number..?
604897
Comment 63•14 years ago
|
||
(In reply to comment #61)
> I've filed bug #604880 describing the particular issue you uncovered and made
> this bug depend on it. Bug #604880 should be trivial to get approved and
> landed, and we'll see if the issue seen here goes away. If it doesn't, there
> are other things to handle, but your fix would be valid in any case.
Thanks, that's a good idea. Anyway, anybody having the problem with view-source or broken images could check in debugger if it is caused by full cache block file. Set breakpoint at nsDiskCacheMap::FindRecord() and do some disk cache activity (e.g. pageload or view details of some cache entry in about:cache?device=disk) that triggers this breakpoint. Then check if any of the bitmaps is full, e.g. in gdb:
x/1024wx mBlockFile[0].mBitMap
x/1024wx mBlockFile[1].mBitMap
x/1024wx mBlockFile[2].mBitMap
Assignee | ||
Comment 65•14 years ago
|
||
Comment 66•14 years ago
|
||
In case there are any problems with reproducibility, I have found that https://bug605048.bugzilla.mozilla.org/attachment.cgi?id=483877 consistently gives me the File not found page when I view source the first time, and then it works fine on reload.
Comment 67•14 years ago
|
||
fwiw, i'm seeing what looks like this when i login to both bankofamerica.com and ba.com; missing content on the page and layout, and in the case of ba.com forms are not showing up properly
Comment 68•14 years ago
|
||
This is a major website compatibility hit; we need to pull it into an earlier beta.
blocking2.0: betaN+ → beta8+
Assignee | ||
Comment 69•14 years ago
|
||
There is a fair chance are that bug #604880 will resolve this issue. Can we carry over the blocking-flag to bug #604880 and get it landed?
Assignee | ||
Comment 71•14 years ago
|
||
Bug #604880 has landed - do you guys who experience this issue see any improvement?
Comment 72•14 years ago
|
||
Can't reproduce with my testcase any longer, which is good.
Comment 73•14 years ago
|
||
I am still seeing the problem in 20101021030744. The file-on-disk entry continues to point to a non-existent file; force reload works once but the problem reoccurs after that point.
Comment 74•14 years ago
|
||
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101021 Firefox/4.0b8pre -- the problem seems less pronounced than it used to be, but I'm still seeing it. The Zeus management GUI is no longer missing a lot of the icons that used to be, and the table frames are working again, but the status display and a few of the icons still don't show without a shift+reload.
Comment 75•14 years ago
|
||
(In reply to comment #74)
> the problem seems less pronounced than it used to be, but I'm still seeing it.
Did you try to clear the whole cache after upgrading FF? Also could you check the bitmaps as described in comment #63?
Comment 76•14 years ago
|
||
(In reply to comment #75)
> (In reply to comment #74)
> > the problem seems less pronounced than it used to be, but I'm still seeing it.
>
> Did you try to clear the whole cache after upgrading FF?
I had not, and after doing so, the problem does seem to be completely gone. The bank website I was having issues with and the Red Hat Network site also are no longer exhibiting this problem.
> Also could you check the bitmaps as described in comment #63?
No idea how to set a breakpoint, sorry.
Comment 77•14 years ago
|
||
Sounds like we need to ship a cache version increment with this, to make sure the cache is cleared.
Assignee | ||
Comment 79•14 years ago
|
||
Unless anyone disagrees I'll say Michals patch for bug #604880 fixed this issue. This is a simple patch to bump the cache-version suggested by Jason in comment #77.
Attachment #485626 -
Flags: review?(jduell.mcbugs)
Updated•14 years ago
|
blocking2.0: beta8+ → beta7+
Comment 80•14 years ago
|
||
Comment on attachment 485626 [details] [diff] [review]
Bump disk-cache version
Given that this is a version bump change only, and we're blocking beta7 on it, and Jason is on semi-vacation, I'm taking the privilege to r=jst this, even if I'm not a peer!
Attachment #485626 -
Flags: review?(jduell.mcbugs) → review+
Comment 81•14 years ago
|
||
Cache version bump landed:
http://hg.mozilla.org/mozilla-central/rev/f91e0db149dd
Marking FIXED!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•