Closed Bug 177329 Opened 22 years ago Closed 8 years ago

nsIWebBrowserPersist can't persist things whose cache entries have expired

Categories

(Core Graveyard :: Embedding: APIs, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE
mozilla1.5alpha

People

(Reporter: bzbarsky, Assigned: adamlock)

References

(Depends on 1 open bug)

Details

(Keywords: embed, helpwanted, topembed+)

Attachments

(1 file, 5 obsolete files)

When a page _has_ a cache entry but the entry is expired (eg no-cache has been set) the persistence object will attempt to refetch it from the network instead of reading it from cache. This means that in the usual case (trying to save an invoice page) one of two things happens: 1) Nothing is saved or 2) The order is placed twice. Neither is really acceptable. The API needs to be fixed to accept not just a URL but also a cache key or cache token or nsIWebPageDescriptor (preferred) or something.
Blocks: 99642, 115174, 120809
Keywords: embed, topembed
QA Contact: depstein → dsirnapalli
Keywords: topembedtopembed+
Is this bug reproducible? My understanding is this pages in Mozilla are saved using SaveDocument, not SaveURI. This means that the document will be saved from the in-memory DOM and not from the network. Some of the URIs in the page may be refetched however but since no post data goes with them, it should not cause the problems described. nsIWebPageDescriptor is just a crappy abstraction of calling loadUri with LOAD_HISTORY from a session entry in the docshell. It's not really caching anything. The patch in bug 170722 adds referrer, extra headers & a cache key arg to the saveURI method. The cache key is the nsISupports that something has previously grabbed from the last time the channel was loaded. Unless this bug is describing something different I am tempted to dupe it.
> My understanding is this pages in Mozilla are saved using SaveDocument, not > SaveURI. "web page, complete" uses SaveDocument; "web page, HTML only" uses SaveURI. If you save "web page, HTML only" this bug is eminently reproducible. I doubt a cache key will help the particular problem described in this bug (expired cache entry), since cache keys do not pin entries in the cache; cache tokens do that, iirc. Of course nsIWebPageDescriptor won't pin anything in the cache either, for obvious reasons... A cache key _will_ allow us to retrieve the correct cache entry when it exists, which is certainly a step up from what we do now. The real problem is that we need a _coherent_ strategy for dealing with issues like this (and we should decide, for example, whether saving should work even if cache is disabled). ccing some necko/cache people in the hope that they will have something to say...
I'm not sure what to do in this situation either. I guess in the first instance if the page is https or contains postdata we should display a prompt dialog to give the user the chance to change their mind. Is there a way to test if a given URI has expired or is not in the cache in advance of trying to save it?
Target Milestone: --- → mozilla1.4beta
adam: no not exactly. i mean you could use the cache APIs to check, but that is not reliable since the cache is multithreaded and may be forced to evict something on a background thread. instead you should open the channel with the nsICachingChannel::LOAD_ONLY_FROM_CACHE load flag set. this will cause the load to fail with NS_ERROR_DOCUMENT_NOT_CACHED if the document does not exist in the cache. then you can prompt the user in response to this error.
So doing what Darin suggested, I can think of three places this logic can be put: 1. In the progress / download javascript that calls the persist object 2. In the persist object with a flag to turn on this behaviour. 3. In necko with a flag on the nsIRequest / nsIChannel to turn on this behaviour. The second and third would better satisfy the topembed+ against this bug. Darin, do you have any preference? I'll see if I can come up with a patch either way. The flag in both cases would be something that prompts the user if they wish to fetch from the network if the error code NS_ERROR_DOCUMENT_NOT_CACHED comes back the first time around. What happens in Mozilla if the user hits reload on a page in this situation?
Attached patch Work in progress (obsolete) (deleted) — Splinter Review
Still work in progress and untested. This demonstrates how the functionality might be made available through the persist object and a new PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED flag.
i don't think it makes sense for this prompting to live in necko. in general, user interactions should be left out of necko. in some cases, we obviously can't help it, but whenever we put user interactions in necko it makes it harder to customize the UI. (e.g., prompting vs. error pages.)
I'll proceed with the web browser persist approach. I'll probably need two new flags not one - one to enable prompting and another to enable LOAD_ONLY_FROM_CACHE. I'm having trouble producing a test page though. I've tried several ways to generate a page that should be refetched each time from the network: 1. A CGI script which contains an Expires header pointing only a few seconds into the future. 2. An html file containing <meta http-equiv="Pragma" content="no-cache"> and loaded via http. 3. Changing Mozilla cache settings to check the page every time it is loaded. It still wants to save the cache entry! The only way I discovered to force it to not be cached is to flush the cache after loading the page for the first time. Any ideas?
"pragma: no-cache" and loading via https should work....
Attached patch Work in progress 2 (obsolete) (deleted) — Splinter Review
Second work in progress. This one actually works although there are some big XXX comments where I've had to hack some flags to force the behaviour I want to test. I also need to think where the prompt text goes; probably into appstrings.properties along with the existing POST retry prompt. Another issue to be considered is that the caller is expecting saveURI to save just one URI, not two, implying one start state notification and one stop state notification. So I'll have to suppress the state notification in OnStopRequest from the first saveURI and in the OnStartRequest from the second to make it all seem like a single request. I hope they don't cache and compare the nsIRequests passed with the state changes for any reason...
Attachment #120498 - Attachment is obsolete: true
Attached patch Work in progress 2 (obsolete) (deleted) — Splinter Review
Almost there, just have to test out js side of things and see about suppressing the extra start / stop notifications.
Attachment #120639 - Attachment is obsolete: true
Attached patch Patch (obsolete) (deleted) — Splinter Review
This is the patch with one thing to clear up. What to do about shift click on links. These would normally bring up a save as dialog. At present with the patch applied, shift clicking will pose a "This is not in the cache message, do you want to proceed" message. I would like to suppress this but I am confused about how the JS is being called. All of the various "Save Page", "Save Link As...", shift+click all go through a single JS function SaveURL() in contentAreaUtils.js. The bit that confuses me is that the various places that call it do not supply all the arguments it wants. http://lxr.mozilla.org/seamonkey/search?string=saveURL What are the default values for they args they do not supply? For example, shift+click only supplies the first 3 args but never says what the aBypassCache arg value should be. It seems from the path it takes that aBypassCache is false, but is this right for shift click? If not, I can leave the code as is and just explicitly pass true instead. The same goes for the "Save Link As..." Anyway, the patch is complete apart from this issue. It implements the PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED and PERSIST_FLAGS_ONLY_FROM_CACHE and makes contentAreaUtils.js and promptDlg.js supply these flags when calling saveURI. I did not bother suppressing the extra nsIRequests and instead documented PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED to warn that it could result in multiple requests occurring.
Attachment #120994 - Attachment is obsolete: true
Args that are not passed to a JS function are "undefined", which tests as "false". There is an existing bug, iirc, about whether shitf-click should indeed bypass cache; it's tied up in discussion of what we should be doing.
What should I do about it for now? I will have to make it explicitly pass true or false for this value. If false is the current behaviour I can preserve that but either way I will have to add another bool to saveURL to supress the prompting behaviour for Save Link & shift+click. The current wording of the prompt doesn't make sense if you're never visited what you're trying to save anyway.
So what happens if neither PERSIST_FLAGS_FROM_CACHE nor PERSIST_FLAGS_BYPASS_CACHE is set? Perhaps the "save link as" case should simply not be setting either of those flags and allowing the normal cache policies to be used? That's the suggestion that seemed most reasonable from the other bug...
If neither BYPASS_CACHE or FROM_CACHE is specified, you get whatever the default behaviour from necko is. I don't know what that might be but presumably it is one or the other. I think it's better to be explicit here. Either shift+click should pass true, if it wants to force a refetch or false if it doesn't. Personally I think it should send false and let persist try the cache first, but we should have another arg on SaveURL that suppresses the prompt behaviour.
Or you could add another persist flag... "CACHE_PREFERRED" which is intermediate between the existing "CACHE_REQUIRED" and "CACHE_FORBIDDEN" flags, but a bit stronger than the default "do whatever Necko wants to" behavior... I think this would make a lot more sense as a flag than an arg to saveURI, but that's just me. And I'm not sure which is better for embedders.
FROM_CACHE is the equivalent of preferred since it will grab from the cache and then try the network if it has to. BYPASS_CACHE skips the check altogether. And the latest patch adds ONLY_FROM_CACHE which makes necko return an error code if the item is not in the cache. So, all the options are covered and map onto your choices like this: ONLY_FROM_CACHE == CACHE_REQUIRED FROM_CACHE == CACHE_PREFERRED BYPASS_CACHE == CACHE_FORBIDDEN
> And the latest patch adds ONLY_FROM_CACHE which makes necko return an error code > if the item is not in the cache. Ah. So the comment about things not being fetched should move from FROM_CACHE to ONLY_FROM_CACHE in the idl, and save link as should use FROM_CACHE. Right?
Attached patch Patch (obsolete) (deleted) — Splinter Review
Updated patch changes the documentation for PERSIST_FLAGS_FROM_CACHE to make it more obvious and the wording of the warning. More significantly, I searched for all calls to SaveURL and made the callers explicitly supply all the args the function expects. Since I pass in false, the behaviour is the same as now (i.e. bypassCache is false). The one notable change to behaviour I encountered was that if you shift+click on an unvisited link you get the warning message. This could probably be surpressed by adding another arg to saveURL to control the prompt behaviour. I can see this error might annoy some people :)
Attachment #121541 - Attachment is obsolete: true
Yep. I thought the idea was that we would only prompt in the "only from cache" case and would just silently fall back to getting off the network in the "preferred from cache" case?
Yes. The thing here, is that the persist object is being told to save some random URI, but needs to ensure that if the item is not there in the cache that it does not go off and get it without checking first. Hence, when the saveURL() js method supplies a bypassCache value of false, I assume they want to grab from the cache and pose the dialog before trying from the network. Most of the time, this is exactly what it probably means, how I wonder if 'grab from network' is implied by shift+clicking a link and whether the warning is irrelevant. If this other bug decides that shift+click should always refetch, it makes this problem go away since it will supply true as a bypassCache value and this issue won't exist any more. Boris, if you have a chance you might like to apply the patch and see if it does what you hope it does.
Missed the 1.4b train, will this be targetted for final 1.4 or 1.5 alpha? Please re-set the milestone accordingly.
Target Milestone: mozilla1.4beta → ---
Hmm, no sign of 1.4 final on the target milestones, so I'll put 1.5a for now.
Target Milestone: --- → mozilla1.5alpha
Attached patch Patch mk2 (deleted) — Splinter Review
New patch attempts to remove the annoying prompting by adding a new aPromptIfNotInCache to saveURL() so callers can control this behaviour. I have gone through the callers, supplying true or false for this value.
Attachment #122619 - Attachment is obsolete: true
Comment on attachment 124574 [details] [diff] [review] Patch mk2 Can I have an r/sr on this patch to prompt for expired cache entries? This patch contains both changes for the persist object and the xpfe. As the comments will show, there are probably some places where prompting should happen and others where it should not. The patch takes a best guess at these, so if you feel uncomfortable about these by all means just r/sr the persist part and we can deal with the other ones on a case by case basis.
Attachment #124574 - Flags: superreview?(bz-bugspam)
Attachment #124574 - Flags: review?(brade)
Kathy & Boris, can you take a look at the patch please and lend your opinion on the best way for this to go in please? As mentioned before, the patch contains both the client and persist object side changes. If you feel more comfortable about putting the persist changes in first and reviewing the other parts we can do that.
Adam, I'll try to get to this patch today or tomorrow. Sorry about the delay.
Comment on attachment 124574 [details] [diff] [review] Patch mk2 >Index: mozilla/embedding/components/ui/progressDlg/nsProgressDlg.js >+ if (!persistArgs.bypassCache && persistArgs.aPromptIfNotInCache) { Hmm... how is the persistArgs getting here? The only place where I see this window opened, window.arguments[1] is not even passed... In any case, there are no objects that have a "aPromptIfNotInCache" property with your changes (details below). >+ webBrowserPersist.persistFlags |= nsIWBP.PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED + >+ nsIWBP.PERSIST_FLAGS_ONLY_FROM_CACHE; please use '|' instead of '+' here. >Index: mozilla/docshell/resources/locale/en-US/appstrings.properties Hmm.. Why is the persistence object using this particular stringbundle? Shouldn't it use its own stringbundle instead of hijacking the docshell's? You probably want an em dash or something after "exercise caution". >Index: mozilla/embedding/components/webbrowserpersist/public/nsIWebBrowserPersist.idl So now we have the following options: NONE -- no flags BYPASS_CACHE -- no cache used FROM_CACHE -- try cache; if that fails, go to the network PROMPT_FOR_RETRY_IF_NOT_CACHED -- try cache; if that fails, prompt before retrying. ONLY_FROM_CACHE -- like FROM_CACHE but does not go to the network on a miss The question I have is how FROM_CACHE differs from NONE. Is there a difference? We should document it clearly, if so. Is there a difference between (PROMPT_FOR_RETRY_IF_NOT_CACHED | ONLY_FROM_CACHE) and (PROMPT_FOR_RETRY_IF_NOT_CACHED | FROM_CACHE) ? >+ * flag is only valid from saveURI when combined with >+ * PERSIST_FLAGS_FROM_CACHE and PERSIST_FLAGS_ONLY_FROM_CACHE. This makes it sound like I have to set all three flags at once. Perhaps the "and" should be an "or"? >+ /** Only fetch from the cache */ >+ const unsigned long PERSIST_FLAGS_ONLY_FROM_CACHE = 32768; This needs to be better documented, if only by explicitly contrasting the FROM_CACHE flag. On a side note, it may make the interface more readable to use things like (1 << 15) in place of "32768".... >Index: mozilla/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp >+ mHeaders.Append(aHeader); >+ mHeaders.Append(": "); >+ mHeaders.Append(aValue); >+ mHeaders.Append("\r\n"); mHeaders.Append(aHeader + NS_LITERAL_CSTRING(": ") + aValue + NS_LITERAL_CSTRING("\r\n")); >+ if (data && >+ status == NS_ERROR_DOCUMENT_NOT_CACHED && >+ mPersistFlags & PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED && >+ mPersistFlags & PERSIST_FLAGS_FROM_CACHE) What happens if ONLY_FROM_CACHE is set? >+ if (mProgressListener) >+ prompt = do_QueryInterface(mProgressListener); It sounded to me based on the IDL comments that the progress listener should be prepared to return an nsIPrompt via the nsIInterfaceRequestor's GetInterface... is that not the case? In any case, do_QueryInterface is null-safe, so you don't need the null-check. >+ HeaderVisitor *visitor = new HeaderVisitor; >+ nsCOMPtr<nsIHttpHeaderVisitor> headerVisitor = do_QueryInterface(visitor); Why not just: nsCOMPtr<nsIHttpHeaderVisitor> = new HeaderVisitor(); ? In any case, check for OOM before passing it to VisitRequestHeaders... >+ // Test for odd combinations of cache flags, e.g. asking to prompt if >+ // not cached, but bypassing the cache altogether. This should be documented in the IDL (it's not clear from the interface that setting RETRY_IF_NOT_CACHED and BYPASS_CACHE will just make saveURI throw). >Index: mozilla/xpfe/communicator/resources/content/contentAreaUtils.js >+ if (!useSaveDocument && aData.aPromptIfNotInCache) { The member in aData is called "promptIfNotInCache", no? >+ persist.persistFlags |= nsIWBP.PERSIST_FLAGS_PROMPT_FOR_RETRY_IF_NOT_CACHED + >+ nsIWBP.PERSIST_FLAGS_ONLY_FROM_CACHE; Why do you need to set ONLY_FROM_CACHE given that you have the PROMPT flag on? This looks like a fairly good aproach, now that I've looked at it in detail. It should certainly satisfy our internal "save as" consumers if they can just decide for themselves what behavior they want....
Attachment #124574 - Flags: superreview?(bzbarsky) → superreview-
Can this please be reassigned to someone who's got time for it? At present it's a dangerous bug for anyone who does online shopping or banking as forms can be silently reposted (see bug 115174).
Keywords: helpwanted
Adding myself to CC:, this critically affects my ability to use Firefox for anything serious.
Bug 288462 provides the full summary of all related issues, complete with RFC violations. Should be linked to this one.
(In reply to comment #0) > The API needs to be fixed to accept not just a > URL but also a cache key or cache token or nsIWebPageDescriptor (preferred) or > something. This was fixed in bug 84106; the remaining issue is probably the subject of bug 479296. The patch here is quite old, however it might be useful as a reference. Actually, the question about what to do when the page cannot be retrieved from the cache is still open, and is being addressed in bug 479296. Setting a dependency between this bug and bug 479296 may be useful.
Depends on: 486921
QA Contact: dsirnapalli → apis
Attachment #124574 - Flags: review?(brade)
Marking a bunch of bugs in the "Embedding: APIs" component INCOMPLETE in preparation to archive that component. If I have done this incorrectly, please reopen the bugs and move them to a more correct component as we don't have "embedding" APIs any more.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: