Closed Bug 40867 Opened 25 years ago Closed 23 years ago

Need means to reuse/reload current page without refetching from server

Categories

(Core :: DOM: Navigation, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: jmd, Assigned: rpotts)

References

(Blocks 2 open bugs, )

Details

(5 keywords, Whiteboard: [Hixie-P1] partial fix is checked into 0.9.2 branch)

Attachments

(9 files, 2 obsolete files)

(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/php
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
When viewing any page, we need to keep a copy of the source around, instead of opening again, in the case of the file protocol, or downloading it again, in the case of the http and ftp protocols. The reason this must be done is simple. Save As, in other applications, means, save what I am currently viewing to a file that I will specify. In most cases, this is what Mozilla does with it's Save As. However, due to the fact that we are regeting the page, there stands a chance that it has changed, or been completly removed. This goes completly against the 'save what I am currently viewing to a file' meaning most users know Save As to mean. One of the worst consequenses of this is for web developers. Say you are designing a page, and have a copy open in mozilla. You make an unwanted change to the source file, and are unable to undo it. Or maybe you even accidently deleted the source file. No worries, you have a copy cached in Mozilla, right? Wrong. Also note that trying to save it, after the file has been deleted, will crash Mozilla if the file was accessed via ftp or file. I've opened bug 40792 on this problem. If the file was accessed via http and was deleted, Mozilla will quietly save the 404 page. This is bad. Very, very bad. Another problem is pages with dynamic content. You want to save the page you are viewing. Maybe it was loading 2 hours ago, but thats the page you want saved, not whatever is there now. Yet another problem is large pages for users on dialup connections. The Lynx browser makes you redownload a page when you to view as source. I found this infuriating when I was a dialup'er, trying to speed up my surfing by using Lynx. It made no sense to have to redownload what I seemingly was staring right at. For the composer, a bug was filed a month ago (Bug 37023), which is related to this. davidr8 reported that composer was forcing you to save before you could view source. beppe commented that it was more efficient to read from a file, and the bug was set to VERIFIED WONTFIX. This is exactly the same problem. Do not read the source from a file. Say I make a page in composer, save it, then edit that saved file in another editor. If I switch back to composer, I'm looking at the old version of the file. If I view source, I want, and expect, to see the old source. What if i needed to reference it. It's seemingly there, hell, I'm staring right at it! Saving the pre-rendered page in memory is a huge bandwidth savings if it was accessed over http/ftp, and is just plain the right thing to do.
Great bug! I think it's the same as bug 6119, though.
That is only regarding view source. I proposed that the system be used for Save As, and view source. I'll put a note on that one.
Somehow, I would think the cache should take care of this, given a "Once Per Session" recheck pref (in Advanced|Cache in the prefs). Adding "perf" keyword.
Keywords: perf
Looks similar to bug 6119. Reassigning to Bill law. Neeti
Assignee: gordon → law
Component: Networking: Cache → XML
I suggest that bug 6119, bug 17889, bug 37023, and bug 40792 (and probably a few other bugs as well) all be made dependent on this bug; and that this bug be upped to major severity, since it involves loss of data. Otherwise those other bugs are all going to run around doing various inappropriate things with the cache, when commands operating on the current file (save, print, view source, etc) shouldn't be dependent on the cache at all (e.g. if the sum total of files in the current Web page is larger than the disk cache size). This bug really isn't much to do with XML. Mozilla should hold the source of *all* files currently being used, whether they be XML, PNG, CSS, or whatever.
Changing component from XML to XPApps (as I think was intended) and nominating for nsbeta3 since I think this is a major problem - we need to use the cache for these things (and get things out of the cache even when expired in certain circumstances).
Component: XML → XP Apps
Keywords: nsbeta3
Nav triage team: Already works the way 4.x works.
Whiteboard: [nsbeta3-]
Removing nsbeta3- to trigger re-evaluation. PDT: This causes silent data loss! If you want to save the page currently being shown, but the page in question is no longer on the server, or is dynamically created, then Mozilla will silently lose the page, saving the 404 file not found page, or the different version in the case of dynamic content, without telling the user. This is also a problem with View Source, one that content developers find incredibly annoying with dynamic content. Say there is a bug on a dynamically generated page. The user right clicks on the page to view the source, but when the source comes up it is for a totally different page, since the server has generated a whole new document. There are many other examples of when this is critical (e.g. dialup connection is dropped before user saves page, view source on a POST operation such as after a money transaction over the net, saving NetCenter content that changes regularly, and so on and so forth). IE does this correctly, so marking 4xp. Moving up to 'major' severity, since this is a major loss of function. Is there a keyword for this kind of bug? Marking 'correctness' unless someone can think of a more appropriate keyword...
Severity: enhancement → major
Keywords: 4xp, correctness
Whiteboard: [nsbeta3-]
[To reverse the old maxim: it's not a feature, it's a bug! Resummarizing.]
Hardware: PC → All
Summary: [RFE] Mozilla Needs to Hold Source of Current Page in Memory → Hold source of current page in memory
nav triage team: nsbeta3-, while we believe this to be important, we are concerned that this would be too difficult to do safely, this late in the game.
Whiteboard: [nsbeta3-]
Sanity check: this bug is nsbeta3-, but blocks bug 17889 which is nsbeta3+.
Removing nsbeta3- again for re-evaluation. We have a lot more information on this bug now. Bugs 17889 and 6119 are closely related, and likely have the same fix. 17889 is nsbeta3+, and I just nominated 6119 for + and dogfood. Unless somebody can come up with a super-workaround for both bugs, this bug should really be fixed. Ian Hickson has excellent comments in bug 6119 describing the problem, and I copied comments from a dup of 6119 which may help towards a fix.
Whiteboard: [nsbeta3-]
adding cc
See also bug 39957... Okay, so I'm thinking about this now. I'm not sure what we currently do, but I would imagine that we keep a DOM tree for our current document if it's HTML or XML, and otherwise, we just display the document source. It would probably not be a Good Thing to keep both a DOM and the source of a large document around at once, but it might be the compromise we would have to make for beta3. In the long run, what might be possible is to have our object model code for XML/HTML store enough information to be reserialized exactly as it was in the source, such as whitespace, capitalization, comments, entity substitutions, etc. A lot of thought is being given to this stuff in the W3C's work on the infoset (http://www.w3.org/TR/xml-infoset) and this sort of stuff has been mulled over plenty in SGML's "groves" -- the issue of what data constitutes the "information represented by the document" and what is also in the document source but is considered extraneous... Anyway, the XML and HTML parsers could store this extra information in the object model (and hopefully by this time we'll have a unified object model behind all our DOMs), and thus avoid any duplication of information. That would be the intelligent solution to the view-source, save-as, and send-page bugs (see bug 6119). The printing and changing charset bugs don't need the exact source, so this isn't a direct solution to those, but the DOM does need to get held in memory for them... Now that I think about it, why should the printing and changing charset bugs even exist, if the DOM is kept in memory already (which it must, right?) Does the document actually get reloaded and reparsed before doing those things? That would be really messed up. I'm thinking that it might be two different underlying bugs: (1) we need to hold the source for the view-source/save-as type bug, and (2) we need to make print, change charset, etc. use the current DOM we have in memory. Does this sound like a reasonable analysis?
nav triage team: there are other ways to fix the specific user problems (like reading from the cache) that is much less of a hit then this suggested fix. nsbeta3- M Future to consider later.
Whiteboard: [nsbeta3-]
Target Milestone: --- → Future
Johng, that is incorrect. You can't read from the cache for a page which Mozilla has been told not to cache, because it's not in the cache. For example -- you want to save, or print, the order confirmation page for something you've bought over the Web, but the server (correctly) has told Mozilla not to cache that page for privacy reasons. So Mozilla goes to the cache, finds that the page isn't there, and then *reloads the page* ... so (unless the shop site is quite cleverly designed) you end up inadvertently doubling your order. Need I say, that is not desirable?
I'm not 100% positive it works that way. I think most sites are set up so that an order triggers a redirect to the receipt page. So a reload just reloads that receipt page. Going *back* triggers a "re-post form data?" confirmation dialog. But I'm out of my area of expertise, here. Regardless, the bottom line is that there is no way for us (speaking for the "front end") to tell Necko to do anything more than fetch it from the cache (if possible). See http://lxr.mozilla.org/seamonkey/source/netwerk/base/public/nsIChannel.idl#198 There is no "reload from the secret stash that's not the cache." I think, if the request is legitimate (which it very well might be), that it would have to be implemented, for the most part, in the http channel implementation, with a new load flag that we could then use to fetch the data in the various places where we want to reuse it (view source, save as, print, etc.). So, I'm resetting the component. At the least, Gagan will be able to comment on the subject with more authority than I possibly could. Re-summarizing, too.
Assignee: law → neeti
Component: XP Apps → Networking: Cache
Summary: Hold source of current page in memory → Need means to reload current page without refetching from server
Attached file IRC log: dr and dbaron, discussing (deleted) —
Not actually sure if this is a networking bug. I'm wondering what piece of code actually hangs on to the document while it's being rendered, while the DOM is being modified by scripts, etc. Who owns the current document?... Please have a look at the attachment... De-nominating for beta3, removing milestone, but marking blocker, since it blocks two closely-related [nsbeta3+][dogfood+] bugs. This needs to be on somebody's radar, I just don't know whose.
Severity: major → blocker
Keywords: nsbeta3
Summary: Need means to reload current page without refetching from server → Need means to reuse/reload current page without refetching from server
Whiteboard: [nsbeta3-]
Target Milestone: Future → ---
forgive me if im stating the obvious with any of this. even worse, im not quite sure what DOM means, used in the context here. i'm taking it to mean the html equvilent of java bytecode (optimized half-rendered page), so if i guessed wrong there, i guess you should stop reading. we need three source caches. the traditional disk and memory caches, plus one to hold the source of the current page, regardless of the settings of disk/memory caches, and weither or not the server told us to cache the current page. seperately, there is the DOM cache (perhaps storing DOM for the last N pages would be a neat cpu/memory trade, but lets not stray). print/redraw and maybe character switching (17889) would ask for the DOM cache. view source/send page/save would ask for the source cache, which would fetch from the always-available current-page cache. the other alternative, the reserialization dr explained, seems like a pain, and error prone... maybe worth investigating in the future, i dont know.
jeremy: dom is the w3c standard object model interface for representing a document. > print/redraw and maybe character switching (17889) would ask for the DOM > cache. view source/send page/save would ask for the source cache, which would > fetch from the always-available current-page cache. that's one way of looking at it, but you have the right idea: use the object model where you don't need the source, to avoid reparsing; use the source where you do need it. > the other alternative, the reserialization dr explained, seems like a pain, > and error prone... maybe worth investigating in the future, i dont know. to your credit, it is definitely a pain if not well-thought-out beforehand. there is talk, though, of redoing our dom code sometime in the future (because we currently have separate code for html, xul, and other doms). this would be a place to include the reserialization information for xml/html. but, like i said, that would be for the future. for the present, we just need to keep the source of the current document in memory. perhaps if we're worried about bloat, we could choose only to keep it in memory when it's the result of a form submission... something like that.
Bug 17889 (to do with changing encoding), nsbeta3+, means you need to retain source at the *byte* level ...
No longer blocks: 6119
For pages that dont give special no-cache settings, we behave the same as 4x But if the page had headers that disabled caching, then we will hit this bug in 4.x too. Would be nice to fix this. Since we dont have time, we will fix this for later release. The general idea of the fix is to keep all pages in cache (no-cache) and not get them from cache if they weren't supposed to be cached. Then add another reload policy that says get it from cache overriding these hints. Removing 4xp
Keywords: 4xp
Target Milestone: --- → Future
No longer blocks: 39957
We don't behave the same as 4.x here -- bug 20843 is what is the same as 4.x, that is, post results aren't cached, but in 4.x you at least were able to view the source or save the current page without refetching. that has nothing to do with the behavior of the cache. Bill Law summarized the bug best by saying "There is no 'reload from the secret stash that's not the cache'." This bug probably shouldn't even be in the Cache component. Also, adding dataloss (because it should have been there all along).
Keywords: dataloss
Is this the same as bug 50949? The bugs that depend on each seem pretty similar, and that one's fixed :)
As of the 09-27-00 build (give or take a day or two), the Advanced Debug option for "Enable Memory Cache" is gone, and documents do not seem to be getting caching in memory. Is this related to all this discussion of cache/no-cache here (bug 40867)? As I understand it, there are supposed to be two caches, disk and memory. If a document's HTTP header (set by the HTTP server) says "Don't cache", the document should not be stored in the disk cache. But shouldn't it still be stored in the memory cache, for re-use by Save As, Print, etc.?
atovar: there (are/should be) three caches, disk, mem, and current. current caches objects regardless of weither disk/mem are turned off, or if http sets nocache. I say are/should be, because i'm not sure of the status. if are is true, this bug is half closed.
Per law@netscape.com, this bug now also tracks the usage for View Source, i.e. "Make View source never load from the server". Thus nominating for mozilla0.9. See recent comments in bug 6119 for rationales.
Keywords: mozilla0.9
Blocks: 56346
When I type the url, "about:cache", I never see anything listed under memory cache, even when I have Debug / Enable Memory Cache set. Does this accurately show that the memory cache is never used? I was also thinking: Why is the memory cache disabled by default? Is this because we are relying on the operating system to provide sufficient in-memory caching for objects stored to the disk cache?
Blocks: 55583
This bug is a *major* pain for Zope developers, since on server exceptions we embed a traceback in an HTML comment in the error page, so that it can be inspected with View Source. Couldn't the original text simply be attached to the root node of the DOM as a read-only attribute?
Blocks: 6119
This seems to be an architectural problem, and a cascading one that's causing many other bugs. (With perhaps more bugs yet to be identified.) My understanding (correct me if I'm wrong) is that all these various functions (View Source, Save As, etc.) all ask for the page from the Necko library, which tries to fetch it from the normal memory or disk cache if possible, loading it from the network (or generating an error?) if it's not in the cache. I assume that the cache management subsystem is part of Necko, not external to it. Is this an accurate summary of how it works? Have I made any errors? Users and web developers alike have the same (reasonable) expectation -- that what they see in the browser window is what they'll get if they view the source, save it, etc. While that's often true, clearly it sometimes isn't. It can cause data loss and confusion, impede debugging of web applications, etc. I think the root cause of this problem is that these functions are asking Necko for the information, when Necko's function is to fetch data from URLs of all sorts. The very concept of "loading" the data to be viewed or saved is really nonsensical, because it's obviously ALREADY been loaded once! Forcing that "load" to come from the cache may improve the situation, but it doesn't solve the problem at all. It only masks the more obvious symptoms of the problem. I hate to suggest architectural changes, because they can often be problematic, needing to evaluate and fix any code that might contain implicit assumptions based on the old architecture. However, I believe the only solution that will ever fix this problem for good will probably need to be architectural in nature. Leaving Necko out of the picture for the moment, there should be an underlying relationship between the data structures that reflects the way we want to use them. For functions like "View Source" and "Save As", we want to make sure that we ALWAYS refer to the same source that is already being displayed in the window where the functions were triggered. This is a user requirement, and should be reflected in the data structures for maximum consistency and correctness. The simplest way to achieve this would be for the data structures associated with the window in question to contain or directly point to the exact source they are using. I assume they must already contain the DOM generated from that source, since that would be necessary to reflow the page if the window is resized. The simple solution would be to associate the original source with the DOM, and access THAT via the window's data structures for functions like "View Source" and "Save As", leaving Necko out of the picture entirely. This would necessarily be more correct and more reliable than asking Necko to fetch it from the cache. Older windows with outdated content would work as expected, despite the existence of updated content in the cache. POST results from form submission would work as expected. Clearing the cache wouldn't matter. Things would work much more as the user expects them to, and this would be a relatively small architectural change for a big improvement in correctness. The obvious downside of this simple solution is that it would use more memory; many pages would be saved that would duplicate information already in the cache. The better solution is probably a larger architectural change, separating the cache manager from the Necko library. After all, Necko depends on the caching functionality, and it would be best for these original source copies to be integrated cleanly with the existing cache -- but there's no good reason for Necko to be in the middle of functions operating on previously-loaded data. It would be cleaner to make the cache manager independent of Necko, while keeping Necko dependent on the cache manager. These functions on previously-loaded data could then depend on the independent cache manager without depending on Necko. This architectural change would make it impossible for these functions to unexpectedly reload the page from the network. An independent cache manager would need to be more flexible than Necko needs. (I don't know how flexible the current cache manager actually is; maybe it's already more capable than I imagine.) While Necko needs size-bounded in-memory and on-disk LRU caches keyed on URL information, functions such as "View Source" and "Save As" (AND Back/Forward history functions) need a reference-counted and/or garbage-collected memory management system, bounded only by available memory. When the LRU cache Necko needs happens to point to identical data, the cache manager could share the memory space. However, if Necko needs to replace its cache entry when content has been updated, or delete an entry, it must NOT change or discard any data still being referenced from, say, a browser window, if functions on previously-loaded data are to perform as expected. I've dodged a sticky issue. What should happen if the DOM is changed by Javascript code between the time the page is loaded and when the "View Source" or "Save As" function is called? There's good arguments for modifying the source to match the DOM changes, but this is clearly MUCH more difficult than tracking a copy of the original source. Even if this can be done, sometimes the user is likely to want the original source despite the DOM changes, so it should only be optional. (I just filed bug 63892 about this, since it's really an enhancement anyhow.)
No longer blocks: 6119
Keywords: nsbeta1
Blocks: 6119
cc to self
neeti, are you going to fix this, or should it be reassigned so some non-Necko person? It shouldn't be Futured without helpwanted for long. /be
This is something I'd be happy to volunteer for, if nobody else is ready to work on it. However, I can't guarantee how much time I'll have available to work on it, or how long it would take me. I'm not familiar with the Necko code or the current cache architecture, so I'd have to study it. I also don't know all the places it would need to be linked in. (Some pointers would be helpful.) My point is, I'm willing to work on this, but can't promise I'll get it done in any reasonable timeframe, if at all. If nobody else would be working on it anyway, then there's nothing to lose. If someone with more time available can dedicate it to this, I can look for other things to work on. Should I wait and see if anyone picks this up? Should I start working towards it just in case? Should I not have said anything until I had a patch in hand?
Blocks: 57724
No longer blocks: 57724
Wouldn't it be nice if it were possible to generate the original source accurately (including formatting) from the information in the dom? If that were true, then none of this would be an issue, we could just use the dom to regenerate the source without having to keep an extra copy around. Not throwing away whitespace, and finding a way to represent whitespace in attributes, would be a nice memory efficient solution to this problem, and would have the side benefit that people would be much happier with composer since it wouldn't always be reformatting their documents.
That seems like a possible solution, but it would require a considerable number of extensions to the DOM (some HTML-only, some perhaps XML-only): * remembering the original case of normalized attribute and element names and attribute values * remembering the whitespace in a tag: + after the element name + around the = separating attribute name from value + after each attribute * remembering the type of quotation marks around each attribute value * remembering the entire contents of any markup declarations (e.g., DOCTYPE) * remembering the delimeters of included marked sections * remembering the contents and delimeters of ignored marked sections * remembering which characters were originally entity references * remembering how every entity reference was terminated * properly remembering comment delimeters, since the W3C DOM Core doesn't fully describe SGML comment structure * remember whatever badly formed HTML (and perhaps XML, too, for view-source and save) is thrown at us * ...what did I forget?
This is an HTTP issue, not an HTML issue. Since right now things don't work right for *any* document type, it is premature to be discussing a complex optimization for one particular document type. And if you ask me, reconstructing the *exact* source from the DOM in a way that works 100% of the time, and is more efficient than simply keeping a copy of the original source, sounds completely hopeless.
It's surely not more efficient in terms of CPU time to regenerate the original source from the DOM, even if it's extended to represent insignificant (to the DOM) aspects such as whitespace. More importantly, it's a significant increase in complexity that begs for subtle bugs where you may reconstruct the exact source 99.9% of the time, but never 100% of the time. This would really work more cleanly and reliably at a lower level. The best solution is probably to save the content, byte for byte. Then it's straightforward to "recreate" the original source, because it's not a complex process of reconstruction from the DOM. Also, saving a byte-for-byte copy allows for changing the character set (bug 17889) or potentially reinterpreting the content as a different MIME type entirely. (Suppose the MIME type specified by the server is wrong? How about a function to reinterpret the data under a user-specified MIME type?) Also, if memory efficiency is a concern, and CPU time is considered a worthwhile tradeoff, simply compressing the source in-memory with zlib is a viable option. The DOM could also be compressed if it's not actively in use (i.e. history)...
Regarding documents being reformatted by composer, a more straightforward solution (especially if the byte-for-byte source is to be saved for this bug) would be to associate parsed elements in the DOM with byte ranges in the saved source. This is probably much simpler than annotating the DOM to retain case and whitespace information. Wouldn't that be as effective? (Modified elements could replace byte ranges in place -- fancier versions could try to guess some of the formatting conventions by heuristic tests on the surrounding source...)
IMO, viewing the source from the cache should be perfectly fine. If you need to debug, you should be able to control your server to ensure that the page can be cached or there should be a control in the browser to ignore "no-cache" HTTP headers. Please don't implement some drastic, complicated solution such as regenerating the source from the DOM or whatever. K.I.S.S.
I agree that reconstructing the source from the DOM is a drastic, complicated solution. However, I strongly disagree that simply reading from the cache is an acceptable solution. Not only do you have a problem with content that asks not to be cached, but also with older content that has been replaced in the cache with newer content (but may still be displayed in a window), form post results, missing data from manually clearing the cache, etc. The cache is inappropriate for the purpose, when the true intent of the functions (in the user's mind) is to operate on the EXACT data that they see in the window (or saw before). This is why I'm suggesting saving a byte-for-byte copy of the content, irrespective of whether that content should normally be in the cache. (Ideally, data thus saved would share memory with duplicate cache data for efficiency reasons...)
To reiterate: 1. You can't fetch the source from the cache, since it may not be there, or may have been replaced with a newer copy. I want the source for what I'm looking at, not what I'd get if I reloaded! 2. You can't reconstruct the source from the DOM, since the DOM may have changed, and it's terribly fragile to expect an accurate reconstruction even if it didn't change. A copy of the source needs to be associated with the window/frame/layer/whatever. It can be compressed. It may be able to be shared with the cache, but that's tricky. For now, why not just keep the compressed copy as an attribute of the appropriate DOM node, or history object, or whatever, and release it when the object it's attached to is released?
Speaking as someone who's already had to cancel a duplicate order caused by trying to save an order confirmation, I'd say pulling the data from cache is not sufficient.
You all have valid points. However, I think viewing the source is something that is only useful to developers, who will have the savvy to do what is necessary to look at the source (i.e. turn off the no-cache header in HTTP server or tell browser to ignore no-cache, don't reload page in another window, etc). 99% of ordinary users will never bother to view the source. Saving the current page is a different activity. Ideally, the page save function will offer the ability to save in a variety of formats, including HTML. However, I think that the data to save can just be generated from the DOM object and there is no requirement that the HTML (if the HTML format is chosen) be anything like the original source code. I agree that "Save" should absolutely never cause another request to be sent. In summary, I don't believe that "saving a page" is the same thing as viewing/saving the contents of the HTTP reply message that generated the page.
> Not only do you have a problem with content that asks not > to be cached, but also with older content that has been replaced in the cache > with newer content (but may still be displayed in a window), form post results, > missing data from manually clearing the cache, etc. A more flexible cache architecture could handle that (have a way of marking a mem-cached page as "window still visible, so don't overwrite until window goes away, but data not to be saved to disk cache" or some such). > 2. You can't reconstruct the source from the DOM, since the DOM may have > changed, and it's terribly fragile to expect an accurate reconstruction even if > it didn't change. If the DOM has changed, then so has the display in the window; don't you want to see the source for what you're really looking at, rather than what you loaded some time ago? > to associate parsed elements in the DOM with byte ranges in the saved source. That would be very useful for composer. The parser would have to do the association, though; trying to reconstruct it later would be error prone and probably CPU intensive.
cc'ing gordon who working on the new cache.
Nice summary, Evan. Personally, I agree with everything you said in it. Ken, bug 6119 was specific to viewing the source; this bug applies as much to saving a page as to viewing the source. You seem to be suggesting that saving a page generated from the DOM would be sufficient. As a user, I would NEVER want the browser doing this. Even ignoring the readability of the code, if the browser is just saving what it *thinks* the source represents, it invites errors to creep into saved pages that weren't in the original source. A bug in the HTML parser could cause a buggy page to be saved. (What if it didn't render properly and the user is saving the page in order to view it in another browser?) Unsupported HTML tags (or attributes) would be discarded. Also, Mozilla itself could fail to view the saved page as before, if there's a bug in the generation code. While regenerating the HTML from the DOM is possible, it's complex and risky, and offers little advantage over the straightforward solution of saving the original source in some fashion...
Akkana: See my comments (for this bug) of 2000-12-28 14:39 for some of my earlier thoughts about how an independent cache manager could meet the needs of Necko's LRU cache and this sort of caching of historical content simultaneously. As for DOM changes, I don't think it's cut-and-dried. Probably you'll want to see the original source sometimes and the source as modified by DOM changes at other times, depending on the circumstances. As a user, I certainly wouldn't want to lose the ability to save a byte-for-byte copy of the original source, even if the modified DOM could be used to generate a more-current version. Both options would be nice, but I believe the original source is critical... As far as the parser needing to associate the byte ranges within the DOM, that was implied. It would be senseless to try to figure that out afterwards; the parser would definitely need to do that part of the job. Neeti/Gordon: I guess I'm glad someone is going to be working on this, but I'm a little disappointed at the same time. I was hoping to take a stab at it; it sounded like fun! (But I wouldn't want it held up waiting on me, since I can't guarantee I'd find the time -- I don't have the luxury of being paid to hack Mozilla full-time...) If a wild hare makes me play with it anyway, I'll let you know. (But only if there's something worth seeing!)
Actually, when I look at the source, 99% of the time I am interested in the source as it came from the server. This is certainly the case when I try to save a page. If the DOM has been modified by client-side scripts, I would want to be able to save the original source and scripts and not have extra HTML that's generated from the modifications to the DOM. In an ideal world, we would have both options: 1) looking at the source of "what we see now". This would need to be generated from the DOM. 2) looking at the original source. This is most likely impossible to generate from the DOM because it does not necessarily correspond to the DOM (elements may have been added to the DOM through scripts). It looks like the original purpose of this bug was to cover option #2. Should we open a separate bug on option #1?
I wanted to disagree with KenW's comments. I think that it is not a safe assumption to make that a developer should be able to frob the cache directive if they're debugging server-generated pages. IMO, view-source should function independently of any cache directives. Furthermore, when I save a document, I want the original source in all cases that I've encountered so far. In my experience, I end up wanting to save a page so that I can do a diff against an earlier saved page. I think it's dangerous to think of saving a page as a non-developer function. Both viewing the source and saving the page should be lossless.
We have been working on some design changes to the cache that should make it easier to address these problems. I hope to push a proposal out to mozilla.org in the next few days. The decision to keep the source around for currently viewed pages doesn't belong to the cache, but some of the changes we want to make will make it easier for current pages to share data with the cache avoiding unnecessary duplication.
Adding self to CC.
*** Bug 67781 has been marked as a duplicate of this bug. ***
*** Bug 68677 has been marked as a duplicate of this bug. ***
I don't think 99% of the users never bother to view the source. Although HTML was not meant for such a purpose, there are many people out here that generate pages at hand, and thus want access to the source whenever requested. That includes Javascript that can modify the code -both unaltered and modified source are worth interest in that case, but the unaltered one is the more important. When saving a page to the disk, the original page need to be saved. The changes in the DOM will operate when loading the page in a browser, executing whatever javascripts etc. there are in the page. If the settings of the browser have changed (disabling Javascript...), so will the displayed page. I consider this as a matter of consistency in the browser rendering. A question I have is whether or not keep images, embedded objects of the page in memory or not. Suppose I want to print a page: using the pictures from the DOM would be OK, wouldn't it? There is no use in refetching them
cc'ing pavlov and saari for their view on images and printing. Short answer: sometimes (often?) the images need to be redecoded at a different resolution for printing.
Yeah, pulling the unmodified data out of the disk cache and redecoding it when we need to print would be ideal for images with greater than screen resolution dpi.
jido@respublica.fr don't think ... <people> never bother to <act> is a double negative. While it is valid English, people tend to worry that the author is confused. the literal meaning is: believe <people> <act>. Saving objects with the page is the subject of other bugs. printing should use the current data. WYSIWYG
Printing should produce a WYSIWYG replication of what you see on screen in the highest possible resolution, and that will be bounded by the printing device or resolution of the image, whichever is smaller. We're not printing a 72 dpi image if we have a 300dpi copy sitting around.
so long as we know that the image @300dpi is related to the one @72 dpi and not something that could be entirely different.
Yes, both/all resolutions are derived from the same http get of the compressed data.
Blocks: 55055
Blocks: 68412
Gordon: do we have any progress on this bug? I'm currently working on a set of webpage scripts that have form POSTs all over them and debugging them is almost impossible due to this bug. It's driving me up the wall :(
I'm not sure if the problem I was encountering was related to this. When I was using some sort of web account and making operations and submitting forms with cookies, mozilla sometimes re-post my login information when the page direct me back to the main page (which should be cached), and the server thought I was attempting to multiple login and refused my request.
Cache bugs to Gordon
Assignee: neeti → gordon
Is the new Cache Manager going to fix this? It'd be a real handy thing to have, as a developer. e-commerce and other apps where reloading will totally screw you could use it, too.
The new cache itself won't fix this, but will enable the docShell to hold onto references to cache entries for the current page ensuring they won't get fetched from the server again. I'm not sure if docShell will take advantage of this however.
Whiteboard: [cache]
Resetting milestone from "Future" because at the time it was set it just meant "post netscape6 release", and we have milestones up to mozilla 1.2 by now.
Keywords: mozilla0.9.1
Target Milestone: Future → ---
Blocks: 74349
Depends on: 72519
Target Milestone: --- → mozilla0.9.1
Blocks: 64100
Any follow-up on this one now that the new cache is backing in the tree?
Keywords: mozilla0.9
From what I've learned about the new cache, we now have a way to keep the source for currently viewed pages in the cache using the hard reference capability. Just keep the hard reference around. This capability is is not used yet, however.
If I understand this correctly-- at this stage this is just a docshell issue. ->docshell owner then.
Assignee: gordon → adamlock
Component: Networking: Cache → Embedding: Docshell
QA Contact: tever → adamlock
I'm slightly worried about letting the cache handle source storage. Consider this: user visits a dynamically generated page, and leaves the window open. A little later, they visit the page again in a different window, getting different results. Now the "same" URL is open in two different windows, with two different source texts. Can the cache handle this correctly, or will both windows seem to have the source text that was loaded more recently?
The cache can handle this correctly *if* the docShell holds strong references (cache tokens) to the items in the cache that it used to render itself. If docShell doesn't hold the strong references, viewing source on either window would use the latest version (possibly fetching a new third version from the net).
*** Bug 78740 has been marked as a duplicate of this bug. ***
Yes ... and it's much more efficient to hold it in the cache because then you have only one copy of the document ... why keep it around a 100K document in your current page structure for a long period of time if it's already sitting in the cache? Here's the sequence of events you're talking about: 1. User surfs to http://www.yahoo.com. - http://www.yahoo.com comes down from the network. We grab a copy from the cache. - We grab a hard reference to that copy immediately. - We keep that hard reference around. This means the cache agrees to never get rid of it. 2. The page expires from the cache. - Since the entry has a hard reference, it stays around in memory but gets Doomed. Thus it cannot be searched. - Note that this is also what will happen if the page is set not to be cached at all. 3. The user surfs to http://www.yahoo.com in another browser. - Since the entry is doomed, the search through the cache does not find it. - http://www.yahoo.com comes down from the network. We grab this new copy from the cache. - We grab a hard reference to that copy immediately. - We keep that hard reference around. This means the cache agrees to never get rid of it. - Both copies of the page are in memory now; each hard reference refers to its own copy. 4. User closes the first browser. - The browser lets go of the first hard reference. - The cache realizes that the entry is doomed and so immediately dumps the document from memory. 5. User surfs somewhere else in the second browser. - The browser lets go of the second hard reference. - The cache entry is not removed yet (because it's not ready to expire) but when it expires it will be removed normally now instead of being Doomed.
Exactly!
Gordon, can the cache token be passed around? For example, view source loads in a different docshell than the original document (it's in a different <browser/> element...). Is there a way to pass the cache token to the new docshell? And are such things scriptable?
Yeah, they could be passed around, but some recipients may not have enough context to be able to use them. We would definitely want to enable their use by view-source. The sooner we can work out the details of what's required the better. I believe the http channel is scriptable (that's where you get and set cache tokens). There may still be a bug open to implement the setting of cache tokens on http channels; Darin would know for sure. We had been waiting to get an actual client for them first. Take a look at nsICachingChannel.idl. Cache tokens are just an opaque nsISupports that happen to contain an nsICacheEntryDescriptor, so a http channel knows how to use them to rehydrate itself.
this bug depends on a bug that is marked future. I'm reflecting that in this bug's milestone.
Target Milestone: mozilla0.9.1 → Future
Considering that this is a correctness/dataloss bug with 36 votes that has been targetted to mozilla0.9.1, and that the reason given for futuring bug 72519 is "there are no consumers for it", wouldn't it perhaps be better to ask whether the futuring of bug 72519 could be re-evaluated than to just future this bug which obviously a lot of people consider to be very important... It's also not as if depending on a futured bug is an absolute impediment to this being fixed; I notice that 3 of the bugs "blocked" by this bug, and one other bug "blocked" by bug 72519, are already fixed anyway.
This no longer depends on bug 72519 anyway. The function we need is GetCacheToken() and that is implemented now. The bug covers both GetCacheToken () and SetCacheToken(); it is the latter that is not yet working, and that is not necessary for this bug. Everything necessary to fix this bug is now in Mozilla. There is no need to change the milestone unless the developers cannot do it in time.
clobbering milestone and killing dependency.
No longer depends on: 72519
Target Milestone: Future → ---
Question: should the browser keep references to the current page's images also? It seems like it should. Also if the current page contains frames we need to keep around references to the frameset as well as every single frame. Otherwise this fix is not accomplishing everything it should--namely a perfect representation of the current page for Save, View Source and Print.
Setting cache tokens probably *is* necessary for this bug to be fixed reliably. Simply holding a cache token won't guarantee that what you request from HTTP is the same thing that you're holding the cache token for.
OK, my apologies. Then there's something I don't understand. When I read bug 72519 I naturally assumed that GetCacheToken got you the nsICacheEntryDescriptor (hard reference) referring to the document. Indeed, reading the code, that seems to be the case. Hmm ... perhaps the hard reference is not always set up by the time you call GetCacheToken()? Is SetCacheToken() supposed to place the document into the cache and set up the hard reference? Or perhaps does http place the document into the cache automatically but SetCacheToken() has to be called to keep and hold a hard reference to the document?
GetCacheToken() will allow you to keep an entry in the cache, but only SetCacheToken() will allow you to recreate a channel to that entry. Otherwise you have to create a channel with a URL which may or may not be what the cache token refers to (the entry has been doomed).
Oh, I see. So what you're saying is we will do this: * User types in URL - Browser creates HTTP channel - Browser has HTTP channel retrieve content from URL (possibly through cache) - Browser grabs hard reference from HTTP channel and keeps it around (GetCacheToken) - Browser destroys HTTP channel * User hits View Source - View Source gets cache token from Browser - View Source creates HTTP channel - View Source has HTTP channel retrieve content using hard reference (SetCacheToken) - View Source destroys HTTP channel Why not change the second part to this: * User hits View source - View Source gets cache token from Browser - View Source grabs content directly from cache token Then you don't need SetCacheToken(). HTTP channel seems like an unnecessary extra step in this case. Maybe it's impossible to get a stream out of the cache, and that's why it's done this way?
I think this also causes us to suck on the IBENCH tests.
No that's bug 61363. The IBENCH tests we're looking at aren't doing printing, saving, or view-source. This bug is for operations on the currently displayed page, so they don't result in refetching from the net. When the user wants to print, save, or view-source for the front-most window, they don't expect to get different results from the net.
Do you think we can get this resolved in time for mozilla0.9.1?
BTW, another situation when we should use the current version instead of refetching is the "Send page" function (which does not work correctly even in Netscape 4). With "Send page"not only you might end up loosing the data, you may also end up e-mailing something very different from what you thought you are e-mailing...
-> 0.9.3
Target Milestone: --- → mozilla0.9.3
*** Bug 79758 has been marked as a duplicate of this bug. ***
Whiteboard: [cache] → [Hixie-P1] (HTTP) [cache]
Blocks: 83792
bug 83792 - same cause, more unwanted effects. Fix this NOW, please. (Sorry for the spam)
No longer blocks: 83792
Blocks: 84106
Blocks: 85128
Blocks: 86261
Keywords: nsenterprise
a possible (minimal) solution: toplevel page is loaded... before the necko channel goes away, QI to nsICachingChannel. then call nsICachingChannel::GetCacheKey. store the cache-key someplace. when you want to reload the document, just create a new necko channel, but before calling AsyncOpen, QI the new channel to nsICachingChannel, and call SetCacheKey, passing the stored cache-key to the channel. also, be sure to call SetLoadFlags(nsIRequest::LOAD_FROM_CACHE) on the necko channel as well to prevent the usual cache validation. the cache-key is used, for example, to distinguish the results from different POST requests on the same URL. this solution does not guarantee that data will not be fetched from the cache, but it does do the "right thing" regardless of whether the cache is present or not, etc. and, for file->SaveAs and view-source it will almost always avoid refetching from the net.
Folks, could someone please step up and make a damn architectural decision here? over a year of thinking about this issue in this bug alone and we don't seem much closer to a solution. There's been a lot of good discusions and ideas kicked around, and a number of possible solutions proposed, but we still don't appear to have anyone setting a stake in the ground and saying "do this." (or better yet a group of core moz folk that can agree on single solution.) I'm currently loosing a $100 argument with buy.com over a duplicate order that resulted from printing a copy of the order summary. At this point mozilla is unusable for the average user's e-commerce (at least with sites as poorly engineered as buy.com). I've had to resort to another browser for any form of online purchase.
putterman: were you asking me about ecommerce and multiple orders?
Radha, what's the session history perspective on this?
PDT+ per selmer's request.
Whiteboard: [Hixie-P1] (HTTP) [cache] → [Hixie-P1] PDT+
Session History comes into picture only when you hit reload() which uses the VALIDATE_ALWAYS loadflag. Session History however stores the cachekey for the current page in mOSHE. This cacheKey is passed to nsICachingChannel when the loadtype is LOAD_HISTORY/RELOAD_NORMAL/RELOAD_CHARSET_CHANGE. If the solution to this bug is to do a similar thing for view-source/printing, then I think the current setup in nsDocShell::DoURILoad() can be extended to meet the needs here.
I'm not seeing any problems with printing POST results. On both the trunk and the 0.9.2 branch when printing the results of a form POST we *do not* reload from the network (and do another POST)... Are there any sites that exhibit this problem when printing? (ie. reposting when printing...) If not, I suspect that this is only a SaveAs issue... Because the printing code deals with documents in a completely different way... -- rick
Attached patch patch for Save As... case (deleted) — Splinter Review
The attached patch implements Darin's suggestion for passing in the cache key to the request generated by a Save As... operation. The patch attempts to load out of the cache. If there is post data and the stream isn't in the cache, it brings up a dialog asking if the user wants to repost. The Save As... menu item works as expected with the patch. I saw a problem with the Save As... context menu item, but it seemed as if the JS in nsContextMenu.js was passing a bogus false value for doNotValidate. Still investigating, but the patch should be set to go in as-is pending r/sr's.
Rick, was that a recent change in printing? I know it used to go back to the network (for get requests anyway) in pre 0.8 days... I had gotten burnt a few times printing the results of some internal tools we use and getting hard copy of the input forms instead. with current trunk builds here's the matrix from a quick test I just did: (pre vidur's patch which I just mid-air colided with... building with it now.) POST GET --------------------------- print OK OK view source OK OK save as BAD! OK hmmm.... one in six... russian roulette anyone? (p.s. after four months, mention of the state attourney general last week seems to have finally solved the discusions with buy.com)
with that patch my test case is 100% OK.
Chris, how about "File -> Send page" (bug 86261)? I haven't checked it yet, but I am afraid it will add a bullet or two to your "Russian roulette".
POST GET --------------------------- print OK OK view source OK OK save as OK OK send page BAD! OK
http://ntiaotiant2.ntia.doc.gov/test/mozillatest is a much better test case from cweiss@iname.com on bug 55583. summary for those not on that bug: save as: "successfully" re-requests! (number incremented) view source: fails completely (server error, suspect GET instead of POST) send page: fails completely (server error, ditto) print: success :)
vidur/adam - any update to this PDT+ bug? Pls update status whiteboard. It seems from Chris Abbey's comments that the patch from Vidur works ok? If so, can this patch get r and sr and be checked in? It looks like the patch will fix Save As, but not Send Page. Perhaps that is a separate bug.
Latest patch looks good to me so r=adamlock@netscape.com. I think Send Page should be new bug.
Well, according to last Chris Abbey's test results, _all_ methods currently re-request, so there is still a need for some universal method of avoiding re-requesting. But once such method is implemented, making use of it becomes separate bugs - bug 55583 for "view source", bug 84106 for "save as" and bug 86261 for "send page". IMHO, what this all means is that the "save as" patch belongs to bug 84106 and not to this bug.
lchaing: my append on 6 July was with the trivial test case I attached, the append on 7 July however was with the more realistic testcase at the URL. (I'm not sure the difference, aside from a cfm script vs a php script, but it clearly demonstraits one). As a result I wouldn't hold much faith in the value of my append from 6 July if I were anyone.... ;) Aleksy: as rick pointed out, and the URL proves, print works fine... but then print is also a completely different beast in that it really doesn't *need* the original html, the dom is fine. Otherwise I agree with your take on this bug 100%.
The patch ensures that for Save As we first attempt to get the data from the cache and, only if that isn't possible (e.g. someone explicitly clears the cache), we put up a dialog asking if we should repost. The same pattern should be used for the View Source and Send Page cases as well, though the fixes for these two will require changing different code paths. I got a verbal sr=rpotts@netscape.com, so I'm inclined to get this into the trunk for some bake time.
Vidur, exactly, and this is why the patch belongs to bug 84106. What this bug is about is creating a mechanism that would ensure that if you are viewing a page, then you also have a cache takon that will ensure that the source will be in the cache if you need it. This patch is doing the right thing (hopefully) and is addressing a related issue (as covered by bug 84106), but it is not addressing this bug.
Vidur, does that patch also fix "Save Frame As..."? It also uses the savePage() function, but I'm not sure the history entry you're getting has the post data for a frame... (in fact, based on my testing it seems not to).
Also how about "View Frame Source"? I was having trouble with that not working correctly for POSTed data with 0.9.2 today, which is what reminded me.
"View Frame Source" is what _I'm_ working on (bug 55583). That was my 'testing'. If there's a way to get the postData and cacheKey associated with a document in a frame, I have yet to find it.... Hence my previous comment in this bug.
Not sure if this is pertinent but thought someone should check. There have been a number of comments posted about "postData" does this need to include/does include the cookie data at time of posting? Obviously if you were refetching the page from the network that would then be important, but is it important for re-obtaining the page from the cache? or any other location? For instance you go to the same page URL+Post data, but since the submission has already been posted once, now the source could change based on it being the second post. So then seemingly identical "posts" could be different based on cookies, or the current time, etc, but would both appear to be the same in the cache, does the second post overwrite the first cache? should it? Glad to see this bug finally getting some heavy attention....
docshell member mOSHE has the cachekey and postdata for the document. THis applies for subframes too.
Radha: is that member somehow scriptable? The problem is that we're working in JS here... Wiggins: The important thing is the combination of cacheKey + postData. The behavior on load is then as follows: if (cacheKey && postData) { // try to load from cache using cachekey. If that fails pop up a warning // dialog before requesting from the network } else if (cacheKey) { // load from cache but do not prompt before re-requesting } else { // just load }
mOSHE is a private member of nsDocShell, not available from JS.
is all or parts of this on the trunk or branch? what's left to do? can someone summarize and project an ETA?
Here's what needs to happen as far as I can tell: 1) An API is needed to get from a docshell (any docshell, not just the root of the docshell tree which has the session history attached to it) the information needed to recreate it (load from history or reload from the web with the same post data). At the moment this would be a cache key and a post data stream. These could be gotten as themselves, in the form of an nsISHEntry, or whatever. 2) A method needs to be added to the nsIWebNavigation interface that takes the information gotten in step #1 and loads a page using it. This could be similar to the current nsIDocShell::loadURI or something along those lines. Once that's done, this bug can probably be marked as fixed. The dependent bugs can then handle using these APIs to make save page, save frame, send page, view source, view frame source, and so on work correctly. Ccing Rick and Jud who've been talking about this a bit lately, since they'd be the ones who could provide an ETA (and corrections in case I've gotten something wrong).
What's up with this bug? No current status and time is almost gone!
At chofmann's request, I'm posting this summary of status. Currently, the following work correctly if there's post data associated with a page: - print page - print frame - save page The following still don't work correctly if there's post data (I believe they will always refetch with a HTTP GET): - save frame - view page source - view frame source - send page For NS6.1 RTM we have the following options: 1) Disable the latter set of operations for pages that have post data. To do this correctly, specifically to even be able to determine whether the document in an individual frame has post data associated with it, we still need some amount of infrastructure (as opposed to chrome JS) change. Our options include: a) Expose the currently private nsISHEntry member of a docshell. b) Expose a mechanism for getting information about post data at the nsIWebNavigation level for both top-level content windows and individual frames. Once one of these changes are made, we'd need to make JS changes to check for the existence of post data before completing the corresponding operation. 2) Come up with a complete fix. Boris Zbarsky's description of such a fix (or at least the infrastructure for it) is on the money. We'd have to: a) Expose a mechanism for getting the post data at the nsIWebNavigation level (as in 1b). b) Allow for the post data to be passed in as an optional parameter to nsIWebNavigation::LoadURI (bug 46870). c) Either expose the notion of a cache key in a similar manner as the post data stream or hide the cache key completely. The latter option would require us to create an internal mapping between URIs and cache keys (including URIs that are not "officially" in the cache, such as those that have post data). Once these changes are made, we'd also need to make JS changes to retrieve the post data and pass it into the load methods used for the different operations. 3) Live with the status quo Rick Potts is looking at the infrastructure changes for 1 and 2. He will first work on the infrastructure changes needed for disabling (probably option 1b/2a). He will then check in an existing patch for 2b (see bug 46870). The ETA for these changes is 7/18. At chofmann's request, I will open bugs for making the chrome JS changes necessary to disable the features for pages with post data. If Rick can't get the complete fix done by 7/18, chofmann will find owners for the chrome bugs.
By "parts working" I assume you mean there was a checkin on the branch. If so, I'm leaning toward option 3.
A few comments on what vidur wrote: > 1) Disable the latter set of operations for pages that have post data. This would be a branch-only thing, I presume? On the trunk we would keep working toward option #2? > The latter option would require us to create an internal mapping between URIs > and cache keys This may not work so well for view source... Consider the page at http://www.mozilla.org. To view its source we load the URL view-source:http://www.mozilla.org. The protocol handler for view-source: creates an HTTP channel for http://www.mozilla.org and loads it. _But_, we want to use the source from the original URL. So we want to take the cache key and post data corresponding to the http: url and load the view-source: URL with those. This does not mean that the cache key has to be exposed, it just means that there needs to be a way to load a url of the form view-source:whatever using the information for the "whatever" URL.
Assignee: adamlock → rpotts
>This would be a branch-only thing, I presume? yes > On the trunk we would keepworking toward option #2? yes The most troublesome remaing problem for most users would be "send page", I could see users completing an ecommerce transaction and then attempting to send the page to themselves as a confrimation of the transaction. bug over to rpotts since he is working the infrastructure changes that are now the focus of this bug.
In order to make this work correctly, 3 different API changes must be made: 1. Provide a 'postData' attribute on nsIWebNavigation to allow JS to access the postData. 2. nsIWebNavigation::LoadURI(...) needs an additional 'postData' argument. 3. The underlying necko 'cache key' needs to be hidden from non-networking APIs. Clearly, we can't make all of these changes on the branch :-) #3 alone requires a fair amount of work - and darin is currently gone :-( So, here's my propsed plan... ----------------------------- For the branch I can make API change #1 and expose a postData attribute to JS. This will be enough to allow all JS implementations to 'disable' if postData is present. Fortunately, this API change is *very* small and should not introduce regressions... it does not change any existing APIs and the only consumers will be the new JS code written to disable the features. For the trunk I can land API change #1 and #2 and start working on #3... Once all three changes have been made, we will be able to 'correctly' implement these features in JS... Does this sound reasonable to everyone? -- rick
sounds like a good approach to me.
Okay, I'm an outsider here, but I'm slightly worried about the #3 approach. Consider the following scenario. I'm assuming a page that simply provides an incrementing counter every time it's hit, but the same argument applies to almost any page that can give different results on successive hits with the same querystring and/or post data - anything that, for example, is time-sensitive (eg frequently updated news sites), relies on a backing database on the server which changes state, is content-negotiated (if the user changed their language prefs) or cookie based, etc etc etc. User opens page in window 1. Page is fetched from server, returning "1". Cache entry #1 is created, containing "1". User leaves this window open on this site and goes to work in another window. User later opens same page in window 2. Page is fetched from server again, returning "2". Cache entry #2 is created, containing "2". Cache entry #2 has the exact same URI, and also the exact same post data (if any) as #1. User goes back to window 1 and selects "View source". If the "View source" implementation only has the uri and post data to go on, it has no way to distinguish between cache entries #1 and #2, but they contain different things! Since #2 was created later it will probably get that, which is wrong. Surely a better way to go here would be to expose the cache key on the document directly (as an opaque pointer that you can't do anything with except load, of course) and use this for "view source", "send page" etc. Relying on the URI plus post data seems like a step backward. Am I missing something?
I have the same concern as Stuart. More specifically, hiding the cache key should still somehow allow loading url "view-source:foo" and getting the source of "foo" from the corresponding window from cache.
>Does this sound reasonable to everyone? After the long wait, sounds like music to the ears :-) Although the reasonning so far throughout the bug has been that the cache key is "key" to fixing the bug as sballard has illustrated.
A second URL fetch performed with identical post data will not retrieve the same cache entry unless a "cache key" or "cache token" (from the original channel) was provided to the HTTP channel. The original cache key can not be recreated given an identical URL and identical POST data. I'm not clear on why we want/need to hide the cache key.
One of the issues is that there is not a "single" cache key associated with the document - correect? there is a cache key per URI loaded for the document. This means that to be "correct" someone needs to hold onto *all* of the cache keys - including the ones for linked CSS and JS... Holding onto the cache key for the document URI will definately make things "better" but i don't believe that it is sufficient to be "correct". If these assumptions are correct, then I believe that someone has to hold onto *all* of the cache keys... And i'm not about to add an nsVoidArray argument (for them) to LoadURI(...) :-) I believe that we need to come up with a "correct" approach.
rick, when can you get: 1. Provide a 'postData' attribute on nsIWebNavigation to allow JS to access the postData. on the branch? lets make that happen and then we can take the pdt+ off the this bug. vidur, can you add the bug numbers for the open bugs for making the chrome JS changes as a reference in this bug? those also need to get pdt+
No issues with the branch which can have its quick fix -- but if it takes adding a nsVoidArray argument for the correct fix later on the trunk, then there are 59 votes and XXX persons cc:ed here for that :-)
Can any netscape people comment on how 4.x handled this? (I never saw 4.x get it wrong, although I can't say I tried very hard). A single cache key would work fine for view source, at least. Is there any way that the document itself can hold cache keys to the other things that were loaded by it? That way everything gets cleaned up through normal refcounting and we only need the single cache key to the document. Perhaps the extra VoidArray should be added to the CacheKey object?
stop it... you all are hurting me :-) with nsVoidArray example rick meant to say that this is a bad idea... we clearly need to work a better solution than having to keep all the cache-keys (or any) Lets stay focussed on resolving this for both short term and eventually with a better solution past this release. And rick: next time you suggest something like a nsVoidArray of cache keys (without sarcasm/kidding tags around it)... I am gonna come looking for you with my nerf gun :-)
gagan, how many smiley faces do i need? i always thought that one was enough :-) but perhaps when the cache is involved i need a few more :-) :-) :-) -- rick
once again i seem to have ignited a hot topic :-) My main concern with exposing a cache key via the nsIWebNavigation API is that i do not believe that it is sufficient to guarantee that *all* content will come out of the cache. I believe that we may need some other mechanism in order to "get it right" 100% of the time. So, rather than put the cache key there "for now" (and realistically never revisit the issue until another PDT+ bug turns up) i think that we should hold off and consider what the correct solution is - for the trunk. I'll try to attach a patch for the branch (really soon) which exposes a read-only attribute for the postData on nsIWebNavigation... At least this will allow us to *not* do the "wrong thing" :-) Once the branch work has been done, i'll start looking into what the "right thing" is for the trunk... So, Gagan, get your nerf gun ready ;-)
I've just attached a patch which adds a read-only attribute to nsIWebNavigation allowing access to the postData (if there is any...) So with this patch, given an nsIDocShell you can QI to nsIWebNavigation and check for postData. When dealing with frames it is important to get the nsIDocShell associated with the particular frame in order to insure that you are getting the right post data :-) -- rick
As far as that goes... we need to get frame docshells in response to context menu actions. At the moment when a context menu comes up we have the document involved (this.target.parentDocument) and the window involved (document.commandDispatcher.focusedWindow). Neither of these allows easy access to the corresponding docshell. One can walk the docshell tree and for each docshell compare nsIDocShell.document to this.target.parentDocument till a match is found (I have a function that does this lying around). Is there a better way?
from an nsIDOMDocument you can do the following to find the corrosponding nsDocShell: QI(nsIDOMDocument) -> nsIDocument nsIDocument->GetScriptGlobalObject(...) nsIScriptGlobalObject->GetDocShell(...) and you're there!!!
nsIDocument is not scriptable... I'm working in JS.
In terms of the question of how to fix it "right", my suggestion on 2000-12-28 was to make a separate cache manager instead of trying to adapt Necko to do it. This still seems cleaner to me; am I alone in thinking along these lines?
Stuart, as far as I remember 4x indeed did it right most of the time, but not for the "send page".
Blocks: 91341
Blocks: 91342
The bugs to disable view source and send page on the branch are 91341 and 91342.
No longer blocks: 91341, 91342
The PDT+ on this bug on the branch is *just* for the patch necessary to make bug 91341 and 91342 possible (we've just PDT+ed those bugs.) Can this be checked in today?
Could someone review and superreview Rick's branch only patch? thanks!
Rick, we really need a way to get the docshell of a document from js. Do you know of one?
r=bbaetz for the branch. Having the input stream implement nsIRandomAccessStore doesn't help js, because that interface isn't scriptable, and we need this for js. For the branch, blake only needs to know if the data is post data or not, so thats OK. It needs work for the trunk, and possibly a different solution.
The input stream is rewound at http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#4526 and any loads that use the nsIWebNavigation APIs should be passing through there... Code that creates channels directly instead of using the nsDocShell stuff will have a C++ part that can rewind the stream.
Blake super-reviewed Rick Pott's branch only change and also checked it into the branch. PDT shd remove PDT+ from this as it is no longer a stopper for the branch.
Whiteboard: [Hixie-P1] PDT+ → [Hixie-P1] PDT+ branch fix is checked in.
hey blake, jst and I talked about the best way to get an nsIWebNavigation from an nsIDOMWindow... and as you know, currently, there is no easy way. The best solution we came up with was to an nsIWebNavigation as one of the interfaces that you could get via GetInterface(...) this would require two small changes in nsGlobalWindow.cpp 1. Add nsIWebNavigation to the GetInterface(...) implementation. 2. Change the ClassInfo flags so nsIInterfaceRequestor was visible to JS via XPConnect. Let me know if you think that these changes are necessary for the branch... I think that we will definately want to do this on the trunk... since we need a way to map from an nsIDOMWindow to the embedding interfaces... -- rick
Rick, thanks for the info. Last I heard was that we could live with it being a problem in View Frame Source on the branch. Lisa, does PDT want to pursue fixing it for view frame source too? [removing PDT+; there's nothing that needs verification here anyways]
Whiteboard: [Hixie-P1] PDT+ branch fix is checked in. → [Hixie-P1] branch fix is checked in.
No way Netscape 4x did it right most of the time... How many times did I curse Navigator trying to fetch the document again in offline mode for a printing or for a view source? And what about when View Source provided the source of the error page instead of the displayed one? Maybe it got better with 4.75 or 4.76 but it was a very late effort.
Netscape 4.x doesn't do it right. I just tested 4.76, and it's very broken. I'm sure 4.78 is much the same. I've uploaded an attachment with the Perl code of the CGI that I used to test with; it sends back the current date/time, the method used to call the CGI, and a link to itself to trigger the GET method, and a submit button to itself to trigger the POST method, so both can be tested. If you wait at least 1 second between tests, you can tell whether or not the CGI was reloaded. Under 4.76 (Linux), here's what I found: The GET method will perform the "View Source", "Print", "Send Page" and "Edit Page" functions, but ALL of these will re-fetch the page and run the CGI again, since the content has expired from the cache. (If it's in the cache, it will probably use that for some or all of these functions, but that's not correct behavior either.) All functions using the GET method will use the page, but none of them use the copy being displayed in the window. The POST method is even worse. "Send Page" and "Edit Page" both reload the page with a GET request (not POST), running the CGI again. "Print" fails completely, sending nothing to the "lpr" process, which returns the error "lpr: stdin: empty input file". "View Soruce" will bring up a window, but the source displayed is that of the usual POST error message: <TITLE>Missing Post reply data</TITLE> <H1>Data Missing</H1> This document resulted from a POST operation and has expired from the cache. If you wish you can repost the form data to recreate the document by pressing the <b>reload</b> button. Basically, NONE of the functions in Netscape 4.x that should be using the original source are correctly doing so. When they manage to, it's only because the content happens to be in the cache, and it could expire from the cache or be replaced with a different version at any time. For content that expires right away, the ONLY thing that works on the original data is cut & paste with the mouse into another window. That's all. Netscape 4.x doesn't even come close to getting it right.
As rpotts explained earlier, I have attached a patch to nsGlobalWindow.cpp and nsDOMclassInfo.cpp that exposes nsIInterfaceRequestor to JS. I need this functionality for some other purposes too.
Radha, don't expose new interfaces on the window object in JS unless it's absolutely required, in this case it isn't. In stead, allow scripts to QI the global object to this interface when this interface needs to be accessed. This means that the window object in JS won't have a getInterface() method on it, in stead you haveto do window.QueryInterface(Components.interfaces.nsIInterfaceRequestor).getInterface(...), this way we can support new functionality w/o polluting the global namespace on web pages. I'll attach a patch that does this part, your change to nsGlobalWindow.cpp is fine, but don't check in the nsDOMClassInfo.cpp change in your patch. With my patch (once tested n' all that) you'll have sr=jst
Re: jst's patch, I'm assuming that the new inclusion of nsIXPCScriptable::DONT_ENUM_QUERYINTERFACE in the scriptable flags for Nodes, Arrays, etc. is not going to break anyone. sr=vidur
r=rpotts on jst's patch to nsGlobalWindow to expose nsIInterfaceRequestor via an explicit QI(...) and r=rpotts on redha's patch to add nsIWebNavigation to getInterface(...) once the DOM_CLASSINFO_MAP_ENTRY(nsIInterfaceRequestor) is removed...
Vidur, the scriptable flags remain unchanged for Nodes, Arrays etc. (they already have nsIXPCScriptable::DONT_ENUM_QUERYINTERFACE set), only the flags for Window are changed.
The latest patch consolidates the previous 2 patches (patch ids 43271, 43347) and a patch given by jband privately for a assertion problem. This patch securely exposes nsGlobalwindow interfaces to JS. will be checkin this to the trunk. Reviews were given to individual patchlets.
I don't quite see how printing comes into this. Dosn't print work on the (currently displayed) DOM? IMHO it should, printing should be wysiwyg.
*** Bug 93157 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla0.9.3 → mozilla0.9.4
*** Bug 93890 has been marked as a duplicate of this bug. ***
*** Bug 94417 has been marked as a duplicate of this bug. ***
What's the latest on this? They want me to disable edit page/send page/view source again for pages with postdata on the trunk unless this gets fixed the proper way.
Depends on: 94205
Moving to nsenterprise+, adding nsBranch. I'm assuming the 0.9.2 branch fix hasn't been checked into the trunk as it wasn't the "complete, right fix."
Keywords: nsBranchnsbranch
not happening for 0.9.4. some underlying work Rick is doing will get us closer to this, but we're not there yet.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Blocks: 99194
Checkpoint for nsbranch . . . Judson - Are we there yet on this one?
No longer blocks: 99194
Minusing this one per my email exchange with Judson. This looks like a good one to get, but it is not gonna be fixed until later in TM0.9.5.
Keywords: nsbranchnsbranch-
marking nsenterprise-.
*** Bug 96159 has been marked as a duplicate of this bug. ***
-> 0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Blocks: 104166
Blocks: 107067
Keywords: nsbranch-
Whiteboard: [Hixie-P1] branch fix is checked in. → [Hixie-P1] partial fix is checked into 0.9.2 branch
->0.9.7 bug #94205 will be landing soon - hope :-)
Target Milestone: mozilla0.9.6 → mozilla0.9.7
*** Bug 106931 has been marked as a duplicate of this bug. ***
Is there something corrupt in Comment #158 (from 7-20-01) that causes it not to wrap in the Mozilla 0.9.6 window? I don't know if this is a regression in Mozilla (I doubt it) or something getting messed up in the buzilla database. Also, I apologize to anyone who thinks this is trivial; it is, but I'm hoping it's easily fixed.
that was a bugzilla bug that was recently fixed i believe.
OK. To summarize the current situation and get the ball rolling again.... At the moment we can: 1) Get the right docshell 2) Get the postdata associated with the page 3) Call loadURI with this info (bug 94025 has been fixed). This is not sufficient for doing view source completely correctly, since there could be multiple entries in the cache all with the same URI and same postdata but different content. It _is_ sufficient for fixing bug 64100. Rick said: > there is a cache key per URI loaded for the document. This means that to be > "correct" someone needs to hold onto *all* of the cache keys - including the > ones for linked CSS and JS... This is true in general. In reality, however, we're dealing with the following operations: A) save/save as. This uses nsIWebBrowserPersist now anyway, so I am not sure this discussion applies anymore... At the very least, this should be spun off into a separate bug on nsIWebBrowserPersist if it's still a problem. (bug 84106 should be retested). Using the PERSIST_FLAGS_FROM_CACHE flag should have the desired effect here. B) Send page. Is this going to send all the linked materials together with the page? (I sicerely hope this is not the case; munging urls in the source to point them to the online content seems like a better solution, a la nsIWebBrowserPersist). If it is _not_ sending the linked stuff, then we don't need a "whole page" solution. C) View (frame) source. This could not care less about things linked from the page/frame. All it wants is the source for the page/frame itself as it came off the wire. D) Changing character set. This could actually benefit from the more gneral approach. This seems to cover all the things mentioned in the bugs this blocks. I propose: I) Bug 64100 should get reopened and fixed (this is not very difficult to do now, with the infrastructure we have in place). II) Rick and company should decide whether we actually need a generic api for this (especially since intl seems to be opposed to the concept of fixing item D). I'm getting very tempted to move view source into C++, from whence I could just call nsIDocShell::LoadURI directly and not have all these problems. The current apis would allow this to work just fine.
*** Bug 118487 has been marked as a duplicate of this bug. ***
What about the milestone setting? 0.9.7 is long gone, are we going to have this for 0.9.9 or 1.0??
*** Bug 55583 has been marked as a duplicate of this bug. ***
cc: self
I noted an interesting thing with the current implementation of view source. It does not send cookies to the server? I use cookies for session control. If you try to access the page without cookie you get redirected to login page. When I tried view source I got a blank page? This should only be the case when you open without the session cookie? I dont know if this is relevant or has any impact but I did not se any reference to this situation. I do not have an test case at this moment but I can supply one if the need should arise.
Technically View Source should not be sending anything as is its dependency on this bug. View Source should only show what was received by Mozilla from the server. It should not cause a second request.
This might be redundant information, but I just can't pull myself to read all the post. But it seem to me Internet explorer make heavy use of disk cache. Efficient at it even. When you view source and look at the file, its not a file in meory, but rather a a physical copy. So can't we just stream the source to memory AND disk when we loading a page, and view source just open the file at that particular time. Correct me if I'm wrong, but Internet explorer's view source doesn't apply to dynamically generated content. (So its not fetched from memory, but rather disk) Someone know how opera handle this?
Mike: it is true that IE like mozilla will reuse content from the [disk or memory] cache when it is available. but, unlike mozilla, IE will generate HTML when the [disk or memory] cache does not contain the requested document. This bug is mostly solved already. Mozilla now puts every downloaded document into the cache, regardless of cache control headers (the headers are honored when fetching the documents from the cache for normal page loads). This does NOT guarantee that the document will exist in the cache when the user views or saves the source of a document in a random browser window, but it does pretty much guarantee that it'll work just after the document is loaded. NOTE: i'm only talking about HTTP and HTTPS, not FILE or FTP. though, i think FTP could be modified to use the cache for downloaded material in a similar fashion.
Is http://bugzilla.mozilla.org/show_bug.cgi?id=120809 related? An image gets refeched from the server every time, but definitely is in the cache.
Blocks: 120809
darin: If I go to a page that is generated on the fly (e.g. slashdot), open it in two windows (thus causing each window to have different content) and then save both of these windows, will the saved copies be different?
you will save the most recently fetched version of slashdot. is this a problem we want to solve? if so, then it means pinning documents in the cache, which sort of goes against the idea of having a fixed sized cache. i'm not saying it's impossible... indeed, the cache provides facilities for pinning an entry in the cache, but it certainly is not without its tradeoffs.
My opinion is that "yes, it is a problem we want to solve". I've got some CGI scripts which generate oodles completely different types of content under various circumstances. While debugging, I'll often look at the sources of different outputs (using some non-Mozilla browser). perhaps Yet Another Pref? "Save raw HTML source in cache to avoid new requests when using the view-source command." Turned off by default, easy for web developer types to turn on. Only folks who want the feature have to deal with the tradeoff. -matt
Yes, it is the problem we want to solve, but No, it should not be a pref. This is what you would expect to happen, that the page you look at gets saved. If we don't pin it there is no way to be sure that that happens. IMHO we also want to pin it for at least three other things; view-source, send- page and history. Pinning the htmlsource should not be that expensive, how many html-pages does a "defaultinstall doing normal browsing" contain approx? I have 171 files in my cache, though i guess some of them are images, i would happily waste 10-20 of that on pinned down pages
I might be naive, but when I say "save page as" or "send page", I mean "do it with the stuff I am seeing on my screen right now", and it is very disturbing for me to see a refetch when I do it. Many pages are relatively small and static, most servers are very responsive and many people have good connections, but this is not always the case, so IMHO, treating a refetch as a Very Expensive Operation and doing it only when absolutely necessary or when explicitly asked for is The Right Thing.
As I understand it, we _do_ want to avoid having another refresh. Unfortunately, that _doesn't_ necessarily mean "the contents on your screen right now." Why? Because you can change the stuff on your screen. Fill in forms, execute javascript, whatever. Things of that nature should, IMHO, not be reflected in "view source," "save page as" or "send page". But what about changing the character set? (that's bug 17889 btw) Changing the charset shouldn't cause a page to reset. If we do want to support a save/send page option which maintains the current state of the page, it should be considered a different feature and not interfere with the current effort to implement the current features.
I don't think that saving the page EXACTLY like it looks now is what anyone wants. The ideal situation would be to save the HTML document to the cache as the server gave it to you, and then also save all the other files the server sent over that the document needs to render correctly, such as StyleSheets, images, Java and other ebedded things (Flash, etc) That way, whenever you want to either see the source, or save the entire page (HTML+dependencies) etc it is all already there, moz would just need to copy it elsewhere. The only reason that moz needs to refetch anything from the server is if the user hits Reload or the cache TTL expires. Or if the meta refresh is set, of course.
-- #197 From Darin Fisher 2002-01-27 16:14 said: ------- > you will save the most recently fetched version of slashdot. is this a problem > we want to solve? YES. View Source, Save As, Print -- they should ALL act on the presently viewed page. In the case of view source, we should always view the html for the page that is displayed in the window. If this means that I have 10 browser windows open, all which requested some http://something/time.cgi URL (which outputs the time of day in html) at slightly different times, I should get 10 different view sources. Always. This is vital to being able to develop content for Mozilla, because all major server-side development these days is dynamic, and it is *categorically impossible* to test or debug dynamic content with Mozilla if it does a refetch when the user hits "view source." The same goes for Save As and Print from a user's perspective. If I just hit the "submit" button to place an order, I don't want that posted again if I want to print the invoice page or save it to disk (which would place another order). And if I choose to place two different orders to the same site in two different windows (sequentially), and then save/print each invoice after each has been placed, it should "do the right thing" and save/print two different invoices. ------- #201 From Nick Lewycky 2002-01-27 18:14 said: ------- > As I understand it, we _do_ want to avoid having another refresh. > Unfortunately, > that _doesn't_ necessarily mean "the contents on your screen right now." > > Why? Because you can change the stuff on your screen. Fill in forms, execute > javascript, whatever. Things of that nature should, IMHO, not be reflected in > "view source," "save page as" or "send page". But what about changing the > character set? (that's bug 17889 btw) Changing the charset shouldn't cause a > page to reset. Filling in forms doesn't change the source html. Executing javascript, in most cases, doesn't change the source html either (document.write() is the only exception I can think of, and I believe that is a microsftism), so the browser should just show the original html as sent by the server for save/view source. But this does bring up an interesting print problem... Printing should, IMO, show the current form contents, as well as any dhtml changes via javascript, etc. That information is not contained in the source html.
From previous discussion, I understood that Print worked with the DOM representation of the page anyway. So it's probably Ok already. Who wants to test printing a filled form?
Hey, what's wrong with the formatting of my previous comment? Is it possible to edit a comment after posting it?
> Print Not relevant. Printing prints the DOM, not the source. So you always print exactly what you see. All the concrete examples people have given so far of cases where they would want to see separate source for the same url involve post data that was sent to the URL. In those cases, we _do_ cache each response separately.
The view source problem isn't limited to post data sent to the URL. A "timeofday.cgi" is a good example of a non-post request sent to a URL which returns dynamic content, and the "view source" should be different for each browser window/history/tab. This isn't a contrived example. JSPs work this way as well. As do ASPs. As do PHPs. etc. URL accesses may cause the server to generate and send dynamic content /without/ sending state information via (get/post/cookies/URLrewrite) which must be cached for view source.
(I got a midair collision with somebody saying the same thing as me; this is in response to comment #206) Not at all; it applies to *any* site whose content might change over time. If I visit slashdot.org at lunchtime, leave the window open, do some other browsing in other windows, including visiting slashdot.org again later with new stories, then go to view source of the original page, I expect to get the source of that page, not the source of slashdot.org at some arbitrary later time that I happened to view it. From a user's perspective, this is basic functionality. If the page is sitting right there in a window, why should it be impossible to view its source? I do understand the implementation reasons why it's hard, but it's also hard to implement a functional CSS engine and HTML4 viewer - that's not a reason to choose not to do it, or we'd still be using MozillaClassic :) It should *never* be possible to have a page in a window and not be able to view source, or send or save the page, and get any result other than the exact content of the window that is on your screen. Even if the page specifies no-store. That's what users expect; it should be what we do. No exceptions.
i think one point needs to be made clear: it is trivial (modulo some API issues) to fix mozilla to not hit the server for view->source and file->saveas nearly 99% of the time (including POST results). it is far more complex to ensure that this is correct 100% of the time, and it is also far more complex to ensure that view->source and file->saveas correspond to the page being viewed. of course, i understand why solving this is important, and i would love to see it solved. i also think that we have to be careful when we go to solve this problem, since there is a bloat factor to consider... even if it wouldn't be an issue for the majority of users, we still need to think about people with systems "circa" 266 Mhz. moreover, some users disable their disk cache. we need a solution that works (or can be disabled) when there is no disk cache.
Blocks: 118487
Another thing I wanted to mention is "back" (the leftmost button in the toolbar). moz treats "back" as a _refetch_ (unless it's an anchor in this page). IMO this is wrong: "back" means "give me what I already saw", not "refetch". Again, treating a refetch as a Very Expensive Operation and doing it only when absolutely necessary or when explicitly asked for is The Right Thing.
sam: back currently loads the page from the cache. it is true that if that page is dynamically generated and has been more recently visited and hence modified that you will not see what you originally saw. fixing this would be difficult and probably should not be done as it would result in some serious bloat, especially for SSL pages which cannot be cached on disk for security reasons.
Does the cache not have a (timestamp visited), (timestamp last modified) for the entry?
yes it does, but how would that help this situation? sure it would allow you to determine that the page in the cache is not the same page as you previously visited, but then what? you'd still be without the original page, so why not just show what you've got in the cache.
What your looking as it the need to cache everything but (invalidate/show/hide) the cache on specific requests (or specific components.) It's agreed that Print, Save As, View Source and some other commands do not need to reload the URL from the network. One option to alleviate this, is to cache all requests in both digested and raw formats regardless, but make them visible only to specific services (Print, Save As, View Source, etc.) The "stealth cached" items expire and are replaced/expired as a normally cached item (perhaps with a rule always cache). By using this method, you can have one efficient cache for the system. * Cache everything * On cache request, determine visibility to decide if you should return item is not in cache (even if it is) and must be reloaded from the network. ** If do not cache pref is set, then the item isn’t visible in the cache (to the browser) and must be refetched. ** If Print, Save As, View Source, all cache is visible always. This would involve some checking between components to see who is authorized to view the stealth cache, but I think it would improve performance in addition to eliminating double posts and gets. These are just suggestions and probably nothing new, but they might help in the discussion for a direction to go...
thomas: most of what you've described, mozilla already does (modulo some bugs). it achieves the effect using load flags. there are three principle load flags: LOAD_NORMAL, LOAD_FROM_CACHE, LOAD_BYPASS_CACHE. clicking on a link or entering an address in the URL bar sets the LOAD_NORMAL flag. file->saveas, view->source, file->print set LOAD_FROM_CACHE, and shift-reload sets LOAD_BYPASS_CACHE. i should really say that this is how mozilla is designed... there are bugs and mozilla may not set the load flags correctly in all cases. what mozilla does not do is "pin" entries in the cache. this means that a page you are viewing could be evicted from the cache if you spend sufficient time browsing in another window. as a result, file->saveas, etc. may require a server hit.
if(cached(current_page)) { &nbsp;&nbsp;&nbsp;display_source(cache(current_page)); } else { &nbsp;&nbsp;&nbsp;echo "Sorry, the page in question was not found in the cache, click here to re-retrieve this page"; } If the a refetch is required, the user should be well aware of it (and mozilla should ask permission), period. Else I assume that's the source of the current page. This is just my take/solution on the whole thing, and what I expect should be done, obviously the real code will probably be quite a bit more complex than my lame psuedo code. :) Is it really that unacceptable to say "sorry I couldn't view the source, here's why:"?
steve: you have described exactly what mozilla already tries to do (again, modulo some bugs).
i think that a seperate bug should be filed to lock a window's history in the disk cache, but i feel that full source versions of pages should not be stored in memory. if someone decides to turn off disk cache it should be acceptable to require a refetch from the server on view source. what people are attempting to add into this bug would be best implemented by changing the cache system--anything else is just a hack. i don't think anyone wants to see the cache system extended this close to 1.0. file a bug and dep it on this one. i see wins here if the cache is modified to provide for a true history, especialy since the cache's traditional role is all but obsolite due to dynamic pages and fast connections. this really should be discussed in a news group.
darin@netscape.com 2002-01-30 11:54 wrote (#215): > what mozilla does not do is "pin" entries in the > cache. this means that a page > you are viewing could be evicted from the cache if > you spend sufficient time > browsing in another window. as a result, > file->saveas, etc. may require a > server hit. The only problem with this, and I don't mean to harp on this point if it is already abundantly clear, is that Mozilla's DOM is effectively a "memory cache", and it does get out of sync with the real cache. If I can see the page in the mozilla window, it is only reasonable to expect that I can view its source, save it to disk, or send it in email, WITHOUT requiring the browser to get it from the server again. My humble picks from the best solutions yet suggested: 1) Sync the caches. Sync the caches. Sync the caches. Either (a) purge DOM from memory when a page is purged from disk, or (b) pin entries in the cache. (a) is the same fix as popping up a "page is gone" requestor, but doesn't leave the user scratching his head wondering why in the world Netscape's engineers can't save the page to disk if they can render it right there on the screen, because it will no longer render to screen. If it's gone from the cache, it can't be rendered without a request. or 2) add yet another cache that mirrors the DOM, saving the pure HTML. Better yet, attach it somewhere as part of the DOM. This eliminates the sync problem because if Mozilla every goes to the cache, it will reconstruct the DOM from (and with) the html. IMO, 1 is the best "fix it now!" solution and 2 is the best "real" fix, even though it is either ugly or less memory efficient. If I'm speaking out of my ****, forgive me. Trying to help.
> purge DOM from memory when a page is purged from disk You mean blank out the page the user is currently viewing? The only place the DOM is "cached" is in the actual view the user has. As soon as you view something else the DOM is destroyed.... > If it's gone from the cache, it can't be rendered without a request. Correct. The if you hit "back" to go back to a page that's gone from the cache you will get a request (and an alert). The problem is when the source has already been rendered.
OK. And then we can come to conclusion that the only correct solution is to hold a direct cache key to pinned cache entry for every dom which is held in mozilla. But this was already suggested. If i try to get back in history to other page and it's cached I load it and then I get a new direct cache key. So there is no need to mess up with direct cache keys and history. Certainly this behaviour is not perfect but it can't be without other tradeoffs. Note that there are other bugs where some people want to have cached the entire DOM for purpose of page history. Then in this DOM history should be cached the cache keys for sources.
Sorry for the unconstructive spam but... If the only way to fix this is to modify the cache to allow "pinning" pages in the cache, we've had over 1.5 years to realize this and so I *do* think Mozilla 1.0 should wait on it. Also, at this point there have been SO MANY examples and counter-examples that I don't remember what works and what doesn't. The basic idea seems to be: the original HTML for whatever is displayed (in a Mozilla window) should be accessible to various functions (Print, Save As, View Source, etc.) without a refetch. Is this not correct, i.e. do these functions already work as I just described?
It seems that we are down to 2 areas of misbehavior: 1. GET requests that have been evicted from the cache will hit the server *without* asking the user first. 2. POST requests do not work. To address these problems i'll expose some kind of nsIWebPageDescriptor interface that allows access to the following information: a) URI b) PostData c) Cache key There will also be a LoadURI(nsIWebPageDescriptor) -- somewhere... Additionally, i'll make sure that if the 'load-from-cache-only' load flag is set on the view-source channel (which it should be) then the user is prompted BEFORE a request is made to the server (in the case where the page has been evicted from the cache...) This is basically what boris proposed way back when in comment #186 ;-) -- rick
If the majority of dynamic pages contain a "no-cache" meta tag, does the "cache-pinning" solution mean that we're back to square 1 for developers wishing to see their outputted page source, and for anyone wishing to save-as or send a dynamic page (e.g., an invoice)?
Question: "Pinning" refers to preventing the expiration from cache of any element currently displayed in a browser window? If not, what exactly does it mean?
Al: 'no-cache' documents are still put into the cache. they are there for the purposes of view->source, etc. normal page loads of a 'no-cache' document will bypass the cache.
How about no-store documents? Is there any way to ensure that those get pinned, say, only in the memory cache - so that they can still have view-source done on them, but will never ever enter the disk cache?
stuart: yes, even 'no-store' documents are cached (in the memory cache only) for the purposes of view->source, etc. this was only recently changed to be this way... previously 'no-store' documents were not cached.
*** Bug 120457 has been marked as a duplicate of this bug. ***
darin: Would it be possible to associate a particular cache entry with each document currently on the screen, and (through the magic of reference counting or something along those lines) guarentee that that cache entry not be overwritten so long as the relevant page is open? That would mean that the cache could end up containing three or four copies of a page with the same URI, so that view source would work (you would use these cache entries to load the page in view source or save as); as soon as those windows got closed, the entries in the cache would be deleted (or marked as out of date or whatever). In the real world this would not cause much bloat (how often do you visit the same page in multiple windows?) but it would solve the view-source-of-the-same-page-multiple- times problem.
hixie: the cache already provides support for exactly what you're describing. we'd just need to hook up docshell and friends to utilize it. there are issues, of course, that need to be resolved... consider the hypothetical example of an ever increasing number of windows (eg. you've just visited an evil website)... at what point does the cache refuse to hold items in the cache? does it ever refuse? does it honor or ignore its size limits? it seems to me like we need a solution that allows the cache to honor its size limits, otherwise we run the risk of serious meltdown resulting from a malicious website.
Also, the cache would only hold multiple entries for the same URI if HTTP thought they resulted in different documents. Otherwise, the reference held by each window points to the same cache entry.
I'd venture to suggest that if we refuse to hold the source in the cache, we should refuse to open the window ( / display the page), period. My suggestion would be to allow the cache to expand a limited amount over it's official maximum size for this purpose (eg 10% more, with a lower limit to handle the case where the cache is set to zero) and if loading a page would break this rule, refuse to load the page. The really difficult case would be what to do with a huge (bigger-than-cache) file being displayed in a pre-existing window. I'm not sure what we should do there - either truncate the page, or make an exception to the cache-size rules for a single page that's bigger than the cache for the duration of time that it's being displayed in a window. Either way, I don't think we should *ever* break the contract that if we can display something in a window, we can provide it's source on demand exactly as it was received from the server.
Darin: A malicious website could exhaust the system memory just by launching a ton of windows, but the problem would be escalated by the additional cache footprint if pinned entries in the cached are never released as well. I think the overhead of the windows would have more of an impact. FWIW, the cache should honor its constraints. I don't think the cache should ever start refusing to add entries, but it definitely should start expiring entries even pinned entries when the situation you described exists. Could there be a warning or a notice to tell a user that they may want to increase these options if the pinned cache size gets to close to the limits?
Darin: A malicious website could exhaust the system memory just by launching a ton of windows, but the problem would be escalated by the additional cache footprint if pinned entries in the cached are never released as well. I think the overhead of the windows would have more of an impact. FWIW, the cache should honor its constraints. I don't think the cache should ever start refusing to add entries, but it definitely should start expiring entries even pinned entries when the situation you described exists. Could there be a warning or a notice to tell a user that they may want to increase these options if the pinned cache size gets to close to the limits?
thomas: and if we're talking about the memory cache? HTTPS only goes into the memory cache... also, what if the user has disabled their disk cache? then, currently, only the memory cache is used. meltdown is more serious and more common if there are not hard limits on the size of the memory cache.
> My suggestion would be to allow the cache to expand a limited amount over it's > official maximum size for this purpose Currently-open document source (not DOM) should NOT count against cache. It's a functionality issue, not performance. If it's not available, Save and view source BREAK. Not performace "oh it's slow" break... dataloss "#!&%$" break. Save and view source should work perfectly if the user has 0 mem and 0 disk cache. Think lower-end machines with a network wide cache on the LAN. Why waste seperate resources on the local machine. But certainly those users should still expect save and view source (etc) to work. Or think power users that know HTTP caching. Even they won't understand what a DOM cache is, and why what they see they can't save.
> we should refuse to open the window ( / display the page) Let's not lose sight of things here. This is a browser. Its primary purpose is to display pages. All else is secondary and, ultimately, unnecessary. We should display the page no matter what. If we can't show the source for a particular page we are displaying we should just say so ("The page has expired, ....").
I wouldn't go so far as to say non-browsing features are "unnecessary". After performing some sort of financial transaction via the web, saving the resulting confirmation/tracking page can certainly be considered necessary.
Keywords: topembed
There has been a lot of discussion here about how to handle this bug. I am not sure that this is the appropriate place for discussion, but since everyone else is doing it, I guess I will too. :) The end users expect to be able to be able to save, print, or view source of any web page that they are looking at. Any other behavior should be considered a flaw. In order to accomplish this, we need to be able access the source for every document currently in any open window or tab. The consensus seems to be that the fix is to pin the source for each currently open document in either memory cache or disk cache. The debate seems to be about how to handle the various scenarios presented by the fact that memory cache and disk cache are both limited resources and too many open documents being pinned can potentially cause Mozilla to deplete one or both of those resources. To address the potential issues, I propose the following (*see footnote) : A) If disk cache is disabled : 1) The source for all documents currently in an active window or tab is pinned in memory cache until memory cache is exhausted. 2) When memory cache is exhausted, prompt the user that memory cache is running out and give 2 choices: a) close some open windows or tabs in order to free memory cache, b) enable disk cache. B) If disk cache is enabled : 1) Store and pin open https document source in memory cache. 2) Store and pin open non-https document source to disk cache. 3) If memory cache is exhausted by pinned https document source, prompt the user that memory cache is running low and give 2 choices: a) close some open windows or tabs to free memory cache, b) break security and enable temporaily enable disk cache for https document source. 4) If the specified disk cache limit is reached, prompt the user that the disk cache limit has been reached and give 3 choices: a) closing some of the open windows or tabs, b) temporarily allow the disk cache limit to be exceeded, c) increase disk cache limit. C) Alternative : 1) Automatically store and pin non-https document source to disk even if disk cache is disabled or the disk cache limit has been reached. 2) If disk cache is disabled, these document source files can be deleted from disk as soon as the document is no longer in an open window. 3)If disk cache is enabled, these document source files just become unpinned cache files and are handled accordingly by the disk cache. This alternative method of addressing the issue can be a preference settting so that users can avoid almost all of the prompts described in the previous method (the exception being when memory cache becomes full of open https source). The only real danger of this alternative method is in the case of the malicious website that opens an infinite number of windows and thus fills the user's hard drive with pinned source files, but we can just file a different bug for that, right? :D (*Note: I am by no means claiming that this solution is all my own original idea. Obviously others have previously presented many of the aspects of this solution; I am only trying to summarize and organize it all into one concise description of a solution. If anyone can think of a scenario that I have not addressed, please let me know --preferably via email.)
I'm going to put my two cents in. I'm not a mozilla developer but instead an end user. I develop web sites and thus have an understanding of what I expect of mozilla based on this. I don't understand a lot of the technical debate regarding this bug and I don't have time too. What I do know it what I expect the browser to do. Maybe by presenting my expectations you guys can figure out how to 'make it so' When I view the source of a web page, I expect the browser to show the source for that page. If I've got two or three different 'versions' of the page open, then the source for each version should be correct for that version. So if I'm working on a CGI script and I make a change to the script (but haven't refreshed the browser window yet) I expect to see the original scripts results, not the changes I've just made. NOTE: If I want an up to date view of the latest source I will refresh the page before viewing the source. If I haven't reloaded the page, then I want to see the source for the un-reloaded page, not the new source. The same goes for printing, or any other aspect of the page. As I said, I don't understand a lot of the technical aspects of caching and what-not. However I do expect the browser to act in a predictable way. This means that when i view source, or print, I get results based on the page as it is, not as it may be now. this means that if I have numerous versions of a page in various windows, and I have made changes to the underlying file, then each page should be different and reflect this in the source, etc. Hope this helps ;-]
Actually, now that I think about it a little more, this is what I think: The "cached" copies of documents that are *currently* being displayed in currently open browser windows (or tabs) should *not* be counted towards the total size of the cache, whether it be memory or disk. In other words, the maximum possible size of the cache should be the maximum size specified by the user *plus* the size of all documents currently open in browser windows. As soon as a window is closed, or the user browses to a different document (causing a document to no longer be displayed in an open window), its size should start counting towards the total cache size; at this point, therefore, older entries should be evicted in order to bring the cache down within its size limits. This works out right if the cache is disabled, too: then the default cache size is zero, so as soon as a document is no longer being viewed, it will become the oldest document in the cache and be immediately evicted. At first glance this seems like weird behavior (or at least it seemed weird to me) due to the timing of when things would be evicted. But actually it's exactly the behavior you would expect if you treat the "currently-viewed cache" as completely distinct from the regular cache. As soon as a page is no longer currently-viewed, it moves into the regular cache, and that has the same effect as what happens today when a page is put into the regular cache.
> should *not* be counted towards the total size of the cache The problem with this is that users limit cache sizes for a reason. I have a limited quota and I limit my cache such that cache will not push me over quota. If we go over that limit we better be doing it in some temp directory somewhere and not in the profile....
FWIW, I agree that currently displayed elements should not count towards the cache limits. As far as most users are concerned, a cache is where *done* items are stored for quick retrieval. I also think that the common user would *expect* a browser to fail gracefully and refuse to open new windows when, as they see it, "their system's memory has been taxed to the limit." In other words, don't be afraid to present the user with a refusal to perform. They are more likely to blame it on their computer, its lack of memory, or slow processor, :-| than on mozilla itself. (And, imho, rightfully so! It isn't mozilla's responsibility to make sure there are enough resources to open yet another item. That duty belongs to the user or, more to the point, the user's pocketbook.)
> I have a limited quota and I limit my cache such that This is nothing to do with disk quota. What I (and about 3 other people now) propose is keeping it in core. If it uses the normal memory cache mechanism, fine. If it's seperate, whatever. Once the user leaves the page then Mozilla will decide whether to keep it in memory or disk cache, or neither. Users expect the more pages the have actively open, the more memory usage will rise. This is how any other app works, cache or no cache.
just to bring this discussion back on track > consider the hypothetical example of an > ever increasing number of windows (eg. you've just visited an evil website)... i just tried this out to see what currently happens in this situation, mozilla crashed with 146 pages open (using 100% of cpu and all ram - 384meg). so basicaly mozilla is going to crash or the system is going to become <b>completely unusable long before the size of total pages open exceeds size of cache</b>. it would take 500 - 1000 pages to exceed the default disk cache size of 50000k so this should never become an issue. so all that needs to be done is to > hook up docshell and friends to utilize it.
So I think that (per benm comments) we shouldn't try to put resource control into cache management routines (resources in the global meaning, that is, utilization of machine's disk and CPU resources by Mozilla), and in this bug not worry about this again. A separate bug should be filed about implementing in the future some sort of global resource control in Mozilla that would prevent /Java/Javascript/Flash plugin/ any other plugin/ from executing DOS attacks that lock up client machine (eg. by JavaScript opening too many windows, or injecting too much HTML code into the document). This feature would impose dynamic limits on memory allocation, cpu utilization, number of opened windows, etc. The limits would be dependent on amount of recources available on the particular machine, would be calculated dynamically and the user would be notified gracefully if the limits were exceeded. This issue is _separate_. Please someone qualified (preferably someone who knows a proper component owner for that :-)) file a separate bug. The limits on cache size should only pertain to content that isn't displayed in any window (that is, to content that is not pinned). This may be difficult, but a browser isn't made in 6 days, especially one that is implemented correctly :-)
benm, olo: we should not try to design a system that makes this worse. i don't understand why current problems w/ many windows is justification for not solving this bug correctly. we should care about enforcing cache limits... the cache limits are a user preference. you may argue that they are not your preference, but then that is yet another preference. IMO we should strive to come up with a good compromise solution to this problem. we shouldn't race to solve this bug by pushing out any concern of cache limits.
darin: those of us proposing "ignoring" cache limits for currently-visible files are not proposing that the cache limits be weakened, but rather that the *meaning* of "cache limits" be redefined. When I wear my "developer" hat, I understand the reason why you might want to consider a currently-being-viewed page as part of the cache. But when I put on my "user" hat, I don't think of it as part of the cache at all: the cache is an *optional* feature, which can be disabled without hurting browsing. It stores *recently accessed* pages, for ready availability if I should view the same page *again*. From a user standpoint, even if not from an implementation one, this is significantly different than storing the *currently visible* page for ready access if I need to *do something else with that page, such as see its source*. So my position is that as a user, I would not expect the cache limits to include a limit on the number of pages I can keep open at once without losing functionality like "view source". When I, as a user, set a cache limit, I'm making a tradeoff of memory/disk versus performance, not memory/disk versus incorrectness. The difficulty comes when the disk cache comes into play, and the user specifies a location for that disk cache. By my theory, "currently viewed pages" should not be stored in that location because they do not count towards disk cache size, and the user might limit their cache size based on a known amount of free space in the cache location. On the other hand, it is vital that moving items from currently-viewed to cached (and back, if necessary) be an efficient operation, which may not be the case between two different cache locations, especially if they are on separate drives. The only easy solution I can think of would be to ensure that currently viewed documents only get stored in memory, not disk. But that adds to bloat. I guess the UI could give a disclaimer that the cache directory will also be used to store currently viewed pages and therefore will take more space than the limit you give it?
sballard: i agree with your conclusion... clearly it is bad to store stuff in the disk cache w/o counting toward the total size of the disk cache. and clearly storing in-use pages only in memory is bad too because of the bloat that would incur. and, i too agree that it would be very nice from the point of view of the user/webdeveloper to be able to count on view->source and file->saveas always giving me what i am looking at. i'm not really arguing against that as a goal. instead, i'm just saying that we need to weigh the cost of doing so, and consider that it might end up breaking down at some point. that there might need to be some threshold at which we simply cannot guarantee that the source for an in-use page is accessible on the local system. we should be able to set that threshold high enough so that 99% of the users will never know it exists. i'm suggesting that a threshold exist and that it be configurable.
How about a pref that goes with the caching prefs, and looks like this: "Allow [ 10]Mb of additional space in the disk cache for pages currently being viewed (necessary for "view source" and "save page", among other things)." Ideally, that number would default to something nice and high, but the option of setting it to "unlimited" should also be available (I'd certainly choose that, as a web developer). Perhaps: ( ) Unlimited Allow additional space in the ... (as above) (*) [ 10]Mb of This wording would clearly indicate that this amount would be *additional* to the amount already specified in the disk cache, so users would know to make sure to allow enough extra space for it on disk. If a user reduces that number, they can't complain if "view source" starts working wrong. But if they specify "unlimited", it should be absolutely impossible to get into the state where a currently-visible page cannot have its source viewed or saved (correctly!).
i'm going to put on mpt's hat 'no more prefs' you have 3 prefs (mem size, disk size, when to check for new objects), they should be enough. I will ask one question: What do you expect mozilla to do when it encounters a single page that requires say 1gb of memory to render? I have two versions of the page on my system, one with local images, and one with remote images. I've loaded the page on things ranging from 24.0 dialup to a fast university net connection. [My system has never had more than 1/2gb of physical ram] If I were to load that page and leave some other windows open, should i expect mozilla to keep around the source and images for all my open windows?
timeless- Yes. and give the user an enormous big red button that says "due to low system resources you will not be able to both load this entire page and reliably use your other windows" [close other windows] [do my best to keep them all]
A couple questions for timeless - After rendering those pages in a current version of Mozilla, does it do a refetch of the source in order when you request to view source? If you look at those pages in another browser (IE, Opera, whatever), are you able to view source without a refetch? And some comments for everyone - The way I see it the primary issue of concern in this bug is about the text source of the pages. If Mozilla were to pin the text files in cache and not pin any of the images, flash, sound, etc then that would resolve most of the problems we currently face now and since the text source files are generally fairly small, we are not likely to have the cache limit exceeded by pinned source unless disk cache is disabled or set to a very low limit. In regards to viewing or saving source, pinning only the text files would make it work the way it should. In regards to saving a whole page (text source, images, and all other components), there could still be some problems with dynamic content if everything did not get pinned, but we would not be any worse off than we are now (in regards to saving the whole page correctly). As for printing, I am not up to speed about how printing works now, does it depend on refetching or cached source or is it entirely independant of this bug? I personally have not had any problems printing pages correctly lately, but I can not say for sure if that is due to recent printing code changes or just good luck.
It has been said before but I'll say it again: Printing is NOT affected by this bug in ANY way. Printing works on the DOM and doesn't use the source. So printing never refetches from server even if the source of the page has disappeared from all caches. It also works perfectly if you have 10 pages from the same url open and they all look different. If you don't like this behaviour then PLEASE don't argue about it here, this bug is spammy as is anyway. File a NEW bug and present your arguments THERE.
what are the chances of getting a fix for this before 099?
Keywords: nsbeta1
No longer blocks: 107067
> pin the text files in cache and not pin any of the images How do you make this distinction? What about SVG or other XML-based image formats? Those are definitely "images" and they definitely have a "source". (I can see not pinning things referenced by <img> tags or something, I guess).
Just something that should be kept in mind, there is such a thing as dynamically-generated images -- think weather maps, for example. So the "don't pin images" idea will not be bulletproof. Whether or not it's an acceptable tradeoff I couldn't really say.
excuse me sticking my oar in as well, but at least one bug depending on this assumes this will fix images, not just text files. btw i'm not quite sure what's going on with the dependencies - bug 118487 is dependent on this bug, but that is marked as a duplicate of 115174. shouldn't someone fix it so the original is the dependent? bug 120809 is the other one that wouldn't be fixed with a text-file-only pinning.
It's simple: pin the source text of the displayed URL. If you open an SVG image in its own window, then of course you pin its source. On the other hand, if the image is merely linked in the href of an HTML page, you don't pin it. The same applies to linked scripts and stylesheets.
> if the image is merely linked in the href of an HTML page, you don't pin it This would definately be unacceptable. EVERY object on EVERY open page (windows and tabs) needs to have its original data available in system memory. Anything that can change by a DOM or what not would need a seperate copy that I presume would be created on first-change.
_Why_ do you want every object to be immediately available? I think it's enough to have every object that can be directly saved/view-source'd (e.g. images, iframes, normal frames, the document itself) available. Things like .css or .js files seem to be unnecessary to me.
> Things like .css or .js files seem to be unnecessary to me. "Save as..." -> "web page complete" needs them. We should be able to load a complex page with all kinds of dynamic content and crazy nocache directives, lose the network connection, and still be able to save the whole page.
This bug's summary is "need means to reuse/reload current page. . . " not current image or current JavaScript or otherwise. In reality, everything on the web is static, except for numerous HTML entities and some images. To fix this bug, we have to focus on reusing the already downloaded HTML file. There is no way we can recreate an HTML file from the docshell exactly as it was. For "view source" and other operations, however, that is the requirement. The simplest and most effective solution is to keep a copy of the current HTML file in core. There would be no error messages, no cache keys, no pinning, and no prefs. It would cost memory, but sometimes you have to bite the bullet. For every open window that is displaying an HTML file (or files, if there are frames), Mozilla should store the HTML file(s) in core. As a later enhancement, we could gzip the HTML files so they will consume less RAM. I assume that after Mozilla downloads an HTML file for display, it generates a docshell, then it tosses the HTML file away. My suggestion is to not toss the HTML file away until the window closes or the window displays some other page. This will be hoggish when users start downloading 2GB HTML files every day, but that hasn't happened yet to any degree of freqency. Should that become common, we can address that problem at that time. Today, however, storing the HTML in core makes sense. Take the worst case scenario, a power user with 20 Mozilla windows open, each with 100k HTML files. That's only 2 Meg or so. As a power user, he should expect to use a lot of memory. Most users have one or two windows open, and most users download HTML files of less than 100k. Admittedly, Slashdot is the exception. In this case, however, the exception proves the rule. It would be reasonable to limit the size of any HTML file stored in core to an arbitrary file size limit, such as 1 megabyte. Then we can employ a simple formula such as: when view source (or related operation) called, use HTML file from core. If not in core, use HTML file in memory cache. If not in memory cache, use HTML file in disk cache. If not in disk cache, re-download the file. When such operations ask for images, JavaScript files, and stylesheets, just get them from cache or redownload them. They're static anyway, so the only issue is performance. If a 100 MB HTML file is generated dynamically, heaven help that server farm. As for the possible SVG image problem, a separate bug should be filed for that. Plain text files (TXT's in the Win32 world) should not be stored in core, since they are going to be static files anyway. As for dynamic images, we can't fix that. Remember, that printing is NOT affected by this bug. See comment 255. One may wish to send an image via e-mail without redownloading the image. Most dynamic images, however, will be "mapquest" maps (courtesy of Navigation Technologies Inc., BTW), and will be the exact same when redownloaded. As for weather maps, there is not any real problem. If a user downloads a weather map at 12:00, and then waits 6 hours and finally sends the file by e-mail, and he expects the sent map to be just as it was at 12:00, he has a problem that Mozilla just can't solve. Most weather maps will be looked at and handled during a few minutes time at most, making any change negligible. It is a safe assumption that most users would want the most current map anyway. Finally, many weather map sites force a redownload every so often, so there is little point in trying to "fix" this.
Let us first get a mechanism for pinning the *html* page (or image or whatever is currently being viewed) before starting to talk about all linked resources. The scope of this bug is about creating a mechanism for pinning things, in particular the current page. Everything else is subject for another bug. Feel free to create a bug on "Make all linked resources to a page refetchable without refetching from server". I won't put my thoughts on if i think that should be done in this bug since it would only encourage further discussions. At least this is my oppinion, rpotts, darin, do you agree?
Why can't we add a new attribute to the DOM, say document.source, which would be read-only attribute containing the HTML code of the corresponding page?
Um, why on earth would we need to expose the original (potentially hundreds of kB) of source of a document *throught the DOM* as text? PS. I haven't followed this discussion, so I could be missing context here...
jst, that was just an idea I had. It's not like we _need_ to do it. I don't see, though, where the difference is between using the DOM to access it or some other way.
Any change we make to the DOM can affect any webpage out there (by introducing new names into the namespace in question), so to expose things throught the DOM we need a really good reason to do so. And exposing huge strings to JS is in general not a good idea since they end up living in the JS GC world and potentially staying around in memory for a long time after being accessed.
People, the problem is not storing the data. The problem is storing the data _without adding too much bloat_. Just so that everybody understands, bloat in this context is memoryconsumption. So simply storing the data in the document is not an acceptable solution, since it will consume too much memory. Remember that mozilla is already critisized for using much memory, and that is just for the features like CSS/DOM/XML/HTML/JS, iow things that many people think is more important then the features that are blocked by this bug. I'm not saying that this is not important or that I think it should be solved. I'm just making sure you all understand what the problem is. Using the cache system and pinning things in the cache saves us much memory, since most of the time we will get the pinned source for free; it already exists in the cache anyway. So what we need to solve is the remaining cases where the current cache system arn't solving the problem. I.E. we need to give the cache the ability to guarantee that the source is available, as often as possible, preferably always. Please don't come with arguments like "anything less then 100% is acceptable", isn't it good if we can increase the number from the current, say 90%, to 99%? Sure, we should aim for 100%, but if we don't get there right away it's not the end of the world. There's always tomorrow.
The advantages of hanging the source off of the DOM are: 1- it's the most natural place (from an internal perspective) for it to go, since the DOM is what Mozilla keeps around to render/print the page, and whenever the page is renderable/printable, it should be view-sourceable/sendable/save-asable. 2- no ugly cache pinning/duplicating, etc problems 3- all the problems and nuances of pinning discussed in the last 50 or so comments go away. Viewable-but-not-saveasable goes away. The 0-size cache problem goes away. Even the oh-gee-this-page-is-7-terabytes-what-should-we-do? problem goes away because you simply run out of memory and go away or give an error. The disadvantages: 1- this exposes the .source node to the user (through javascript or whatnot), which really isn't a good idea or even beneficial to the user in really any way. 2- memory bloat. I question this, however -- doesn't representing the document as DOM take up more memory than representing it as plain html? Even if the DOM were a bit smaller than the html, we aren't talking about any orders of magnitude, are we? It seems to me that if you want to cut back on Mozilla's bloat, there are a thousand other things to do away with before you do away with "correct functionality" and "the right solution." I mean, for heaven's sake, purge all the forward/backward pages and make me reload those (from the cache or the server) before you make me reload the CURRENT page just to save it to disk. Can we somehow attach the source to the DOM internally without exposing it to the user? There is a 1-to-1 correspondence between pages that need the source held and pages which have a current DOM in memory. This is not true of any of the cache pinning solutions. I really don't think it is unreasonable in any way to make the browser keep a copy of the html of visible pages in memory -- isn't this how the most rudimentary browser you could conceive of would do it anyway? If we don't attach it to the DOM, is there some other reasonable place in memory we can stick the source, even if it means paging some of this out to disk once in a while to appease the memory-bloat argument? I'd hate to suggest the creation of yet another cache, but it seems to me (a completely non partial observer :) that page source needs to be held, and it seems equally obvious (from this week's traffic on this bug) that the traditional cache is not the place to be doing it unless you want to add all manner of kludges... I think I've now gone over my alloted 2 cents.
Yes, the in memory DOM representation is larger than the source of the document, but whether or not it's on an order of magnitude larger is besode the point, the point is that it's large enough to not want to keep it in memory. Either way, keeping the source in memory, even if it is far smaller than the DOM representation of the source is unnecessary bloat that we can not in any way afford. We're way way too bloated as it is, if we'd start holding on to the source of the document in memory too we'd need a really really good reason to do that. I still haven't heard any reasons close to good enough to pay that cost (not that I've listened that closely, but still...).
>Can we somehow attach the source to the DOM internally without exposing it to >the user? Hm... from my understanding of DOM and IDL, this would be possible: Just add this to nsIDOMDocumentInternal: [noscript] nsAString getSource(); (or [noscript] string getSource(); or whatever an appropriate string type would be) This would, as I understand, eliminate problems with GC, as the function would only be called by C++ Code, which isn't GC'ed. Or is it?
Why does everyone seem to want to put everything in memory? The only thing that _absolutely_ needs to be put in memory is encrypted files. Personally I dont care if it takes 5 msec to see the source or 5 sec, just as long as I can actually see it. So why not just follow what others already have posted and dump everything we get from the server in eg. /tmp, that includes images, html, xml etc. This enables the view-source, save-as and other? features and makes them function correctly. This disk usage should not count against the setting for the cache, because it is not cache. The cache setting is to specify just how much memory/disk you are willing to spend on *caching* data! (I have never understood the memory cache of mozilla/netscape, why is it needed? The OS just does fine caching the files anyway, while being much better at handling the size of it...)
Various people suggested that this bug should refer to all accessible documents (including images and .js and .css files). However, we should think about the average use case. People who need save as/view source want it to save the correct HTML page they are looking at, without a refetch. When users use power-features like save entire page, they should expect a refetch of images or other non-critical data if it has been expired from cache. Also, they should expect a more updated version be saved, just as if they reopened the URL in a new window - You don't expect to see the exact same thing. Therefore, I think only the currently-viewed HTML/xml/whatever source should be saved, and only if it's uncachable (if it's cachable, then we don't expect it to change, thus we may refetch). Maybe we should pin cachable documents to make them remain in cache. Another option is that if a cachable page is removed from cache and the refetched for save/view source we should check that the newly fetched page is the same as the old one (via hashing or other means), and if not warn the user with a message such as: "The page you are about to save/view may be different from the displayed page. Proceed with view/save?" I suggest adding a pref such as: Keep source of viewed pages: ( ) Keep the entire page (increases memory consumption). (*) Keep only the main page and frames. ( ) Do not keep source (Save and View Source may not work as expected). In my opinion, if a user views a very large static HTML file, it is very important to save it because the user will most likely want to save the file (think the Jargon file in one big HTML document). In any case, if the fetch of original text is not applicable, I suggest a message similar to expired POST data: "The page you are trying to view/save has expired from cache. Refetch?"
The only reason we care about bloat is that it has a negative impact on performance (in virtual memory operating systems). If we keep the source file in DOM, we would increase Mozilla's memory consumption by a few hundred kilobytes on average. This would hurt performance. If we resolve this bug in a way that results in heavy disk access by every browser window, as suggested by comment 274 and many others, however, then performance will take a huge hit. The first option is better overall. I'll be quiet after this message. I've made the case for keeping the source in DOM. As suggested above, the potential security problem can be avoided. Finally, after a couple more of the keyword: footprint bugs are fixed, the memory use issue will balance out.
Responses to various people have said, and my last comment in this bug till there is actual code involved: > downloads an HTML file for display, it generates a docshell, then it tosses > the HTML file away. You make it sound like we have the HTML and then drop it. We never have the entire HTML file at once (except in the cache). As packets come in necko sends notifications to the parser which processes the new data and waits for more. We would need to keep reallocing the data as more came in (this would be quite slow). > Plain text files [...] are going to be static files anyway. This is a completely bogus assumption... > I mean, for heaven's sake, purge all the forward/backward pages and make me > reload those (from the cache or the server) That's what we already do. All that stuff is stored in the disk cache and if it's expired from there we put up a dialog telling you so and saying that we will be reloading from the server. How often do you run across this dialog? > isn't this how the most rudimentary browser you could conceive of would do it > anyway? The most rudimentary browser I can conceive of would do a lot of things that scale poorly and are inefficient, including that one, yes. It would also not lay out most web pages out there... At this point, the thing I suggest is that we wait for Rick's nsIWebPageDescriptor work. That should allow immediate improvements in View Source and Save Page/Save Image that will solve the problems people are seeing in most cases. At that point we can see what issues still remain and how common they are, and discuss ways to solve them.
1. Can someone confirm that "90% of the time" the source-HTML is not refetched when doing View Source/Save? 2. Am I correct in this summary of the competing PRO arguments? - Pinning-the-cache is more "elegant" because it potentially does away with any memory bloat - attaching the source-HTML to the DOM is simpler since there's already the one-to-one relationship with browser windows 3. What does the memory cache DO anyway? I assumed the DOM for all open windows was stored separately since the memory footprint swells consistently as you open more windows. Or, is the DOM part of the memory-cache but the memory- cache is implemented per-window? 4. If we implement some form of cache-pinning, I would suggest a user-prompt like this, "A cached copy of this web page is no longer available. (You may need to increase the size of your disk/memory cache) Would you like to continue, by reloading the page from the web-site?" YES / NO 5. When you do a Save As in IE 5.5 it *reloads* most page elements, though I don't know about the HTML specifically.
Sorry for the additional spam, but I just did a quick test of the Save As. ------------------ I had this bug's Bugzilla page open in both Mozilla 20020129 and in IE 5.5. I did a Save As from both. Next, I opened a new IE window and submitted my previous set of comments. Then, after a minute-long pause for Bugzilla to process it, I re-did the Save As from both original windows; my comments weren't included. So, in this simple test case, both Mozilla and IE continued to save the displayed HTML rather than refetching. Perhaps a new bug(s) needs to be created for the specific instances where refetches occur?
I don't think we need a new bug thats what this one is for. The easiest case is just a simple form using the post method. Type something in hit submit and then view source, the page now shows the submitted text but when you view source you still get the origional source of the form. This is a huge issue since it make mozilla hard to use for web developers, the people we want using mozilla to keep the web from becoming an IE only wasteland. Just goto http://joshuaeichorn.com/mozilla/view_source_example.php to see the effect. I think the majority of users would be happy if this case was fixed.
Did we max the Bugzilla db or something? I'm seeing "only" the first 280 replies on the web-page now. ------------------- Mark, Joshua, thank you for the clarifications. Sorry, Joshua, but I got an "unknown domain" error when I tried to use your test page ------------------- There are way too many comments on this bug and I think it's because the title makes it sound like a widespread problem. Maybe it should be renamed, "Need improved means to reuse current page without refetching (see blocked bugs)." I also think we need to create more bugs (which will be marked as blocked-by-40867) describing the remaining problem areas. And, if we create two new bugs for the two possible solutions (marked as *blocking* 40867), we could then leave 40867 as the tracking bug. Specifically, create the following new bugs: - Need means to pin open-windows in cache - Need means to attach source-HTML to DOM - Need means to individually cache dynamic-URLs in multiple windows - Need means to prevent expiration of cached copy of still-displayed windows - Need means to "cache" (for re-use) pages marked no-cache (?) - Need means to cache/re-use HTML forms without refetch (bug 68412 or 84106?) I leave it for someone else (more knowledgeable than I) to do any of these suggestions. Finally, is there something specific about the HTTP info for http://weather.yahoo.com/forecast/USCA0982_f.html that causes it to refetch on Save As. If so, this could become another blocked-bug.
I think breaking this into separate bugs is a good idea at this point (actually, a whole lot sooner would have been better). - Need means to pin open-windows in cache - Need means to attach source-HTML to DOM These are separate issues and deserve separate bugs to discuss their merits. "Need means to pin open-windows in cache" should probably be named: - Open windows need to hold cache tokens for elements (or something like that). - Need means to individually cache dynamic-URLs in multiple windows - Need means to prevent expiration of cached copy of still-displayed windows - Need means to "cache" (for re-use) pages marked no-cache (?) These are not necessary. The first two are already supported by the cache service, and will be addressed if open windows hold cache tokens. We are already doing the third (though someone could open a bug to say we shouldn't). - Need means to cache/re-use HTML forms without refetch (bug 68412 or 84106?) I'm not quite sure what this means. Data received from a server will be cached, and reuse can be discussed in one of the bugs above. If this refers to data entered into a form by the user, then that would require a different bug, because that data is not stored in the cache.
Keywords: nsbeta1nsbeta1+
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Bug 115832 asks for page to be reused without refetching when doing File -> Edit Page.
Blocks: 115832
Let me see if I've got this straight. * This bug NEEDS to be fixed, ASAP. E-commerce is unsafe while this bug is outstanding; 'view source' works in a non-WYSIWYG, bandwidth-hogging, and generally unacceptable manner; etc. I personally will expect all the major issues related to this to be fixed by Mozilla 1.0, or I'll abandon Mozilla in disgust. * This bug has been open for a year and a half, and there's been a lot of confusion about its nature. THE BUG IS: "view source", "save", "save as", and "send page" should use the exact source received from the server for the page currently being viewed in the current window. (Some people think "Back" should too, though this is debatable). (This manifests itself in dozens of strange ways). Early on, *two* fully functional solutions were proposed. 1) Store the raw text (usually HTML) of a document (page, frame, etc.) as a node in the DOM, and grab that. People didn't like this because it added a bunch of text to the DOM, but it would work. No attempt has been made to implement it. 2) Store a 'hard reference', not URL-based, to the document's place in the cache (it's stored in raw form in the cache) -- make sure that the cache doesn't wipe out the document until the window containing it is gone by holding onto the hard reference -- grab the document source using this. (This is also known as the 'cache pinning' or 'cache keys' solution.) This required an improvement in the cache to support this functionality. These changes to the cache are DONE: see comment #74. So this solution is HALF-IMPLEMENTED. So now all that's needed is for the front end to actually use this. Any page in an open window needs a 'cache key' a.k.a. 'hard reference' which it holds on to. 'View Source' and friends need to use this key to retrieve the source. This has been waiting for nine months, and I can't figure out why. 1) Because nobody is willing to put the interface code in to hold a hard reference for each document being displayed in a window? 2) Because nobody is willing to put the code in for the individual commands (View Source, etc.) to retrieve their information using this hard reference? 3) Because people are too confused about the source of the bug to realize that this is what needs to be done? If number 3 is the problem, I hope I've solved it. Number 2 is probably blocked by number 1. (Also number 2 deserves a separate bug for each command.) What's blocking number 1, which sounds like less than an hour's work for someone who knows the code ?!?!
This bug deserves priority P1, for obvious reasons.
4) Because there's still back end work required to actually *use* cache keys. I believe there is back end work being done behind the scenes. I don't remember where Boris said it, but he is waiting on someone else so he can do the front end work. (IIRC)
Nathan: I think you are referring to rpotts' comment #223 and bzbarsky's comment #277. Currently it doesn't appear there's a bug filed on this, I could only see nsIWebPageDescriptor mentioned by Boris Zbarsky in bug 99642, so it really looks like Rick Potts works on this thing behind-the-scenes now. We have to wait.
Keywords: topembedtopembed+
Target Milestone: mozilla0.9.9 → mozilla1.0
I like the speed of the rendering engine, I like the way mozilla looks.... But I'm really getting angry when I want to do a simple view-page-source and gets unexpected results! I lived with the bug for almost one year, expecting it to be fixed from version to version. Hell, no - almost 2 years (by reading these posts) and it's still there! I always had to keep a copy of netscape around and I can't say that I'm happy with this. I read many of the notes here, let me phrase some of my thoughts, in no particular order. - I never understood why a view-page/print/whatever would have to go to reread it from network. The content might change at any time no matter what headers or url or get/post data says. As somebody already said - even if the content changes and I open the same link in another window, I expect that each window will give me at print/save/view-source the correct information, i.e. the one they built their information on. - I don't get the memory consumption argument in this thread... I want from a program to behave correctly and not to eat too many resources. In that order! So, I would have expected to be given a correct information when doing a view-source. Only if that worked I would mind about the memory consumption. If it doesn't work... I consider it a bug and the whole program is already compromised, i.e. I wouldn't trust it anymore (since it's not doing things right). - I don't get the reason why not to keep all windows _original_ html/js/images/etc bytes that are used to render a page? And I'm meaning not _only_ the HTML files. You'll never know when somebody needs to save some javascript, some images or some flash file... Why not do it ? Memory consumption ? Wow - that's a good one. I really think that HTML files (most often content type) will have about 1 order of magnitude compared to its DOM representation. Not to mention compared to all the other structures needed to keep information regarding the window state, XML/HTML/DOM representation, XUL, theme and many other such things. - Somebody said that after opening more than 100 windows the system crashed while the browser's cache of 50Mb wasn't yet filled. Wow - that enforces my thought about the fact that not the original content retreived from the web would be the memory hog... So, then, again - why not keep it as is and make the browser behave as it should? Of course I also can't imagine somebody with more than 20-30 windows opened :-) - I don't mind where you keep the original bytes - be it cache, separate memory space or file system. Of course I think the memory is the normal place to use. I do care though that it should be kept! I would accept that the browser tells me that it doesn't have more memory to open new windows than to give unexpected content when doing a preview/save/view-source. Of course, this would probably happen when reaching that >100 open windows case :-) - I wouldn't count the original bytes as belonging to the cache (even if it's stored somehow in there - pinpointed entries?). Some of the posters keep telling that the cache size limitations (memory cache size, file system cache size) _must_ be respected and that it can't keep all the original bytes. Ok - if you insist: what is this good since I still can't predict or limit the amount the whole browser process uses? What good to know that the cache is limited to xx Mb and you have very strict rules to obey this (while not having the expected behaviour) and still - after opening 2-3 windows the process already has 30 MB in size and is increasing very much with each new opened window? - I don't want to switch to another browser, but if mozilla will keep behaving wrong.... - As a memory optimisation, you can always compare each new content to the ones that could be the same (same URL for example) and keep only one reference-counted memory block. Of course, the cache will also use probably the last-modified version of these URLs. And whenever a window gets closed, decrement the reference count of all objects it used. In fewer words - I want more this wrong behaviour fixed than mind about memory consumption and get a bad program. Sorry if this was a little bit too long - but I'm a bit angry on this behaviour.
Here's some initial work to add the infrastructure necessary for new windows to leverage cached content... More work is needed, but this gives an idea of the direction... -- rick
Rick, that looks pretty good at first glance. The only two issues I see on the view source front are: not handling "view frame source" (need to pass the page descriptor to openDialog() in nsContextMenu.js inside viewFrameSource()) and the code in viewsource.js that does argument handling. typeof(null) == "object", unfortunately. It should be fine to just assume the second arg to be the charset (if not null) and the third arg to be the descriptor (if not null). On a non-viewsource topic, the persistence object could use a page descriptor as well to save pages that are the result of post requests (right now it sets the post data stream, but not the cache key on the channel). It looks like you're trying to keep the descriptors as opaque as possible, so maybe it would make sense to have a "open channel using this descriptor" global helper that would open a channel, set the post data stream, cache key, whatever flags are needed to make sure the channel reads from cache, etc. This would encapsulate knowledge of what these descriptors actually are in just two places (with some bending over, one _could_ try to use this helper in docshell, but that seems uncalled-for to me). Thanks for doing this, Boris
hi boris, this patch smacks nsContextMenu.js a bit so that view-frame-source works.... i wish i could have avoided introducing the BrowserViewFrameSource() function and instead just called BrowserViewSourceOfWindow(...) passing in the 'focused' window. but i had *no* idea how to do this :-( i also added a try-block to the argument parsing in viewsource.js to deal with the situation where 'null' is passed. i believe it should just ignore the 'null' and keep looking for more args. let me know if you think this patch is the right direction... if so, i'll clean up the patch and land it... thanks, -- rick
I am adding my vote to the 174 of them that already exist, to show my annoyance of the bug, especially while troubleshooting a web script. However, from the very recent patches that I notice, it looks like this might be near fixed. I appreciate the hard work of the coders, and hope to see the results in the next version of Mozilla. Thanks, guys!
That looks reasonable... I have to admit I'm still a little confused as to why the code in viewsource.js can't just assume that the second arg (if non-null) is the charset and the third (if non-null) is the descriptor. That lets people add more args as needed and not have them accidentally used as descriptors or charsets (if a future fourth arg happens to be a string starting with "charset=") or the like...
hey boris, you're absolutely right!! i think it must have been paranoia (or lack of sleep) on my part -- it's getting so i can't tell the difference :-) i'll fix up viewsource.js to assume the following (possibly null) arguments: [0] - url [1] - charset [2] - page cookie thanks, -- rick
Comment on attachment 73999 [details] [diff] [review] initial patch allowing view-source to use cached content... Minor problem: nsIWebPageDescriptor.idl uses the wrong (old) license.
dude this bug sucks my ass. your fixing of this bug would please me greatly and I will be more than happy to send you beer if it is fixed.
Blocks: 132638
No longer blocks: 132638
Many, many kudos to Rick. (You're about to fix a bug with 188 (the third-most) votes; blocking eight bugs including three major; a correctness, data loss *and* performance issue; a Netscape 4 parity issue (despite lack of tag); a problem which prevents architecture stabilization and dates back several years! Your solution will also, it seems, fix 55583, a bug with 240 (the second-most) votes and 93 (the most) duplicates!) If you can get this fixed by Mozilla 1.0, you will be my hero. Is there anything we the observers can do to help make sure this actually gets in by 1.0? It doesn't seem to have any of the markers indicating that mozilla.org considers it a must-have-for-1.0 (although it obviously is a must-have to a lot of web developers, including me).
Attachment #73999 - Attachment is obsolete: true
Attachment #74439 - Attachment is obsolete: true
r=bzbarsky on the xpfe changes. The docshell changes look good to me too, with one possible caveat... could it become an issue that we have two distinct session history entries floating about that hold the same post data stream? Would it ever happen that we'd try to rewind the stream and then read it on two different threads or something evil like that (causing the stream to be rewound on one thread while it's being read from on another)? I don't recall ever running into that problem when I played with a similar approach that cloned the session history entry a while back, but....
- In nsDocShell::LoadPage(): + nsCString spec, newSpec; Use nsAutoCString, or whatever it's called nowadays. + newSpec.Append(NS_LITERAL_CSTRING("view-source:")); + newSpec.Append(spec); do newSpec.Append(NS_LITERAL_CSTRING(...) + spec); to avoid double append and potentially double reallocs in the string code. - In nsSHEntry::Clone(): + rv = dest->QueryInterface(NS_GET_IID(nsISHEntry), (void**) aResult); + NS_RELEASE(dest); Why not simply: *aResult = dest; I don't see the need for the QI call here, the compiler should do the right cast with the above and it's less code, and one fewer AddRef/Release. Other than that the changes look good to me, sr=jst
hey johnny, thanks for the comments -- i'll fix up the patch... The extra QI/Release in Clone() is purely habit :-) It came from when we used to hand-roll factory functions !! Since the [out] result was nsISupports the QI was necessary and we used to have a 'rule' about using this form :-) But you're absolutely right, when assigning a known class into a correctly typed result, the compiler will do the right thing... Hey, old habits die hard ;-) -- rick
cc'ing darin, who can comment on the postdata issue further. The session history changes look good to me. Make sure that subframe navigation continues to work fine. About cloning postdata, I don't think it will be a issue, because, nsDocShell::GetCurrentDescriptor() primarily looks to send mOSHE as the page descriptor. mOSHE is either the entry for the current page, if the current page is done loading and docshell is in a stable state or entry for the previous page if the user has just started loading a page and necko has not yet started the data transfer. mOSHE will be null only when session history is disabled, in which case mLSHE will also be null and GetCurrentDescriptor() will return error. A sticky situation could be, when docshell is loading a page with postdata and has just set the mOSHE in Embed(), (ie., data transfer and consumption has started) but network is having trouble completing the postdata submission to the server, (thereby possibly in the middle of reading the postdata stream) *and* the user does a view-source at this time, which will rewind the postdata stream. But I think while reading from a stream, necko maintains offset values which will enable it to read from where it left last time. So, this shouldn't be an issue, but network experts can comment further.
What are the footprint impacts of the current patch? What if a large number of windows/tabs are open, or very large flat documents? If there are impacts, how can they be removed by people (embedders, etc) who don't need this? (No view source/etc available). This pagedescriptor - are we "pinning" the source in the cache as was mentioned? Which cache, memory or disk? What if we have no disk cache or it's turned off?
The current implementation incurs NO extra bloat... It merely allows the existing caching mechanisms to work... In the future we can deal with the cases where the cached content is not available. However, I believe that this new API is sufficient to deal with more complex 'pinning' strategies... Right now, I think it's important to leverage the current 'pinning' strategy. Especially since this should deal with 99.9% of the common cases (for view-source at least). -- rick
Comment on attachment 76086 [details] [diff] [review] New patch that addresses the previous comments... a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76086 - Flags: approval+
>In the future we can deal with the cases where the cached content is not >available. If my reading of comment #74 is correct, the current 'pinning' implementation in the cache ensures that the cached content is always available (provided Mozilla doesn't crash or run out of memory) for an open window or a frame in an open window. Even for pages which are 'not cached' (such pages will only be accessible through hard references, not through the regular cache mechanism). Since 'view source' as currently designed can only be applied to pages in an open window, this should deal with *100%* of the cases for 'view source'.
Am I correct in assuming that this will mean hitting the back button will ALWAYS pull the cached copy? While pages that have "pragma: no cache" should not have a physical cache for good reasons, it still should be at least cached in the memory, so that when you hit the back button, you're not reloading the damn page again. IE has the same problem. This has major problems with CGI scripts with FORMs that disappear when you hit the back button. All of the contents disappear, and you have to start over again. Is this another bug, or directly linked to this one?
Brendan: You're probably thinking of bug 112564.
Patch checked in!!!
*sniff* rpotts, please tell me that's not an april fool's joke
It's not.
The test case now works. Congratulations on the good work. However, the problem isn't solved yet. *sigh* Go to a page which changes often, like http://gcc.gnu.org/ml/gcc-patches/2002-04/ Leave that window around for a while. Open the same page in another window. See, it's changed. Try 'view source' in the first window. You get the source for the newer version. Obviously the 'cache key' currently used is *not* exactly what we want. (Trying to figure out the maze of interfaces in Mozilla is still very difficult for me, so I thought it was.) Apparently what we want is now called a 'cache token', and is *still* not fully implemented. Please add dependency on bug 72519.
Nathanael: reusing cache tokens was not part of the design of view-source, etc. for a good reason. it impairs the cache's ability to manager resources and to make room for newer content. that said, i can see the benefits of holding cache tokens for HTML documents, and since there aren't very many at one time, we probably could get away with pinning these in the cache. however, i think that's a second-level enhancement to view-source. the solution thus far ensures that we do the right thing on pages that are freshly navigated to, which IMO covers the majority of the view-source cases.
yeah, it is a second-level enhancement. I still think it's important but what's left isn't really a 'must be fixed by 1.0' kind of thing.
I'm sorry to spam a load of people with this message but I've been keeping an eye on this bug ever since I started using Mozilla as it irks me so much. What I've never fully understood through this whole period is /why/ Moz seems to have such difficulty in doing View Source when others like NS4.7x, IE6 handle it fine. Is there some fundamental flaw in the way Moz is/was written from the beginning that stops VS working like it does in the other browsers? I'd be grateful for an explanation (preferably through my EMail so this bug isn't tangled up) anyone can provide. If anyone can come up with a good reason as to why VS was never implemented and catered for from the beginning? Other than "Because" and "Most people don’t use it", I mean.
Because ;-) Marking fixed since it is.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Please open a new bug to deal with the case that's still broken, if you plan on marking this one fixed. (I'm referring to the situation where the same page is open in multiple windows with different contents; only one cache entry is preserved for the content, instead of one for each different set of content). Otherwise we still *don't* have a way to "reuse/reload current page without refetching from server" for all cases, so this bug is NOT fixed.
I would actually like to understand this bug better myself. A technical explination of the issues would help... how does Netscape 4.x/IE 6 deal with the view-source issue? Why doesn't Mozilla use the same style? What are the advantages to Mozilla's method? And I have to agree that I don't think this is completely fixed, although it is 1000 times better than what it was! Good work!
I just checked and 4.x *does* get the remaining case right: I viewed the top story on slashdot.org and then opened the same page in a new window. I refreshed the new window until the number of posts for "-1" in the "select threshold" dropdown list went up, without touching the original window. Then I did "View source" in the original window and looked at the bit of the source that identifies the threshold dropdown. It indicated the original number, as it should have. (Oh, and I made sure I got to the story page just by clicking a link, so there was no postdata) I haven't tested, but from comments here Mozilla would have given the value from the other window. The fact that NS4.x gets this right is an indication that the tradeoffs described in this bug *aren't* necessary - although I didn't test with cache disabled or set too small to hold a slashdot page, so I can't say for sure what happens in that pathological case for 4.x. caillon, you marked this fixed, do you have an opinion on whether this bug should be kept open for the remaining cases or closed in favor of a new bug for the case where the same source is in two windows? How about a new bug for the pathological case (cache too small to hold all open windows) too?
Please don't re-use this bug for new issues, it's big enough as it is...
Thank you again for a fix to this bug. Remaining issue spun off to bug 136633
verifying, Man is it good to se this in!
Status: RESOLVED → VERIFIED
Blocks: majorbugs
No longer blocks: majorbugs
Why is this bug marked as VERIFIED FIXED in 2002 when Firefox 3.x still has this same issue? I still am unable to save, view source, etc. pages, images and others. It reloads them from the network every time. KenW: "I think that the data to save can just be generated from the DOM object and there is no requirement that the HTML (if the HTML format is chosen) be anything like the original source code." I disagree. There is a very strong requirement that the source saved be exactly, byte-for-byte what the server provided. These are the fundamental and essential functions of an HTTP browser: download, view, and save a copy of a web page. If you are not saving the original web page, then the browser has failed in its most basic functioning.
Bug 288462 provides the full summary of all related issues, complete with RFC violations. Should be linked to this one.
Here I was getting totally crazy why some $_POST (php) output I printed in <!-- --> didn't show up in view source... Really, if I want to get the source of a clean request to a URL, then I'll use wget! Not verified! Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.6) Gecko/2009020410 Fedora/3.0.6-1.fc10 Firefox/3.0.6
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: