Closed Bug 472895 Opened 16 years ago Closed 6 years ago

Save page, HTML only, on document.write result document (wyciwyg document) completely fails (saves document that called write() instead)

Categories

(Firefox :: File Handling, defect)

defect
Not set
major

Tracking

()

RESOLVED WONTFIX

People

(Reporter: r_rom, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 I just paid my credit card bill and wanted to save the receipt. So, I saved a print view page in a file, but when I viewed its content, I saw unusable garbage that wasn't related to what I saw in the browser. Source view of the page I wanted to save reflected what I saw, but that couldn't be saved in a file, for some reason. This is a data lost bug. Reproducible: Always
Version: unspecified → 3.0 Branch
I meant to say "data loss" bug.
Exact steps to reproduce including a testcase is needed.
I think you'd need to have a Firefox developer with a credit card account at Chase. Until one becomes available, perhaps, somebody should address existing bugs related Firefox re-requesting data from a server upon saving pages. Maybe when they are fixed, this bug will be eliminated as well. Core functionality such as this is much more important than the "awesome bar" and other things, in my opinion. Having considered what I said above, let me know if I can provide more information.
Save as HTML or complete ? That you see the page in the browser doesn't mean that the original is available if the page should not be cached. The images for example are only in the memory as decoded image but the original file is gone, the html is in the memory as parsed html but the original html is gone.... And discussing which bugs should be fixed and which not doesn't belong into this bug and we always accept patches from you. Searching before filing your bug is a must and you should already found bug 84106, bug 288462.
Save as HTML. What needs to be saved is whatever reflects what's being viewed in the browser window. If it's generated HTML, so be it. View source shows the correct data that should be saved... at least, in my case with Chase. I will mark this bug as duplicate of bug 288462. The first one was filed before Firefox existed and is assigned to a person who is said not to be reading bug reports.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
I seriously doubt this specific bug is a duplicate of the general mess that bug 288462 is. That said, I can't make sense of what this bug is about. "print view page"? "source view"? If you can describe the exact steps you took which led to the problem, that might get us closer to actually fixing it. I don't need to be able to repeat the steps on the site, but I _do_ need to have those steps to see which codepath you were following.
Bug 288462 is meta bug for tracking purpose. Why this bug can be a DUP of meta bug? Reopening, and set 288462 in Blocks: of this bug. Please find appropriate bug to DUP if DUP, after checking dependency tree for meta Bug 288462.
Blocks: 288462
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
No longer blocks: 288462
Blocks: 288462
I will try to describe what happened with more detail. Hopefully, with this attention, the problem will get solved. I went through a bill payment process that consisted of a two or three steps (different pages), and then I got to a transaction confirmation page (the final page that displayed the receipt, so to speak). It had a button/link to print the receipt. I clicked it and got a new page that was more suitable (content-wise) for printing. This page also launched a dialog to print it. I canceled the dialog and wanted to save the page. Because I was aware of the data refetching problem in Firefox, I opened the source view window (Ctrl+u) and then switched back the web page window and saved it (Ctrl+s to save HTML only). Then I looked at the content of the resulting file and saw that wasn't nearly what I saw in the browser. Not even similar. As Boris has said in one of bug reports, useless. However, the source view window contained exactly (as far as I could tell) the code that reflected what I saw in the page window -- the receipt. What I tried to say in this bug report is that the source view window could display the proper HTML code, etc., but the browser couldn't save the same thing in the file. Apparently, it had to refetch it from the server. In 2 or 3 weeks I will be paying another bill and could watch for certain things if you tell me what to watch for. I could even let one of you do a remote admin session if this could help.
Roman, thank you very much for the clear description of the problem. Do you have any idea whether the page you're trying to save is a result of a POST request or a GET request? This could be bug 84106 if the former. If you don't know, would you be willing to get an HTTP log following the directions at http://www.mozilla.org/projects/netlib/http/http-debugging.html the next time you pay the bill? So start the browser, pay the bill, quit the browser, and then e-mail me the log? (Probably better to not attach it to the bug, since it might have some sort of personal information in it.)
Boris, if I had to guess, I'd say it was a POST request. I am certainly willing to get an HTTP log the next time I pay my bill. Will it be useful only to determine the type of request or will it provide more potentially useful data? That page seems to say that having a nightly trunk build of Mozilla is a precondition for getting a log. Is that correct or can I produce a log with a release version of FF 3.0.5 too?
Roman, the HTTP log will provide various other data too (e.g. the headers the server sends, whether we hit the cache, etc). You can produce a log with a release version. The "nightly trunk build" thing was true as of when the page was first written, right after the logging code landed. That was in 2003 or so. ;) Thanks again for helping get to the bottom of this!
Roman sent me a log. Looks like the relevant parts for purposes of this bug are: 0[932140]: POST /EPay/EpayMakePayment.aspx?AI=134902742 HTTP/1.1 ... 0[932140]: POST /EPay/epaymakepaymentverify.aspx HTTP/1.1 ... 0[932140]: POST /EPay/epayconfirmpayment.aspx HTTP/1.1 and then later on in the same log: 0[932140]: GET /EPay/epayconfirmpayment.aspx HTTP/1.1 The server response to this last GET is 660 bytes long, which is exactly the size of the incorrect data saved on disk. The server response to the POST to the same url is 19583 bytes long... So this definitely looks like bug 84106, at least assuming the log includes an attempt to save the confirmation page.
Bug 84106 is about "Not correctly retrieving post data...", but the problem here and in other reports seems to stem from the fact the Firefox contacts the server to save a file representation of which a user is looking at. It should just save that representation without contacting the server.
Yes, but the point is that it doesn't have the original text in memory necessarily, and getting it from cache correctly requires having the right cache token and postdata. In any case, I'll see what I can do about getting bug 84106 fixed, and then we should see whether this bug ends up fixed as a result.
Depends on: 84106
Please keep in mind that the page source view shows the correct code. So, one part of FF has the right info, but another doesn't? That source code is what has to be saved in my situation. But sure, fix whatever you think needs to be fixed. As long as things are improving beyond superficial things such as adding a star to bookmark pages, things are good :)
Page source reads it from cache, using exactly the same approach that I'm going to take for bug 84106. ;)
No longer depends on: 84106
Roman, if you get a chance to retest in a nightly build the next time you pay your bill, that would be much appreciated!
OK, so I should have been a little more careful in comment 13. The key issue there seems to be that there's a GET at all. Bug 84106 would have caused a re-POST of the data; instead we have no POST data at all. The latest log Roman put together for me has several (4) GETs for epayconfirmpayment.aspx after the single POST, and all are responded to with that 660 byte response as far as I can tell. I have no idea where those GETs are coming from, but I assume the final save-as is reading one of those GETs from cache for whatever reason.
Boris, do you remember this from my earlier comment: "...I got to a transaction confirmation page (the final page that displayed the receipt, so to speak). It had a button/link to print the receipt. I clicked it and got a new page that was more suitable (content-wise) for printing..."? Getting the page with printable content possibly made a GET request. I also viewed the source of the printable page, which might have made a GET request (is it possible?). And then came the Save As operation.
Oh, hmm... Indeed. The view-source shouldn't have been the thing that did that GET, since if it did it would have shown the error page. But more interestingly, the view-source you did is not showing up in this log, as far as I can tell. My best guess is that what happens is that the printable version is actually created via document.write(). That would show the written data in view-source, but trying to save would get something totally different, since save would use the URI of the page that made the write() call. This is the only scenario I can think of at this point consistent with what we're seeing. If you're interested, I could create a special build that would log all data being passed to document.write() to standard output so that one could see it when running "firefox -console". Alternately, I could create a build that would alert and bail out when trying to save a page generated via document.write. Let me know if either would be useful?
One detail I've forgotten to mention: the printable page is displayed in a new window. Boris, I will run any special build you think may help you eliminate this problem.
Roman, can you try the build at https://build.mozilla.org/tryserver-builds/2009-03-26_19:17-bzbarsky@mozilla.com-try-7f25ee8849d/ (whichever one is appropriate for your platform)?
Boris, your special build saved what I wanted. There's one strange thing that I noticed, though: the HTML document specified charset=UTF-8, but the saved file is in "UTF-16LE with Signature" (according to EmEditor 7.00.3), and Notepad displayed empty vertical rectangles throughout the file that are usually displayed when unrecognized (or undisplayable?) characters are encountered.
Yeah, that's because the document.write document is created in UTF-16 directly; that's what all JS works with. Sounds like Notepad just ignores the BOM and ends up trying to render U+0000 for the zero bytes in UTF-16. It's really not much of a text editor... ;) OK, sounds like we understand what the issue is here. Not really at all related to bug 288462. Paolo, would you be interested in picking this up? I think the necessary changes here are one of: 1a) Change nsIWebBrowserPersist to take an originating principal as an argument to saveURI and set it as the owner on the channel. or 1b) Allow wyciwyg:// channels without an owner. 2) Something like the patch above in terms of detecting wyciwyg documents and using that URI as the URI to save. We don't want to always use the docshell's URI, because for error pages that would leave to weird behavior, I think... jst, do you have a preference between (1a) and (1b) above?
No longer blocks: 288462
Severity: critical → major
Component: General → File Handling
QA Contact: general → file.handling
Summary: Page source doesn't equal what is saved in a file → Save page, HTML only, on document.write result document (wyciwyg document) completely fails
Version: 3.0 Branch → Trunk
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows Vista → All
Hardware: x86 → All
Summary: Save page, HTML only, on document.write result document (wyciwyg document) completely fails → Save page, HTML only, on document.write result document (wyciwyg document) completely fails (saves document that called write() instead)
Depends on: 486921
(In reply to comment #26) > Paolo, would you be interested in picking this up? I think the necessary > changes here are one of: > > 1a) Change nsIWebBrowserPersist to take an originating principal as an > argument > to saveURI and set it as the owner on the channel. > or > 1b) Allow wyciwyg:// channels without an owner. > > 2) Something like the patch above in terms of detecting wyciwyg documents and > using that URI as the URI to save. We don't want to always use the > docshell's URI, because for error pages that would leave to weird behavior, > I think... > > jst, do you have a preference between (1a) and (1b) above? It seems this is another case where the channel to be saved is set up incorrectly; solution (1a) is probably the most correct. I had a quick look at the docshell code, and if I understand it correctly the code sets the right owner on wyciwyg:// channels, since the URIs have the flag nsIProtocolHandler::URI_INHERITS_SECURITY_CONTEXT: http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#7586 Probably the save code should do the same thing. But I'm wondering, since we already have code that does everything that is needed for the channel setup, why don't we find a way to reuse it and simply pass the prepared channel to the saveChannel method of nsIWebBrowserPersist? I filed bug 486921 to track this possibility, which is also an interesting way to address issues like finding out the right cache key for subframes (bug 485196), which is not trivial, and so on (the complete list of dependent bugs is available in bug 486921). What do you think about this approach? To me, it seems feasible, I'd only need some help with understanding which operations performed by the docshell are only required if the page is actually loaded in the browser window, rather than saved separately (see last sentence of bug 486921 comment 0). Another aspect would be designing an interface that could be used from JavaScript.
(In reply to comment #26) > We don't want to always use the docshell's URI, because for error pages > that would leave to weird behavior, I think... This is an interesting, separate issue. Currently, when the user tries to save error pages, they have the option of saving them as "complete XHTML" or as "file only". If the page is saved as complete, then the contents of the error page itself are stored on disk; if the page is saved as file only, the browser tries to fetch the page again from the server. I don't know what happens with your patch applied, when the page is saved as file only. Maybe, the "Save As" behavior when an error page is displayed could be similar to the one triggered by a reload, with the difference that errors would be conveyed through the Download Manager. Since we don't have a "real" document at hand, we may also want to disallow saving an error page as complete. Another possibility (but I don't like it very much) is to disable the "Save As" command completely for error pages, however we should still handle the case where an error page is currently displayed in a subframe. This change would probably require a separate UI review.
> I had a quick look at the docshell code, and if I understand it correctly > the code sets the right owner on wyciwyg:// channels, since the URIs have > the flag nsIProtocolHandler::URI_INHERITS_SECURITY_CONTEXT: Yep. However, the owner really is needed only when parsing into a document, so it doesn't necessarily make sense to assert on lack of owner in cases like save as, like we do now. Reusing the docshell setup code from webbrowserpersist is not going to be very easy, nor necessarily desirable. For example, they want very different load flag setups.... > Currently, when the user tries to save error pages, they have the option of > saving them as "complete XHTML" or as "file only". Yeah. Honestly, "save" should just be disabled on error pages in the UI, I would think, as you describe in your last paragraph of comment 28. We should spin that off into a separate bug, as you say.
(In reply to comment #29) > > I had a quick look at the docshell code, and if I understand it correctly > > the code sets the right owner on wyciwyg:// channels, since the URIs have > > the flag nsIProtocolHandler::URI_INHERITS_SECURITY_CONTEXT: > > Yep. > > However, the owner really is needed only when parsing into a document, so it > doesn't necessarily make sense to assert on lack of owner in cases like save > as, like we do now. I didn't realize that. Are you suggesting to remove the owner check from nsWyciwygChannel.cpp entirely? > Reusing the docshell setup code from webbrowserpersist is not going to be very > easy, nor necessarily desirable. For example, they want very different load > flag setups.... Actually, I was thinking about replacing the saveURI call with saveChannel in JavaScript, without modifying the nsIWebBrowserPersist interface or code. In order to call saveChannel, the channel itself must be prepared in a way that is similar to what is done by the docshell. My idea of reusing the code is based on the consideration that most of the save-related bugs I saw are caused by differences in the channel setup process. I agree, however, that most of the docshell code mentioned in bug 486921 is unrelated to the actual channel setup. We may just reconstruct the logic and reimplement it in JavaScript, to prepare the channel for saveChannel. In both cases, I think a review of the docshell history loading process, noting the differences with "Save As", may be beneficial. Boris, maybe you're more indicated than me do this comparison; every time I look at that code, I notice a new line that requires me to do a three- or four-level deep MXRing in order to have a vague idea of what it does :-) For example, what about channels supporting offline application caches? "Save As" does not set them up in the same way the docshell does. > > Currently, when the user tries to save error pages, they have the option of > > saving them as "complete XHTML" or as "file only". > > Yeah. Honestly, "save" should just be disabled on error pages in the UI, I > would think, as you describe in your last paragraph of comment 28. We should > spin that off into a separate bug, as you say. Filed bug 488339, with "uiwanted" keyword; you may want to add your considerations there.
If we use nsIWebNavigation.currentURI.spec instead of document.location.href for saving the original file, we must make sure not to use it to determine the file name to use on save. Probably internalSave would need both URIs. I'd prefer to write some regression tests (bug 486342) before going on with fixing this. I'd have to write a specific test for the patch to be checked in, anyway.
> Are you suggesting to remove the owner check from nsWyciwygChannel.cpp entirely? Basically, yes. Maybe move it to somewhere in nsHTMLDocument or some such if we really care. > Actually, I was thinking about replacing the saveURI call with saveChannel > in JavaScript, without modifying the nsIWebBrowserPersist interface or code. Ah, I see. That might be interesting, yes. I have no idea what the right offline app cache behavior is here, to be honest. dcamp's the man for that. Getting some tests for this first makes a lot of sense to me, yes.
Boris, maybe you can have a look at bug 486921 attachment 372907 [details] [diff] [review]. It's an experiment, that fixes this bug and others by reusing the docshell code. I'd be interested in your opinion.
(In reply to comment #32) > Getting some tests for this first makes a lot of sense to me, yes. I started writing some basic infrastructure for "Save As" testing on bug 486342, maybe you want to have a look.

wyciwyg is being removed in bug 1489308, so maybe we just won't even attempt to support this kind of saving of a page created by document.write? I might be misunderstanding.

Depends on: 1489308

I suspect in the new world this is wontfix. Nothing will store the actual data passed to write(), the URL of the document that is being written to will be the URL of the document that called open(), so saving will just save that document.

I expect at this point this is wontfix.

Status: NEW → RESOLVED
Closed: 16 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: