Closed Bug 57724 Opened 24 years ago Closed 19 years ago

View source munging pages (does not display original page source as sent by server)

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: jruderman, Assigned: mrbkap)

References

Details

(Keywords: helpwanted, perf, testcase)

Attachments

(4 files, 3 obsolete files)

These are bugs where view-source doesn't display a page as originally sent. Assigned to me for tracking only; steal if you want to use this bug to fix things.
Adding five dependencies.
Depends on: 44186, 49030, 55583, 57717, 57722
Keywords: meta
Jesse: you also added Bug 49030. I (the reporter of 49030) just checked the page in question (http://www.handyshop.de), and it looks fine now. I think 49030 should be marked as fixed.
OS: Windows 98 → All
Hardware: PC → All
Depends on: 43267
Added related bugs involving view-source.
Depends on: 6119, 40867
This bug doesn't depend on bug 40867 directly. It depends on bug 40867 through bug 55583 (and maybe through bug 6119).
No longer depends on: 40867
It's a tracking bug, right? Isn't it easier to track the indirect dependencies if you list them directly? (Or do you tend to look at the dependency tree?)
I tend to look at the dependency tree, at least if there are more than three blockers. By just listing direct dependencies it's easier to see what the problems with view-source are.
Depends on: 63137
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
Target Milestone: mozilla0.9 → mozilla0.9.1
Adding bug 67938.
Depends on: 67938
Tracking bugs should probably belong to the QA. Giving bug to janc.
Assignee: harishd → janc
Status: ASSIGNED → NEW
updated qa contact.
QA Contact: janc → bsharma
I don't know who should get this, but I do know it's not mine... returning to Jesse
Assignee: janc → davidr8
Depends on: 70828
Resetting target milestone. Several of the bugs tracked by this bug are futured :(
Target Milestone: mozilla0.9.1 → ---
Bug 78342 should probably be added to this bug, or duped against one of the dependencies if I missed it.
Depends on: 83221
Adding bug 70828, view source makes double-quotes inside tags disappear.
Many of the bugs that this bug depends on seem to indicate that the parser cleans up the HTML. It seems that (for view source) the document should not go through the parser, or at least should go through the parser without being changed.
-> nobody (removing tracking bugs from the list of bugs I own)
Priority: P3 → --
-> nobody
Assignee: jesse → nobody
Depends on: 87726
Depends on: 91045
Some of the bugs that depend on this are already fixed in my tree. I'll fix the others too (exept bug 55583 which is not directly related to the others). This is no longer a meta bug. I'll dupe the bugs that depend on this.
Assignee: nobody → clarence
No longer depends on: 6119, 43267, 44186, 49030, 55583, 57717, 57722, 63137, 67938, 70828, 83221, 87726, 91045
Keywords: meta
Priority: -- → P3
Summary: [meta] View-source munging pages → View-source munging pages
Target Milestone: --- → mozilla0.9.4
Status: NEW → ASSIGNED
*** Bug 63137 has been marked as a duplicate of this bug. ***
*** Bug 87726 has been marked as a duplicate of this bug. ***
*** Bug 57722 has been marked as a duplicate of this bug. ***
*** Bug 89033 has been marked as a duplicate of this bug. ***
*** Bug 43267 has been marked as a duplicate of this bug. ***
*** Bug 57717 has been marked as a duplicate of this bug. ***
*** Bug 70828 has been marked as a duplicate of this bug. ***
*** Bug 83221 has been marked as a duplicate of this bug. ***
*** Bug 91045 has been marked as a duplicate of this bug. ***
*** Bug 70918 has been marked as a duplicate of this bug. ***
*** Bug 49030 has been marked as a duplicate of this bug. ***
So it's not lost in the duping, my testcase is attached to bug 43267, for a specific manifestation of this bug (extraneous quote removal).
Marking mostfreq as it has 13 dups.
Keywords: mostfreq
*** Bug 92196 has been marked as a duplicate of this bug. ***
*** Bug 91240 has been marked as a duplicate of this bug. ***
Andreas: your comments in bug 92196 mention generated content. Are you generating content with a stylesheet (using the "content" attribute)? If so, please check that it's selectable and copyable from view source. The initial reason view source was moved from XML to HTML was that Mozilla had a bug in which generated content was not selectable. See bug 12460....
Boris, no, the tokenizer will produce an additional token for view source.
Andreas: Good. :) Another question. While you're in there, can you possibly get line numbers working? The whole mLineNumber thing is badly off (for example, <table width="100%" height="100%"> will get counted as one line because the newline is included in the attribute value -- '"100%"\n'. Similar problems happen with doctype declarations and the like. CDATA sections seem to count as one line each, making this _really_ bad on pages with inline JS or CSS). I was looking at getting that working at some point, but it would involve changes to the tokenizer, for the reasons I just mentioned. Luckily, you're already changing the tokenizer.... just a thought... feel free to ignore this. :)
Boris, I've already done this for most tokens in the tokenizer, but that is not enough to make it work. Some code (probably in CNavDTD) has to read the newline count for *all* tokens. At the moment the newline count seems to be ignored for at least attribute and end tag tokens and for all code inside <noscript>.Also, it will not work for mac style line breaks (the scanner counts simply LFs and I do the same).
OK. Well, I was hoping. :)
Andreas: in the attachment you added to bug 92196 the &amp;s are incorrect since the source doesn't contain &amp;s but only &-chars. And why is mozilla trying to highlight stuff within the <xmp> tags? The only tag recogninzed after a <xmp> should be a </xmp> if I understood the HTML-2.0 definition correctly. The same goes for <listing>. I know that the page is HTML-2.0 but those pages are still out there.
carljohan, as I wrote in bug 92196, you'll be able to suppress the "amp;"s. For <xmp> and <listing> (and <plaintext>), you're mostly right. Its contents should be CDATA (but note that these elements were deprecated even in HTML 2.0). I'll investigate if I can simply fix it by setting the CDATA flag in nsElementTable.cpp. If not I'll file a separate bug on it.
Depends on: 92718
Clarence: Could you please post a patch ( even if incomplete )? I would like to take a look at it. Btw, watch out when you touch nsElementTable.cpp...it's very fragile. Note: All these problems started because of view source coloring.
Attached patch patch (work in progress) (obsolete) (deleted) — — Splinter Review
The patch is not a final version, but most of it should work. The basic idea is to create more smaller tokens instead of a single complex token for view-source, e.g. separate tokens for delimiters, attribute names and attribute values. The delimiter tokens slow down view-source notably, so I made a pref for viewing them as plain text ("view_source.styled_delimiters"). > Note: All these problems started because of view source coloring. I need it for rewriting HTML source in bug 40873 too. And maybe composer also could use it.
No longer depends on: 92718
Attached patch patch (still work in progress) (obsolete) (deleted) — — Splinter Review
The new patch does no longer freeze/crash on plain text documents and it includes a modified version of the patch attached to bug 91437.
Clarence: This patch is huge!!!! The bigger the patch the harder it is to get into 0.9.4. Btw, I haven't started reviewing your patch yet. > The delimiter tokens slow down view-source notably, so > I made a pref for viewing them as plain text > ("view_source.styled_delimiters") Who would want to view source in color if it's too slow? IMO, we need a completely different approach to solve the coloring problem ( very low priority though ).
> This patch is huge!!!! The bigger the patch the harder it is to get > into 0.9.4. Basically, it's a rewrite of most of nsHTMLTokenizer and nsHTMLTokens. Probably they were not written to handle view-source and can't do it properly without major changes. I thought about writing a own tokenizer for view-source, but I wanted to have view-source being in sync with the tokenizing process. And I found problems not directly related to view-source that I wanted to fix too (I'll summarize them later). Finally I noticed that performance in the tokenizer could be much better (I do not know how much it adds to the total page load time, but I hope that it will be notably faster). Yes, my patch has some risks and should get checked in early in a milestone cycle. I hope to finish it within a week, but even that might be too late for 0.9.4. I'd rather have a good solution for 0.9.5 than a quick hack for 0.9.4. > Who would want to view source in color if it's too slow? IMO, we need a completely > different approach to solve the coloring problem ( very low priority though ). If it shows the real source and has good syntax highlighting (reflecting what the tokenizer recognizes and what not), I'll accept a slow view-source. I see nochance to accelerate view-source besides of accelerating layout altogether. What we could do is incremental display for view-source. If we'd do that it wouldn't matter as much if it's slow. And, BTW, view-source without syntax highlighting should go through tokenizer as text/plain (if we want to use the tokenizer for text/plain at all (my last patch at least optimizes this)).
While you're recoding nsHTMLTokens.cpp, please take a look at BUG 92721.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Attached patch patch v 0.5 (obsolete) (deleted) — — Splinter Review
Currently I'm trying to resolve performance issues. My patch makes the tokenizer about 15% faster (depending on page), but most of that time is lost in the HTML content sink later. Some pages (especially Bugzilla bug lists) load even slower than before (up to 1.5%). View-source with syntax highlighting will be typically about 10-15% slower. I think I can reduce that number, but there will be a price to pay for correcting the errors. Very large documents slow down more (40% for a 1MB bug list), and if stylable delimiters are enabled (default in the patch, but should later be disabled by default) load times rise about 100-200%. Time spent in layout increases exponentially with the number of tokens in a <pre> element. Good news about performance: My patch makes tokenizer about 99.5% faster for plain text documents. This includes view-source without syntax highlighting, reducing total load time about 20-80% (depending on size of document and proportion between markup and text).
> Time spent in layout increases exponentially with the number of tokens in > a <pre> element. Um. That's very unfortunate... has anyone considered filing a bug on layout regarding that?? Also, would it be at all beneficial to create multiple small <pre> elements instead of one big honkin' one? It seems that this would alleviate the exponential growth problem a bit....
Filed bug 97229 for the frame construction time problem. Yes, I though about dividing the <pre> too, I think it's worth to implement it (see numbers in bug 97229). The reason for the slowdown in the content sink was that I've changed whitespace tokens to use nsSlidingSubstring instead of nsString. It's very slow for strings consisting of only one char, but maybe that's only on my system.
Andreas, you may want to email scc@mozilla.org and ask him how to efficiently use the string classes to do what you're trying to do with nsSlidingSubstring.... Chances are, he can suggest a more effective approach. Things like + mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue)); can almost certainly be done more efficiently (without creating and initializing an nsAutoString, for example).
Keywords: mostfreq
> + mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue)); I have undone this already. It was just a hack to use nsSlidingSubstring without changing LookupTag(), but it turned out that it had no benefit over the current approach without changing more of the parser code. Where I can't see an improvement I leave the code as it is now.
Will this fix bug 98149?
*** Bug 98149 has been marked as a duplicate of this bug. ***
Adding "patch". Is this considered blocked by bug 97229?
Keywords: patch
QA Contact: bsharma → moied
Attached file testcase - quotes being removed (deleted) —
Attached file testcase - lines being combined (deleted) —
Attaching my old testcase from a dupe, and a new instance I came across today here so they can be verified against when this patch is ready.
Keywords: testcase
This isn't going to make 0.9.5, obviously, so targeting for 0.9.6. Andreas: perhaps you could freshen the patch and/or list what the remaining issues if any are before we try to get reviews? Adding perf since the htmlparser rewrite of this has performance impacts (some positive).
Keywords: perf
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Is this rewrite absolutely necessary ( though I appreciate the effort )? It looks scary. Remember the existing code has evolved for quite some time and I'm not sure how well the patch above could be tested for regressions. Is this is the only solution to fix view-source problems? I would prefer a simpler solution focusing only on the problem ( view-source ). I'm very nervous about the patch ( though I haven't looked into it throughly ). Please don't make things complicated than what it's now.
Hmm. Perhaps the best thing to do here would be to arrange for a branch. After seeing the havoc linktoolbar wreaked, this seems way too scary to just check into the tree, regardless of review. OTOH, the view source implementation does have sort of an air of being tacked on, and it would be nice to see it improved. A branch would let us spin through a whole bunch of regression tests and so forth without disturbing the stability of the main tree-I think it's something we should seriously consider.
Whether we use this patch or not, we need view-source to show an accurate representation of the source, otherwise it is useless! I know this patch is intended to fix a lot of view-source bugs. Do we have a list of those bugs in one place?
I do like the idea of fixing viewsource problems but not at the cost of jeopardizing layout. If the intention is to fix viewsource problems then we need to a solution to target just that problem.
Sorry for not working on this for a long time. But I have a new patch now. Please note my detailed explanations at http://c07.de/mozilla/bug57724/changes.html (will attach this here too). Binaries for Linux are available at http://c07.de/mozilla/bug57724/ .
Attached patch patch, v0.9 (deleted) — — Splinter Review
Attached file explanations to the patch (deleted) —
Attachment #44427 - Attachment is obsolete: true
Attachment #44660 - Attachment is obsolete: true
Attachment #46746 - Attachment is obsolete: true
> No newline tokens for plain text documents (performance reasons). Line breaks > are contained in text tokens instead. How big a performance win is this? The reason I ask is the if you have actually implemented all this line-counting correctly we should be able to do things like insert named anchors in the source view at the beginnings of lines. This would allow scrolling to those anchors and the like. The other concern I have is that this patch will break line wrapping in view source as far as I can tell. Perhaps we can put the id="viewsource" on the <body> and change viewsource.js appropriately?
Forgot to say something about performance: - Normal performance (not view-source) is only slightly better, about -0.5% (page loading time without images etc.). - Performance for view source depends largely on the size of the page. Same performance for typical pages, up to 10% slower for small and/or simple pages (e.g. +7% for the Mozilla homepage), much faster for large and/or complex pages (e.g. -90% for an 1MB bug list). - Much faster for view-source without syntax highlighting and other text/plain documents (typical -50%). - Poorer performance if you enable the new pref for styled delimiters, except for very large documents. Can be more than 100% slower for small documents with many comments like <!-------------------------------------->.
>> No newline tokens for plain text documents (performance reasons). Line breaks >> are contained in text tokens instead. >How big a performance win is this? The reason I ask is the if you have actually >implemented all this line-counting correctly we should be able to do things like >insert named anchors in the source view at the beginnings of lines. This would >allow scrolling to those anchors and the like. Newline tokens are created only after markup (>), not for line breaks in text. There is no change in current behavior other than ignoring markup in plain text. Creating newline tokens for all line breaks would probably cost some performance. >The other concern I have is that this patch will break line wrapping in view >source as far as I can tell. Perhaps we can put the id="viewsource" on the ><body> and change viewsource.js appropriately? Will test this.
My patch doesn't break line wrapping.
Andreas, you tested this on a document big enough to trigger the splitting into 512-token blocks? And looked at the _end_ of the file? Basically, it looks like line wrapping will only work for the first <pre> with your changes, since that's the only one that has the #viewsource id attached to it.
Boris, you're right. Need to change that.
Some comments: The fact that the XML flag isn't set for XHTML as text/html is a bug, see bug 107904. We shouldn't support the weirdnesses of <plaintext>, <xmp>, and <listing>, all of which are obsolete-in fact, there's already a bug to remove <plaintext> support open, bug 88987. I will create testcases to check out some fun SGML stuff (particularly attribute handling) soon.
I've been using this for the past several days and haven't noticed any weirdnesses yet. Loading of view source seems slow on some pages, but not to a greater extent than I experience with normal builds. Pushing off to 0.9.7 while I'm at it, as this clearly won't be in for 0.9.6...
Target Milestone: mozilla0.9.6 → mozilla0.9.7
*** Bug 107611 has been marked as a duplicate of this bug. ***
Blocks: 105937
Blocks: 49030
Blocks: 70828
Blocks: 63137
Blocks: 87726
Blocks: 57722
Blocks: 89033
Blocks: 43267
Blocks: 57717
Blocks: 83221
Blocks: 91045
Blocks: 70918
Blocks: 92196
Blocks: 91240
Blocks: 98149
Blocks: 107611
Blocks: 91046
0.9.7 is out and the bug is still there... so I'd suggest to adjust the "target milestone" setting!
->1.01, I doubt this patch can be adequately tested before 1.0. :-(
Target Milestone: mozilla0.9.7 → mozilla1.0.1
What does 1.0.1 mean?? You want to release the "rockstable bugfree" 1.0 version including this bug??
The patch in this bug involves a destabilizing major tokenizer rewrite with unknown consequences for lots of parsing issues. So yes, unless this lands very soon (0.9.8 timeframe) this should not land before 1.0.
one of the purposes of the massive bug reopening/dependency creation was that the more serious of these bugs could be solved individually, without emplacing the entire patch here. (as far as that goes, if we're going to do a major rewrite, I think we should go whole-hog and do a separate view-source tokenizer).
I like Christopher's proposal for a separate viewsource tokenizer. This IMO would reduce maintenance cost and would limit regressions to viewsource only.
Agreed. A separate view-source tokenizer would be the way to solve the problem without abandoning syntax highlighting. In a separate view-source tokenizer, *only* the parsing needed for syntax highlighting would be done, and it could be audited to make sure it lost no information, even for totally invalid pages. It could probably be significantly simpler than the regular tokenizer! Meanwhile, the regular tokenizer could continue to do whatever 'corrections' and optimizations it wanted. However, that might take a while to implement. In the interests of getting something done about this in the *near* future, how about a way to turn off syntax highlighting, and have 'view source' *just* show the raw document? This could be a hidden pref. However, I'm pretty sure there's enough demand for 'view source' to show each page *exactly* as it came from the server, that that should be the *default*. Better to have a defaulted-to-off preference "Turn on syntax coloring in view source (and potentially lose information)".
I have to cast my vote in favour of a "raw", non-tokenised mode for view-source. That would fix most of these bugs for the web developers, and the majority of normal users won't care. Would this be hard to implement? (I suspect easier than a seperate view-source tokeniser)
It would be pretty trivial to do that, in fact (just have nsViewSourceChannel reset the type to text/plain). The losses are: 1) lack of syntax highlighting (a _major_ reason people prefer Mozilla's/Netscape's "view source" to IE's). 2) Some intl pages which set a charset via a META tag may no longer be view-sourceable in the right charset. We _do_ set the default charset on the webshell in the view source window under some conditions, but to be truthful I'm not sure what those are. In any case, that would only apply to "view source" from the menu not to "view-source:" urls or to view source launched from the JS console (!). In my opinion, item #2 is a _major_ drawback to the proposal. Item #1 is avoidable if we use the existing "view source highlighting" pref to control this behavior. Item #2 needs careful thought and input from intl folks who are more familiar with the issues.
The following observation might be related. If you copy something from view source and paste it to another application, additional empty lines are added at some places. pi
*** Bug 142044 has been marked as a duplicate of this bug. ***
What's the status on this? We have a patch for a long time now, but no activity. Also, maybe the blocking-list could be duped to this bug, looks like they all are special cases of this bug. pi
> What's the status on this? We have a patch for a long time now, but no activity. See comment 61 and comments thereafter by both Harish and Andreas. The blocking bugs _were_ dupped. They were _undupped_ because many of them can be fixed independently without a complete parser rewrite (which is what this patch is).
I guess the target milestone should be moved then. pi
Keywords: helpwanted
Target Milestone: mozilla1.0.1 → Future
Depends on: 149867
example: I'm writing on a php-project. In that project I've a telephone list. It will work with the following arguments (only the important will be mentioned) SM = sub menu (number) to show V = variable that decides if the form to fill in or the results will be shown (0 = form, 1 = execute sql and show results - without the form) Suchwort = words I'll look up in my database (will be divided thru explode in different words by blank). It's a field in my form. mypage.php?SM=6 ==> shows me the page with the form to fill in, which will be send like that: <form action="mypage.php?SM=6&V=1" method ="post"> mypage.php?SM=6&V=1 ==> examins different variables, if they are set. If they are set, they control the order, count of records per page,.. One check I'll do is: if (empty($Suchwort)) { $V=0; } That means: if someone comes over history/bookmark/.. to the page, it's not interesting, if V=1 or V=0. When I get back results back for my query and look up the sourcecode (Strg-U) of my page I'll get back the form. ==> I get a new request for mypage.php?SM=6&V=1 . And there's the problem: I don't send $Suchwort via adress, I send it through a form. mozilla doesn't remember that and finds an empty $Suchwort and resets V to the value zero. If testing Strg-U on a page called with mypage.php?SM=6&V=1&Suchwort=123 it will work without any problem. I think it would be better to analyze the page in the cache than sending a new request. If using dynamic pages you're getting problems when searching bugs in your source-code or when examining pages in www. I'm using: en-US, rv:1.2a, Gecko/20020907 MultiZilla/v1.1.22
Frank: wrong bug. this bug is for view source displaying the right page but with the source slightly modified.
quotation from first entry: These are bugs where view-source doesn't display a page as originally sent. ====== I thought, it'll fit, because "doesn't display a page as originally sent" is very general and opens different ways to interpret ;-) I found different bugs for "view-source" but no bug would fit. SDo I thought, I add the problem here (general).
Frank, this is why one reads all the comments on a bug before commenting.
Blocks: 154120
Depends on: 188609
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and <http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss bugs are of critical or possibly higher severity. Only changing open bugs to minimize unnecessary spam. Keywords to trigger this would be crash, topcrash, topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Severity: critical → minor
Keywords: dataloss
This is not minor. It's normal. Please fix that.
Severity: minor → normal
As a note, I'm considering removing the whole view-source feature entirely and just using an editor to view the source. The amount of time being spent on view-source is utterly disproportional to its usefulness (or rather the usefulness it offers over a text editor).
Take a look at bug 8589, which would do that elegantly. I tried to take a stab at it, but I couldn't work out the undocumented internals of Mozilla well enough to do it. If you have the knowledge of Mozilla internals to do it, pleeeeeeease do and we will love you forever!
Well, since the view-source channel now returns the application/x-view-source MIME type all the time, all that would be required would be to not register an internal handler for that type (see nsContentDLF.cpp). Then when you try to view source you would get the normal helper app dialog, select an application to view with, make sure "ask me every time" is unchecked, and be done with it. This is essentially a one-line change (not including the code that could be removed as a result, of course).
If internal view-source is removed, some of the code should probably be kept and moved to Composer for things like bug 58730. (In which case the source is munged anyway to achieve more compliant HTML).
bz: Anyway, how should removing view-source (which I really like because of syntax highlighting and such stuff) help to resolve this bug? I thought this bug is because the Cache doesn't give us the right source for the pages in the given cases - if that's true, the text editor would happen to get that wrong source in those cases, which would mean that bug wouldn't be solved.
KaiRo: No, it's the parser who munges the pages, not cache. aiui, if you disable syntax highlighting, it might work...
KaiRo, you're _way_ in the wrong bug, like Biesi said. Aaron, if the composer people want the code they can get it out of CVS (Attic, to be exact). Again, this is presuming we decide to remove the internal view-source. Given all the problems it's causing, I will likely do it by summer unless someone fixes the major issues with it (poor tokenizer, poor performance, etc).
bz, biesi: Oops, sorry, it seems I'm still a bit confused by the word "munge" ;-) Anyway, thanks for claifying I was thinking/writing of the wrong bug here... I still don't like view-source going away (we're getting nearer and nearer to do everything the same as IE, it seems) - but I have no right to complain as long as I'm not able to code anything which could make our situation better here :(
Viewsource should map to the source document 1:1. This is not completely true in mozilla because the viewsource content is processed by the same tokenizer that is tuned to handle quriky content. IMO, the viewsource content should be handled by a) a new tokenizer or b) avoid tokenizer completely ( may be by using a serializer instead ). Syntax highlighting might be an issue but Im sure we can come up with something :-)
In "view source", I see a normal space where there is &nbsp; in the original document. Is there a bug for that or should I file a new one? BTW, sorry if I get it wrong, but aren't the dependencies reversed? I.e. if this is a tracking bug, then this bug should _depend on_ fixing all the concrete view-source bugs, not vice versa...
> Is there a bug for that or should I file a new one? Is that regular view source, or view selection source? The dependencies are correct. This is not a tracking bug, this is a bug on the root cause of all those other bugs -- the fact that we use the normal HTML tokenizer for view source.
> > In "view source", I see a normal space where there is &nbsp; in the > > original document. Is there a bug for that or should I file a new one? > Is that regular view source, or view selection source? Ah, sorry, I didn't realize there was a difference. It only does this in view selection source. Is that a different bug than this? Also, sorry about my misunderstanding about the dependencies.
> Is that a different bug than this? Yes, bug 155635. And it's fixed in current builds.
No longer blocks: 49030
Can someone change the summary of this bug to use real words (i.e. not "munging") so that people have a better chance of finding this bug when they search Bugzilla? How about "View Source does not display original page source as sent by server."
Summary: View-source munging pages → View source munging pages (does not display original page source as sent by server)
Blocks: majorbugs
Ah, maybe this is the bug report I am looking for. I've just been commented at bug 55583 but that may deal with a different matter although the Summary describes my problem very good. I am a web developer and thus I need to see the html code that a server sends to the client. But if I use View Page Source on a dynamically created page resulting from a POST request, Mozilla asks me to send the POST request again. This means: Mozilla does not show the original html source code which it should get from the cache or somewhere. The problem is that the resending the POST request can result in a completely different page (e.g. if a transaction has finished after the first POST request). Want to try yourself? Go to http://democam.mobotixserver.de/cgi-bin/store2rom press the button, wait for the page to return and then do right-click -> View Page Source. Now a popup window will appear telling you that the "page you are trying to view contains POSTDATA that has expired from cache".
Nope, wrong bug. Notice this bug has to do with PARSER.
Daniel: nope, that would be a different bug - this one is about view-source showing not exactly what the server sent, such as not displaying redundant double-quotes in HTML tags etc. Anyway, your testcase works perfectly for me - perhaps you're using an old version? Or perhaps you didn't wait until the page finished loading before trying to view the source (I believe the problem you describe happens then, though I'm not sure)?
Mh, if it is really about "what the server sent" then check out this page which embedds a timestamp on every page sent out (only precise to the minute): http://www.theregister.co.uk - Open the URL using Mozilla on a *Windows* system. (Linux works ok) - Enter something in the text box (upper left corner) and press Go!. - See the time in the head bar on the page. - Wait one minute or brew a tea. - Now "View Page Source", confirm the pop up window. - When the source is displayed find "Updated:" You will see that the time has changed compared to the one shown on the rendered page. I think, this fits the summary "does not display original page source as sent by server". If this is the wrong bug could you please point me to a more appropriate one? Cheers Daniel Germany
Daniel, please read the whole bug, not just the summary (and in the case of a tracking bug its dependencies) before commenting. This bug covers the fact that what view-source shows is a tokenized version of the source that comes from the server; the tokenizer sometimes "fixes" the source or loses data... Your problem is totally unrelated to this bug; see bug 166786.
Boris, thanks for pointing me into the right direction. Then I'm off to bug 166786. :-) Goodbye.
Boris, you mean that mozilla gets a page and tokenizes it without keeping the original source before doing anything else. The contents of bug 188609 does not satisfy me. What can I do to have the real page source ? I haven't contributed to mozilla yet outside of submitting, commenting or voting for bugs. What can I begin with ?
That's correct. When doing view-source of a page, we start by getting the source from cache, tokenize it, then generate a document that has highlighted source. The tokenizing is only needed for syntax highlighting; if highlighting is off the tokenizing could very well be skipped. I'm not sure what you can begin with, because I'm not sure what you want to do; feel free to email me to discuss (the discussion does not seem directly relevant to this bug).
Many text editors manage to syntax-highlight code without screwing it up in any way. So can we.
Actually, I've managed to write malformed code that's made every single text editor I tried either give up on highlighting it or actually get so confused that it changed the data in the file (eg by not loading all of it). But yes, we can do better. All we need is to write a separate tokenizer for view-source, as discussed, instead of using the same one as pageload. Wanna do it? Or are you only willing to make comments that don't really take this bug anywhere?
I'd be quite happy to write a view-source tokeniser algorithm. But my attempts at finding where it would fit in the program code are driving me mad - I guess I need more than just the XUL and JS sources...?
Yes. The tokenizer would be instantiated from nsParser.cpp (right now that instantiates an nsHTMLTokenizer).
Remember while writing it that it has to handle malformed html, or stuff that isn't HTML at all; the output must be character-for-character the same as the imput, except possibly for the colors. (This shouldn't be too hard, but I want to make the requirements of the problem clear.) Probably each token should hold the exact original character string it's associated with, and there should be at least one token type for "I don't know what this is".
Is a tokenizer really required for view source? How important really is (poorly) attempting to color-code HTML in the browser? (in the editor I can understand). And please, no "I'll stop using Mozilla and shoot myself if you remove colored view source" comments from the non-developers. If MNG and other useful features are being removed to slim down Moz/Firebird, why not ditch this? It's buggy, a time sink for someone to redo it, and excess code (which leads to excess bugs). IE somehow manages to get by without a page source tokenizer. And it's not something I hear the masses clamoring for. (The real masses, not this bugs CC list) Unfortunately I suspect this may lead to "View Selection Source" going away too, which is (arguably) semi-useful to a small percentage of users. But that tends not to work well half the time anyway. Perhaps it would be a candidate for an XPI extension.
Do you ever use "view source"? I can tell you for a fact that the highlighting makes it a lot more readable... As a result, view-source is the #1 tool for me when I'm debugging web page issues. Please don't drag MNG in here -- if the imagelib owner wants to cut stuff, that's his decision, but he has no control over the parser and content code. Yes, view-source could be made into an extension, with a lot of work (a bunch of extensions, one per platform, since it uses C++, actually).
One last thing. Size-wise, view-source code is about 40KB, if anyone cares.
Here's a thought for implementing. Instead of rewriting the tokenizer, add fields to each token for "source-start" and "source-end", byte offsets into the raw source. Then to display color-coded source, we just display the raw source and walk the parse tree to decide how to color it.
I (used to) use view source routinely to debug pages which were not displaying properly. The Mozilla coloring tokenization process meant that view source *did not display* the source of the page. The current coloring system is *worse than nothing* for debugging web pages. I've ended up using "Save Page..." and opening the saved file in a text editor every time.
Blocks: 220386
Blocks: 172947
View source is downcasing tags in XHTML 1.0 strict. I wish to see the source as it is sent from the server even if it incorrect.
That is indeed the whole point of this bug.
view source does not show the original source again:( After a post, it shows the page before post. (However as I saw, I can reload the source :), I just recognize it now, that allows again) Cache is enabled and has 50 Mb. Sorry if this announce is not in the best place here, but I could not find the original bug. Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20031030
*** Bug 225209 has been marked as a duplicate of this bug. ***
Blocks: 240636
*** Bug 251528 has been marked as a duplicate of this bug. ***
Blocks: 262562
(In reply to comment #127) > Here's a thought for implementing. Instead of rewriting the tokenizer, add > fields to each token for "source-start" and "source-end", byte offsets into the > raw source. Then to display color-coded source, we just display the raw source > and walk the parse tree to decide how to color it. Sorry I've been quiet on this bug for a while. I think this would be a good idea. Not only would it solve the problem once and for all, it would save having to maintain two separate tokeniser algs in parallel, and consequently any chance of the two disagreeing over whether something is HTML or XHTML.
That assumes we have the raw source as some sort of contiguous thing in memory all at once. That almost never happens; the source is parsed incrementally as it comes in from the networking library.
Oh. I would've thought it saved it verbatim to the cache. If it doesn't, it should.
1) Not everything is saved in the cache. 2) You're suggesting we get the source twice (once to construct the parse tree, and once to do this walk thing).
(In reply to comment #137) > 1) Not everything is saved in the cache. Why not? Even if there's some dreaded 'expire' or 'no cache' directive on the page, surely it should be there temporarily? > 2) You're suggesting we get the source twice (once to construct the parse tree, > and once to do this walk thing). What are you on about? I'm suggesting that we get the source once, save it to the cache and tokenise/parse it.
When reading data from the cache, we can't get it all at once either -- it comes in chunks.
We don't have to wait until the whole page is retrieved and then parse it out of the cache. As each chunk comes in, send it both to the cache and to the tokeniser.
That's what we do right now for normal HTML loads. But when you want to do view source, we'd have to read the data from cache, generate a parse tree, then walk all the data _again_ to display it, per your suggestion. Assuming I understand your suggestion correctly.
Why not use the parse tree we already keep in memory, as character offsets into the cached source?
We don't have a parse tree in memory (at any point in time, actually). We have a DOM. Bloating all DOMs significantly for the benefit of view-source is not an option. Further, the DOM need not correspond to the original source in any way (see JS manipulation of the DOM). Finally, the default tokenizer/parser just drops a lot of stuff on the floor altogether; stuff that needs to show up in view-source.
(In reply to comment #143) > We don't have a parse tree in memory (at any point in time, > actually). What _do_ we use to view source at the moment then? Is there any reason it can't still be used to view source under the modifications proposed from comment 127 onwards? > Finally, the default tokenizer/parser just drops a lot of stuff on > the floor altogether; stuff that needs to show up in view-source. The view-source code would display the cached source as is, using the tokenise tree/parse tree/whatever merely to syntax-highlight.
> What _do_ we use to view source at the moment then? We read it from cache and retokenize/reparse it (with slightly different rules so we don't lose content). The content sink builds a different content model from the normal HTML one from the parse tree. Please see nsViewSourceHTML.cpp. > The view-source code would display the cached source as is, using the tokenise > tree/parse tree/whatever merely to syntax-highlight. That really can't work if the tree doesn't match the source, is the point.
What possible causes are there of the tree not matching the source?
I thought I covered that in comment 143? In any case, with Blake Kaplan's recent tokenizer work we're using the same tokenizer in both cases and just not dropping things in view-source. You'll note that most of the bugs blocked by this bug are in fact fixed; of the remaining ones, most are not actually bugs but rfes. So there is no need to do a wholesale rewrite of anything, is the point.
(In reply to comment #147) > I thought I covered that in comment 143? OK, so it wouldn't make sense to use the DOM as the whatever. > In any case, with Blake Kaplan's recent tokenizer work we're using the same > tokenizer in both cases and just not dropping things in view-source. You'll > note that most of the bugs blocked by this bug are in fact fixed; of the > remaining ones, most are not actually bugs but rfes. OK. I guess we can wait and see what Blake comes up with.
Blocks: XiHashed
Taking, as I ended up fixing the rest of these bugs.
Assignee: c → mrbkap
Status: ASSIGNED → NEW
...and marking this bug as FIXED! The rest of the bugs depending on this one are either not bugs in view source (such as bug 105937) or simply RFEs (in other words, we're now showing (hopefully!) the exact source code of the page, with some slight coloring tweaks remaining to be more expressive; but this bug as reported is no longer a problem).
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Congratulations, and thank you!
No longer blocks: majorbugs
I'm seeing this problem again, in FireFox build 20050823 with bfcache enabled. I can reliably reproduce it like this: 1. Visit http://www.flinthomes.net/ 2. Near the lower-left of the page, under "MLS QuickSearch", hit "Find" (you can leave the textbox empty) 3. On the resulting page, view the source. It's not the same source that was rendered (it's as if the page is re-requested without the appropriate POST variables). I couldn't find a more specific or recent bug dealing with this, so re-opening this one.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please file new bugs on new problems. If the problem only happens with bfcache enabled, please make sure that your bug blocks bug 274784 and cc me. As for the rest, this bug was about the right source being dealt with but not being _shown_ correctly due to parsing issues. So it sounds like a totally separate issue.
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
Blocks: 320585
I have reported the same bug for a view years ago at http://www.mobildiscounter.de
same bug here at http://www.autoradio-test.org some month ago
(In reply to Jens from comment #154) > I have reported the same bug for a view years ago at > http://www.mobildiscounter.de (In reply to Carry from comment #155) > same bug here at http://www.autoradio-test.org some month ago View Source on these pages works for me. If you still see the problem in a fresh build, please file a new bug with detailed steps to reproduce.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: