Closed
Bug 57724
Opened 24 years ago
Closed 19 years ago
View source munging pages (does not display original page source as sent by server)
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
Future
People
(Reporter: jruderman, Assigned: mrbkap)
References
Details
(Keywords: helpwanted, perf, testcase)
Attachments
(4 files, 3 obsolete files)
These are bugs where view-source doesn't display a page as originally sent.
Assigned to me for tracking only; steal if you want to use this bug to fix
things.
Reporter | ||
Comment 1•24 years ago
|
||
Adding five dependencies.
Jesse:
you also added Bug 49030. I (the reporter of 49030) just checked the page in
question (http://www.handyshop.de), and it looks fine now. I think 49030 should
be marked as fixed.
Comment 3•24 years ago
|
||
Added related bugs involving view-source.
Reporter | ||
Comment 4•24 years ago
|
||
This bug doesn't depend on bug 40867 directly. It depends on bug 40867 through
bug 55583 (and maybe through bug 6119).
No longer depends on: 40867
Comment 5•24 years ago
|
||
It's a tracking bug, right? Isn't it easier to track the indirect dependencies
if you list them directly? (Or do you tend to look at the dependency tree?)
Reporter | ||
Comment 6•24 years ago
|
||
I tend to look at the dependency tree, at least if there are more than three
blockers. By just listing direct dependencies it's easier to see what the
problems with view-source are.
Tracking bugs should probably belong to the QA. Giving bug to janc.
Assignee: harishd → janc
Status: ASSIGNED → NEW
Comment 10•24 years ago
|
||
I don't know who should get this, but I do know it's not mine...
returning to Jesse
Assignee: janc → davidr8
Reporter | ||
Comment 11•24 years ago
|
||
Resetting target milestone. Several of the bugs tracked by this bug are
futured :(
Target Milestone: mozilla0.9.1 → ---
Comment 12•24 years ago
|
||
Bug 78342 should probably be added to this bug, or duped against one of the
dependencies if I missed it.
Reporter | ||
Comment 13•23 years ago
|
||
Adding bug 70828, view source makes double-quotes inside tags disappear.
Comment 14•23 years ago
|
||
Many of the bugs that this bug depends on seem to indicate that the parser
cleans up the HTML. It seems that (for view source) the document should not go
through the parser, or at least should go through the parser without being changed.
Reporter | ||
Comment 15•23 years ago
|
||
-> nobody (removing tracking bugs from the list of bugs I own)
Priority: P3 → --
Comment 17•23 years ago
|
||
Some of the bugs that depend on this are already fixed in my tree. I'll fix the
others too (exept bug 55583 which is not directly related to the others).
This is no longer a meta bug. I'll dupe the bugs that depend on this.
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 18•23 years ago
|
||
*** Bug 63137 has been marked as a duplicate of this bug. ***
Comment 19•23 years ago
|
||
*** Bug 87726 has been marked as a duplicate of this bug. ***
Comment 20•23 years ago
|
||
*** Bug 57722 has been marked as a duplicate of this bug. ***
Comment 21•23 years ago
|
||
*** Bug 89033 has been marked as a duplicate of this bug. ***
Comment 22•23 years ago
|
||
*** Bug 43267 has been marked as a duplicate of this bug. ***
Comment 23•23 years ago
|
||
*** Bug 57717 has been marked as a duplicate of this bug. ***
Comment 24•23 years ago
|
||
*** Bug 70828 has been marked as a duplicate of this bug. ***
Comment 25•23 years ago
|
||
*** Bug 83221 has been marked as a duplicate of this bug. ***
Comment 26•23 years ago
|
||
*** Bug 91045 has been marked as a duplicate of this bug. ***
Comment 27•23 years ago
|
||
*** Bug 70918 has been marked as a duplicate of this bug. ***
Comment 28•23 years ago
|
||
*** Bug 49030 has been marked as a duplicate of this bug. ***
Comment 29•23 years ago
|
||
So it's not lost in the duping, my testcase is attached to bug 43267, for a
specific manifestation of this bug (extraneous quote removal).
Comment 31•23 years ago
|
||
*** Bug 92196 has been marked as a duplicate of this bug. ***
Comment 32•23 years ago
|
||
*** Bug 91240 has been marked as a duplicate of this bug. ***
Comment 33•23 years ago
|
||
Andreas: your comments in bug 92196 mention generated content. Are you
generating content with a stylesheet (using the "content" attribute)? If so,
please check that it's selectable and copyable from view source. The initial
reason view source was moved from XML to HTML was that Mozilla had a bug in
which generated content was not selectable. See bug 12460....
Comment 34•23 years ago
|
||
Boris, no, the tokenizer will produce an additional token for view source.
Comment 35•23 years ago
|
||
Andreas: Good. :) Another question. While you're in there, can you possibly
get line numbers working? The whole mLineNumber thing is badly off (for example,
<table width="100%"
height="100%">
will get counted as one line because the newline is included in the attribute
value -- '"100%"\n'. Similar problems happen with doctype declarations and the
like. CDATA sections seem to count as one line each, making this _really_ bad
on pages with inline JS or CSS).
I was looking at getting that working at some point, but it would involve
changes to the tokenizer, for the reasons I just mentioned. Luckily, you're
already changing the tokenizer....
just a thought... feel free to ignore this. :)
Comment 36•23 years ago
|
||
Boris, I've already done this for most tokens in the tokenizer, but that is not
enough to make it work. Some code (probably in CNavDTD) has to read the newline
count for *all* tokens. At the moment the newline count seems to be ignored for
at least attribute and end tag tokens and for all code inside <noscript>.Also,
it will not work for mac style line breaks (the scanner counts simply
LFs and I do the same).
Comment 37•23 years ago
|
||
OK. Well, I was hoping. :)
Comment 38•23 years ago
|
||
Andreas: in the attachment you added to bug 92196 the &s are
incorrect since the source doesn't contain &s but only &-chars.
And why is mozilla trying to highlight stuff within the <xmp> tags?
The only tag recogninzed after a <xmp> should be a </xmp> if I understood
the HTML-2.0 definition correctly. The same goes for <listing>.
I know that the page is HTML-2.0 but those pages are still out there.
Comment 39•23 years ago
|
||
carljohan, as I wrote in bug 92196, you'll be able to suppress the "amp;"s.
For <xmp> and <listing> (and <plaintext>), you're mostly right. Its contents
should be CDATA (but note that these elements were deprecated even in HTML 2.0).
I'll investigate if I can simply fix it by setting the CDATA flag in
nsElementTable.cpp. If not I'll file a separate bug on it.
Comment 40•23 years ago
|
||
Clarence: Could you please post a patch ( even if incomplete )? I would like to
take a look at it. Btw, watch out when you touch nsElementTable.cpp...it's very
fragile.
Note: All these problems started because of view source coloring.
Comment 41•23 years ago
|
||
Comment 42•23 years ago
|
||
The patch is not a final version, but most of it should work.
The basic idea is to create more smaller tokens instead of a single complex
token for view-source, e.g. separate tokens for delimiters, attribute names
and attribute values. The delimiter tokens slow down view-source notably, so
I made a pref for viewing them as plain text ("view_source.styled_delimiters").
> Note: All these problems started because of view source coloring.
I need it for rewriting HTML source in bug 40873 too. And maybe composer also
could use it.
Comment 43•23 years ago
|
||
Comment 44•23 years ago
|
||
The new patch does no longer freeze/crash on plain text documents and it
includes a modified version of the patch attached to bug 91437.
Comment 45•23 years ago
|
||
Clarence: This patch is huge!!!! The bigger the patch the harder it is to get
into 0.9.4. Btw, I haven't started reviewing your patch yet.
> The delimiter tokens slow down view-source notably, so
> I made a pref for viewing them as plain text
> ("view_source.styled_delimiters")
Who would want to view source in color if it's too slow? IMO, we need a completely
different approach to solve the coloring problem ( very low priority though ).
Comment 46•23 years ago
|
||
> This patch is huge!!!! The bigger the patch the harder it is to get
> into 0.9.4.
Basically, it's a rewrite of most of nsHTMLTokenizer and nsHTMLTokens. Probably
they were not written to handle view-source and can't do it properly without
major changes. I thought about writing a own tokenizer for view-source, but I
wanted to have view-source being in sync with the tokenizing process. And I
found problems not directly related to view-source that I wanted to fix too
(I'll summarize them later). Finally I noticed that performance in the tokenizer
could be much better (I do not know how much it adds to the total page load
time, but I hope that it will be notably faster).
Yes, my patch has some risks and should get checked in early in a milestone
cycle. I hope to finish it within a week, but even that might be too late for
0.9.4. I'd rather have a good solution for 0.9.5 than a quick hack for 0.9.4.
> Who would want to view source in color if it's too slow? IMO, we need a
completely
> different approach to solve the coloring problem ( very low priority though ).
If it shows the real source and has good syntax highlighting (reflecting what
the tokenizer recognizes and what not), I'll accept a slow view-source. I see
nochance to accelerate view-source besides of accelerating layout altogether.
What we could do is incremental display for view-source. If we'd do that it
wouldn't matter as much if it's slow. And, BTW, view-source without syntax
highlighting should go through tokenizer as text/plain (if we want to use the
tokenizer for text/plain at all (my last patch at least optimizes this)).
Comment 47•23 years ago
|
||
While you're recoding nsHTMLTokens.cpp, please take a look at BUG 92721.
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Comment 48•23 years ago
|
||
Comment 49•23 years ago
|
||
Currently I'm trying to resolve performance issues. My patch makes the
tokenizer about 15% faster (depending on page), but most of that time is
lost in the HTML content sink later. Some pages (especially Bugzilla bug
lists) load even slower than before (up to 1.5%).
View-source with syntax highlighting will be typically about 10-15% slower.
I think I can reduce that number, but there will be a price to pay for
correcting the errors. Very large documents slow down more (40% for a 1MB
bug list), and if stylable delimiters are enabled (default in the patch,
but should later be disabled by default) load times rise about 100-200%.
Time spent in layout increases exponentially with the number of tokens in
a <pre> element.
Good news about performance: My patch makes tokenizer about 99.5% faster
for plain text documents. This includes view-source without syntax
highlighting, reducing total load time about 20-80% (depending on size of
document and proportion between markup and text).
Comment 50•23 years ago
|
||
> Time spent in layout increases exponentially with the number of tokens in
> a <pre> element.
Um. That's very unfortunate... has anyone considered filing a bug on layout
regarding that??
Also, would it be at all beneficial to create multiple small <pre> elements
instead of one big honkin' one? It seems that this would alleviate the
exponential growth problem a bit....
Comment 51•23 years ago
|
||
Filed bug 97229 for the frame construction time problem. Yes, I though about
dividing the <pre> too, I think it's worth to implement it (see numbers in
bug 97229).
The reason for the slowdown in the content sink was that I've changed whitespace
tokens to use nsSlidingSubstring instead of nsString. It's very slow for strings
consisting of only one char, but maybe that's only on my system.
Comment 52•23 years ago
|
||
Andreas, you may want to email scc@mozilla.org and ask him how to efficiently
use the string classes to do what you're trying to do with
nsSlidingSubstring.... Chances are, he can suggest a more effective approach.
Things like
+ mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue));
can almost certainly be done more efficiently (without creating and initializing
an nsAutoString, for example).
Keywords: mostfreq
Comment 53•23 years ago
|
||
> + mTypeID = nsHTMLTags::LookupTag(nsAutoString(mTextValue));
I have undone this already. It was just a hack to use nsSlidingSubstring
without changing LookupTag(), but it turned out that it had no benefit over
the current approach without changing more of the parser code. Where I can't
see an improvement I leave the code as it is now.
Comment 54•23 years ago
|
||
Will this fix bug 98149?
Comment 55•23 years ago
|
||
*** Bug 98149 has been marked as a duplicate of this bug. ***
Comment 57•23 years ago
|
||
Comment 58•23 years ago
|
||
Comment 59•23 years ago
|
||
Attaching my old testcase from a dupe, and a new instance I came across today
here so they can be verified against when this patch is ready.
Keywords: testcase
Comment 60•23 years ago
|
||
This isn't going to make 0.9.5, obviously, so targeting for 0.9.6. Andreas:
perhaps you could freshen the patch and/or list what the remaining issues if any
are before we try to get reviews?
Adding perf since the htmlparser rewrite of this has performance impacts (some
positive).
Keywords: perf
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Comment 61•23 years ago
|
||
Is this rewrite absolutely necessary ( though I appreciate the effort )? It
looks scary. Remember the existing code has evolved for quite some time and I'm
not sure how well the patch above could be tested for regressions. Is this is
the only solution to fix view-source problems? I would prefer a simpler solution
focusing only on the problem ( view-source ). I'm very nervous about the patch (
though I haven't looked into it throughly ). Please don't make things
complicated than what it's now.
Comment 62•23 years ago
|
||
Hmm. Perhaps the best thing to do here would be to arrange for a branch. After
seeing the havoc linktoolbar wreaked, this seems way too scary to just check
into the tree, regardless of review. OTOH, the view source implementation does
have sort of an air of being tacked on, and it would be nice to see it improved.
A branch would let us spin through a whole bunch of regression tests and so
forth without disturbing the stability of the main tree-I think it's something
we should seriously consider.
Comment 63•23 years ago
|
||
Whether we use this patch or not, we need view-source to show an accurate
representation of the source, otherwise it is useless! I know this patch is
intended to fix a lot of view-source bugs. Do we have a list of those bugs in
one place?
Comment 64•23 years ago
|
||
I do like the idea of fixing viewsource problems but not at the cost of
jeopardizing layout. If the intention is to fix viewsource problems then we need
to a solution to target just that problem.
Comment 65•23 years ago
|
||
Sorry for not working on this for a long time.
But I have a new patch now. Please note my detailed explanations at
http://c07.de/mozilla/bug57724/changes.html (will attach this here too).
Binaries for Linux are available at http://c07.de/mozilla/bug57724/ .
Comment 66•23 years ago
|
||
Comment 67•23 years ago
|
||
Updated•23 years ago
|
Attachment #44427 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #44660 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #46746 -
Attachment is obsolete: true
Comment 68•23 years ago
|
||
> No newline tokens for plain text documents (performance reasons). Line breaks
> are contained in text tokens instead.
How big a performance win is this? The reason I ask is the if you have actually
implemented all this line-counting correctly we should be able to do things like
insert named anchors in the source view at the beginnings of lines. This would
allow scrolling to those anchors and the like.
The other concern I have is that this patch will break line wrapping in view
source as far as I can tell. Perhaps we can put the id="viewsource" on the
<body> and change viewsource.js appropriately?
Comment 69•23 years ago
|
||
Forgot to say something about performance:
- Normal performance (not view-source) is only slightly better, about -0.5%
(page loading time without images etc.).
- Performance for view source depends largely on the size of the page. Same
performance for typical pages, up to 10% slower for small and/or simple pages
(e.g. +7% for the Mozilla homepage), much faster for large and/or complex
pages (e.g. -90% for an 1MB bug list).
- Much faster for view-source without syntax highlighting and other text/plain
documents (typical -50%).
- Poorer performance if you enable the new pref for styled delimiters, except
for very large documents. Can be more than 100% slower for small
documents with many comments like <!-------------------------------------->.
Comment 70•23 years ago
|
||
>> No newline tokens for plain text documents (performance reasons). Line breaks
>> are contained in text tokens instead.
>How big a performance win is this? The reason I ask is the if you have actually
>implemented all this line-counting correctly we should be able to do things like
>insert named anchors in the source view at the beginnings of lines. This would
>allow scrolling to those anchors and the like.
Newline tokens are created only after markup (>), not for line breaks in text.
There is no change in current behavior other than ignoring markup in plain
text. Creating newline tokens for all line breaks would probably cost some
performance.
>The other concern I have is that this patch will break line wrapping in view
>source as far as I can tell. Perhaps we can put the id="viewsource" on the
><body> and change viewsource.js appropriately?
Will test this.
Comment 71•23 years ago
|
||
My patch doesn't break line wrapping.
Comment 72•23 years ago
|
||
Andreas, you tested this on a document big enough to trigger the splitting into
512-token blocks? And looked at the _end_ of the file?
Basically, it looks like line wrapping will only work for the first <pre> with
your changes, since that's the only one that has the #viewsource id attached to it.
Comment 73•23 years ago
|
||
Boris, you're right. Need to change that.
Comment 74•23 years ago
|
||
Some comments:
The fact that the XML flag isn't set for XHTML as text/html is a bug, see bug
107904. We shouldn't support the weirdnesses of <plaintext>, <xmp>, and
<listing>, all of which are obsolete-in fact, there's already a bug to remove
<plaintext> support open, bug 88987. I will create testcases to check out some
fun SGML stuff (particularly attribute handling) soon.
Comment 75•23 years ago
|
||
I've been using this for the past several days and haven't noticed any
weirdnesses yet. Loading of view source seems slow on some pages, but not to a
greater extent than I experience with normal builds.
Pushing off to 0.9.7 while I'm at it, as this clearly won't be in for 0.9.6...
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Comment 76•23 years ago
|
||
*** Bug 107611 has been marked as a duplicate of this bug. ***
Comment 77•23 years ago
|
||
0.9.7 is out and the bug is still there... so I'd suggest to adjust the "target
milestone" setting!
Comment 78•23 years ago
|
||
->1.01, I doubt this patch can be adequately tested before 1.0. :-(
Target Milestone: mozilla0.9.7 → mozilla1.0.1
Comment 79•23 years ago
|
||
What does 1.0.1 mean?? You want to release the "rockstable bugfree" 1.0 version
including this bug??
Comment 80•23 years ago
|
||
The patch in this bug involves a destabilizing major tokenizer rewrite with
unknown consequences for lots of parsing issues. So yes, unless this lands very
soon (0.9.8 timeframe) this should not land before 1.0.
Comment 81•23 years ago
|
||
one of the purposes of the massive bug reopening/dependency creation was that
the more serious of these bugs could be solved individually, without emplacing
the entire patch here. (as far as that goes, if we're going to do a major
rewrite, I think we should go whole-hog and do a separate view-source tokenizer).
Comment 82•23 years ago
|
||
I like Christopher's proposal for a separate viewsource tokenizer. This IMO
would reduce maintenance cost and would limit regressions to viewsource only.
Comment 83•23 years ago
|
||
Agreed. A separate view-source tokenizer would be the way to solve the problem
without abandoning syntax highlighting. In a separate view-source tokenizer,
*only* the parsing needed for syntax highlighting would be done, and it could be
audited to make sure it lost no information, even for totally invalid pages. It
could probably be significantly simpler than the regular tokenizer! Meanwhile,
the regular tokenizer could continue to do whatever 'corrections' and
optimizations it wanted.
However, that might take a while to implement. In the interests of getting
something done about this in the *near* future, how about a way to turn off
syntax highlighting, and have 'view source' *just* show the raw document?
This could be a hidden pref. However, I'm pretty sure there's enough demand for
'view source' to show each page *exactly* as it came from the server, that that
should be the *default*. Better to have a defaulted-to-off preference "Turn on
syntax coloring in view source (and potentially lose information)".
Comment 84•23 years ago
|
||
I have to cast my vote in favour of a "raw", non-tokenised mode for view-source.
That would fix most of these bugs for the web developers, and the majority of
normal users won't care. Would this be hard to implement? (I suspect easier
than a seperate view-source tokeniser)
Comment 85•23 years ago
|
||
It would be pretty trivial to do that, in fact (just have nsViewSourceChannel
reset the type to text/plain). The losses are:
1) lack of syntax highlighting (a _major_ reason people prefer
Mozilla's/Netscape's "view source" to IE's).
2) Some intl pages which set a charset via a META tag may no longer be
view-sourceable in the right charset. We _do_ set the default charset on
the webshell in the view source window under some conditions, but to be
truthful I'm not sure what those are. In any case, that would only apply
to "view source" from the menu not to "view-source:" urls or to view source
launched from the JS console (!).
In my opinion, item #2 is a _major_ drawback to the proposal.
Item #1 is avoidable if we use the existing "view source highlighting" pref to
control this behavior. Item #2 needs careful thought and input from intl folks
who are more familiar with the issues.
Comment 86•23 years ago
|
||
The following observation might be related. If you copy something from view
source and paste it to another application, additional empty lines are added at
some places.
pi
Comment 87•23 years ago
|
||
*** Bug 142044 has been marked as a duplicate of this bug. ***
Comment 88•22 years ago
|
||
What's the status on this? We have a patch for a long time now, but no activity.
Also, maybe the blocking-list could be duped to this bug, looks like they all
are special cases of this bug.
pi
Comment 89•22 years ago
|
||
> What's the status on this? We have a patch for a long time now, but no activity.
See comment 61 and comments thereafter by both Harish and Andreas.
The blocking bugs _were_ dupped. They were _undupped_ because many of them can
be fixed independently without a complete parser rewrite (which is what this
patch is).
Comment 90•22 years ago
|
||
I guess the target milestone should be moved then.
pi
Updated•22 years ago
|
Keywords: helpwanted
Target Milestone: mozilla1.0.1 → Future
Comment 91•22 years ago
|
||
example:
I'm writing on a php-project. In that project I've a telephone list. It will
work with the following arguments (only the important will be mentioned)
SM = sub menu (number) to show
V = variable that decides if the form to fill in or the results will be shown
(0 = form, 1 = execute sql and show results - without the form)
Suchwort = words I'll look up in my database (will be divided thru explode in
different words by blank). It's a field in my form.
mypage.php?SM=6
==> shows me the page with the form to fill in, which will be send like that:
<form action="mypage.php?SM=6&V=1" method ="post">
mypage.php?SM=6&V=1
==> examins different variables, if they are set. If they are set, they control
the order, count of records per page,..
One check I'll do is: if (empty($Suchwort)) { $V=0; }
That means: if someone comes over history/bookmark/.. to the page, it's not
interesting, if V=1 or V=0.
When I get back results back for my query and look up the sourcecode (Strg-U) of
my page I'll get back the form.
==> I get a new request for mypage.php?SM=6&V=1 . And there's the problem: I
don't send $Suchwort via adress, I send it through a form. mozilla doesn't
remember that and finds an empty $Suchwort and resets V to the value zero.
If testing Strg-U on a page called with mypage.php?SM=6&V=1&Suchwort=123 it will
work without any problem.
I think it would be better to analyze the page in the cache than sending a new
request. If using dynamic pages you're getting problems when searching bugs in
your source-code or when examining pages in www.
I'm using: en-US, rv:1.2a, Gecko/20020907 MultiZilla/v1.1.22
Comment 92•22 years ago
|
||
Frank: wrong bug. this bug is for view source displaying the right page but with
the source slightly modified.
Comment 93•22 years ago
|
||
quotation from first entry:
These are bugs where view-source doesn't display a page as originally sent.
======
I thought, it'll fit, because "doesn't display a page as originally sent" is
very general and opens different ways to interpret ;-)
I found different bugs for "view-source" but no bug would fit. SDo I thought, I
add the problem here (general).
Comment 94•22 years ago
|
||
Frank, this is why one reads all the comments on a bug before commenting.
Comment 95•22 years ago
|
||
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity. Only changing open bugs to
minimize unnecessary spam. Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Comment 96•22 years ago
|
||
This is not minor. It's normal. Please fix that.
Updated•22 years ago
|
Severity: minor → normal
Comment 97•22 years ago
|
||
As a note, I'm considering removing the whole view-source feature entirely and
just using an editor to view the source. The amount of time being spent on
view-source is utterly disproportional to its usefulness (or rather the
usefulness it offers over a text editor).
Comment 98•22 years ago
|
||
Take a look at bug 8589, which would do that elegantly. I tried to take a stab
at it, but I couldn't work out the undocumented internals of Mozilla well enough
to do it. If you have the knowledge of Mozilla internals to do it, pleeeeeeease
do and we will love you forever!
Comment 99•22 years ago
|
||
Well, since the view-source channel now returns the application/x-view-source
MIME type all the time, all that would be required would be to not register an
internal handler for that type (see nsContentDLF.cpp). Then when you try to
view source you would get the normal helper app dialog, select an application to
view with, make sure "ask me every time" is unchecked, and be done with it.
This is essentially a one-line change (not including the code that could be
removed as a result, of course).
Comment 100•22 years ago
|
||
If internal view-source is removed, some of the code should probably be kept and
moved to Composer for things like bug 58730. (In which case the source is
munged anyway to achieve more compliant HTML).
Comment 101•22 years ago
|
||
bz:
Anyway, how should removing view-source (which I really like because of syntax
highlighting and such stuff) help to resolve this bug?
I thought this bug is because the Cache doesn't give us the right source for the
pages in the given cases - if that's true, the text editor would happen to get
that wrong source in those cases, which would mean that bug wouldn't be solved.
Comment 102•22 years ago
|
||
KaiRo: No, it's the parser who munges the pages, not cache. aiui, if you disable
syntax highlighting, it might work...
Comment 103•22 years ago
|
||
KaiRo, you're _way_ in the wrong bug, like Biesi said.
Aaron, if the composer people want the code they can get it out of CVS (Attic,
to be exact). Again, this is presuming we decide to remove the internal
view-source. Given all the problems it's causing, I will likely do it by summer
unless someone fixes the major issues with it (poor tokenizer, poor performance,
etc).
Comment 104•22 years ago
|
||
bz, biesi:
Oops, sorry, it seems I'm still a bit confused by the word "munge" ;-)
Anyway, thanks for claifying I was thinking/writing of the wrong bug here...
I still don't like view-source going away (we're getting nearer and nearer to do
everything the same as IE, it seems) - but I have no right to complain as long
as I'm not able to code anything which could make our situation better here :(
Comment 105•22 years ago
|
||
Viewsource should map to the source document 1:1. This is not completely true in
mozilla because the viewsource content is processed by the same tokenizer that
is tuned to handle quriky content. IMO, the viewsource content should be handled
by a)
a new tokenizer or b) avoid tokenizer completely ( may be by using a serializer
instead ). Syntax highlighting might be an issue but Im sure we can come up with
something :-)
Comment 106•22 years ago
|
||
In "view source", I see a normal space where there is in the original
document. Is there a bug for that or should I file a new one?
BTW, sorry if I get it wrong, but aren't the dependencies reversed? I.e. if this
is a tracking bug, then this bug should _depend on_ fixing all the concrete
view-source bugs, not vice versa...
Comment 107•22 years ago
|
||
> Is there a bug for that or should I file a new one?
Is that regular view source, or view selection source?
The dependencies are correct. This is not a tracking bug, this is a bug on the
root cause of all those other bugs -- the fact that we use the normal HTML
tokenizer for view source.
Comment 108•22 years ago
|
||
> > In "view source", I see a normal space where there is in the
> > original document. Is there a bug for that or should I file a new one?
> Is that regular view source, or view selection source?
Ah, sorry, I didn't realize there was a difference. It only does this in view
selection source. Is that a different bug than this?
Also, sorry about my misunderstanding about the dependencies.
Comment 109•22 years ago
|
||
> Is that a different bug than this?
Yes, bug 155635. And it's fixed in current builds.
Comment 110•22 years ago
|
||
Can someone change the summary of this bug to use real words (i.e. not
"munging") so that people have a better chance of finding this bug when they
search Bugzilla?
How about "View Source does not display original page source as sent by server."
Updated•22 years ago
|
Summary: View-source munging pages → View source munging pages (does not display original page source as sent by server)
Comment 111•21 years ago
|
||
Ah, maybe this is the bug report I am looking for. I've just been commented at
bug 55583 but that may deal with a different matter although the Summary
describes my problem very good.
I am a web developer and thus I need to see the html code that a server
sends to the client. But if I use View Page Source on a dynamically created page
resulting from a POST request, Mozilla asks me to send the POST request again.
This means: Mozilla does not show the original html source code which it should
get from the cache or somewhere. The problem is that the resending the POST
request can result in a completely different page (e.g. if a transaction has
finished after the first POST request).
Want to try yourself?
Go to
http://democam.mobotixserver.de/cgi-bin/store2rom
press the button, wait for the page to return and then do
right-click -> View Page Source.
Now a popup window will appear telling you that the "page you are trying
to view contains POSTDATA that has expired from cache".
Comment 112•21 years ago
|
||
Nope, wrong bug. Notice this bug has to do with PARSER.
Comment 113•21 years ago
|
||
Daniel: nope, that would be a different bug - this one is about view-source
showing not exactly what the server sent, such as not displaying redundant
double-quotes in HTML tags etc.
Anyway, your testcase works perfectly for me - perhaps you're using an old
version? Or perhaps you didn't wait until the page finished loading before
trying to view the source (I believe the problem you describe happens then,
though I'm not sure)?
Comment 114•21 years ago
|
||
Mh, if it is really about "what the server sent" then check out this
page which embedds a timestamp on every page sent out (only precise
to the minute):
http://www.theregister.co.uk
- Open the URL using Mozilla on a *Windows* system. (Linux works ok)
- Enter something in the text box (upper left corner) and press Go!.
- See the time in the head bar on the page.
- Wait one minute or brew a tea.
- Now "View Page Source", confirm the pop up window.
- When the source is displayed find "Updated:" You will see that the time
has changed compared to the one shown on the rendered page.
I think, this fits the summary "does not display original page source as sent by
server". If this is the wrong bug could you please point me to a more
appropriate one?
Cheers
Daniel
Germany
Comment 115•21 years ago
|
||
Daniel, please read the whole bug, not just the summary (and in the case of a
tracking bug its dependencies) before commenting.
This bug covers the fact that what view-source shows is a tokenized version of
the source that comes from the server; the tokenizer sometimes "fixes" the
source or loses data...
Your problem is totally unrelated to this bug; see bug 166786.
Comment 116•21 years ago
|
||
Boris,
thanks for pointing me into the right direction. Then I'm off to bug 166786. :-)
Goodbye.
Comment 117•21 years ago
|
||
Boris, you mean that mozilla gets a page and tokenizes it without keeping the
original source before doing anything else.
The contents of bug 188609 does not satisfy me.
What can I do to have the real page source ?
I haven't contributed to mozilla yet outside of submitting, commenting
or voting for bugs. What can I begin with ?
Comment 118•21 years ago
|
||
That's correct. When doing view-source of a page, we start by getting the
source from cache, tokenize it, then generate a document that has highlighted
source.
The tokenizing is only needed for syntax highlighting; if highlighting is off
the tokenizing could very well be skipped.
I'm not sure what you can begin with, because I'm not sure what you want to do;
feel free to email me to discuss (the discussion does not seem directly relevant
to this bug).
Comment 119•21 years ago
|
||
Many text editors manage to syntax-highlight code without screwing it up in any
way. So can we.
Comment 120•21 years ago
|
||
Actually, I've managed to write malformed code that's made every single text
editor I tried either give up on highlighting it or actually get so confused
that it changed the data in the file (eg by not loading all of it).
But yes, we can do better. All we need is to write a separate tokenizer for
view-source, as discussed, instead of using the same one as pageload. Wanna do
it? Or are you only willing to make comments that don't really take this bug
anywhere?
Comment 121•21 years ago
|
||
I'd be quite happy to write a view-source tokeniser algorithm.
But my attempts at finding where it would fit in the program code are driving me
mad - I guess I need more than just the XUL and JS sources...?
Comment 122•21 years ago
|
||
Yes. The tokenizer would be instantiated from nsParser.cpp (right now that
instantiates an nsHTMLTokenizer).
Comment 123•21 years ago
|
||
Remember while writing it that it has to handle malformed html, or stuff that
isn't HTML at all; the output must be character-for-character the same as the
imput, except possibly for the colors. (This shouldn't be too hard, but I want
to make the requirements of the problem clear.) Probably each token should hold the
exact original character string it's associated with, and there should be at
least one token type for "I don't know what this is".
Comment 124•21 years ago
|
||
Is a tokenizer really required for view source? How important really is (poorly)
attempting to color-code HTML in the browser? (in the editor I can understand).
And please, no "I'll stop using Mozilla and shoot myself if you remove colored
view source" comments from the non-developers.
If MNG and other useful features are being removed to slim down Moz/Firebird,
why not ditch this? It's buggy, a time sink for someone to redo it, and excess
code (which leads to excess bugs).
IE somehow manages to get by without a page source tokenizer. And it's not
something I hear the masses clamoring for. (The real masses, not this bugs CC list)
Unfortunately I suspect this may lead to "View Selection Source" going away too,
which is (arguably) semi-useful to a small percentage of users. But that tends
not to work well half the time anyway. Perhaps it would be a candidate for an
XPI extension.
Comment 125•21 years ago
|
||
Do you ever use "view source"? I can tell you for a fact that the highlighting
makes it a lot more readable... As a result, view-source is the #1 tool for me
when I'm debugging web page issues.
Please don't drag MNG in here -- if the imagelib owner wants to cut stuff,
that's his decision, but he has no control over the parser and content code.
Yes, view-source could be made into an extension, with a lot of work (a bunch of
extensions, one per platform, since it uses C++, actually).
Comment 126•21 years ago
|
||
One last thing. Size-wise, view-source code is about 40KB, if anyone cares.
Comment 127•21 years ago
|
||
Here's a thought for implementing. Instead of rewriting the tokenizer, add
fields to each token for "source-start" and "source-end", byte offsets into the
raw source. Then to display color-coded source, we just display the raw source
and walk the parse tree to decide how to color it.
Comment 128•21 years ago
|
||
I (used to) use view source routinely to debug pages which were not displaying
properly. The Mozilla coloring tokenization process meant that view source *did
not display* the source of the page. The current coloring system is *worse than
nothing* for debugging web pages. I've ended up using "Save Page..." and
opening the saved file in a text editor every time.
Comment 129•21 years ago
|
||
View source is downcasing tags in XHTML 1.0 strict. I wish to see the source as
it is sent from the server even if it incorrect.
Comment 130•21 years ago
|
||
That is indeed the whole point of this bug.
Comment 131•21 years ago
|
||
view source does not show the original source again:( After a post, it shows the
page before post. (However as I saw, I can reload the source :), I just
recognize it now, that allows again) Cache is enabled and has 50 Mb.
Sorry if this announce is not in the best place here, but I could not find the
original bug.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20031030
Comment 132•21 years ago
|
||
*** Bug 225209 has been marked as a duplicate of this bug. ***
Comment 133•20 years ago
|
||
*** Bug 251528 has been marked as a duplicate of this bug. ***
Comment 134•20 years ago
|
||
(In reply to comment #127)
> Here's a thought for implementing. Instead of rewriting the tokenizer, add
> fields to each token for "source-start" and "source-end", byte offsets into the
> raw source. Then to display color-coded source, we just display the raw source
> and walk the parse tree to decide how to color it.
Sorry I've been quiet on this bug for a while. I think this would be a good
idea. Not only would it solve the problem once and for all, it would save
having to maintain two separate tokeniser algs in parallel, and consequently any
chance of the two disagreeing over whether something is HTML or XHTML.
Comment 135•20 years ago
|
||
That assumes we have the raw source as some sort of contiguous thing in memory
all at once. That almost never happens; the source is parsed incrementally as
it comes in from the networking library.
Comment 136•20 years ago
|
||
Oh. I would've thought it saved it verbatim to the cache. If it doesn't, it
should.
Comment 137•20 years ago
|
||
1) Not everything is saved in the cache.
2) You're suggesting we get the source twice (once to construct the parse tree,
and once to do this walk thing).
Comment 138•20 years ago
|
||
(In reply to comment #137)
> 1) Not everything is saved in the cache.
Why not? Even if there's some dreaded 'expire' or 'no cache' directive on the
page, surely it should be there temporarily?
> 2) You're suggesting we get the source twice (once to construct the parse tree,
> and once to do this walk thing).
What are you on about? I'm suggesting that we get the source once, save it to
the cache and tokenise/parse it.
Comment 139•20 years ago
|
||
When reading data from the cache, we can't get it all at once either -- it comes
in chunks.
Comment 140•20 years ago
|
||
We don't have to wait until the whole page is retrieved and then parse it out of
the cache. As each chunk comes in, send it both to the cache and to the tokeniser.
Comment 141•20 years ago
|
||
That's what we do right now for normal HTML loads.
But when you want to do view source, we'd have to read the data from cache,
generate a parse tree, then walk all the data _again_ to display it, per your
suggestion. Assuming I understand your suggestion correctly.
Comment 142•20 years ago
|
||
Why not use the parse tree we already keep in memory, as character offsets into
the cached source?
Comment 143•20 years ago
|
||
We don't have a parse tree in memory (at any point in time, actually). We have
a DOM. Bloating all DOMs significantly for the benefit of view-source is not an
option. Further, the DOM need not correspond to the original source in any way
(see JS manipulation of the DOM). Finally, the default tokenizer/parser just
drops a lot of stuff on the floor altogether; stuff that needs to show up in
view-source.
Comment 144•20 years ago
|
||
(In reply to comment #143)
> We don't have a parse tree in memory (at any point in time,
> actually).
What _do_ we use to view source at the moment then? Is there any reason it
can't still be used to view source under the modifications proposed from comment
127 onwards?
> Finally, the default tokenizer/parser just drops a lot of stuff on
> the floor altogether; stuff that needs to show up in view-source.
The view-source code would display the cached source as is, using the tokenise
tree/parse tree/whatever merely to syntax-highlight.
Comment 145•20 years ago
|
||
> What _do_ we use to view source at the moment then?
We read it from cache and retokenize/reparse it (with slightly different rules
so we don't lose content). The content sink builds a different content model
from the normal HTML one from the parse tree. Please see nsViewSourceHTML.cpp.
> The view-source code would display the cached source as is, using the tokenise
> tree/parse tree/whatever merely to syntax-highlight.
That really can't work if the tree doesn't match the source, is the point.
Comment 146•20 years ago
|
||
What possible causes are there of the tree not matching the source?
Comment 147•20 years ago
|
||
I thought I covered that in comment 143?
In any case, with Blake Kaplan's recent tokenizer work we're using the same
tokenizer in both cases and just not dropping things in view-source. You'll
note that most of the bugs blocked by this bug are in fact fixed; of the
remaining ones, most are not actually bugs but rfes.
So there is no need to do a wholesale rewrite of anything, is the point.
Comment 148•20 years ago
|
||
(In reply to comment #147)
> I thought I covered that in comment 143?
OK, so it wouldn't make sense to use the DOM as the whatever.
> In any case, with Blake Kaplan's recent tokenizer work we're using the same
> tokenizer in both cases and just not dropping things in view-source. You'll
> note that most of the bugs blocked by this bug are in fact fixed; of the
> remaining ones, most are not actually bugs but rfes.
OK. I guess we can wait and see what Blake comes up with.
Assignee | ||
Comment 149•20 years ago
|
||
Taking, as I ended up fixing the rest of these bugs.
Assignee: c → mrbkap
Status: ASSIGNED → NEW
Assignee | ||
Comment 150•20 years ago
|
||
...and marking this bug as FIXED! The rest of the bugs depending on this one are
either not bugs in view source (such as bug 105937) or simply RFEs (in other
words, we're now showing (hopefully!) the exact source code of the page, with
some slight coloring tweaks remaining to be more expressive; but this bug as
reported is no longer a problem).
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 151•20 years ago
|
||
Congratulations, and thank you!
Comment 152•19 years ago
|
||
I'm seeing this problem again, in FireFox build 20050823 with bfcache enabled.
I can reliably reproduce it like this:
1. Visit http://www.flinthomes.net/
2. Near the lower-left of the page, under "MLS QuickSearch", hit "Find" (you
can leave the textbox empty)
3. On the resulting page, view the source. It's not the same source that was
rendered (it's as if the page is re-requested without the appropriate POST
variables).
I couldn't find a more specific or recent bug dealing with this, so re-opening
this one.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 153•19 years ago
|
||
Please file new bugs on new problems. If the problem only happens with bfcache
enabled, please make sure that your bug blocks bug 274784 and cc me.
As for the rest, this bug was about the right source being dealt with but not
being _shown_ correctly due to parsing issues. So it sounds like a totally
separate issue.
Status: REOPENED → RESOLVED
Closed: 20 years ago → 19 years ago
Resolution: --- → FIXED
Comment 154•12 years ago
|
||
I have reported the same bug for a view years ago at http://www.mobildiscounter.de
Comment 155•12 years ago
|
||
same bug here at http://www.autoradio-test.org some month ago
(In reply to Jens from comment #154)
> I have reported the same bug for a view years ago at
> http://www.mobildiscounter.de
(In reply to Carry from comment #155)
> same bug here at http://www.autoradio-test.org some month ago
View Source on these pages works for me. If you still see the problem in a fresh build, please file a new bug with detailed steps to reproduce.
You need to log in
before you can comment on or make changes to this bug.
Description
•