Closed Bug 121616 Opened 23 years ago Closed 16 years ago

Page info reports wrong size for compressed pages

Categories

(SeaMonkey :: Page Info, defect)

x86
All
defect
Not set
minor

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: davh, Assigned: db48x)

References

()

Details

For compressed pages it is the compressed size and not the actual size that is
reported.
Both the compressed size (if any) and actual size should be reported because
they are both interresting from each point of view.
mass moving open bugs pertaining to page info to pmac@netscape.com as qa contact.

to find all bugspam pertaining to this, set your search string to
"BigBlueDestinyIsHere".
QA Contact: sairuh → pmac
Oh, you mean like the size of the image in memory? I think mozilla converts them
to  a 24bpp bitmap whenever they're needed, but I don't know if that information
is actually available to anything other than the layout frames themselves. Have
to look into it.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 82059
By "compressed" you mean "content-encoded", right?  If so, reporting the
"actual" size is what we do.  For a content-encoded HTTP transmission, the
encoding is an integral part of the body data.  This means that the version
stored in the cache is compressed, so reporting the "actual" size would involve
decompressing the whole thing...  In addition to which some sites actually use
content-encoding correctly (instead of as a substitute for transfer-encoding). 
And there we should _really_ not decompress.

Or are you talking about something else entirely?
Well, that depends on your definition of actual size. IMO it is the size of the
page I am currently looking at and not the size of a compressed version of it.

And yes, I am talking about Content-Encoding. However few know the difference
between Content- and Transfer-Encoding :(

>In addition to which some sites actually use
>content-encoding correctly (instead of as a substitute for transfer-encoding). 

I have yet to see a point in having both a content- and transfer-encoding, since
they basicly mean the same: The browser has to uncompress to view it.

Why is the page stored compressed in cache, when it will never be used
compressed again? At any time where it is needed it has to be decompressed.

>And there we should _really_ not decompress.

Mozilla already decompresses content-encoded pages...

> I have yet to see a point in having both a content- and transfer-encoding

Transfer-decoded data are to be decompressed as soon as they come off the wire.
The encoding is just there for convenience of transmission.  For content-encoded
data, the encoding is an integral part of the data.  Eg, a .tar.gz file is
content-encoded while HTML that's compressed before sending _should_ be
transfer-encoded.

> Well, that depends on your definition of actual size.

Right.  My point is that "actual size" is the "size of the data", which is
uncompressed for transfer-encoding and compressed for content-encoding (where
the encoding is just a property of the data).

> when it will never be used compressed again? 

Who says it won't?  Mozilla will always decompress before viewing, yes.  But for
saving to disk content-encoded data should _not_ be decoded since the encoding
is an integral part of the data.  (Yes, we currently decode in some cases when
saving but only becase sites are not using transfer-encoding).

The point is, for content-encoded pages there may be legitimate reasons to need
the encoded version.  That's why we store it encoded. 


Component: XP Apps: GUI Features → Page Info
All discussion so far relates to what is meant by "size".  From a user
standpoint, the number of bytes in the window for the HTML file can be
important.  For example, I spend significant time as a user reading amateur
fiction on the Web.  For short chapters, I remain connected while reading.  For
longer chapters, I disconnect after loading the page.  (My ISP and phone company
both appreciate that.)  For really long chapters, I save the page on my hard
drive for reading later (possibly over several days).  This whole process does
not work if Mozilla tells me a 50 KB page is only half that size.  

Please fix this.  
Perhaps we should simply remove this from the UI, since no matter what we do
it's wrong from someone's standpoint...
> Perhaps we should simply remove this from the UI, since no matter what we do
> it's wrong from someone's standpoint... 

That's not solution, Boris. =)

Is possible to detect which content has encoding and which not? If it is
possible, we should indicate content encoding in Page Info sort of this:

Size:   43.39 KB (44429 bytes in Content-Encoded form)
> Is possible to detect which content has encoding and which not?

"Sort of".  It's hard.  It may be impossible from the Page Info dialog.
Question:  What is the size of a page?  

Suggested Answer:  It depends on the client platform.  

Take the page in question and treat it as if it were downloaded via FTP onto a
local hard drive.  That is, if it is HTML, take only the HTML file without any
referenced .gif, .jpg, or other files.  If it is ASCII text, that's what it is.
 The resulting approximates in size the file that would be obtained from Mozilla
if you select "Save Link Target As" from a right-click pull-down on a link to
the page, not the file you would get from "File > Save Page As", which roughly
approximates the visible page.  

For a UNIX platform, use the size that would show when "ls -l" is executed.  

For a PC, use the size that you would see in Windows Explorer or a Properties
window.  Where a Properties window might show (for example) "19.0KB (19,456
bytes), 24,576 bytes used", the number is 19.0 KB because that matches the size
shown by Windows Explorer.  

I'm not sufficiently familiar with the Mac to offer a suggestion for that
platform.  

In any case,  "View > Page Info" should present sizes that meaningful to users.  
> Take the page in question and treat it as if it were downloaded via FTP onto a
> local hard drive.

FTP does not support transfer or content encodings... so there is in fact no way
to figure out what it would look like if that happened.

> The resulting approximates in size the file that would be obtained from
> Mozilla if you select "Save Link Target As" from a right-click pull-down

The size of that file depends on some mind-reading the "save as" code does as to
what the server "really means" when it says "Content-Encoding: gzip".

What does "meaningful to user" mean?  What information is the user looking for
exactly?  Most users who look at Page Info seem to want to know how long the
page would have taken to download based on the size, or something like that. 
You're not, but you seem to be the exception.

Again, I stand by comment 7.  No matter what we show here it will be "wrong" and
showing the decompressed size is not technically feasible anyway in most cases,
so we should just nix this item.  It's a holdover from the NS4 days, and NS4
does not support content encodings...
Perhaps their should be a small footnote that the window can't be completely
accurate, that way it doesn't decieve users.

For example, slashdot, is 12k according to mozilla, it's really a few times larger.

Developers would appreciate this note, as checking page size is important.  And
Mozilla is a great developers platform thanks to JavaScript debugging among
other features (and some great extensions).

Is a small footnote right below the size a possibility?  Just to note something
like "May be inaccurate if page is compressed", or something to that effect?
Couldn't there be a (##kb when uncompressed) addition after the current size
when looking at the page info for an object? Yes, this would require the object
to be decompressed, but surely this happened already to display it, so the
result can be cached?
> but surely this happened already to display it, so the result can be cached?

The result is never even available.  The decompression code uses a streaming
decompression algorithm that decompresses the data a chunk at a time; the total
length is not really ever known anywhere (the parser code just gets a bunch of
calls with "here is another data chunk").  Someone could calculate that length,
add it all up, pass it back out, etc, but doing this in the parser is hard
because of document.write calls and doing it anywhere else is pretty much
impossible.
Perhaps just write "Compressed Page" for Size when this is done?

That way, only uncompressed shows the page size.  We can't be wrong if we don't
say something right?

Would something like that be fesable?
> but doing this in the parser is hard because of document.write calls

surely document.write does not call OnDataAvailable, which is the function that
knows about the length of the passed-in data?
I'm actually not sure what document.write does, exactly.  Chances are, you're
right that it does not call OnDataAvailable.  I'm also not keen on adding hooks
to parser, content sink, and document all for the sake of little-used page info
functionality, of course.
OK I was confused by this too. Simply saying "(Compressed)" after the size (IF
it *was* compressed) would be enough to remove the confusion.
For bonus points, show the real size as well, but people can always save to disk
as a workaround.
See comment 1 for bug #263393, which cites the same URL as does this one.  A
quick test of View Info shows that the size by Mozilla is from GET while the
actual size is near what is obtained from HEAD.  

Perhaps, HEAD should be used to obtain the size to fix this bug.  

However, bug #160454 requests that Mozilla no longer use HEAD when processing
"Save As" (and extending to 263393, when processing "Save Target Link As"). 
Care needs to be taken to ensure HEAD is still used where appropriate.  
> A quick test of View Info shows that the size by Mozilla is from GET

It's the size of the actual data returned by the server, as reported by the cache.

> Perhaps, HEAD should be used to obtain the size to fix this bug.  

That would only "fix" it for broken servers like that in bug 263393.  Any time
any headers differ between HEAD and GET that's a bug in the server.
Whether the downloaded page is compressed or not, it seems the page info 'size'
is of limited value anyway.
As mentioned by others before, the size reported if the page was originally
compressed is the size of the compressed page (not the uncompressed and rendered
page). However, I've noticed that for pages that are not compressed to start
with, the value is wrong anyway (or is it?). For example, for www.xulplanet.com,
the size is reported as 8637 bytes, but if I save then it's 8663 on disk. So
which is correct?
Additionally, the size reported is for the html only, and doesn't include image
sizes and the effects of applying any css. The true size of everything that's
rendered (even for an uncompressed page) could be vastly different from what's
reported.
The 'size' field is so vague... least of all it needs a better tag. IMO
something does need to change somewhere.
(In reply to comment #21)
>  For example, for www.xulplanet.com,
> the size is reported as 8637 bytes, but if I save then it's 8663 on disk. So
> which is correct?

you probably saved as "web page, complete", which modifies the page, and is thus
useless for comparing size values.
(In reply to comment #22)
> you probably saved as "web page, complete", which modifies the page, and is thus
> useless for comparing size values.

Yes, that's it exactly (if saving as html only then the sizes do match). But I
always do that ("web page, complete"), as I'm sure many others do, so this
anomoly will be common.
Probably the biggest use for this feature is checking/optimising download
speeds. For the most part seeing the compressed size is pretty useful.

However... are there any modern (say v4+) browsers which don't support
compression? If so, then they are not going to get the compressed page,
therefore it is irrelevant what *my* browser might show me, and you *would* want
to check the uncompressed size to understand what some users would get.
> are there any modern (say v4+) browsers which don't support compression?

NS4 is a v4+ browser.  It doesn't do HTTP/1.1.

Some proxy servers probably only do HTTP/1.0 too.

In general, if you want to find broken or silly clients (or servers), you sure can.
Okay.  I was on the wrong track with my comment 19.  

Please note, however, that page size is useful to me, per my comment 6.  I can
guses a page to be relatively short by the size of the vertical scrollbar
slider.  However, for longer pages, this is not effective; the slider seems
equally small for text pages of 50KB and 100KB.  

If determining the size of the page in the browser window is not practical, I
suggest using the size of the file obtained from a "Save Page As".  For HTML
files, this would be the size of the file when selecting "Web Page, HTML only".
 I suggest this (1) because graphics (the most likely additional files for "Web
Page, complete") are often gratuitous and thus not saved and (2) because we can
get separate sizing information on all other components via the Media tab of
"View Page Info".  
Product: Browser → Seamonkey
Why is this bug specific to "Mozilla Application Suite" and "Linux"?
It is the same on Firefox and on other OSes, too.
See also bug 271370.
OK, I changed it to all OSes.
However, I don't see what to do about the Product; it really relates to Firefox
AND the Suite, which isn't an option.
(There are a lot of bugs which have this issue...)
OS: Linux → All
*** Bug 299453 has been marked as a duplicate of this bug. ***
Any news on this?
QA Contact: pmac
As per comment 7 page size is not shown in the General tab.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Well I must have been dreaming because size /is/ shown in the general tab.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Status: REOPENED → NEW
(In reply to Comment #7)

> Perhaps we should simply remove this from the UI, since no matter what we do
> it's wrong from someone's standpoint...
If it's always wrong from someone's viewpoint. Then there is no definitive "right" size. WONTFIX.
Status: NEW → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.