Open
Bug 61363
(latemeta)
Opened 24 years ago
Updated 3 years ago
Make sure that chardetng-triggered encoding reload is read from the cache
Categories
(Core :: DOM: HTML Parser, enhancement, P5)
Core
DOM: HTML Parser
Tracking
()
NEW
Future
People
(Reporter: pollmann, Unassigned)
References
()
Details
(Keywords: helpwanted, testcase, Whiteboard: Please read comment 115.)
Attachments
(1 file)
(deleted),
patch
|
Details | Diff | Splinter Review |
This is a follow-on to bug 27006. We need to come up with a "real" fix for this
problem. Since I'm hoping that we can somehow force the reload to come from the
cache instead of the server, I'm starting out with this assigned to Gagan. I'll
send out an email trying to get a meeting set up so we can work out more details.
->to cache
Assignee: gagan → neeti
Component: Networking → Networking: Cache
QA Contact: tever → gordon
Target Milestone: --- → M1
Eric, what do you need from the cache and/or http?
Target Milestone: mozilla0.9 → mozilla0.9.1
Darin, this looks like a dup of the other <meta> charset bug you're working on.
Assignee: gordon → darin
Comment 6•24 years ago
|
||
actually, i'm going to use this bug to track this problem. we need:
1) support for overlapped i/o in the disk cache.
2) ability to background the first load and let it finish on its own.
If we had 1) then I wonder if for blocked layout, uninterrupted streaming of
data to cache would be a better solution than our filling up the pipes and
subsequently taking the socket off the select list. This way our network
requests would never pause for layout/other blocking events-- that means we'd be
fast. Maybe we need a PushToBackground (with a better name) on nsIRequest. The
implementation of PushToBackground will simply take all "end" listeners off and
continue to stream data to the cache. So consumers that are currently killing
our first channel would just push it to background and make new requests for the
same URL. What say?
Comment 8•24 years ago
|
||
agreed.. we do need some sort of communication from the parser to http to tell
it to keep going "in the background"... there are two options i see...
1) parser could just eat all the data; then, http would not even need to be made
aware of what's going on.
2) parser could return a special error code from OnDataAvailable that would
instruct HTTP to not call OnDataAvailable anymore, but to just continue
streaming the data into the cache... this error code could perhaps be
NS_BASE_STREAM_CLOSED.
I'm not sure that option 2 would be that much more efficient than option 1...
option 1 would be a lot easier to implement, but option 2 could be used by
any client of http.
gagan: i'm not sure we can completely background the download... as this would
require pushing it to another thread, which would be difficult.
either way we need overlapped io in the cache. Setting that as the first target
and giving to gordon. This is a serious bug (ibench and double-post)
Assignee: darin → gordon
Keywords: topperf
Comment 10•24 years ago
|
||
*** Bug 78018 has been marked as a duplicate of this bug. ***
Comment 11•24 years ago
|
||
*** Bug 78494 has been marked as a duplicate of this bug. ***
Updated•24 years ago
|
Whiteboard: want for mozilla 0.9.1
Comment 12•24 years ago
|
||
In order to implement over-lapped I/O in the cache, I'll need to finish the Disk
Cache Level 2, which includes the necessary stream-wrappers.
Depends on: 72507
Keywords: nsenterprise
Comment 14•23 years ago
|
||
can't make it to 0.9.2. pushing over...
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Comment 16•23 years ago
|
||
Removing nsenterprise nomination. Adding nsBranch.
Keywords: nsenterprise → nsBranch
Comment 17•23 years ago
|
||
Gordon/Gagan - This likes a good one to take. How close are you to resolving
this one? If it can't be finished this week, pls mark it as nsbranch- for this
round.
Comment 18•23 years ago
|
||
We are not close on this. It's very doubtful this will be ready to land in the
next month.
Keywords: mozilla1.0
Comment 19•23 years ago
|
||
any chance this will make the MachV train?
Comment 20•23 years ago
|
||
Darin and I are backing off of supporting overlapped I/O in the cache (which was
the reason I was given this bug). We need review the severity and potential
fixes, since necko has changed quite a bit since this bug was originally
reported. I'll meet with him and update the bug with our current thoughts.
Comment 21•23 years ago
|
||
cc'ing shaver, since i know he has comments on this one.
If you submit a POST form, and the returned HTML has a charset -- as is the case
with a number of e-commerce sites in Canada, where we have accents and things --
then you get the scary "resubmit your data?" dialog, sometimes twice. That
dialog is doubly scary when you're slinging around your credit card with
non-refundable tickets, so I've had to spin up IE for some purchases to keep my
blood pressure down.
I don't understand why we have to go to the network or the cache for this. When
we hit a <meta charset> tag, we just need to go back and fix up attribute values
to match the new character set, and then make sure that future content is
charset-parsed appropriately. I don't think it's ever possible for the charset
to change the structure of the document, because in that case we might not
really have seen <meta> and the whole thing collapses on itself.
"Overlapping I/O" sounds like a win other things (multiple copies of an image on
a page, where the second one is requested while the first one is still coming
in?), to be honest, but I don't think the right fix here involves any I/O driven
by a <meta>, just attribute fixup. And since overlapped I/O seems to be rocket
science, why not let a DOM/parser guy take a swing at it?
Comment 23•23 years ago
|
||
agreed... falling back on the cache/necko is just a hack solution at best.
-> parser
shaver: btw, imagelib already gates image requests to avoid multiple hits on the
disk cache / network for an image that appears more than once on a page.
Assignee: gordon → harishd
Component: Networking: Cache → Parser
QA Contact: gordon → moied
Comment 24•23 years ago
|
||
Do we understand the situations where the 'meta charset sniffer' is failing --
thus forcing us to reload the document?
our current sniffing code looks at the first buffer of data for a
meta-charset... so, i'm assuming that in situations where we reload, the server
has sent us a 'small' first packet of data...
is this *really* the case, or has our sniffer broken?
-- rick
Comment 25•23 years ago
|
||
Can't answer the question about server reloads of POST documents, but for
the GET case of a document, the sniffer is working (i.e., we don't do
the double GET as long as the meta tag is within 2K bytes of the beginning of
the document; otherwise, we do the double GET).
Comment 26•23 years ago
|
||
rpotts: what jrgm said... otherwise, we'd have seen a huge regression on ibench
times.
And duplicate stylesheets and script transclusions from frames in framesets?
Not to hijack this bug with netwerk talk, now that we've punted it back
(correctly, IMO) to the parser guys -- hi, Harish! -- but it seems like this is
a correctness issue for more than just <meta> tags. I don't see another bug in
which to beat this dying horse, but I'll be more than happy to take the
discussion to one that someone finds or files.
Comment 28•23 years ago
|
||
shaver: duplicate css and js loads are serialized. hopefully, this is not too
costly in practice.
Comment 29•23 years ago
|
||
>Do we understand the situations where the 'meta charset sniffer' is failing --
>thus forcing us to reload the document?
>
>our current sniffing code looks at the first buffer of data for a
>meta-charset... so, i'm assuming that in situations where we reload, the server
>has sent us a 'small' first packet of data...
>
>is this *really* the case, or has our sniffer broken?
First of all, the sniffing code is origional designed as an "inperfect" performance
tuning for "most of the case" instead of a perfect all-the-case general solution
You are right, it only look at the first block. And it is possible in theory that
the meta tag can happen thoudans bytes after (I saw larnge js in front of that
before)
second, even the meta code sniff correctly, we still need the reload mechanism
work correctly for charset-detector (by exam bytes and freq analysis) reload
turn "character set:auto-detect" from "(Off)" to "All" and visit some
non latin 1 text file and you will see the reload kick in.
add shanjian
Why do we need to reload for the charset sniffer? Can't it just look at text
runs and attribute values to do frequency analysis, and the perform the in-place
switchover described above? The document structure had better not change due to
a charset shift, or there's nothing we can do without an explicit and correct
charset value in the headers.
Comment 31•23 years ago
|
||
I expect a reload hack after will always be "easier" than fixing up a bunch of
strings in content node members. Cc'ing jst. But easier isn't always better
(nor is worse, always). If we can avoid reloads, let's do it.
ftang, is it possible shaver's canadian e-commerce website POST data reloads are
due to universalchardet and not a sniffer failure?
/be
A great test case is this: go to the URL I just added, and click:
[Click Here to Generate Currency Table]
You'll be treated to _four_ POST data alerts, two of each type.
(For bonus marks, just _try_ to use the back button or alt-left to go back in
history.)
Comment 33•23 years ago
|
||
Correct me if I'm wrong:
There is code in nsObserverBase::NotifyWebShell() to prevent reload for a POST data.
Comment 34•23 years ago
|
||
If meta charset sniffing fails then we fall back on the tag-observer mechansim
to |reload| the document with a new charset. However, the code in nsOberverBase
would make sure that we don't reload POST data. Therefore, we should never ( I
think ) encounter double-submit-problem. The draw back, however, is that the
document wouldn't get the requested charset.
Comment 35•23 years ago
|
||
I don't think it's even possible to always correctly do the fixup in the content
nodes after we realize what the charset really should be. The bad conversion
that already happened could've actually lost information if there were
characters in the stream that were not convertable to whatever charset we
converted to, right?
Comment 36•23 years ago
|
||
there has to be some way to do this without going back to the cache/network for
the data. remember: the cache isn't guaranteed to be present. we need a
solution for this bug that doesn't involve going back to netlib.
Comment 37•23 years ago
|
||
jst: Is there a way to throw away the content nodes, that got generated before
encountering a META tag with charset, without reloading the document?
jst: aren't we storing text and attributes as UCS2 -- unless they were
all-ASCII, in which case we can trivially reinflate. From either of those
conditions, I think we should be able to reconstruct the original (on-wire) text
runs, if we haven't thrown away the original charset info, and then re-decode
with the new character set.
I thought, and that paragraph is very much predicated on this belief, that we
only converted to "native" charsets at the borders of the application: file
names, font/glyph work for rendering, etc. If that's not the case, I will just
go throw myself into traffic now.
Comment 39•23 years ago
|
||
If the conversion from the input stream to unicode (using the default charset)
and back to what we had in the input stream is reliably doable, then yes, we
could convert things back and re-convert once we know the correct charset. But
I'm not sure that's doable with our i18n converter code... ftang, thoughts?
Comment 40•23 years ago
|
||
Harish, yes, we can reset the document and start over if we have the data to
start over from.
Comment 41•23 years ago
|
||
Strange... I don't see any reposts on the www.xe.net URL when I click on
the [Click Here to Generate Currency Table] button.
I'm using Mozilla0.9.8 on Linux.
But when I choose to View-source on the generated table it is blank.
Updated•23 years ago
|
I'm underwater with 0.9.9 reviews and approvals, but I wanted to toss this up
for discussion. If people agree that it's a viable path, there are a bunch of
improvements that can be made as well: text nodes already know if they're
all-ASCII, for example, though I don't know how to ask them.
Big issues that I need assurance on:
- all parts of the DOM can handle having their cdata/text/attribute string
values set, including <script>, and will DTRT. (I fear re-running scripts!)
- the entire concept of re-encoding in the old charset and then decoding with
the new one is viable. (Ignore, for now, the XXX comment about the new
buffer having more characters.)
Be gentle, but be thorough!
Updated•23 years ago
|
Keywords: mozilla1.0+
Updated•23 years ago
|
Keywords: mozilla1.0
Comment 43•23 years ago
|
||
I can't say I had a close look or anything but I really like this aproach, this
would be light years ahead of what we have now (assuming it actually works, that
is :-)
Comment 44•23 years ago
|
||
We need Ftang's input too.
Comment 45•23 years ago
|
||
The proposal doesn't cover encoding the unparsed data with the correct charset (
new charset ).
Btw, if the approach works, can we remove the META tag sniffing code?
Yes, you're right: I forgot to mention that we need to update the parser's
notion of current charset.
smontagu has made me nervous about roundtripping illegal values. I'm hoping
he'll say more here.
Mike
Comment 47•23 years ago
|
||
*** Bug 129074 has been marked as a duplicate of this bug. ***
Comment 48•23 years ago
|
||
Shaver is working on this. Mike, should I assign this bug to you?
Comment 49•23 years ago
|
||
I wish I paid attention to this bug earlier. I suggested the same approach when
ftang first explained mozilla's doc charset handling. However, I have to say
that the final patch might be more complicated that shaver's patch.
Is it possible to convert text back from unicode to current character encoding
and reconvert to unicode with new encoding? I want to share some of my
understanding. Theoritically, the answer is NO. It is true ( or practically
true) that unicode charset covers almost all the native charsets we can
encounter today. But not all code points in a non-unicode encoding are valid.
For example, in iso-8859-1, code point 0x81 is not defined. If the incoming data
stream is encoded in win1251, 0x81 is a valid code point. Suppose somehow we use
iso-8859-1 to interprete the text data, code point 0x81 will be converted to
unicode u+fffd. When we later try to convert this code point back, there is no
way to figure out where it comes from. I believe this is the only scenario we
need to worry about. (It is possible that for some encoding, more than one code
points map to a single unicode code point. If that is the case, it is a bug and
we can always fix it in unicode conversion module.)
I could not figure out a perfect solution to this problem at this time, but I
would like to suggest 2 approaches for further discussion.
1) Could we always buffer current page? Probably inside parser?
2) We can use a series of unassigned code points in unicode for unassigned code
point and change our charset mapping table. The aim is to make charset
conversion to be round trip for any character. For single byte encoding, we have
at most 256 code points, and most of them should be assigned. For multi-byte
encoding, we can interprete illegal byte sequence byte by byte. This practice
must be kept internal as much as possible. This should make mike's approach
feasible.
(We can't ignore the existence of illegal code points in many websites. In many
cases, a illegal code point usually suggests a wrong encoding. To interrupt the
process when meeting a invalide code point does not seem like a good idea.)
Comment 50•23 years ago
|
||
This happends not only with russian. Set any autodetection and I'll see it.
Comment 52•23 years ago
|
||
*** Bug 116217 has been marked as a duplicate of this bug. ***
Comment 53•23 years ago
|
||
updating summary
Summary: <meta> with charset should reload from cache, not server → <meta> with charset should NOT cause reload
Comment 54•23 years ago
|
||
*** Bug 131524 has been marked as a duplicate of this bug. ***
Comment 55•23 years ago
|
||
*** Bug 131966 has been marked as a duplicate of this bug. ***
Comment 56•23 years ago
|
||
Shaver: What's the status on this? Can this be done in the 1.0 time frame? If
not let's move it to a more realistic mile stone.
Comment 57•23 years ago
|
||
Looks like this is definitely not going to make it to the m1.0 train ( Shaver? ).
Giving bug to Shaver so that he can target it to a more realistic milestone.
Assignee: harishd → shaver
Comment 58•23 years ago
|
||
*** Bug 135852 has been marked as a duplicate of this bug. ***
Comment 59•23 years ago
|
||
*** Bug 129196 has been marked as a duplicate of this bug. ***
Comment 60•23 years ago
|
||
Attempt to reduce dupes by adding "twice" and "two"
Summary: <meta> with charset should NOT cause reload → <meta> with charset should NOT cause reload (loads twice/two times)
Comment 61•23 years ago
|
||
*** Bug 117647 has been marked as a duplicate of this bug. ***
Comment 62•23 years ago
|
||
*** Bug 139659 has been marked as a duplicate of this bug. ***
Comment 63•23 years ago
|
||
*** Bug 102407 has been marked as a duplicate of this bug. ***
Comment 64•23 years ago
|
||
Adding topembed to this one since Bug 102407 was marked duplicate of this. Many
sites from the evangelism effort demonstrates the POSTDATA popup problem. See
more info in bug: Bug 102407.
Adding jaimejr and roger.
Keywords: topembed
Comment 65•23 years ago
|
||
topembed+. carrying topembed+ over from Bug 102407.
Updated•23 years ago
|
Comment 66•23 years ago
|
||
Seems loike a few customers are interested in this one getting fixed soon. What
are the chances we could have a fix in the next week?
Comment 67•23 years ago
|
||
Take note of bug 81253. It definitely wasn't a complete fix for this bug, but it
dealt with the 90% case. Specifically, we do not reload if the META tag is in
the first buffer delivered by the server. Can someone confirm that the new bugs
are cases where the META is not in the first buffer? Or did the code change from
81253 just rot?
Comment 68•23 years ago
|
||
> Or did the code change from 81253 just rot?
It ain't rotten.
Comment 69•23 years ago
|
||
As vidur notes, for the most common case, if the document returned by GET or
POST has a "<meta http-equiv='Content-Type' content='text/html; charset=...'>"
within the first ~2k returned (and not beyond that point), then we do not
re-request the document.
The other bugs that have been marked recent dupe's involve charset
auto-detection and/or more elaborate form submission scenarios.
Comment 70•23 years ago
|
||
We may have to choose not to fix this in 1.0 time frame because of the comlexity
and risk. But we have to fix it sooner or later. It is just unacceptable for
websites without meta charset, but involved in form submitting.
Yeah, the roundtripping of illegal values makes this turn into something like
rocket science. I haven't had good success getting i18n brains on this one, no
doubt because they're swamped with 1.0/nsbeta issues as well.
Let's reconvene for 1.1alpha.
Status: NEW → ASSIGNED
Target Milestone: mozilla1.0 → mozilla1.1alpha
Comment 72•23 years ago
|
||
It's important to remember that the patch to
http://bugzilla.mozilla.org/show_bug.cgi?id=81253 looks for the META-charset in
the first buffer of data. This is at most ~2-4k, however, it is whatever the
network hands out to the parser... It 'could' be much less...
Perhaps, some of the remaining problems are due to servers which return a much
smaller block of data in the first response buffer...
-- rick
Comment 73•23 years ago
|
||
rick: yup that would also cause problems, but i think a large part of the
problem has to do with charset detection. say there is no meta tag... if we
don't know the charset of the document, and we try to sniff out the charset,
then there'll always be a charset reload. that seems like the killer here IMO.
it seems like we hit this problem *a lot* when "auto-detect" is enabled.
Comment 74•23 years ago
|
||
Sorry for the spam, but any chance for fixing it?
It's wery annoyng when using character set autodetection and russians etc must
use this feature. I heard many questions about this problem in Moz 1.0 PRx and
Netscape 7.0 PR1...
Updated•22 years ago
|
Whiteboard: [ADT2] → [ADT2 RTM]
Comment 75•22 years ago
|
||
A short term solution that we're considering is:
1) The default charset for a page should be set to the charset of the referrer
(both in the link and the form submission case). This is dealt with by bug 143579.
2) Auto-detection should not happen when there's POST data associated with a page.
Some pages may not be rendered correctly, but this solution should deal with the
common case. Reassigning this bug to Shanjian.
Comment 76•22 years ago
|
||
I am going to handle this problem in 102407 using above approach, and leave this
bug open for future.
Comment 77•22 years ago
|
||
Jaime, you might want to remove some keywords in this bug.
Comment 78•22 years ago
|
||
thanks shanjian!
removing nsbeta1+/[adt2 RTM], and strongly suggest Driver's remove Mozilla1.0+,
and EDT remove topembed+, as the short term solution (safer, saner) will be
addressed in bug 102407, well relegating this issue to WFM, or just edge cases.
Keywords: nsbeta1+
Updated•22 years ago
|
Whiteboard: [ADT2 RTM]
Comment 79•22 years ago
|
||
Just porting my comment from bug 102407:
Why cannot you keep loading the document to the end even
though meta charset says it's in another charset and after the document has
finished, reload the document from the cache in the same way as viewing source
(finally!) works. The performace could degrade but at least
Mozilla would be doing the right thing -- and for big files reloading full thing
from the cache would be faster than loading from server anyway. Asyncronous
loading to cache would be cool, but it's needed for a feature that isn't used
that much. Performance can be increased later if *really* seen important but how
often charset is changed between page changes anyway?. 9 times of 10 I've seen
this bug is because automatic charset detection has detected the charset
incorrectly and reloads the document even though it should be doing nothing.
I put up a little test at http://www.cc.jyu.fi/~mira/moz/moztest.php which uses
cookies to save 7 last page loading times and changes charset every now and
then. And sends meta charset after 2K. Automatic reloading can be seen as
subsecond reload times and flashing on browser view.
Comment 81•22 years ago
|
||
We should look at fixing this one for the next release, because it is a
performace issue.
Comment 83•22 years ago
|
||
*** Bug 158331 has been marked as a duplicate of this bug. ***
Comment 84•22 years ago
|
||
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity. Only changing open bugs to
minimize unnecessary spam. Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: major → critical
Comment 85•22 years ago
|
||
*** Bug 88701 has been marked as a duplicate of this bug. ***
Updated•21 years ago
|
Summary: <meta> with charset should NOT cause reload (loads twice/two times) → <meta> with charset and autodetection should NOT cause reload (loads twice/two times)
Comment 87•21 years ago
|
||
*** Bug 171425 has been marked as a duplicate of this bug. ***
Comment 88•21 years ago
|
||
*** Bug 77702 has been marked as a duplicate of this bug. ***
Comment 89•21 years ago
|
||
*** Bug 137936 has been marked as a duplicate of this bug. ***
Comment 90•21 years ago
|
||
Could we update the target milestone for this 3 year old bug? I think we missed
1.2 ;-)
Updated•21 years ago
|
Assignee: shanjian → parser
Status: ASSIGNED → NEW
Priority: P2 → P1
Updated•21 years ago
|
Target Milestone: mozilla1.2beta → Future
Comment 91•21 years ago
|
||
*** Bug 235160 has been marked as a duplicate of this bug. ***
Comment 92•20 years ago
|
||
*** Bug 248610 has been marked as a duplicate of this bug. ***
Comment 93•20 years ago
|
||
*** Bug 287569 has been marked as a duplicate of this bug. ***
Comment 94•19 years ago
|
||
Status report? This bug has been marked "Severity: Critical", "Priority: 1" and
has keyword "dataloss" and still there hasn't been even a status update in last
3+ years?
Can somebody comment about how hard it would be to implement my suggestion in
comment #79? I haven't hacked with Mozilla's C++ source, so I have no idea.
Here's the suggested algorithm again reworded:
1. In case the meta charset (or any other heuristics) tells Mozilla that it's
using incorrect charset, raise a flag that document is displayed with incorrect
character set.
2. Regardless of this problem, keep going until the document has been fully
transferred so that Mozilla has full copy of it in the cache.
3. Reload the page from cache with the correct charset. (I'm hoping that cache
has *binary* copy of transferred data, not something that has gone through
parser and is therefore hosed anyway.) If the View Source feature can work
without reloading the POSTed page, then it should be possible to reload from
cache, too.
Comment 95•19 years ago
|
||
thank you for volunteering
Updated•19 years ago
|
Assignee: mira → mrbkap
Priority: P3 → --
QA Contact: moied → parser
Target Milestone: Future → ---
Updated•19 years ago
|
Priority: -- → P3
Target Milestone: --- → mozilla1.9alpha
Comment 96•19 years ago
|
||
I can't seem to reproduce this on the site in the URL. Can someone please update the URL with a testcase that shows this?
Target Milestone: mozilla1.9alpha → Future
Comment 97•19 years ago
|
||
(In reply to comment #96)
> I can't seem to reproduce this on the site in the URL. Can someone please
> update the URL with a testcase that shows this?
I was about to change the url from http://www.xe.net/ict/ to one I mentioned in comment #79 (http://www.cc.jyu.fi/~mira/moz/moztest.php) but I wasn't allowed to. The test case changes between iso-8859-1 and iso-8859-15 every second. Hit "Reload page via GET" link a couple of times (wait a few seconds between tries) to see the problem. The test case uses cookies for timing the requests from a single browser. You should be able to see the euro sign when the page text says "iso-8859-15" and there should be a generic currency sign when page text says "iso-8859-1". With GET this is true (the page loads twice if there's a problem) whereas with POST you get incorrect rendering. I have View - Character Encoding - Auto-Detect - (Off) set in case that matters.
I still see the problem with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20051108 Firefox/1.6a1.
A workaround for this bug is to include the meta charset declaration in the first 2048 bytes of a file.
Comment 98•18 years ago
|
||
hi,
i can confirm the problem for the new testcase testing
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3 ID:2007030919
and
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a2pre ID:2007041704 [cairo]
Comment 100•16 years ago
|
||
I'm still seeing this problem on Firefox 2.0.0.14 ...
I use a page with
<META HTTP-EQUIV="Content-Language" CONTENT="ro">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2">
and all pages that contain this are requested twice from the server the first time they are requested. If they already been requested during the current session, the server receives only one request.
This error is a big problem for me, since on generated pages, data is processed twice and I get incorrect data in the database because of this...
The only workaround so far was to
Comment 101•16 years ago
|
||
bug 463175 is reporting the same reload twice problem. Always reproduceable.
This bug still exists in 3.1.10
This bug is serious, causes dataloss, and makes modalDialog stop working.
It tells me the default behaviour is not supporting "extended" charsets. Which it really should be.
If the page is somehow faulty, raise a flag, and inform the user. This reload behaviour seems like a fix with good intentions, but this is not a good idea for any page that is dynamic, submits data back or is used in a RIA.
Updated•16 years ago
|
Flags: blocking1.9.2?
Comment 103•16 years ago
|
||
This bug needs a test case. The test case should be on the web. That way we can see how other browsers interact with the same web page. The test case should clearly show how data loss occurs. It should also show how the modal dialog stops working. Any other defects should be clearly shown. There should be "Expected Behavior" and "Actual Behavior." The test case should be entered into the URL field of this bug report.
Comment 104•16 years ago
|
||
Check bug 463175. I made a test page where you can see the load twice behaviour. The testpage is in th URL field for that bug. That's where I thought it should go, I am new to this forum, still trying to get my head around how this forum works...
Here is that page:
http://beta.backbone.se/tools/test/loadtwice/loadtwice.html
Spimle, repreducible, happends every time, no swapping of META tags needed as suggested above. I do not know why the URL for this bug points to xe.com.
Here is how the showModalDialog stops working: When the modal dialog is first opened the arguments are fine. But since the double load behaviour then loads the page once more, the arguments are lost (returns null). This is actually how we first found this bug. It took a long time to backtrack that this was actually happening to ALL non cached pages in FF without charset tag. Difficult to believe, but there it is.
Data loss can occur in all sorts of ways. See thread above.
Since I am a bit taken back by this, I would like to emphasize three side effects that might speed things up on your end (I guess if the right people read it). I take the chance of barking up the wrong tree, and if I do please accept my appologies.
Three serious side effects I can think of:
1. Developers need to include the correct charset tag in ALL pages.
This is actually something we should have done in the first place. But for a big site or system, this is months of hard work for any developer.
Ironically this includes all(?) pages in this forum ;-)
2. Performance. New pages load twice, dynamic pages using ids in urls load twice.
3. Browser statisics is wrong. A large chunk of the FF penetration figures should be taken out. This might be the most serious.
Since I like irony, I will make a little experiment by writing the letter ä in this comment. Voila, this page is hit twice first time it is requested by FF! Same as the page with bug 463175.
For the developer that cannot wait for this bug to be fixed, here is what we had to do (fixes side effect 1 above):
1. Convert all pages to UTF-8
2. Pray that the developer tool or HTML-editor you use has support for UTF-8
3. Place the correct charset meta tag in the header of all pages
4. If your webserver is IIS, all parameters must be uri encoded or you loose all extended characters
5. Rewrite you cookie routines to support extended characters as well
This took us four months, and still not everything in place :-(
Hope this helps.
Good luck!
Comment 105•16 years ago
|
||
I suggest renaming this bug to:
<meta> with charset and autodetection OR charset missing, should NOT cause reload (loads twice/two times)
Comment 106•16 years ago
|
||
Testcase:
1. Set Character Encoding to Auto Detect.
2. Go to URL: http://beta.backbone.se/tools/test/loadtwice/loadtwice.html
Expected Results: Page loads once
Actual Results: Page loads twice
Flags: wanted1.9.2?
Keywords: testcase
Summary: <meta> with charset and autodetection should NOT cause reload (loads twice/two times) → <meta> tag with charset and autodetection OR charset missing, should NOT cause reload (loads twice/two times)
Comment 107•15 years ago
|
||
Unfortunately I don't think we can fix this for 1.9.2 as this is far from a trivial problem to fix, and we don't have anyone right now with the time to spend on this.
However, if people feel there's value in making the effects of this when it comes to showModalDialog() go away (i.e. if we preserve dialog arguments across reloads), I think we could do *that* for 1.9.2.
I'd like to hear what people think about doing the showModalDialog() part of this only for 1.9.2. I know it sucks to not fix the underlying problem here now, but as I said, it's not very easy to fix in our code, and I'd rather see us fix this for the HTML5 parser than worrying about it in the current parser code. Leaving this nominated for now until I hear some thoughts here.
Updated•15 years ago
|
Assignee: mrbkap → nobody
The HTML5 spec prescribes reparsing when the <meta> is so far from the start of the file that the prescan doesn't find it.
As for chardet, I've made the HTML5 parser only run chardet (if enabled) over the prescan buffer, so chardet-related reparses should be eliminated. However, the HTML5 parser needs more testing in CJK and Cyrillic locales to assess whether the setup is good enough.
Updated•15 years ago
|
Flags: wanted1.9.2?
Flags: wanted1.9.2-
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Information for Web authors seeing this problem and finding this report here in Bugzilla:
This problem can be 100% avoided by the Web page author by using HTML correctly as required by the HTML specification. There are three different solutions any one of which can be used:
1) Configure your server to declare the character encoding in the Content-Type HTTP header. For example, if your HTML document is encoded as UTF-8 (the preferred encoding for Web pages), make your servers send the HTTP header
Content-Type: text/html; charset=utf-8
instead of
Content-Type: text/html
This solution works with any character encoding supported by Firefox.
OR
2) Make sure that you declare the character encoding of your HTML document using a "meta" element within the first 1024 bytes of your document. That is, if you are using UTF-8 (as you should considering that UTF-8 is the preferred encoding for Web pages), start your document with
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
<title>…
and don't put comments, scripts or other stuff before <meta charset=utf-8>.
This solution works with any character encoding supported by Firefox except UTF-16 encodings, but UTF-16 should not be used for interchange anyway.
OR
3) Start your document with a BOM (byte order mark). If you're using UTF-8, make the first three bytes of your file be 0xEF, 0xBB, 0xBF. You probably should not use this method unless you're sure that the software you are using won't accidentally delete these three bytes.
This solution works only with UTF-8 and UTF-16, but UTF-16 should not be used for interchange anyway, which is why I did not give the magic bytes for UTF-16.
- -
As for fixing this:
This bug is WONTFIX for practical purposes, since fixing this would take a substantial amount of work for very little gain. Anyone capable of fixing that this will probably always have higher priority things to work on.
But if this was to be fixed, the first step would be figuring out what WebKit and IE do. Without actually figuring that out, here are a couple of ideas how this could be fixed:
1) In the case of a late meta, if we want to continue to honor late metas (which isn't a given), we should keep the bytes that the HTML parser has already consumed and keep consuming the network stream into that buffer while causing the docshell to do a renavigation without hitting the network again but instead restarting the parser with the buffer mentioned earlier in this sentence.
2) In the case of chardet, it might be theoretically possible to replace chardet with a multi-encoding decoder with an internal buffer. The decoder would work like this: As long as the incoming bytes are ASCII-only, the decoder would immediately emit the corresponding Basic Latin characters. Upon seeing an non-ASCII byte, the decoder would accumulate bytes into its internal buffer until it can commit to a guess about their encoding. Upon committing to the guess, the decoder would emit its internal buffer decoded according to the guest encoding. Thereafter, the decoder would act just like a normal decoder for that encoding.
But it would be a bad idea to pursue these ideas without first carefully finding out what WebKit and IE do. I hear that WebKit gets away with much less complexity in this area compared to what Gecko implements.
Alias: latemeta
Severity: major → enhancement
Priority: P3 → --
Summary: <meta> tag with charset and autodetection OR charset missing, should NOT cause reload (loads twice/two times) → Late charset <meta> or autodetection (chardet) should NOT cause reload (loads twice/two times)
Whiteboard: [adt2] → Please read comment 110.
Comment 112•6 years ago
|
||
I can't seem to reproduce this on the site in the URL. Can someone please
update the URL with a testcase that shows this?There is code in nsObserverBase::NotifyWebShell() to prevent reload for a POST data.
A great test case is this: go to the URL I just added, and click to get currency table
https://www.timehubzone.com/currencies
Flags: needinfo?(datehubzone)
> There is code in nsObserverBase::NotifyWebShell() to prevent reload for a POST data.
It should still be possible to reproduce this in the case of GET requests.
> A great test case is this: go to the URL I just added, and click to get currency table
> https://www.timehubzone.com/currencies
That site declares the encoding both on the HTTP layer and in early <meta>, so it shouldn't be possible to see this reload case there.
In general, I don't expect us to add complexity to cater for this long-tail legacy issue. If we want to never reload, we should revert bug 620106 and then stop honoring late <meta>, too.
Comment 114•3 years ago
|
||
Moving open bugs with topperf keyword to triage queue so they can be reassessed for performance priority.
Performance Impact: --- → ?
Keywords: topperf
The late meta aspect was fixed in bug 1701828.
The page can still be reloaded in the case where it doesn't declare an encoding and the detector guess at the end of the stream differs from the guess made at </head>
. The telemetry for how often this happened expired and I've been too busy on other things to reinstate telemetry in this area.
In any case:
- Any page can avoid this perf problem by declaring the encoding, and pages that people browse the most declare their encoding.
- Even before bug 1701828, which extended the number of bytes that are considered for the initial guess, the detector-triggered reload case affected less than 1.5% of unlabeled page loads globally.
I think it's not useful to try to eliminate the remaining reload case, since it's better for pages to be readable than performantly unreadable.
I'm leaving this bug open for checking that the reload comes from the cache, though.
Severity: normal → S4
Flags: needinfo?(datehubzone)
Keywords: dataloss
Priority: -- → P5
Summary: Late charset <meta> or autodetection (chardet) should NOT cause reload (loads twice/two times) → Make sure that chardetng-triggered encoding reload is read from the cache
Whiteboard: Please read comment 110. → Please read comment 115.
Updated•3 years ago
|
Performance Impact: ? → ---
Updated•3 years ago
|
Restrict Comments: true
You need to log in
before you can comment on or make changes to this bug.
Description
•